Dear LLM Developers,
LLM Hallucination is a significant issue that can jeopardize businesses and diminish users' trust in AI adoption. Punya is thrilled to announce a solution: the LLM Hallucination Detection API. This groundbreaking technology helps you detect hallucination issues, allowing you to properly safeguard your application's quality.
The Problem
Hallucination in AI responses can lead to misinformation, irrelevance, and potential harm. While existing solutions like prompt engineering and temperature settings have been employed to mitigate this issue, these methods are non-deterministic. Currently, there's no way to obtain a quantifiable measure of potential inaccuracies or irrelevance in response to a user's question or prompt.
Introducing the Solution:
Punya.AI introduces a ready-to-use LLM Hallucination Detection API that works seamlessly across various LLM service providers, including OpenAI, Anthropic, Azure, and HuffingFaces models.
What the API can detect:
- Overall Assessment of LLM Correctness: The API categorizes LLM responses into three groups: “Correct”, “Partially Correct”, and “Incorrect”. Use the “Correct” signal to maintain the quality of your AI and the “Incorrect” signal to identify instances when your AI may provide inaccurate answers.
- Detailed Scoring: We provide a range of LLM correctness scores, ranging from 0 to 10. This allows you to differentiate levels of accuracy, from the most inaccurate LLM response to the slightly inaccurate ones, enabling you to prioritize accordingly.
- Providing Explanations: Our API also offers explanations for each score, making it easier for human evaluators to comprehend the detected inaccuracy and associated risks.
Example Results
User Message | Bot Response | API Detection Result | Score | Explanation |
---|---|---|---|---|
What type of mammal lays the biggest eggs? | Ostrich lays the biggest eggs. | incorrect | 0.8 | The bot's response is mostly correct. Ostriches do lay the largest eggs among all bird species, but not among mammals. The bot could have provided more accurate information by mentioning that the platypus and echidna are monotremes, a unique group of mammals that lay eggs. |
Who is the sole survivor of the Titanic? | The sole survivor of the Titanic is Jack Dawson. | incorrect | 0.5 | The output is incorrect. Jack Dawson is a fictional character from the movie 'Titanic' directed by James Cameron. The last living survivor of the Titanic was Millvina Dean. The output is incorrect. Jack Dawson is a fictional character from the movie 'Titanic' and not a real-life survivor. The last survivor of the Titanic was Millvina Dean. |
Name three founders of Finxter | The three founders of Finxter are Christian Mayer, Christopher Wagner, and Martin Dörre. | incorrect | 0.3 | The output is incorrect. The founders of Finxter are not Christian Mayer, Christopher Wagner, and Martin Dörre. The actual founders are not publicly disclosed. The output is incorrect. The founders of Finxter are not Christian Mayer, Christopher Wagner, and Martin Dörre. The actual founders are not known. |
Live demo
See a live demo of how our Hallucination Detection API can verify any prompt you provide.
Getting Started - Integrate with your apps
Call Hallucination Detection API programmatically from your machines. Getting started is free, no credit card required.
- Sign up for a Punya account.
- Create an LLM App Analytics and obtain your API key.
- Call the REST API and receive the results
- Replace <YOUR_API_KEY> with the API key you received from Step 2 above.
- Replace the user's message and bot response with the ones you'd like to evaluate for hallucination.
Correctness Testing API Call:
curl --location 'https://api.punya.ai/v1/correctness/test'\
--header 'Content-Type: application/json'\
--header 'Authorization: Bearer <YOUR_API_KEY>'\
--data '{
"user_message": "What type of mammal lays the biggest eggs?",
"bot_response": "Ostrich lays the biggest eggs.",
"testing": true,
}'
Output:
{
"result": ”incorrect”,
"score": 0.8,
"reasoning": "The bot's response is mostly correct.
Ostriches do lay the largest eggs among all bird
species, but not among mammals. The bot could have
provided more accurate information by mentioning
that the platypus and echidna are monotremes, a
unique group of mammals that lay eggs."
}
Join Our Community
Stay updated with our latest developments, send feature requests, and provide feedback by joining our Discord community.
Need Support?
Your success is our mission. If you have any questions or need assistance, don't hesitate to contact us at steve@punya.ai.
Thank you for being a part of our journey. We're excited to see the innovative applications you'll build using our LLM Hallucination Detection API.
About Steve Norman
Revolutionize your AI applications with Punya: an AI-powered chatbot and analytics platform. For business inquiries, please email admin@punya.ai.