open-source LLM testing tool - AI tools

BenchLLM is a tool for evaluating LLM-powered applications. It allows users to build test suites, generate quality reports, and choose between automated, interactive, or custom evaluation strategies.
- Other

PromptsLabs is a community-driven platform providing copy-paste prompts to test the performance of new LLMs. Explore and contribute to a growing collection of prompts.
- Free

Gentrace is an LLM evaluation platform designed for AI teams to test and automate evaluations of generative AI products and agents. It facilitates collaborative development and ensures high-quality LLM applications.
- Usage Based

promptfoo is an open-source LLM testing tool designed to help developers secure and evaluate their language model applications, offering features like vulnerability scanning and continuous monitoring.
- Freemium

Libretto offers comprehensive LLM monitoring, automated prompt testing, and optimization tools to ensure the reliability and performance of your AI applications.
- Freemium
- From 180$

Phoenix accelerates AI development with powerful insights, allowing seamless evaluation, experimentation, and optimization of AI applications in real time.
- Freemium

Autoblocks is a collaborative testing and evaluation platform for LLM-based products that automatically improves through user and expert feedback, offering comprehensive tools for monitoring, debugging, and quality assurance.
- Freemium
- From 1750$

Laminar is an open-source platform that enables developers to trace, evaluate, label, and analyze Large Language Model (LLM) applications with minimal code integration.
- Freemium
- From 25$

Langtail is a comprehensive testing platform that enables teams to test and debug LLM-powered applications with a spreadsheet-like interface, offering security features and integration with major LLM providers.
- Freemium
- From 99$

GPT-LLM Playground is a macOS application designed for advanced experimentation and testing with Language Learning Models (LLMs). It offers features like multi-model support, versioning, and custom endpoints.
- Free

ModelBench enables teams to rapidly deploy AI solutions with no-code LLM evaluations. It allows users to compare over 180 models, design and benchmark prompts, and trace LLM runs, accelerating AI development.
- Free Trial
- From 49$

Langfuse provides an open-source platform for tracing, evaluating, and managing prompts to debug and improve LLM applications.
- Freemium
- From 59$

Ottic empowers tech and non-technical teams to test LLM applications, ensuring faster product development and enhanced reliability. Streamline your QA process and gain full visibility into your LLM application's behavior.
- Contact for Pricing

RoostGPT is an AI-powered testing co-pilot that automates test case generation, providing 100% test coverage while detecting static vulnerabilities. It leverages Large Language Models to enhance software development efficiency and reliability.
- Paid
- From 25000$

EleutherAI is a research institute focused on advancing and democratizing open-source AI, particularly in language modeling, interpretability, and alignment. They train, release, and evaluate powerful open-source LLMs.
- Free

GeneratorLLMs is a tool that creates standardized `llms.txt` files by extracting core website content. This improves how Large Language Models (LLMs) understand websites and enhances AI visibility.
- Free

Inductor enables developers to rapidly prototype, evaluate, and improve LLM applications, ensuring high-quality app delivery.
- Freemium
Featured Tools

Nectar AI
Create your Perfect Virtual AI Companion
Freebeat.ai
Turn Music into Viral Videos In One Click
Kindo
Enterprise-Ready Agentic Security for DevOps and SecOps Automation
JuicyTalk
Chat or Create Your Own Best AI Girlfriend or Boyfriend Online Free
BestFaceSwap
Change faces in videos and photos with 3 simple clicks
Fellow
#1 AI Meeting AssistantDidn't find tool you were looking for?