open-source LLM testing tool

BenchLLM The best way to evaluate LLM-powered apps

BenchLLM is a tool for evaluating LLM-powered applications. It allows users to build test suites, generate quality reports, and choose between automated, interactive, or custom evaluation strategies.

Other

PromptsLabs A Library of Prompts for Testing LLMs

PromptsLabs is a community-driven platform providing copy-paste prompts to test the performance of new LLMs. Explore and contribute to a growing collection of prompts.

Free

Conviction The Platform to Evaluate & Test LLMs

Conviction is an AI platform designed for evaluating, testing, and monitoring Large Language Models (LLMs) to help developers build reliable AI applications faster. It focuses on detecting hallucinations, optimizing prompts, and ensuring security.

Freemium
From 249$

Rhesis AI Open-source test generation SDK for LLM applications

Rhesis AI offers an open-source SDK to generate comprehensive, context-specific test sets for LLM applications, enhancing AI evaluation, reliability, and compliance.

Freemium

Gentrace Intuitive evals for intelligent applications

Gentrace is an LLM evaluation platform designed for AI teams to test and automate evaluations of generative AI products and agents. It facilitates collaborative development and ensures high-quality LLM applications.

Usage Based

promptfoo Test & secure your LLM apps with open-source LLM testing

promptfoo is an open-source LLM testing tool designed to help developers secure and evaluate their language model applications, offering features like vulnerability scanning and continuous monitoring.

Freemium

Alumnium Bridge the gap between human and automated testing! Translate your test instructions into executable commands using AI.

Alumnium is an AI-powered tool that translates natural language test instructions into executable commands for browser test automation, integrating with Playwright and Selenium.

Freemium

Libretto LLM Monitoring, Testing, and Optimization

Libretto offers comprehensive LLM monitoring, automated prompt testing, and optimization tools to ensure the reliability and performance of your AI applications.

Freemium
From 180$

phoenix.arize.com Open-source LLM tracing and evaluation

Phoenix accelerates AI development with powerful insights, allowing seamless evaluation, experimentation, and optimization of AI applications in real time.

Freemium

Autoblocks Improve your LLM Product Accuracy with Expert-Driven Testing & Evaluation

Autoblocks is a collaborative testing and evaluation platform for LLM-based products that automatically improves through user and expert feedback, offering comprehensive tools for monitoring, debugging, and quality assurance.

Freemium
From 1750$

Laminar The AI engineering platform for LLM products

Laminar is an open-source platform that enables developers to trace, evaluate, label, and analyze Large Language Model (LLM) applications with minimal code integration.

Freemium
From 25$

Langtail The low-code platform for testing AI apps

Langtail is a comprehensive testing platform that enables teams to test and debug LLM-powered applications with a spreadsheet-like interface, offering security features and integration with major LLM providers.

Freemium
From 99$

GPT–LLM Playground Your Comprehensive Testing Environment for Language Learning Models

GPT-LLM Playground is a macOS application designed for advanced experimentation and testing with Language Learning Models (LLMs). It offers features like multi-model support, versioning, and custom endpoints.

Free

ModelBench No-Code LLM Evaluations

ModelBench enables teams to rapidly deploy AI solutions with no-code LLM evaluations. It allows users to compare over 180 models, design and benchmark prompts, and trace LLM runs, accelerating AI development.

Free Trial
From 49$

Gorilla Large Language Model Connected with Massive APIs

Gorilla is an open-source Large Language Model (LLM) developed by UC Berkeley, designed to interact with a wide range of services by invoking API calls.

Free

Axolotl AI We make fine-tuning accessible, scalable, fun

Axolotl AI is a free, open-source tool designed to make fine-tuning Large Language Models (LLMs) faster, more accessible, and scalable across various AI models and platforms.

Free

Literal AI Ship reliable LLM Products

Literal AI streamlines the development of LLM applications, offering tools for evaluation, prompt management, logging, monitoring, and more to build production-grade AI products.

Freemium

Parea Test and Evaluate your AI systems

Parea is a platform for testing, evaluating, and monitoring Large Language Model (LLM) applications, helping teams track experiments, collect human feedback, and deploy prompts confidently.

Freemium
From 150$

Superpipe The OSS experimentation platform for LLM pipelines

Superpipe is an open-source experimentation platform designed for building, evaluating, and optimizing Large Language Model (LLM) pipelines to improve accuracy and minimize costs. It allows deployment on user infrastructure for enhanced privacy and security.

Free

surf.new A Playground for Testing Web Agents

surf.new is a playground environment for testing various web agents powered by Large Language Models (LLMs) that can navigate and interact with websites.

Free

Langfuse Open Source LLM Engineering Platform

Langfuse provides an open-source platform for tracing, evaluating, and managing prompts to debug and improve LLM applications.

Freemium
From 59$

Ottic QA for LLM products done right

Ottic empowers tech and non-technical teams to test LLM applications, ensuring faster product development and enhanced reliability. Streamline your QA process and gain full visibility into your LLM application's behavior.

Contact for Pricing

RoostGPT Automated Test Case Generation using LLMs for Reliable Software Development

RoostGPT is an AI-powered testing co-pilot that automates test case generation, providing 100% test coverage while detecting static vulnerabilities. It leverages Large Language Models to enhance software development efficiency and reliability.

Paid
From 25000$

neutrino AI Multi-model AI Infrastructure for Optimal LLM Performance

Neutrino AI provides multi-model AI infrastructure to optimize Large Language Model (LLM) performance for applications. It offers tools for evaluation, intelligent routing, and observability to enhance quality, manage costs, and ensure scalability.

Usage Based

LLM Explorer Discover and Compare Open-Source Language Models

LLM Explorer is a comprehensive platform for discovering, comparing, and accessing over 46,000 open-source Large Language Models (LLMs) and Small Language Models (SLMs).

Free

Promptotype The platform for structured prompt engineering

Promptotype is a platform designed for structured prompt engineering, enabling users to develop, test, and monitor LLM tasks efficiently.

Freemium
From 6$

EleutherAI Empowering Open-Source Artificial Intelligence Research

EleutherAI is a research institute focused on advancing and democratizing open-source AI, particularly in language modeling, interpretability, and alignment. They train, release, and evaluate powerful open-source LLMs.

Free

TheFastest.ai Reliable performance measurements for popular LLM models.

TheFastest.ai provides reliable, daily updated performance benchmarks for popular Large Language Models (LLMs), measuring Time To First Token (TTFT) and Tokens Per Second (TPS) across different regions and prompt types.

Free

Hegel AI Developer Platform for Large Language Model (LLM) Applications

Hegel AI provides a developer platform for building, monitoring, and improving large language model (LLM) applications, featuring tools for experimentation, evaluation, and feedback integration.

Contact for Pricing

Transformer Lab The Open Source Platform for Training Advanced AI Models

Transformer Lab is an open-source platform enabling researchers, ML engineers, and developers to collaboratively build, train, evaluate, and deploy AI models with features like provenance, reproducibility, and transparency.

Free

Search AI Tools

open-source LLM testing tool - AI tools

Explore More