Open source LLM evaluation - AI tools

BenchLLM is a tool for evaluating LLM-powered applications. It allows users to build test suites, generate quality reports, and choose between automated, interactive, or custom evaluation strategies.
- Other

PromptsLabs is a community-driven platform providing copy-paste prompts to test the performance of new LLMs. Explore and contribute to a growing collection of prompts.
- Free

Gentrace is an LLM evaluation platform designed for AI teams to test and automate evaluations of generative AI products and agents. It facilitates collaborative development and ensures high-quality LLM applications.
- Usage Based

EleutherAI is a research institute focused on advancing and democratizing open-source AI, particularly in language modeling, interpretability, and alignment. They train, release, and evaluate powerful open-source LLMs.
- Free

Conviction is an AI platform designed for evaluating, testing, and monitoring Large Language Models (LLMs) to help developers build reliable AI applications faster. It focuses on detecting hallucinations, optimizing prompts, and ensuring security.
- Freemium
- From 249$

Laminar is an open-source platform that enables developers to trace, evaluate, label, and analyze Large Language Model (LLM) applications with minimal code integration.
- Freemium
- From 25$

LLM Explorer is a comprehensive platform for discovering, comparing, and accessing over 46,000 open-source Large Language Models (LLMs) and Small Language Models (SLMs).
- Free

EvalsOne is a platform for rapidly evaluating Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines using various metrics.
- Freemium
- From 19$

Phoenix accelerates AI development with powerful insights, allowing seamless evaluation, experimentation, and optimization of AI applications in real time.
- Freemium

ModelBench enables teams to rapidly deploy AI solutions with no-code LLM evaluations. It allows users to compare over 180 models, design and benchmark prompts, and trace LLM runs, accelerating AI development.
- Free Trial
- From 49$

Hegel AI provides a developer platform for building, monitoring, and improving large language model (LLM) applications, featuring tools for experimentation, evaluation, and feedback integration.
- Contact for Pricing

Superpipe is an open-source experimentation platform designed for building, evaluating, and optimizing Large Language Model (LLM) pipelines to improve accuracy and minimize costs. It allows deployment on user infrastructure for enhanced privacy and security.
- Free

Rhesis AI offers an open-source SDK to generate comprehensive, context-specific test sets for LLM applications, enhancing AI evaluation, reliability, and compliance.
- Freemium

TheFastest.ai provides reliable, daily updated performance benchmarks for popular Large Language Models (LLMs), measuring Time To First Token (TTFT) and Tokens Per Second (TPS) across different regions and prompt types.
- Free

Langfuse provides an open-source platform for tracing, evaluating, and managing prompts to debug and improve LLM applications.
- Freemium
- From 59$
Featured Tools

Foundor.ai
Business Planning, Supercharged by AI
SpicyGen
Turn your AI Images into Spicy Videos
SweetAI
Best NSFW AI: Free Sex Chat, Image Generator, Characters for Adults
MiriCanvas
Complete all your designs with MiriCanvas
BestFaceSwap
Change faces in videos and photos with 3 simple clicks
Search Daddie
Discover the Best NSFW AI on the Internet
Freebeat.ai
Turn Music into Viral Videos In One Click
Kindo
Enterprise-Ready Agentic Security for DevOps and SecOps Automation
JuicyTalk
Chat or Create Your Own Best AI Girlfriend or Boyfriend Online FreeDidn't find tool you were looking for?