open-source LLM testing tool - AI tools

  • BenchLLM
    BenchLLM The best way to evaluate LLM-powered apps

    BenchLLM is a tool for evaluating LLM-powered applications. It allows users to build test suites, generate quality reports, and choose between automated, interactive, or custom evaluation strategies.

    • Other
  • PromptsLabs
    PromptsLabs A Library of Prompts for Testing LLMs

    PromptsLabs is a community-driven platform providing copy-paste prompts to test the performance of new LLMs. Explore and contribute to a growing collection of prompts.

    • Free
  • Gentrace
    Gentrace Intuitive evals for intelligent applications

    Gentrace is an LLM evaluation platform designed for AI teams to test and automate evaluations of generative AI products and agents. It facilitates collaborative development and ensures high-quality LLM applications.

    • Usage Based
  • promptfoo
    promptfoo Test & secure your LLM apps with open-source LLM testing

    promptfoo is an open-source LLM testing tool designed to help developers secure and evaluate their language model applications, offering features like vulnerability scanning and continuous monitoring.

    • Freemium
  • Libretto
    Libretto LLM Monitoring, Testing, and Optimization

    Libretto offers comprehensive LLM monitoring, automated prompt testing, and optimization tools to ensure the reliability and performance of your AI applications.

    • Freemium
    • From 180$
  • phoenix.arize.com
    phoenix.arize.com Open-source LLM tracing and evaluation

    Phoenix accelerates AI development with powerful insights, allowing seamless evaluation, experimentation, and optimization of AI applications in real time.

    • Freemium
  • Autoblocks
    Autoblocks Improve your LLM Product Accuracy with Expert-Driven Testing & Evaluation

    Autoblocks is a collaborative testing and evaluation platform for LLM-based products that automatically improves through user and expert feedback, offering comprehensive tools for monitoring, debugging, and quality assurance.

    • Freemium
    • From 1750$
  • Laminar
    Laminar The AI engineering platform for LLM products

    Laminar is an open-source platform that enables developers to trace, evaluate, label, and analyze Large Language Model (LLM) applications with minimal code integration.

    • Freemium
    • From 25$
  • Langtail
    Langtail The low-code platform for testing AI apps

    Langtail is a comprehensive testing platform that enables teams to test and debug LLM-powered applications with a spreadsheet-like interface, offering security features and integration with major LLM providers.

    • Freemium
    • From 99$
  • GPT–LLM Playground
    GPT–LLM Playground Your Comprehensive Testing Environment for Language Learning Models

    GPT-LLM Playground is a macOS application designed for advanced experimentation and testing with Language Learning Models (LLMs). It offers features like multi-model support, versioning, and custom endpoints.

    • Free
  • ModelBench
    ModelBench No-Code LLM Evaluations

    ModelBench enables teams to rapidly deploy AI solutions with no-code LLM evaluations. It allows users to compare over 180 models, design and benchmark prompts, and trace LLM runs, accelerating AI development.

    • Free Trial
    • From 49$
  • Langfuse
    Langfuse Open Source LLM Engineering Platform

    Langfuse provides an open-source platform for tracing, evaluating, and managing prompts to debug and improve LLM applications.

    • Freemium
    • From 59$
  • Ottic
    Ottic QA for LLM products done right

    Ottic empowers tech and non-technical teams to test LLM applications, ensuring faster product development and enhanced reliability. Streamline your QA process and gain full visibility into your LLM application's behavior.

    • Contact for Pricing
  • RoostGPT
    RoostGPT Automated Test Case Generation using LLMs for Reliable Software Development

    RoostGPT is an AI-powered testing co-pilot that automates test case generation, providing 100% test coverage while detecting static vulnerabilities. It leverages Large Language Models to enhance software development efficiency and reliability.

    • Paid
    • From 25000$
  • EleutherAI
    EleutherAI Empowering Open-Source Artificial Intelligence Research

    EleutherAI is a research institute focused on advancing and democratizing open-source AI, particularly in language modeling, interpretability, and alignment. They train, release, and evaluate powerful open-source LLMs.

    • Free
  • GeneratorLLMs
    GeneratorLLMs Extracts core website content, creates structured text files, improves LLM comprehension, boosts search engine visibility, and delivers quality data for AI training and inference.

    GeneratorLLMs is a tool that creates standardized `llms.txt` files by extracting core website content. This improves how Large Language Models (LLMs) understand websites and enhances AI visibility.

    • Free
  • Inductor
    Inductor Streamline Production-Ready LLM Applications

    Inductor enables developers to rapidly prototype, evaluate, and improve LLM applications, ensuring high-quality app delivery.

    • Freemium
  • Didn't find tool you were looking for?

    Be as detailed as possible for better results
    EliteAi.tools logo

    Elite AI Tools

    EliteAi.tools is the premier AI tools directory, exclusively featuring high-quality, useful, and thoroughly tested tools. Discover the perfect AI tool for your task using our AI-powered search engine.

    Subscribe to our newsletter

    Subscribe to our weekly newsletter and stay updated with the latest high-quality AI tools delivered straight to your inbox.

    © 2025 EliteAi.tools. All Rights Reserved.