Promptfoo

Assess large language model output quality with customizable metrics, multiple provider support, and a command-line interface for easy integration and improvement.
Language Model Evaluation AI Model Tuning Natural Language Processing

Promptfoo is a command-line interface (CLI) and library for assessing the quality of large language model (LLM) output. It can help developers get the best possible model quality and spot regressions by offering a structured way to test and improve prompts and models.

Among promptfoo's features are:

  • Easy configuration: Specify prompts, models and test cases in YAML.
  • Customizable evaluation metrics: Use built-in metrics or define your own to score model output.
  • Support for multiple LLM providers: Connect to OpenAI, Anthropic, Azure, Google, HuggingFace and others.
  • Command-line interface: Run evaluations from the command line for easy integration with existing processes.
  • Library usage: Use promptfoo as a Node.js library in your own code.
  • Web viewer: Look at output in a structured format for easier analysis.

The main use case for promptfoo is tuning LLM models to find the best prompts and to evaluate output quality. It's for developers and teams building LLM apps, particularly those with a large user base.

Promptfoo's red teaming feature creates customized attacks and jailbreaks for your LLM app to help you find weaknesses and improve security. The scanner offers remediation advice, too.

Pricing isn't clear, but promptfoo is open-source software, which means it's free to use and modify. The project is designed to be flexible and adaptable to different configurations, so it should work for a broad range of LLM development needs.

Published on June 14, 2024

Related Questions

Tool Suggestions

Analyzing Promptfoo...