Evaluate RAG systems with visual analytics
View and submit LLM benchmark evaluations
Launch web-based model application
Display genomic embedding leaderboard
Evaluate open LLMs in the languages of LATAM and Spain.
Retrain models for new data at edge devices
GIFT-Eval: A Benchmark for General Time Series Forecasting
Measure BERT model performance using WASM and WebGPU
Compare audio representation models using benchmark results
Evaluate and submit AI model results for Frugal AI Challenge
Calculate survival probability based on passenger details
Explore GenAI model efficiency on ML.ENERGY leaderboard
Calculate VRAM requirements for LLM models
InspectorRAGet is a tool designed to evaluate and benchmark RAG (Retrieval-Augmented Generation) systems. It provides visual analytics and insights to help users understand the performance and behavior of their RAG models, enabling data-driven optimizations and improvements.
• Visual Analytics: Gain insights into RAG system performance through interactive visualizations.
• Benchmarking Capabilities: Compare multiple RAG models side-by-side to identify strengths and weaknesses.
• Efficient Evaluation: Streamline the evaluation process with automated workflows and reporting.
• Customizable Metrics: Define and track key performance indicators tailored to your needs.
• Integration Support: Easily integrate with popular RAG frameworks and tools.
pip install inspectrraget
.from inspectrraget importInspectorRAGet
What is a RAG system?
A Retrieval-Augmented Generation (RAG) system combines retrieval mechanisms (e.g., databases or search engines) with generative models (e.g., large language models) to produce more accurate and contextually relevant responses.
Can I customize the evaluation metrics?
Yes, InspectorRAGet allows you to define and use custom metrics to align with your specific evaluation goals.
How do I visualize the results?
InspectorRAGet provides built-in visualization tools that generate interactive charts and graphs. You can access these by calling the visualize()
method after running your queries.