Evaluate RAG systems with visual analytics
Calculate VRAM requirements for LLM models
Browse and submit LLM evaluations
Measure over-refusal in LLMs using OR-Bench
Submit models for evaluation and view leaderboard
Convert Hugging Face model repo to Safetensors
Benchmark LLMs in accuracy and translation across languages
Convert Stable Diffusion checkpoint to Diffusers and open a PR
Display and submit language model evaluations
Multilingual Text Embedding Model Pruner
Display leaderboard of language model evaluations
Upload ML model to Hugging Face Hub
Evaluate LLM over-refusal rates with OR-Bench
InspectorRAGet is a tool designed to evaluate and benchmark RAG (Retrieval-Augmented Generation) systems. It provides visual analytics and insights to help users understand the performance and behavior of their RAG models, enabling data-driven optimizations and improvements.
• Visual Analytics: Gain insights into RAG system performance through interactive visualizations.
• Benchmarking Capabilities: Compare multiple RAG models side-by-side to identify strengths and weaknesses.
• Efficient Evaluation: Streamline the evaluation process with automated workflows and reporting.
• Customizable Metrics: Define and track key performance indicators tailored to your needs.
• Integration Support: Easily integrate with popular RAG frameworks and tools.
pip install inspectrraget.from inspectrraget importInspectorRAGet
What is a RAG system?
A Retrieval-Augmented Generation (RAG) system combines retrieval mechanisms (e.g., databases or search engines) with generative models (e.g., large language models) to produce more accurate and contextually relevant responses.
Can I customize the evaluation metrics?
Yes, InspectorRAGet allows you to define and use custom metrics to align with your specific evaluation goals.
How do I visualize the results?
InspectorRAGet provides built-in visualization tools that generate interactive charts and graphs. You can access these by calling the visualize() method after running your queries.