OR-Bench Leaderboard

Evaluate LLM over-refusal rates with OR-Bench

What is OR-Bench Leaderboard ?

OR-Bench Leaderboard is a benchmarking platform designed to evaluate Large Language Models (LLMs) based on their over-refusal rates. It provides a comprehensive framework to assess how often models refuse to respond to prompts, offering insights into their reliability and responsiveness. This tool is particularly useful for researchers and developers aiming to optimize LLM performance and transparency.

Features

• Benchmarking of LLMs: Comprehensive evaluation of models based on their refusal rates. • Performance Metrics: Detailed metrics on refusal rates across diverse scenarios and prompts. • Model Comparisons: Side-by-side comparisons to identify top-performing models. • Scenarios Support: Testing models against a wide range of scenarios. • Transparency: Open and accessible results for community review. • Community-Driven: Continuously updated with new models and data.

How to use OR-Bench Leaderboard ?

Access the Platform: Visit the OR-Bench Leaderboard website or integrate its API into your workflow.
Select Models: Choose the LLMs you want to evaluate or compare.
Review Metrics: Analyze refusal rates and performance across different scenarios.
Compare Results: Use the leaderboard to identify models with the lowest refusal rates.
Consult Documentation: Use provided resources to understand methodologies and improve model performance.

Frequently Asked Questions

What does the OR-Bench Leaderboard measure?
The leaderboard measures the over-refusal rates of LLMs, indicating how often models refuse to respond to prompts.

How are the models evaluated?
Models are evaluated using a standardized set of scenarios designed to test their responsiveness and reliability.

Can I contribute to the leaderboard?
Yes, contributions are welcome. Submit your model or scenario suggestions through the platform's community portal.

Recommended Category

View All

🎤

OR-Bench Leaderboard

You May Also Like

ExplaiNER

Deepfake Detection Arena Leaderboard

Testmax

La Leaderboard

Memorization Or Generation Of Big Code Model Leaderboard

LLM Forecasting Leaderboard

NNCF quantization

SD To Diffusers

MEDIC Benchmark

🌐 Multilingual MMLU Benchmark Leaderboard

LLM Performance Leaderboard

DécouvrIR

What is OR-Bench Leaderboard ?

Features

How to use OR-Bench Leaderboard ?

Frequently Asked Questions

Recommended Category

Generate song lyrics

Video Generation

Change the lighting in a photo

Extract text from scanned documents

Dataset Creation

Separate vocals from a music track

Remove background noise from an audio

Make a viral meme

Music Generation

Sentiment Analysis

Add subtitles to a video

Automate meeting notes summaries

Generate an application

Detect harmful or offensive content in images

Style Transfer