AIDir.app
  • Hot AI Tools
  • New AI Tools
  • AI Tools Category
AIDir.app
AIDir.app

Save this website for future use! Free to use, no login required.

About

  • Blog

© 2025 • AIDir.app All rights reserved.

  • Privacy Policy
  • Terms of Service
Home
Model Benchmarking
OR-Bench Leaderboard

OR-Bench Leaderboard

Evaluate LLM over-refusal rates with OR-Bench

You May Also Like

View All
🥇

Russian LLM Leaderboard

View and submit LLM benchmark evaluations

45
🐢

Newapi1

Load AI models and prepare your space

0
🏆

OR-Bench Leaderboard

Measure over-refusal in LLMs using OR-Bench

3
📊

MEDIC Benchmark

View and compare language model evaluations

6
💻

Redteaming Resistance Leaderboard

Display benchmark results

0
🎨

SD To Diffusers

Convert Stable Diffusion checkpoint to Diffusers and open a PR

72
🥇

TTSDS Benchmark and Leaderboard

Text-To-Speech (TTS) Evaluation using objective metrics.

22
⚡

ML.ENERGY Leaderboard

Explore GenAI model efficiency on ML.ENERGY leaderboard

8
🧠

GREAT Score

Evaluate adversarial robustness using generative models

0
📈

Ilovehf

View RL Benchmark Reports

0
🥇

Hebrew Transcription Leaderboard

Display LLM benchmark leaderboard and info

12
🎙

ConvCodeWorld

Evaluate code generation with diverse feedback types

0

What is OR-Bench Leaderboard ?

OR-Bench Leaderboard is a benchmarking platform designed to evaluate Large Language Models (LLMs) based on their over-refusal rates. It provides a comprehensive framework to assess how often models refuse to respond to prompts, offering insights into their reliability and responsiveness. This tool is particularly useful for researchers and developers aiming to optimize LLM performance and transparency.

Features

• Benchmarking of LLMs: Comprehensive evaluation of models based on their refusal rates. • Performance Metrics: Detailed metrics on refusal rates across diverse scenarios and prompts. • Model Comparisons: Side-by-side comparisons to identify top-performing models. • Scenarios Support: Testing models against a wide range of scenarios. • Transparency: Open and accessible results for community review. • Community-Driven: Continuously updated with new models and data.

How to use OR-Bench Leaderboard ?

  1. Access the Platform: Visit the OR-Bench Leaderboard website or integrate its API into your workflow.
  2. Select Models: Choose the LLMs you want to evaluate or compare.
  3. Review Metrics: Analyze refusal rates and performance across different scenarios.
  4. Compare Results: Use the leaderboard to identify models with the lowest refusal rates.
  5. Consult Documentation: Use provided resources to understand methodologies and improve model performance.

Frequently Asked Questions

What does the OR-Bench Leaderboard measure?
The leaderboard measures the over-refusal rates of LLMs, indicating how often models refuse to respond to prompts.

How are the models evaluated?
Models are evaluated using a standardized set of scenarios designed to test their responsiveness and reliability.

Can I contribute to the leaderboard?
Yes, contributions are welcome. Submit your model or scenario suggestions through the platform's community portal.

Recommended Category

View All
🕺

Pose Estimation

🔍

Object Detection

↔️

Extend images automatically

🚫

Detect harmful or offensive content in images

🌜

Transform a daytime scene into a night scene

🖼️

Image Generation

🎥

Convert a portrait into a talking video

😂

Make a viral meme

💬

Add subtitles to a video

😊

Sentiment Analysis

🎤

Generate song lyrics

✂️

Separate vocals from a music track

📐

Generate a 3D model from an image

🌍

Language Translation

🖼️

Image Captioning