AIDir.app
  • Hot AI Tools
  • New AI Tools
  • AI Tools Category
AIDir.app
AIDir.app

Save this website for future use! Free to use, no login required.

About

  • Blog

ยฉ 2025 โ€ข AIDir.app All rights reserved.

  • Privacy Policy
  • Terms of Service
Home
Model Benchmarking
OR-Bench Leaderboard

OR-Bench Leaderboard

Measure over-refusal in LLMs using OR-Bench

You May Also Like

View All
๐ŸŒธ

La Leaderboard

Evaluate open LLMs in the languages of LATAM and Spain.

71
๐Ÿ 

Nexus Function Calling Leaderboard

Visualize model performance on function calling tasks

92
๐Ÿš€

stm32 model zoo app

Explore and manage STM32 ML models with the STM32AI Model Zoo dashboard

2
๐Ÿ“Š

Llm Memory Requirement

Calculate memory usage for LLM models

2
๐ŸŽจ

SD To Diffusers

Convert Stable Diffusion checkpoint to Diffusers and open a PR

72
๐Ÿง 

GREAT Score

Evaluate adversarial robustness using generative models

0
๐Ÿฅ‡

Russian LLM Leaderboard

View and submit LLM benchmark evaluations

45
๐Ÿฅ‡

TTSDS Benchmark and Leaderboard

Text-To-Speech (TTS) Evaluation using objective metrics.

22
๐ŸŽ

Export to ONNX

Export Hugging Face models to ONNX

68
๐Ÿ†

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

84
๐Ÿ“‰

Leaderboard 2 Demo

Demo of the new, massively multilingual leaderboard

19
โšก

ML.ENERGY Leaderboard

Explore GenAI model efficiency on ML.ENERGY leaderboard

8

What is OR-Bench Leaderboard ?

OR-Bench Leaderboard is a tool designed to measure and compare over-refusal (OR) behavior in large language models (LLMs). It provides a standardized framework to evaluate how models respond to refusal scenarios, ensuring consistent and fair benchmarking across different models. The leaderboard helps researchers and developers understand the limitations and capabilities of LLMs in handling refusal tasks.

Features

  • Comprehensive Benchmarking: Evaluate LLMs on a diverse set of refusal scenarios.
  • Customizable Metrics: Measure over-refusal using multiple defined metrics.
  • Model Tracking: Compare performance across different versions of models.
  • Version Support: Compatibility with various model architectures and frameworks.
  • Open Source: Transparent and accessible for the research community.
  • Community-Driven: Encourages contributions and improvements from users.

How to use OR-Bench Leaderboard ?

  1. Install the Tool: Clone the repository and install dependencies.
  2. Run Benchmarking: Execute the benchmarking script with the desired models.
  3. Analyze Results: Review the generated metrics and comparisons.
  4. Submit Results: Contribute your findings to the leaderboard for community sharing.

Frequently Asked Questions

What is over-refusal in LLMs?
Over-refusal refers to when a model refuses to respond to a query, even when it could provide a meaningful answer.

Why is benchmarking over-refusal important?
Benchmarking helps identify models that may excessively refuse to answer, potentially limiting their utility in real-world applications.

How do I interpret the results from OR-Bench Leaderboard?
Results show how often and in what contexts models refuse to respond, enabling comparisons of refusal behavior across different models.

Recommended Category

View All
๐Ÿง‘โ€๐Ÿ’ป

Create a 3D avatar

๐Ÿง 

Text Analysis

๐Ÿค–

Create a customer service chatbot

โญ

Recommendation Systems

๐Ÿ“

Generate a 3D model from an image

๐Ÿ–ผ๏ธ

Image Generation

๐Ÿ“

Convert 2D sketches into 3D models

๐Ÿ’ฌ

Add subtitles to a video

๐Ÿ’ป

Generate an application

๐ŸŽฎ

Game AI

๐ŸŒˆ

Colorize black and white photos

โœ‚๏ธ

Remove background from a picture

โœจ

Restore an old photo

๐Ÿ”‡

Remove background noise from an audio

๐Ÿ“ˆ

Predict stock market trends