AIDir.app
  • Hot AI Tools
  • New AI Tools
  • AI Tools Category
AIDir.app
AIDir.app

Save this website for future use! Free to use, no login required.

About

  • Blog

ยฉ 2025 โ€ข AIDir.app All rights reserved.

  • Privacy Policy
  • Terms of Service
Home
Model Benchmarking
OR-Bench Leaderboard

OR-Bench Leaderboard

Measure over-refusal in LLMs using OR-Bench

You May Also Like

View All
๐ŸŒ–

Memorization Or Generation Of Big Code Model Leaderboard

Compare code model performance on benchmarks

5
๐Ÿ“Š

Llm Memory Requirement

Calculate memory usage for LLM models

2
๐Ÿ 

WebGPU Embedding Benchmark

Measure BERT model performance using WASM and WebGPU

0
๐Ÿ 

Nexus Function Calling Leaderboard

Visualize model performance on function calling tasks

92
๐Ÿš€

stm32 model zoo app

Explore and manage STM32 ML models with the STM32AI Model Zoo dashboard

2
๐Ÿฆพ

GAIA Leaderboard

Submit models for evaluation and view leaderboard

360
๐ŸŒŽ

Push Model From Web

Push a ML model to Hugging Face Hub

9
โšก

ML.ENERGY Leaderboard

Explore GenAI model efficiency on ML.ENERGY leaderboard

8
โ™ป

Converter

Convert and upload model files for Stable Diffusion

3
๐Ÿ†

Low-bit Quantized Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

165
๐Ÿฅ‡

TTSDS Benchmark and Leaderboard

Text-To-Speech (TTS) Evaluation using objective metrics.

22
๐Ÿ‘“

Model Explorer

Explore and visualize diverse models

22

What is OR-Bench Leaderboard ?

OR-Bench Leaderboard is a tool designed to measure and compare over-refusal (OR) behavior in large language models (LLMs). It provides a standardized framework to evaluate how models respond to refusal scenarios, ensuring consistent and fair benchmarking across different models. The leaderboard helps researchers and developers understand the limitations and capabilities of LLMs in handling refusal tasks.

Features

  • Comprehensive Benchmarking: Evaluate LLMs on a diverse set of refusal scenarios.
  • Customizable Metrics: Measure over-refusal using multiple defined metrics.
  • Model Tracking: Compare performance across different versions of models.
  • Version Support: Compatibility with various model architectures and frameworks.
  • Open Source: Transparent and accessible for the research community.
  • Community-Driven: Encourages contributions and improvements from users.

How to use OR-Bench Leaderboard ?

  1. Install the Tool: Clone the repository and install dependencies.
  2. Run Benchmarking: Execute the benchmarking script with the desired models.
  3. Analyze Results: Review the generated metrics and comparisons.
  4. Submit Results: Contribute your findings to the leaderboard for community sharing.

Frequently Asked Questions

What is over-refusal in LLMs?
Over-refusal refers to when a model refuses to respond to a query, even when it could provide a meaningful answer.

Why is benchmarking over-refusal important?
Benchmarking helps identify models that may excessively refuse to answer, potentially limiting their utility in real-world applications.

How do I interpret the results from OR-Bench Leaderboard?
Results show how often and in what contexts models refuse to respond, enabling comparisons of refusal behavior across different models.

Recommended Category

View All
๐ŸŽต

Generate music

๐Ÿง‘โ€๐Ÿ’ป

Create a 3D avatar

โœ‚๏ธ

Separate vocals from a music track

๐Ÿ”ง

Fine Tuning Tools

๐Ÿค–

Chatbots

๐Ÿฉป

Medical Imaging

๐Ÿ”Š

Add realistic sound to a video

๐Ÿšจ

Anomaly Detection

๐ŸŽญ

Character Animation

โ“

Question Answering

๐Ÿงน

Remove objects from a photo

๐Ÿค–

Create a customer service chatbot

๐ŸŽ™๏ธ

Transcribe podcast audio to text

๐Ÿ”

Detect objects in an image

๐Ÿ–ผ๏ธ

Image Captioning