AIDir.app
  • Hot AI Tools
  • New AI Tools
  • AI Tools Category
AIDir.app
AIDir.app

Save this website for future use! Free to use, no login required.

About

  • Blog

ยฉ 2025 โ€ข AIDir.app All rights reserved.

  • Privacy Policy
  • Terms of Service
Home
Model Benchmarking
OR-Bench Leaderboard

OR-Bench Leaderboard

Measure over-refusal in LLMs using OR-Bench

You May Also Like

View All
๐Ÿ 

PaddleOCRModelConverter

Convert PaddleOCR models to ONNX format

3
๐ŸŽ™

ConvCodeWorld

Evaluate code generation with diverse feedback types

0
๐Ÿ…

LLM HALLUCINATIONS TOOL

Evaluate AI-generated results for accuracy

0
๐Ÿš€

README

Optimize and train foundation models using IBM's FMS

0
๐Ÿ“Š

ARCH

Compare audio representation models using benchmark results

3
๐Ÿ‘€

Model Drops Tracker

Find recent high-liked Hugging Face models

33
๐Ÿ“ˆ

GGUF Model VRAM Calculator

Calculate VRAM requirements for LLM models

33
๐Ÿ”

Project RewardMATH

Evaluate reward models for math reasoning

0
๐ŸŽจ

SD To Diffusers

Convert Stable Diffusion checkpoint to Diffusers and open a PR

72
๐Ÿฅ‡

HHEM Leaderboard

Browse and submit language model benchmarks

116
๐Ÿฅ‡

Pinocchio Ita Leaderboard

Display leaderboard of language model evaluations

10
๐Ÿ“Š

MEDIC Benchmark

View and compare language model evaluations

6

What is OR-Bench Leaderboard ?

OR-Bench Leaderboard is a tool designed to measure and compare over-refusal (OR) behavior in large language models (LLMs). It provides a standardized framework to evaluate how models respond to refusal scenarios, ensuring consistent and fair benchmarking across different models. The leaderboard helps researchers and developers understand the limitations and capabilities of LLMs in handling refusal tasks.

Features

  • Comprehensive Benchmarking: Evaluate LLMs on a diverse set of refusal scenarios.
  • Customizable Metrics: Measure over-refusal using multiple defined metrics.
  • Model Tracking: Compare performance across different versions of models.
  • Version Support: Compatibility with various model architectures and frameworks.
  • Open Source: Transparent and accessible for the research community.
  • Community-Driven: Encourages contributions and improvements from users.

How to use OR-Bench Leaderboard ?

  1. Install the Tool: Clone the repository and install dependencies.
  2. Run Benchmarking: Execute the benchmarking script with the desired models.
  3. Analyze Results: Review the generated metrics and comparisons.
  4. Submit Results: Contribute your findings to the leaderboard for community sharing.

Frequently Asked Questions

What is over-refusal in LLMs?
Over-refusal refers to when a model refuses to respond to a query, even when it could provide a meaningful answer.

Why is benchmarking over-refusal important?
Benchmarking helps identify models that may excessively refuse to answer, potentially limiting their utility in real-world applications.

How do I interpret the results from OR-Bench Leaderboard?
Results show how often and in what contexts models refuse to respond, enabling comparisons of refusal behavior across different models.

Recommended Category

View All
๐ŸŽ™๏ธ

Transcribe podcast audio to text

โ€‹๐Ÿ—ฃ๏ธ

Speech Synthesis

๐Ÿ‘ค

Face Recognition

โœ๏ธ

Text Generation

๐ŸŽญ

Character Animation

๐Ÿ”‡

Remove background noise from an audio

โœ‚๏ธ

Remove background from a picture

โœ‚๏ธ

Background Removal

๐ŸŽต

Music Generation

๐ŸŽจ

Style Transfer

๐Ÿ“

Model Benchmarking

โญ

Recommendation Systems

๐Ÿ‘—

Try on virtual clothes

๐Ÿ–ผ๏ธ

Image

๐Ÿ–Œ๏ธ

Generate a custom logo