AIDir.app
  • Hot AI Tools
  • New AI Tools
  • AI Tools Category
AIDir.app
AIDir.app

Save this website for future use! Free to use, no login required.

About

  • Blog

Β© 2025 β€’ AIDir.app All rights reserved.

  • Privacy Policy
  • Terms of Service
Home
Model Benchmarking
OR-Bench Leaderboard

OR-Bench Leaderboard

Measure over-refusal in LLMs using OR-Bench

You May Also Like

View All
πŸ“

Cetvel

Pergel: A Unified Benchmark for Evaluating Turkish LLMs

16
🐠

PaddleOCRModelConverter

Convert PaddleOCR models to ONNX format

3
🐨

Robotics Model Playground

Benchmark AI models by comparison

4
πŸ“œ

Submission Portal

Evaluate and submit AI model results for Frugal AI Challenge

10
⚑

Modelcard Creator

Create and upload a Hugging Face model card

109
🌸

La Leaderboard

Evaluate open LLMs in the languages of LATAM and Spain.

71
πŸ₯‡

DΓ©couvrIR

Leaderboard of information retrieval models in French

11
πŸ“Š

Llm Memory Requirement

Calculate memory usage for LLM models

2
πŸ₯‡

OpenLLM Turkish leaderboard v0.2

Browse and submit model evaluations in LLM benchmarks

51
πŸ’»

Redteaming Resistance Leaderboard

Display model benchmark results

41
πŸ₯‡

Hebrew LLM Leaderboard

Browse and evaluate language models

32
πŸ†

Open Object Detection Leaderboard

Request model evaluation on COCO val 2017 dataset

157

What is OR-Bench Leaderboard ?

OR-Bench Leaderboard is a tool designed to measure and compare over-refusal (OR) behavior in large language models (LLMs). It provides a standardized framework to evaluate how models respond to refusal scenarios, ensuring consistent and fair benchmarking across different models. The leaderboard helps researchers and developers understand the limitations and capabilities of LLMs in handling refusal tasks.

Features

  • Comprehensive Benchmarking: Evaluate LLMs on a diverse set of refusal scenarios.
  • Customizable Metrics: Measure over-refusal using multiple defined metrics.
  • Model Tracking: Compare performance across different versions of models.
  • Version Support: Compatibility with various model architectures and frameworks.
  • Open Source: Transparent and accessible for the research community.
  • Community-Driven: Encourages contributions and improvements from users.

How to use OR-Bench Leaderboard ?

  1. Install the Tool: Clone the repository and install dependencies.
  2. Run Benchmarking: Execute the benchmarking script with the desired models.
  3. Analyze Results: Review the generated metrics and comparisons.
  4. Submit Results: Contribute your findings to the leaderboard for community sharing.

Frequently Asked Questions

What is over-refusal in LLMs?
Over-refusal refers to when a model refuses to respond to a query, even when it could provide a meaningful answer.

Why is benchmarking over-refusal important?
Benchmarking helps identify models that may excessively refuse to answer, potentially limiting their utility in real-world applications.

How do I interpret the results from OR-Bench Leaderboard?
Results show how often and in what contexts models refuse to respond, enabling comparisons of refusal behavior across different models.

Recommended Category

View All
🚨

Anomaly Detection

🎀

Generate song lyrics

πŸŽ₯

Create a video from an image

✍️

Text Generation

🎡

Generate music

🎨

Style Transfer

πŸ§‘β€πŸ’»

Create a 3D avatar

πŸ‘—

Try on virtual clothes

😊

Sentiment Analysis

πŸ“

Generate a 3D model from an image

πŸ”–

Put a logo on an image

πŸ€–

Chatbots

🌍

Language Translation

🎡

Generate music for a video

πŸ’¬

Add subtitles to a video