Display model benchmark results
Quantize a model for faster inference
Compare model weights and visualize differences
Download a TriplaneGaussian model checkpoint
Evaluate LLM over-refusal rates with OR-Bench
Pergel: A Unified Benchmark for Evaluating Turkish LLMs
Create and manage ML pipelines with ZenML Dashboard
Create and upload a Hugging Face model card
Find and download models from Hugging Face
Calculate memory usage for LLM models
Persian Text Embedding Benchmark
View and compare language model evaluations
Browse and submit language model benchmarks
The Redteaming Resistance Leaderboard is a tool designed to benchmark and compare AI models based on their ability to resist adversarial attacks and maintain performance under challenging conditions. It provides a centralized platform to evaluate and rank models, offering insights into their robustness and reliability in real-world scenarios.
• Real-time Performance Tracking: continuously updates model performance metrics
• Head-to-Head Comparisons: ability to compare multiple models simultaneously
• Resistance Metrics: evaluates models based on their ability to withstand adversarial inputs
• Filtering System: allows users to filter models by specific criteria such as dataset, architecture, or performance thresholds
• Historical Data: provides access to past performance records for trend analysis
• Cross-Platform Compatibility: accessible on multiple devices and browsers
What is Redteaming in the context of AI models?
Redteaming refers to the process of systematically testing AI models to identify vulnerabilities and measure their resistance to adversarial attacks or unexpected inputs.
How are models ranked on the leaderboard?
Models are ranked based on their performance under stress tests, including their ability to maintain accuracy and reliability when exposed to challenging or adversarial conditions.
Can I customize the metrics used for comparison?
Yes, the platform allows users to filter and customize the metrics used for comparison, enabling tailored analysis based on specific needs or use cases.