Track, rank and evaluate open LLMs and chatbots
Analyze model errors with interactive pages
Display and submit language model evaluations
Calculate VRAM requirements for LLM models
Compare code model performance on benchmarks
Search for model performance across languages and benchmarks
Measure execution times of BERT models using WebGPU and WASM
Browse and submit language model benchmarks
Calculate GPU requirements for running LLMs
Generate and view leaderboard for LLM evaluations
Explore and manage STM32 ML models with the STM32AI Model Zoo dashboard
Rank machines based on LLaMA 7B v2 benchmark results
Find recent high-liked Hugging Face models
The Low-bit Quantized Open LLM Leaderboard is a comprehensive tool designed to track, rank, and evaluate open large language models (LLMs) and chatbots, with a focus on low-bit quantization. It helps users explore and compare different LLMs, providing insights into their performance under quantized conditions. This leaderboard is particularly useful for developers, researchers, and enthusiasts looking to optimize model efficiency without compromising accuracy.
• Quantized Benchmarking: Evaluates models using low-bit quantization to reduce memory usage and increase inference speed.
• Model Comparison: Enables side-by-side comparison of different LLMs based on their quantized performance.
• Multi-bit Support: Covers models quantized to 4-bit, 8-bit, and other low-bit representations.
• Real-time Updates: Provides the latest rankings and performance metrics as new models emerge.
• Customizable Filters: Allows users to filter models by specific criteria like quantization bit, model size, or benchmark results.
• Performance Metrics: Displays key metrics such as accuracy, inference speed, and memory usage for each model.
What is low-bit quantization?
Low-bit quantization is a technique to reduce the precision of model weights, typically from 32-bit floating-point numbers to 4-bit or 8-bit integers, enabling faster inference and smaller model sizes.
Which quantization bits are supported?
The leaderboard supports models quantized to 4-bit, 8-bit, and other low-bit representations, ensuring a wide range of optimized models are available for comparison.
How are models ranked?
Models are ranked based on their performance in quantized benchmarks, considering metrics like accuracy, inference speed, and memory efficiency. Rankings are updated in real-time as new models are added.