View LLM Performance Leaderboard
Push a ML model to Hugging Face Hub
GIFT-Eval: A Benchmark for General Time Series Forecasting
Browse and submit LLM evaluations
Submit models for evaluation and view leaderboard
Calculate memory needed to train AI models
Convert Hugging Face models to OpenVINO format
Evaluate adversarial robustness using generative models
Evaluate and submit AI model results for Frugal AI Challenge
Merge Lora adapters with a base model
Convert PyTorch models to waifu2x-ios format
View and submit LLM evaluations
Compare LLM performance across benchmarks
The LLM Performance Leaderboard is a tool designed to benchmark and compare the performance of various large language models (LLMs). It provides a comprehensive overview of how different models perform across a wide range of tasks and datasets. Users can leverage this leaderboard to make informed decisions about which model best suits their specific needs.
1. How often is the leaderboard updated?
The leaderboard is updated regularly to reflect the latest advancements in LLM performance. Updates occur as new models are released or existing models are fine-tuned.
2. Can I compare models based on custom criteria?
Yes, the leaderboard allows users to filter models based on specific criteria such as task type, dataset, model size, or architecture.
3. What types of tasks are evaluated on the leaderboard?
The leaderboard evaluates models on a wide range of tasks, including but not limited to natural language understanding, text generation, reasoning, and code completion.