Browse and submit LLM evaluations
Browse and filter machine learning models by category and modality
Evaluate LLM over-refusal rates with OR-Bench
View NSQL Scores for Models
Merge Lora adapters with a base model
Display genomic embedding leaderboard
Pergel: A Unified Benchmark for Evaluating Turkish LLMs
Evaluate adversarial robustness using generative models
Explore and benchmark visual document retrieval models
Visualize model performance on function calling tasks
Display model benchmark results
Display and filter leaderboard models
Compare LLM performance across benchmarks
The Open Tw Llm Leaderboard is an interactive tool designed to compare and evaluate large language models (LLMs). It provides a platform for users to browse, analyze, and submit evaluations of various LLMs, making it easier to understand their performance and capabilities. This tool is part of the broader OpenTW project, which focuses on advancing transparency and accessibility in AI research.
• Model Comparisons: View side-by-side comparisons of different LLMs based on performance metrics. • Evaluations Browser: Explore a comprehensive database of LLM evaluations across diverse tasks and datasets. • Submission Interface: Submit your own LLM evaluations for inclusion in the leaderboard. • Filtering and Sorting: Narrow down models by performance, architecture, or specific use cases. • Interactive Visualizations: Access charts and graphs to better understand model strengths and weaknesses. • Community-Driven: Leverage insights and contributions from the broader AI research community.
What is the purpose of Open Tw Llm Leaderboard?
The leaderboard aims to standardize and simplify the evaluation of LLMs, enabling researchers and developers to make informed decisions about model selection and improvement.
How accurate are the evaluations on the leaderboard?
The evaluations are community-sourced and subject to peer review. While every effort is made to ensure accuracy, results should be interpreted in the context of the methodologies and datasets used.
Can I submit my own LLM evaluation?
Yes, the leaderboard provides a submission interface for users to contribute their evaluations. Submissions are typically reviewed before being added to the public leaderboard.