Browse and submit LLM evaluations
Evaluate model predictions with TruLens
Evaluate code generation with diverse feedback types
Browse and filter machine learning models by category and modality
Pergel: A Unified Benchmark for Evaluating Turkish LLMs
Display and submit language model evaluations
Browse and submit model evaluations in LLM benchmarks
Convert PaddleOCR models to ONNX format
Visualize model performance on function calling tasks
Calculate survival probability based on passenger details
Evaluate open LLMs in the languages of LATAM and Spain.
Display leaderboard for earthquake intent classification models
View RL Benchmark Reports
The Open Medical-LLM Leaderboard is a platform designed for benchmarking and comparing large language models (LLMs) specific to the medical domain. It provides a centralized space to evaluate and track the performance of various medical LLMs, enabling researchers and practitioners to identify the most suitable models for their specific use cases. The leaderboard is open and accessible, allowing users to browse evaluations and submit their own LLM assessments.
What is the purpose of Open Medical-LLM Leaderboard?
The purpose is to provide a transparent and accessible platform for benchmarking and comparing medical LLMs, helping users identify the best models for their specific applications.
How do I submit an evaluation for a new LLM?
Use the submission interface on the platform to upload your model and its evaluation results. Ensure compliance with the platform's guidelines and data requirements.
How often is the leaderboard updated?
The leaderboard is updated regularly as new models and evaluations are submitted. Follow the platform’s updates or notifications to stay informed about the latest additions.