Visualize model performance on function calling tasks
Submit models for evaluation and view leaderboard
Merge machine learning models using a YAML configuration file
Track, rank and evaluate open LLMs and chatbots
Convert Stable Diffusion checkpoint to Diffusers and open a PR
Text-To-Speech (TTS) Evaluation using objective metrics.
Display and submit language model evaluations
Evaluate LLM over-refusal rates with OR-Bench
Browse and submit language model benchmarks
View and submit LLM benchmark evaluations
Create and manage ML pipelines with ZenML Dashboard
Explore GenAI model efficiency on ML.ENERGY leaderboard
Push a ML model to Hugging Face Hub
The Nexus Function Calling Leaderboard is a tool designed to visualize and benchmark model performance on function calling tasks. It provides a comprehensive platform to compare and analyze the effectiveness of different models in executing specific functions, helping users make informed decisions based on performance metrics.
• Real-time performance metrics: Track model accuracy, execution speed, and success rates in real-time. • Customizable benchmarks: Define specific function calling tasks to test models in scenarios relevant to your use case. • Comparison tools: Easily compare the performance of multiple models on the same task. • Visual analytics: Detailed graphs and charts to help interpret performance data. • Community-driven insights: Access a community-sourced repository of benchmarked models and tasks. • User-friendly interface: Intuitive dashboard design for seamless navigation and analysis.
What models are supported by Nexus Function Calling Leaderboard?
The platform supports a wide range of models, including popular AI frameworks and custom models. Check the documentation for a full list of supported models.
How often are the benchmarks updated?
Benchmarks are updated in real-time as new models are added or existing ones are retested. You can also request specific models to be benchmarked.
Can I use Nexus Function Calling Leaderboard for private benchmarks?
Yes, the platform allows you to run private benchmarks for internal use. Contact support for details on setting up a private instance.