Pergel: A Unified Benchmark for Evaluating Turkish LLMs
Display benchmark results
Persian Text Embedding Benchmark
Rank machines based on LLaMA 7B v2 benchmark results
Compare code model performance on benchmarks
Browse and evaluate ML tasks in MLIP Arena
Generate leaderboard comparing DNA models
Push a ML model to Hugging Face Hub
Submit deepfake detection models for evaluation
View RL Benchmark Reports
GIFT-Eval: A Benchmark for General Time Series Forecasting
Optimize and train foundation models using IBM's FMS
Upload a machine learning model to Hugging Face Hub
Cetvel is a unified benchmark tool designed for evaluating Turkish Large Language Models (LLMs). It provides a comprehensive framework to analyze and compare the performance of different models across a variety of tasks, making it an essential tool for researchers and developers working with Turkish NLP tasks.
• Comprehensive Task Coverage: Evaluate models on tasks such as translation, summarization, and question-answering specific to the Turkish language.
• Customizable Benchmarks: Create tailored benchmarking suites to focus on specific aspects of model performance.
• Cross-Model Comparisons: Compare multiple Turkish LLMs side-by-side to identify strengths and weaknesses.
• Detailed Reporting: Generate in-depth reports highlighting model accuracy, efficiency, and robustness.
• Integration with Popular LLMs: Supports integration with widely-used Turkish and multilingual LLMs.
What models are supported by Cetvel?
Cetvel supports a wide range of Turkish and multilingual LLMs, including but not limited to, models from leading NLP libraries.
Do I need NLP expertise to use Cetvel?
No, Cetvel is designed to be user-friendly. However, basic knowledge of NLP concepts may help in interpreting results.
Can I benchmark models in languages other than Turkish?
Cetvel is primarily optimized for Turkish, but it can be adapted for other languages with additional configuration.