GIFT-Eval: A Benchmark for General Time Series Forecasting
Evaluate LLM over-refusal rates with OR-Bench
Convert Hugging Face model repo to Safetensors
Track, rank and evaluate open LLMs and chatbots
Compare audio representation models using benchmark results
Display and submit language model evaluations
Display and filter leaderboard models
Evaluate and submit AI model results for Frugal AI Challenge
Measure BERT model performance using WASM and WebGPU
Measure over-refusal in LLMs using OR-Bench
Evaluate adversarial robustness using generative models
Display leaderboard for earthquake intent classification models
Benchmark LLMs in accuracy and translation across languages
GIFT-Eval is a benchmark platform designed for general time series forecasting. It provides a standardized framework to evaluate and compare the performance of various forecasting models across diverse time series datasets. The platform aims to foster research and development in time series analysis by offering a comprehensive leaderboard and analysis tools.
• Diverse Datasets: Includes a wide range of time series datasets from different domains. • Multiple Metrics: Evaluates forecasting models using various accuracy metrics. • Model Support: Compatible with popular time series forecasting models. • Leaderboard: Displays performance rankings of different models. • Open Source: Accessible for research and experimentation. • Comprehensive Documentation: Provides detailed guidelines and best practices.
What is the purpose of GIFT Eval?
GIFT-Eval is designed to provide a standardized benchmark for comparing time series forecasting models, enabling researchers and practitioners to evaluate model performance comprehensively.
How do I submit my model to GIFT Eval?
To submit your model, follow the platform's documentation to format your data and results correctly, then upload them through the provided interface.
Can I use GIFT Eval for my own datasets?
Yes, GIFT-Eval supports custom datasets. Simply format your data according to the platform's requirements and run the benchmarking process to evaluate your models.