Quantize a model for faster inference
View and submit LLM evaluations
Benchmark LLMs in accuracy and translation across languages
Measure over-refusal in LLMs using OR-Bench
Evaluate open LLMs in the languages of LATAM and Spain.
Create demo spaces for models on Hugging Face
Find and download models from Hugging Face
View and submit machine learning model evaluations
Text-To-Speech (TTS) Evaluation using objective metrics.
Request model evaluation on COCO val 2017 dataset
Optimize and train foundation models using IBM's FMS
Evaluate RAG systems with visual analytics
Push a ML model to Hugging Face Hub
NNCF (Neural Network Compression Framework) quantization is a technique used to reduce the size of deep learning models and improve inference speed by converting model weights and activations from floating-point to lower-bit integer representations. This process maintains model accuracy while enabling faster and more efficient deployment on resource-constrained devices.
pip install nncf
.from nncf import NNCFConfig
to your code.NNCFConfig
.What models are supported by NNCF quantization?
NNCF supports a wide range of models, including popular architectures like MobileNet, ResNet, and Inception. It is framework-agnostic and works with TensorFlow, PyTorch, and ONNX models.
Is NNCF quantization free to use?
Yes, NNCF is open-source and free to use under the Apache 2.0 license. It is actively maintained by Intel and the OpenVINO community.
How does NNCF ensure accuracy after quantization?
NNCF employs quantization-aware training and automatic accuracy recovery techniques to minimize accuracy loss. These methods fine-tune the model during quantization to maintain performance.