Quantize a model for faster inference
Display and filter leaderboard models
Demo of the new, massively multilingual leaderboard
Convert PaddleOCR models to ONNX format
View and submit language model evaluations
Load AI models and prepare your space
Track, rank and evaluate open LLMs and chatbots
Display leaderboard of language model evaluations
Calculate VRAM requirements for LLM models
Calculate survival probability based on passenger details
Convert Hugging Face models to OpenVINO format
Measure execution times of BERT models using WebGPU and WASM
Browse and evaluate ML tasks in MLIP Arena
NNCF (Neural Network Compression Framework) quantization is a technique used to reduce the size of deep learning models and improve inference speed by converting model weights and activations from floating-point to lower-bit integer representations. This process maintains model accuracy while enabling faster and more efficient deployment on resource-constrained devices.
pip install nncf.from nncf import NNCFConfig to your code.NNCFConfig.What models are supported by NNCF quantization?
NNCF supports a wide range of models, including popular architectures like MobileNet, ResNet, and Inception. It is framework-agnostic and works with TensorFlow, PyTorch, and ONNX models.
Is NNCF quantization free to use?
Yes, NNCF is open-source and free to use under the Apache 2.0 license. It is actively maintained by Intel and the OpenVINO community.
How does NNCF ensure accuracy after quantization?
NNCF employs quantization-aware training and automatic accuracy recovery techniques to minimize accuracy loss. These methods fine-tune the model during quantization to maintain performance.