Quantize a model for faster inference
Evaluate code generation with diverse feedback types
Leaderboard of information retrieval models in French
Launch web-based model application
Demo of the new, massively multilingual leaderboard
View and compare language model evaluations
Text-To-Speech (TTS) Evaluation using objective metrics.
Calculate VRAM requirements for LLM models
Browse and submit LLM evaluations
SolidityBench Leaderboard
Explore and visualize diverse models
Display and filter leaderboard models
Search for model performance across languages and benchmarks
NNCF (Neural Network Compression Framework) quantization is a technique used to reduce the size of deep learning models and improve inference speed by converting model weights and activations from floating-point to lower-bit integer representations. This process maintains model accuracy while enabling faster and more efficient deployment on resource-constrained devices.
pip install nncf
.from nncf import NNCFConfig
to your code.NNCFConfig
.What models are supported by NNCF quantization?
NNCF supports a wide range of models, including popular architectures like MobileNet, ResNet, and Inception. It is framework-agnostic and works with TensorFlow, PyTorch, and ONNX models.
Is NNCF quantization free to use?
Yes, NNCF is open-source and free to use under the Apache 2.0 license. It is actively maintained by Intel and the OpenVINO community.
How does NNCF ensure accuracy after quantization?
NNCF employs quantization-aware training and automatic accuracy recovery techniques to minimize accuracy loss. These methods fine-tune the model during quantization to maintain performance.