Quantize a model for faster inference
SolidityBench Leaderboard
Explore and benchmark visual document retrieval models
View and submit LLM benchmark evaluations
Merge machine learning models using a YAML configuration file
Benchmark AI models by comparison
View and submit LLM benchmark evaluations
Display genomic embedding leaderboard
Browse and filter machine learning models by category and modality
Calculate survival probability based on passenger details
Submit models for evaluation and view leaderboard
Evaluate reward models for math reasoning
Explore and submit models using the LLM Leaderboard
NNCF (Neural Network Compression Framework) quantization is a technique used to reduce the size of deep learning models and improve inference speed by converting model weights and activations from floating-point to lower-bit integer representations. This process maintains model accuracy while enabling faster and more efficient deployment on resource-constrained devices.
pip install nncf
.from nncf import NNCFConfig
to your code.NNCFConfig
.What models are supported by NNCF quantization?
NNCF supports a wide range of models, including popular architectures like MobileNet, ResNet, and Inception. It is framework-agnostic and works with TensorFlow, PyTorch, and ONNX models.
Is NNCF quantization free to use?
Yes, NNCF is open-source and free to use under the Apache 2.0 license. It is actively maintained by Intel and the OpenVINO community.
How does NNCF ensure accuracy after quantization?
NNCF employs quantization-aware training and automatic accuracy recovery techniques to minimize accuracy loss. These methods fine-tune the model during quantization to maintain performance.