Skip to content

QuantLLM is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) efficiently using 4-bit and 8-bit quantization techniques.

NotificationsYou must be signed in to change notification settings

codewithdark-git/QuantLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 QuantLLM: Lightweight Library for Quantized LLM Fine-Tuning and Deployment

PyPI DownloadsPyPI - Version

πŸ“Œ Overview

QuantLLM is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) efficiently using 4-bit and 8-bit quantization techniques. It provides a modular and flexible framework for:

  • Loading and quantizing models with advanced configurations
  • LoRA / QLoRA-based fine-tuning with customizable parameters
  • Dataset management with preprocessing and splitting
  • Training and evaluation with comprehensive metrics
  • Model checkpointing and versioning
  • Hugging Face Hub integration for model sharing

The goal of QuantLLM is to democratize LLM training, especially in low-resource environments, while keeping the workflow intuitive, modular, and production-ready.

🎯 Key Features

FeatureDescription
βœ… Quantized Model LoadingLoad any HuggingFace model in 4-bit or 8-bit precision with customizable quantization settings
βœ… Advanced Dataset ManagementLoad, preprocess, and split datasets with flexible configurations
βœ… LoRA / QLoRA Fine-TuningMemory-efficient fine-tuning with customizable LoRA parameters
βœ… Comprehensive TrainingAdvanced training loop with mixed precision, gradient accumulation, and early stopping
βœ… Model EvaluationFlexible evaluation with custom metrics and batch processing
βœ… Checkpoint ManagementSave, resume, and manage training checkpoints with versioning
βœ… Hub IntegrationPush models and checkpoints to Hugging Face Hub with authentication
βœ… Configuration ManagementYAML/JSON config support for reproducible experiments
βœ… Logging and MonitoringComprehensive logging and Weights & Biases integration

πŸš€ Getting Started

Installation

pip install quantllm

For detailed usage examples and API documentation, please refer to our:

πŸ’» Hardware Requirements

Minimum Requirements

  • CPU: 4+ cores
  • RAM: 16GB
  • Storage: 20GB free space
  • Python: 3.8+

Recommended Requirements

  • GPU: NVIDIA GPU with 8GB+ VRAM
  • RAM: 32GB
  • Storage: 50GB+ SSD
  • CUDA: 11.7+

Resource Usage Guidelines

Model Size4-bit (GPU RAM)8-bit (GPU RAM)CPU RAM (min)
3B params~6GB~9GB16GB
7B params~12GB~18GB32GB
13B params~20GB~32GB64GB
70B params~90GB~140GB256GB

πŸ”„ Version Compatibility

QuantLLMPythonPyTorchTransformersCUDA
latestβ‰₯3.10β‰₯2.0.0β‰₯4.30.0β‰₯11.7

πŸ—Ί Roadmap

  • Multi-GPU training support
  • AutoML for hyperparameter tuning
  • More quantization methods
  • Custom model architecture support
  • Enhanced logging and visualization
  • Model compression techniques
  • Deployment optimizations

🀝 Contributing

We welcome contributions! Please see our CONTRIBUTE.md for guidelines and setup instructions.

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“« Contact & Support

About

QuantLLM is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) efficiently using 4-bit and 8-bit quantization techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published