Project Overview
I fine-tuned the Llama 3 8B base model into a conversational model using Nvidia's ChatQA dataset and performed quantization to AWQ 4-bit. This project showcases my expertise in model fine-tuning, efficient quantization techniques, and deployment-ready transformations.
Details
- Model: Llama 3 8B
- Quantization: AWQ 4-bit
- Dataset: Nvidia ChatQA Training Data
- Frameworks and Tools: PyTorch, Hugging Face, Safetensors
- Capabilities: Conversational QA, tabular and arithmetic calculations, retrieval-augmented generation
- Quantization Benefits: Efficient low-bit weight quantization with faster inference compared to GPTQ, suitable for high-throughput concurrent inference in multi-user server scenarios.
Model Repository:
Hugging Face Model Repository - Sreenington/Llama-3-8B-ChatQA-AWQ