Why always Python ? Let's finetune a code generation model for TypeScript only !
Dataset
I created a TypeScript-Instruct 20K dataset. It's 20,000 pairs of {instruction, output} that you can't find in any current Code Generation LLMs dataset (or maybe you can)
For the output, thank you HuggingFace, I get TypeScript code data from The Stack project
For the instruction, thank you OpenAI, I made 20K API call request to generate instruction and explanation for those code data
Training
Base Model: It's the Code Llama - Instruct 13B
Parameter-Efficient Fine-Tuning method: LoRA
Using Instruction Tuning with two A100 GPUs
Every other things about training (parameters, logs, ...) you can see it in here (link the HuggingFace Training Metrics link later)