XIN CHUN
Published © Apache-2.0

Generative BI (ChatBI) with AMD MI210/MI300X and EPYC

How to train a high-accuracy private large language model (LLM) to enhance efficiency in data analysis scenarios.

AdvancedFull instructions provided404

Things used in this project

Hardware components

AMD Instinct™ MI210 Accelerators
AMD Instinct MI210 accelerators power enterprise, research, and academic HPC and AI workloads for single-server solutions and more.
×4
AMD Instinct™ MI300X Accelerators
AMD Instinct™ MI300X accelerators are designed to deliver leadership performance for Generative AI workloads and HPC applications.
×1
AMD EPYC 9654 96-Core Processor
AMD EPYC™ Processors power the highest-performing x86 servers for the modern data center
×1

Software apps and online services

vLLM
vLLM is a fast and easy-to-use library for LLM inference and serving.
llama.cpp
Inference of Meta's LLaMA model (and others) in pure C/C++
DB-GPT
DB-GPT-Hub is an experimental project that leverages Large Language Models (LLMs) to achieve Text-to-SQL parsing.
Meta-Llama-3.1-70B-Instruct
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.
CodeLlama-34b-Instruct-hf
Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters.
Qwen2-57B-A14B-Instruct
Qwen2 is the new series of Qwen large language models. For Qwen2, including a Mixture-of-Experts model (57B-A14B).

Story

Read more

Code

sft-model.py

Python
fine tune model code
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments
)
from peft import LoraConfig
from trl import SFTTrainer

base_model_name = "codellama/CodeLlama-34b-Instruct-hf"

# Load base model to GPU memory.
base_model = AutoModelForCausalLM.from_pretrained(base_model_name, trust_remote_code = True, device_map="auto")

# Load tokenizer.
tokenizer = AutoTokenizer.from_pretrained(
base_model_name,
trust_remote_code = True)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Dataset for fine-tuning.
training_dataset_name = "xinchun/spider"
training_dataset = load_dataset(training_dataset_name,  data_files= {'data/train_sql.json', 'data/train_udf.json'}, split = "train")

# Check the data.
print(training_dataset)

# Training parameters for SFTTrainer.
training_arguments = TrainingArguments(
    output_dir = "./results",
         num_train_epochs = 1,
         per_device_train_batch_size = 2,
         gradient_accumulation_steps = 1,
         optim = "paged_adamw_32bit",
         save_steps = 50,
         logging_steps = 50,
         learning_rate = 4e-5,
         weight_decay = 0.001,
         fp16=False,
         bf16=False,
         max_grad_norm = 0.3,
         max_steps = -1,
         warmup_ratio = 0.03,
         group_by_length = True,
         lr_scheduler_type = "constant",
         report_to = "tensorboard"
)

# Configure LoRA using the following code snippet.
peft_config = LoraConfig(
        lora_alpha = 16,
        lora_dropout = 0.1,
        r = 64,
        bias = "none",
        task_type = "CAUSAL_LM"
)
# View the number of trainable parameters.
from peft import get_peft_model
peft_model = get_peft_model(base_model, peft_config)
peft_model.print_trainable_parameters()

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

# Initialize an SFT trainer.
sft_trainer = SFTTrainer(
    model = base_model,
    train_dataset = training_dataset,
    peft_config = peft_config,
    formatting_func=formatting_prompts_func,
    tokenizer = tokenizer,
    args = training_arguments
)

# Run the trainer.
sft_trainer.train()

Credits

XIN CHUN

XIN CHUN

1 project • 2 followers

Comments