-
Notifications
You must be signed in to change notification settings - Fork 125
Description
Hello, I have some questions regarding my fine-tuning results. I fine-tuned the Llama-3-8B-Instruct model for 18,010 steps,encoder is all-MiniLM-L6-v2. However, the generation results during testing are extremely poor; the model outputs in generation_result.txt are completely incoherent. In contrast, the model's outputs during the training phase appeared relatively normal. Why is this happening?
My training command is as follows:
setsid env WANDB_MODE=offline PYTHONUNBUFFERED=1 accelerate launch --multi_gpu --num_processes 4 /data2/home/KBLaM_LLama3_8B/experiments/train.py
--dataset_dir datasets
--train_dataset "synthetic"
--N 120000
--B 16
--hf_model_spec "/data2/home/models/Meta-Llama-3-8B-Instruct"
--encoder_spec all-MiniLM-L6-v2
--model_save_dir "/data2/home/KBLaM_LLama3_8B/output/test4"
--hf_token "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"
--sep_query_head
--llm_type "llama3"
--kb_size 500
--total_steps 20010
--use_cached_embd
--use_data_aug
--gradient_accm_step 32 > train4.log 2>&1 &
My testing command is as follows:
CUDA_VISIBLE_DEVICES=1 python /data2/home/KBLaM_LLama3_8B/experiments/eval.py generation
--dataset_dir datasets
--test_dataset "synthetic.json"
--encoder_dir "/data2/home/KBLaM_LLama3_8B/output/test4/stage1_lr_0.0001KBTokenLayerFreq3UseOutlier1KBSize500SepQueryHeadUseDataAugKeyFromkey_all-MiniLM-L6-v2_synthetic_llama3_step_18000_encoder/encoder.pt"
--encoder_spec "all-MiniLM-L6-v2"
--model_dir "/data2/home/KBLaM_LLama3_8B/output/test4/stage1_lr_0.0001KBTokenLayerFreq3UseOutlier1KBSize500SepQueryHeadUseDataAugKeyFromkey_all-MiniLM-L6-v2_synthetic_llama3_step_18000"
--llm_base_dir "/data2/home/models/Meta-Llama-3-8B-Instruct"
--llm_type "llama3"
--save_dir "/data2/home/KBLaM_LLama3_8B/eval_results/generationtest4"
--kb_size 200
--eval_mode kb
--kb_token_layer_frequency 3
--precomputed_embed_keys_path "/data2/home/KBLaM_LLama3_8B/datasets/synthetic_all-MiniLM-L6-v2_embd_key.npy"
--precomputed_embed_values_path "/data2/home/KBLaM_LLama3_8B/datasets/synthetic_all-MiniLM-L6-v2_embd_value.npy"
Here is an example of the model's output during training:
INFO INPUT IDs SHAPE: torch.Size([16, 48]) train.py:607
INFO KB SHAPE: torch.Size([16, 501, 45056]) train.py:636
INFO GT: <|end_header_id|> What description does train.py:637
Velvet Pulse
have?<|eot_id|><|start_header_id|>assistant<|en
d_header_id|>The description of Velvet Pulse is
a high-end audio equipment brand known for its
superior sound
quality.<|eot_id|><|eot_id|><|eot_id|><|eot_id|
<|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_
id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|e
ot_id|>
INFO PRED: def<|eot_id|><|end_header_id|> What train.py:638
insights does The V
have?<|eot_id|><|start_header_id|>assistant<|en
d_header_id|>The description of Velvet Pulse is
a luxury-end fashion equipment brand known for
its exceptional sound
quality.<|eot_id|><|eot_id|><|eot_id|><|eot_id|
<|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_
id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|e
ot_id|> INFO step: 17820, loss: 0.6719280518591404 train.py:680 INFO step: 17821, loss: 0.6504949191585183 train.py:680 INFO step: 17822, loss: 0.6908198706805706 train.py:680 INFO step: 17823, loss: 0.670011792331934 train.py:680 INFO step: 17824, loss: 0.670219199731946 train.py:680 INFO step: 17825, loss: 0.6978795994073153 train.py:680 INFO step: 17826, loss: 0.6766274720430374 train.py:680 INFO step: 17827, loss: 0.6636326257139444 train.py:680 INFO step: 17828, loss: 0.6245416700839996 train.py:680 INFO step: 17829, loss: 0.6708870176225901 train.py:680
Here is the model's output during the testing/generation phase:
Model output:
What is the most beautiful? most most beautiful most beautiful most beautiful? beautiful? beautiful most beautiful? beautiful? beautiful? beautiful most beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful? beautiful?
True answer: The objectives of HyperGlide Systems is study bee behavior, assess the impact of pesticides, and develop conservation strategies.
Model output:
the noble??
What a wonderful??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??
What a noble??