5 Comments
Feb 20Liked by Benjamin Marie

Yes, I'm using SFTTrainer. I have never tried `packing=True` due to the incredibly poorly written documentation on the matter: "Note that if you use a packed dataset and if you pass max_steps in the training arguments you will probably train your models for more than few epochs, depending on the way you have configured the packed dataset and the training protocol. Double check that you know and understand what you are doing."

I just trained a new Qwen1.5 adapter with packing set to True. It did seem to help a little bit. Some predictions stopped generating correctly; some didn't. However, I don't know (or understand) what I am doing with packing.

Expand full comment
author

Yes, that's a regular complain about packing, it seems useful but the documentation is useless.

For instance, I discovered recently that you don't need to set add_eos_token=True when instantiating the tokenizer if you use packing since packing adds EOS tokens by itself... So before I understand that, all my training examples had double EOS tokens. This doesn't have much impact in theory but this is not very clean.

Expand full comment
Feb 19Liked by Benjamin Marie

Just like with Phi, when I LoRA-tune Qwen-1.5 7B, it won't stop generating text until max_tokens is reached. Mistral 7B definitely doesn't have this issue. I'm using these arguments with no quantization and 10k training examples:

learning_rate = 2e-4

lr_scheduler_type = 'linear'

num_train_epochs = 5

warmup_ratio = 0.0

weight_decay = 0.01

optim = 'adamw_torch_fused'

target_modules = 'all-linear'

bf16 = True

Expand full comment
Feb 19·edited Feb 19Liked by Benjamin Marie

And here's how I instantiate my tokenizer:

```

tokenizer = AutoTokenizer.from_pretrained(

model_id,

trust_remote_code = True,

add_bos_token = True,

add_eos_token = True,

padding_side = 'right',

)

if not tokenizer.pad_token:

tokenizer.pad_token = tokenizer.eos_token

tokenizer.pad_token_id = tokenizer.eos_token_id

```

Expand full comment
author

I have observed the same.

Do you use SFTTrainer?

If you do, maybe setting packing=True will solve the issue since packing manages the EOS tokens and masking differently. I didn't try it yet.

Expand full comment