The Weekly Kaitchup #19

Benjamin Marie

Dec 15, 2023

The Falcon Recipe - Phi-2 - DeciLM 7B

Read →

17 Comments

Matt

Dec 15, 2023·edited Dec 15, 2023Liked by Benjamin Marie

Yep, I had already done that, but the problem remains. In your Medium article about Phi-1.5, you mentioned this:

"The problem here is that phi-1.5 was pre-trained without padding and the implementation of MixFormerSequentialForCausalLM released by Microsoft with the model doesn’t support attention masking during training. In other words, we can’t properly fine-tune the model to learn when to stop generating. Pad tokens are interpreted as normal tokens. You would have to modify MixFormerSequentialForCausalLM to add support for the attention mask."

Is the same true with Phi-2?

https://medium.com/@bnjmn_marie/how-to-fine-tune-quantize-and-run-microsoft-phi-1-5-e14a1e22ec12

Expand full comment

Reply (1)

Benjamin Marie

Dec 15, 2023Author

I didn't try phi-2, yet. I would guess this is still true. I'll investigate and will reply if I find something.

Expand full comment

Reply (2)

Matt

Dec 26, 2023Liked by Benjamin Marie

I'm still stumped. Have you taken a crack at instruction finetuning Phi-2 yet? I do see a finetuned chat model on HF, but I haven't played with it yet: https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2

Expand full comment

Reply (1)

Benjamin Marie

Dec 26, 2023Author

I'll write an article on Phi-2 fine-tuning for next week. Not sure whether I'll succeed to teach it when to stop generating but I have several ideas. I'll let you know here as soon as I have something that works.

Expand full comment

Matt

Dec 15, 2023

This looks interesting. https://huggingface.co/microsoft/phi-1_5/commit/de35f900d3fbba84d3f7c9a72e60488fa2c86221

Expand full comment

Reply (1)

Matt

Dec 15, 2023

as does this: https://huggingface.co/microsoft/phi-1_5/commit/3128bb636a3de36f8204901e4310c4449a2c6ddc

Expand full comment

Matt

Dec 15, 2023Liked by Benjamin Marie

I just LoRA-tuned Phi-2, but it refuses to stop generating until `max_new_tokens` is reached. Phi-1.5 suffered from the same problem. Do you know how to correct it?

Expand full comment

Reply (2)

Benjamin Marie

Dec 15, 2023Author

When you load the tokenizer, do you set "add_eos_token=True" ? This adds eos to all the training examples.

Expand full comment

Benjamin Marie

Dec 28, 2023Author

Did you try to set "eos_token_id=tokenizer.eos_token_id" when calling "model.generate"?

For me, it works. The model stops generating when it generates the EOS token. Without that, the model generates the EOS token but ignores it and continues to generate.

The problem that remains is that it tends to never output the EOS token for several of my testing prompts. But maybe that's just because my model is under-trained to learn when to stop, so I'm fine-tuning it again.

Expand full comment

Reply (1)

Matt

Dec 28, 2023

I just tried adding `eos_token = tokenizer.eos_token` and `eos_token_id = tokenizer.eos_token_id` in every possible place:

* AutoModelForCausalLM.from_pretrained()

* model.config

* model.generation_config

* TextGenerationPipeline()

None of them worked. :(

My PEFT adapter was trained with over 10,000 examples. :'(

Expand full comment

Reply (1)

Benjamin Marie

Dec 28, 2023Author

Does it generate the EOS token and ignores it or it never generates the EOS token? (to see the EOS token, set skip_special_tokens=False when calling decode).

Currently, 1/4 of my testing prompts generate an EOS token.

Since Phi-2 doesn't seem to use an attention mask, 10k examples might not be enough to teach the model when to generate an EOS token.

Expand full comment

Reply (1)