Dropout

#116

by Muennighoff - opened Sep 22, 2022

Discussion

Muennighoff

BigScience Workshop org Sep 22, 2022

Shouldn't the dropouts in the config be 0.1, as the model was pre-trained with dropout @TimeRobber @ybelkada ?

TimeRobber

BigScience Workshop org Sep 27, 2022

I don't know about this. I think this depends on what we want those configs to reflect:

training procedure? In that sense yes we did use dropout 0.1 so we can update those
best training procedure? My strong intuition is that we shouldn't have used dropout. Palm didn't set it for example.
best config for finetuning? I think in this case we've seen that dropout has substantial impact on downstream tasks: https://arxiv.org/abs/2204.05832

Muennighoff

BigScience Workshop org Sep 27, 2022

I think either 1) or 3), so we should change the config, no?
2) could be the default parameters in transformers, but not for a model on the hub imo when it was trained differently

TimeRobber

BigScience Workshop org Sep 28, 2022

No strong opinion, but I feel this should already be answered somewhere. cc @patrickvonplaten

ybelkada

BigScience Workshop org Sep 29, 2022

•

edited Sep 29, 2022

I second what @TimeRobber said, I don't have any strong opinion on that. But would be nice if we can update it with the parameter used for training, ie, 0.1 to make the config file reflect the parameters used for the training

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment