Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If anyone wants to fine-tune the 1.5B model, I ported the gpt-2 code to TPUs. You can fine-tune it in Colab. Snapshots are 5.8GB.

notebook: https://twitter.com/theshawwn/status/1191800180192010246

code: https://github.com/shawwn/gpt-2

It's a fork of nshepperd's gpt-2 codebase (https://github.com/nshepperd/gpt-2) which lets you fine-tune 117M and 345M on GPUs.

For a tutorial on how to fine-tune GPT-2, see http://gwern.net/GPT-2



Cool this is awesome !

I’m going to try to retrain this with a twitter dataset called sentiment140 ( I have already processed it with gpt2 345M).


Is your fine-tuned model available somewhere?


I can provide it to you. I have only done 355M. I was trying this for 1.5B but ran into memory issues .


Sorry about the memory issue! I’ll have a fix up later today. Some info: https://twitter.com/theshawwn/status/1192038627854946304?s=2...


I would be very interested! My email is on my profile.


Email sent




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: