The document discusses recent developments in pre-trained language models including ELMO, ULMFiT, BERT, and GPT-2. It provides overviews of the core structures and implementations of each model, noting that they have achieved great performance on natural language tasks without requiring labeled data for pre-training, similar to how pre-training helps in computer vision tasks. The document also includes a comparison chart of the types of natural language tasks each model can perform.
2. Why ‘pre-trained language model’?
● Achieved great performance on a variety of language
tasks.
● similar to how ImageNet classification pre-training helps
many vision tasks (*)
● Even better than CV tasks, it does not require labeled data
for pre-training.
(*) Although recently He et al. (2018) found that pre-training might not be necessary for image segmentation task.