2. Overview
This document covers:
1. What is GPT-3?
2. How is GPT-3 different?
3. OpenAI’s API strategy
4. Potential commercial implications
Details and charts from the OpenAI paper and a talk given on 7/24 by Ben Mann, the second author.
Please send feedback to Raven Jiang (raven@cs.stanford.edu)
Disclaimer: I am not affiliated with OpenAI, nor an expert in deep learning. I possess practical
knowledge of its implementation
3. 1. What is GPT-3?
2. How is GPT-3 different?
3. OpenAI’s API strategy
4. Potential commercial implications
4. What is GPT-3?
• Text generator deep learning model trained by OpenAI
• Transformer architecture pioneered by Google
• GPT-2, BERT, XLNet, and RoBERTa
• Task agnostic
• Unsupervised learning
• 100x larger (more parameters) than its predecessor GPT-2 (2018)
• Estimated to have cost $12 million of computation cost to train
• Trained on text data from books and websites
5. What does it do?
• Seemingly many things
• Translation
• Write new poetry
• Generate stories
• Have a conversation
• Answer questions
• Generate working React code
• Generate Figma designs
• Magical VLOOKUP backed by the
Internet
• Maybe creativity is not hard for AI
6. GPT-3 training data
Common Crawl
59%
WebText2
22%
Books1
8%
Books2
8%
Wikipedia
3%• Common Crawl is scraped web
data manually filtered for some
quality issues
• Books1 and Books2 are mostly
fiction
• Books2 includes non-English
content
7. Transformer architecture
Transformer-based models Older NLP neural network models
Examples Google’s BERT, OpenAI’s GPT-3, Microsoft’s
Turing-NLG
Google’s GNMT
Task Task-agnostic
The same model is successful at many
different language tasks without additional
training
Trained for a specific task
Models are usually trained for a certain task
and fine-tuned for a related task with
additional training data (e.g. Transfer
Learning)
Training Unsupervised training
Model is trained with large collections of text
without special annotations
Supervised training
Trained with large quantities of input
annotated with expected output that are
usually human-generated
Transformers are the state of the art for NLP neural networks
8. Example translation workflow
Transformer
1. Train on unrelated English and French text
2. Query describes desired pattern:
"""Translate these sentences:
Hello => Bonjour
That is a cat => C'est un chat
You pass butter =>"""
3. Result:
"""Translate these sentences:
Hello => Bonjour
That is a cat => C'est un chat
You pass butter => Tu passes du beurre"""
Pre-Transformer Language Models
1. Create dataset of English-French text
examples
2. Train on dataset
3. Query:
"You pass butter"
4. Result:
"Tu passes du beurre"
Goal 1: Find the French translation for “You pass butter.”
9. Generalizability of Transformer models
Transformer
1. Use the same model as the previous task
2. Query describes new pattern:
"""Here are some great dad jokes:
Q: How do you make a lemon drop? A: Let it fall.
Q: What has ears but cannot hear? A: A cornfield.
Q:"""
3. Result:
""”Here are some great dad jokes:
Q: How do you make a lemon drop? A: Let it fall.
Q: What has ears but cannot hear? A: A cornfield.
Q: How does a vampire start a letter? A. Dear blood."""
Pre-Transformer Language Models
1. Create/source a new annotated dataset
suited for the new task
2. Retrain the model either with Transfer
Learning or from scratch
3. Query
Goal 2: Tell some dad jokes
10. 1. What is GPT-3?
2. How is GPT-3 different?
3. OpenAI’s API strategy
4. Potential commercial implications
11. How is GPT-3 different?
• It is huge.
• 175 billion parameters
• Its predecessor GPT-2 has 1.5
billion parameters
• The previous record holder,
Microsoft’s Turing-NLG, has 17
billion parameters
• Innovation of scale not technique
0 50 100 150 200
GPT-2 - 2018/06
Turing-NLG - 2020/02
GPT-3 - 2020/07
Parameters (Billions)
Parameters (Billions)
12. Power of scale
• Scale made a dramatic
difference in performance
• Accuracy increased from
25% to 65% for a specific
benchmarking task going
from 13B parameters to
175B parameters
13. Uncanny Valley
• Participants asked to spot
fake news generated by
GPT-3
• More parameters = harder
to spot
• Very close to 50-50
accuracy at GPT-3 scale
14. Returns to scale
• Task performance appears
to continue improving
with scale
• How will GPT-4 perform?
15. Consequences of scale
• Querying is extremely powerful
• Unexpectedly good performance on a large variety of tasks
• Compared to older task-specific models, API-only access is useful for
broader range of applications
• Caveat: performance probably still inferior to task-specific models
• Caveat2: performance may continue to improve with scale
16. 1. What is GPT-3?
2. How is GPT-3 different?
3. OpenAI’s API strategy
4. Potential commercial implications
17. OpenAI’s API strategy
• Gated API access to selected partners
• No access to underlying GPT-3 model and its trained weights
• Turns NLP from annotation/training problem into meta-programming
• Designing queries to yield useful results on a range of language problems
• Much friendlier paradigm for small teams and product-driven startups
• MaaS (Model as a Service) is viable business(?)
• Extremely large NLP models as OpEx instead of CapEx
• No need to fine-tune models for problems with more training data
• Concerns over AGI risk
18. 1. What is GPT-3?
2. How is GPT-3 different?
3. OpenAI’s API strategy
4. Potential commercial implications
19. Potential commercial implications
• Access to GPT-3 (or future GPT-4) API accelerates go-to-market speed
of a startup doing applied NLP
• Build MVP using GPT-3 without investing in any training data or infrastructure
• Switch to better performing fine-tuned models over time
• Companies like Grammarly may face low-cost competitors
• Building NLP-powered product features may be as simple as
programming GPT-3 to answer the right questions
• Caveat: Only if GPT-3 (or GPT-4) turns out to be Good Enough for
these applications. Unclear without wider access to the API
21. Conclusion
• Task-agnostic NLP models that deliver acceptable performance may
soon be available as a service
• GPT-3 may be that model
• Potential explosion of startups building MVPs on such an API
• Investor warning: startups dependent on the API may lack expertise
and tools to iterate off MVP
• Happy to chat more: raven@cs.stanford.edu