3. 3
Overview
Suggest novel methods to apply NLP approaches to music domain
Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding
Evaluate the performance on four tasks
4. 4
Overview
Suggest novel methods to apply NLP approaches to music domain
Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding
Evaluate the performance on four tasks
5. 5
Contributions
Construct a large-scale symbolic music corpus
– Million MIDI Dataset(MMD)
Design some mechanisms to enhance
pre-training with symbolic music data
(OctupleMIDI Encoding & Masking Strategies)
Achieve the state-of-the-art results on
four music understanding tasks
: Melody Completion, Accompaniment Suggestion,
Genre Classification, and Style Classification
6. 6
Related Work
Symbolic Music Understanding Symbolic Music Encoding
Masking Strategies in Pre-
training
Word2vec models for
music:
• Huang et al., 2016
• Madjiheurem et al.,
2016
Divide music pieces
Fixed duration music
slices
• Herremans et al., 2017
• Chuan et al., 2020
Small NN models &
Only a few music tokens
as inputs
MIDI-based
• MIDI
• REMI (Huang and Yang,
2020)
• CP (Hsiao et al., 2021)
Pianoroll-based
• Brunner et al., 2018
• Ji et al., 2020
Still need long input tokens
Application of masking
strategies for music
domain
• MASS (Song et al., 2019)
• SpanBERT (Joshi et al., 2020)
Not considering the
difference between
NLP & music
8. 8
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
Based on Transformer encoder (Vaswani et al., 2017)
9. 9
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
A novel encoding method, OctupleMIDI, to encode the music sequence more
efficiently
10. 10
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
Predict music tokens as output
15. 15
OctupleMIDI Encoding
OctupleMIDI,
a compact symbolic music encoding method
• Encode 6 notes into 6 tokens
• Much shorter than REMI & CP
• Apply to various kinds of music
Each Octuple token:
• Correspon to a note
• Contain 8 elements
16. 16
OctupleMIDI Encoding
Time Signature
Tempo
Bar and Position
A fraction (e.g., 2/4):
• Length of a beat (note duration e.g., a quarter note in
2/4),
• Number of beats in a bar (e.g., 2 beats in 2/4)
Beats per minute (BPM)
• Pace of music
• From 16 to 256 for OctupleMIDI
On-set time of a note
• 256 bars in a music piece (0 to 255)
• 1/64 note to represent the on-set time of a note (from
0)
17. 17
OctupleMIDI Encoding
Instrument
Pitch
Duration
Velocity
Follow MIDI format
• 129 tokens to represent instruments
• 0 to 127: different general instruments (e.g., piano and
bass)
• 128: special percussion instrument (e.g., drum)
Note pitches for general instruments
• 128 tokens to represent pitch values (follow MIDI
format)
Note pitches for percussion instruments
• 128 pitch tokens to represent percussion type
Note duration
• 128 tokens (percussion: all set to 0)
Quantize the velocity of a note into 32 different values
• Interval of 4 (e.g., 2, 6, 10, … , 122, 126)
18. 18
Masking Strategy
Bar-level masking strategy:
Elements with the same type in the same bar & mask simulaneously
Avoid information leakage & Learn the contextual representation well
19. 19
Pre-training Corpus
Table 2. Size of different music datasets
OctupleMIDI encoding is universal
Most MIDI files can be converted
without noticeable loss of musical
information
Cleaning and deduplication
Obtain Million-MIDI Dataset (MMD):
1.5 million songs with 2 billion octuple
tokens (musical notes)
20. 20
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Table 4. Model configurations of MusicBERT
Small MusicBERT
To compare with baselines (similar data
size)
Base MusicBERT
To achieve the SOTA results
21. 21
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models on the four downstream tasks
22. 22
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models
on the four downstream tasks
Task Find the most matched consecutive phrase
in a given set of candidates for a given melodic
phrase
Evaluation The rate of correctly chosen phrase
in the top k candidates
Best Performance 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑠𝑚𝑎𝑙𝑙 , 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
23. 23
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models
on the four downstream tasks
Task To find the most related accompaniment phrase
in a given set of harmonic phrase candidates for a
given melodic phrase
Evaluation The rate of correctly chosen phrase
in the top k candidates
Best Performance 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑠𝑚𝑎𝑙𝑙 , 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
24. 24
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models
on the four downstream tasks
Task To classify the genre and style
Dataset TOP-MAGD for genre, MASD for style
Evaluation F1-micro score
Best Performance 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑠𝑚𝑎𝑙𝑙 , 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
25. 25
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Experiment on 𝑴𝒖𝒔𝒊𝒄𝑩𝑬𝑹𝑻𝒔𝒎𝒂𝒍𝒍
Effectiveness of OctupleMIDI
Effectiveness of Bar-Level
Masking
Effectiveness of Pre-training
OctupleMIDI significantly outperforms REMI and CP
: Learn from a larger proportion of a music song
with the compact OctupleMIDI encoding
Table 5. Results of different encoding methods
26. 26
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Effectiveness of OctupleMIDI
Effectiveness of Bar-Level
Masking
Effectiveness of Pre-training
Experiment on 𝑴𝒖𝒔𝒊𝒄𝑩𝑬𝑹𝑻𝒔𝒎𝒂𝒍𝒍
Random Randomly masks the elements in the octuple
token
Octuple Randomly mask some octuple tokens
(mask all the elements in an octuple token)
Bar The elements with the same type in the same bar are
27. 27
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Effectiveness of OctupleMIDI
Effectiveness of Bar-Level
Masking
Effectiveness of Pre-training
Experiment on 𝑴𝒖𝒔𝒊𝒄𝑩𝑬𝑹𝑻𝒔𝒎𝒂𝒍𝒍
Pre-training is critical for symbolic music
understanding
28. 28
Conclusion
Propose OctupleMIDI encoding & bar-level masking strategy for music
domain
Develop MusicBERT, a large-scale pre-trained model
for symbolic music understanding
Achieve state-of-the-art performance on
all four evaluated symbolic music understanding task
29. 29
For my research
Acquire some baseline models & datasets to review
Understand new symbolic music representation method
Learn how to design experiments to measure each feature of a model