Music Data Start to End

데이터야 놀자
음악 데이터 수집부터
웹 어플리케이션까지
Music and Audio Computing Lab., KAIST
2017.10.13
 
금상은, 이종필

• S. Kum, C. Oh, and J. Nam, "Melody Extraction on Vocal Segments
Using Multi-Column Deep Neural Networks."  
ISMIR, pp. 819-825, 2016.

• S. Kum and J. Nam, “Classiﬁcation-based Singing Melody Extraction
Using Deep Convolutional Neural Networks.”  
Applied Science. 2017. (submitted)
발표자 소개
• J. Lee and J. Nam, “Multi-level and multi-scale feature aggregation using pre-
trained convolutional neural networks for music auto-tagging,”  
IEEE Signal Processing Letters, vol. 24, no. 8, pp. 1208–1212, 2017.

• J. Lee and J. Nam, “Multi-level and multi-scale feature aggregation using sample-
level deep convolutional neural networks for music classiﬁcation,” 
ICML, Machine Learning for Music Discovery Workshop, 2017.
Music and Audio Computing Lab., KAIST, 박사과정
금상은 이종필

Music Information Retrieval (MIR) [1]
• 음악에서 의미있는 특징을 추출 또는 추론  
(오디오 신호, 악보 웹페이지 등등 외부 소스로부터) 
• 추출한 특징을 사용하여 음악을 색인화, 다양한 검색 및 시스템을 개발  
(예, 콘텐츠 기반 검색, 음악 추천 시스템 또는 대규모 음악 컬렉션을 탐색하기위한 사용자 인터페이스) [1]
[1] J. Stephen Downie. Music Information Retrieval. Annual Review of Information Science and Technology, 37:295–340, 2003.
https://goo.gl/kZ1ZDe

FEATURE EXTRACTION :  
timbre description, music transcription and melody extraction, onset detection, beat tracking, and tempo estimation,  
tonality estimation: chroma, chord, and key, structural analysis, segmentation and summarization

SIMILARITY :  
similarity measurement, cover song identification, query by humming

CLASSIFICATION :  
emotion and mood recognition, genre classification, instrument classification, composer, artist and singer identification

APPLICATIONS :  
audio fingerprinting, content-based querying and retrieval, music recommendation, playlist generation,

audio-to-score alignment and music synchronization, song/artist popularity estimation, music visualization
[2] Schedl, Markus, Emilia Gómez, and Julián Urbano. "Music information retrieval: Recent developments and applications." Foundations and Trends® in Information Retrieval 8.2-3 (2014): 127-261.
Music Information Retrieval (MIR) [2]

Deep learning !!
[3] https://goo.gl/kZ1ZDe

노래 목소리에 기반한 음악 추천
https://i.ytimg.com/vi/ynCBeEEP5ws/maxresdefault.jpg
노래 목소리 :  
- 음량, 음고, 음색의 변화 패턴과 연관
- 창법과 고유한 목소리 특성
- 가수나 음악의 ‘선호도’를 파악할 때 중요한 요소.
→ 목소리에 기반한 음악 검색 및 추천

음악 추천 시스템 예시

Dataset for MIR
https://www.audiocontentanalysis.org/data-sets/
Deep learning
MIR
Tasks
Data

>> 국내 가요 Dataset 의 부재
국내 가요 dataset & 목소리 tag를 만들어 보자
>> 양질의 ‘목소리 Tag’ 적립 X
: pop / music 관련 label은 존재 (MagnaTagATune [5] ), vocal에 관련된 label은 잘 정의 되어 있지 않음.
문제점
https://github.com/keunwoochoi/magnatagatune-list

Investigation on Vocal Tags and Singer Similarity of K-pop [4]
[4] Investigation on Vocal Tags and Singer Similarity of K-pop, Proceedings of the Acoustical Society of Korea Spring Conference, 2016
1. 국내 대표 가수 선별 및 대표 곡 수집
: 멜론 시대별 차트 기준, 년도별 대표 가수 및 곡 선정 (181명, 900곡)
2. 목소리를 묘사하는 단어들을 수집 (54개) 및 태깅 작업을 수행
: "매우 그렇지 않다"(1점) 부터 "매우 그렇다 (5점)까지 각 태그에 대한 신뢰값 
3. 목소리 묘사 태그의 유용성을 검증
: 유저 간 일치도, 활성화 빈도, 태그 간 중복성
4. 목소리 태그를 기반으로 가수들의 유사도를 분석
: 계층적 클러스터링 →가수 간 유사도를 분석 → 목소리 태그의 효용성 확인

그림1. 목소리 묘사 단어의 상관관계 유사도 기반 dendrogram
[4] Investigation on Vocal Tags and Singer Similarity of K-pop, Proceedings of the Acoustical Society of Korea Spring Conference, 2016
Investigation on Vocal Tags and Singer Similarity of K-pop [4]
1. 국내 대표 가수 선별 및 대표 곡 수집
: 멜론 시대별 차트 기준, 년도별 대표 가수 및 곡 선정
2. 목소리를 묘사하는 단어들을 수집 및 태깅 작업을 수행
: "매우 그렇지 않다"(1점) 부터 "매우 그렇다 (5점)까지 각 태그에 대한 신뢰값 
3. 목소리 묘사 태그의 유용성을 검증
: 태그 결과 일치도, 활성화 빈도, 태그 간 중복성 등등
4. 목소리 태그를 기반으로 가수들의 유사도를 분석
: 계층적 클러스터링 →가수 간 유사도를 분석 → 목소리 태그의 효용성 확인

그림2. 가수들의 목소리 묘사 단어 유사도 기반 클러스터 분포
[4] Investigation on Vocal Tags and Singer Similarity of K-pop

K-pop VOCAL dataset 을 만들어 보자
>> K-POP VOCAL Dataset 의 부재
[5] http://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset
>> 양질의 ‘목소리 Label’ 적립 X
: pop / music 관련 label은 존재 (MagnaTagATune [5] ),  
하지만 vocal에 관련된 label은 잘 정의 되어 있지 않음.
문제점
https://kpopvocalanalysis.net/
>> 곡의 전체에 대한 태깅 작업의 한계 
: 음악의 기승전결에 따라 결과가 달라짐.  
따라서 일부 특정 구간을 설명하기에는 부적절한 경우가 있음
http://mac-bach.kaist.ac.kr:8001/

k-pop vocal analysis.net
• https://kpopvocalanalysis.net/
The analysis will be based solely on VOCAL TECHNIQUE
- Tones/Semitones/Notes/Key
- Intonation
- Larynx Position/High Larynx/Low Larynx/Neutral Larynx
- Tonality/Tone Production
- Vibrato
- Stability
- Registers
- Support
- Placement vs Resonance vs Projection
- Vocal Range vs Supported Range vs Tessitura
- Musicianship/Musicality
- Passaggi/Vocal Bridges
- Legato/Staccato
- Agility

K-POP vocal dataset
• Song data
‣ K-pop 가수: 114명 (솔로곡이 있는 가수)

‣ 한 가수당 최대 5곡 → 469곡 
(듀엣, 코러스가 많은 곡, 랩 등등 제외)

K-POP vocal dataset
• Vocal Label
‣ 1. Word List : 3000+ → 334 → 70개 
(12명 설문. 10명 이상 ‘yes’ 인 단어만 선별)

‣ 2. 한 곡 전체에 대해서 태깅, 한 곡당 3명이 평가.  
태그 유용성 (태그간 유사도, 활성 빈도, 태그 신뢰도 검사) : 70 → 42개
‣ 3. 10초 길이로 자른 음원 (가수 목소리가 어느정도 이상 존재하는) 에 대해서  
태깅

곡 단위: Tag Frequency
[ MagnaTagATune ] [ K-POP vocal dataset ]

곡 단위: Tag Agreement
• 한 곡에 대해서 3명이 Tagging
• Score : n/3
• 태그 별 평균

곡 단위: Tag similarity - Hierarchical Clustering

곡 단위: Tag similarity - Hierarchical Clustering_ver.2

가창 태깅 시스템 : segment 단위
• 가창 유무 검출

• 멜로디 추출

• 목소리 관련 Tagging

가창 유무 판별 및 멜로디 추출
FC
singing
pitch
extractor
singing
voice
detector
FC
Audio
CNNs
HMM
Voice
False Alarm
Detector
Melody Contour
spec.
mel-spec.
11 frames
513 bins
115 frames
80 bins
CNNs

멜로디 pitch 추출
FC
singing
pitch
extractor
singing
voice
detector
FC
Audio
CNNs
HMM
Voice
False Alarm
Detector
Melody Contour
spec.
mel-spec.
11 frames
513 bins
115 frames
80 bins
CNNs

• 10초

• 10초동안 가수 목소리가 70% 이상 존재한다고 판단 → TEST !!
FC
singing
pitch
extractor
singing
voice
detector
FC
Audio
CNNs
HMM
Voice
False Alarm
Detector
Melody Contour
spec.
mel-spec.
11 frames
513 bins
115 frames
80 bins
CNNs
가창 유무 판별

K-POP VOCAL dataset
• Song data
‣ K-pop 가수 : 114명

‣ 음원 :

✓ 469곡
✓ 10초 단위로 자른 segments : 6808개 (~19h) 
• Vocal Label : 42개
음색
‣ Low-level : Low-Range, Mid-Range, High-Range, Husky/Throaty, Thick, Thin

‣ High-level : Embellishing, Warm, Lonely, Passion, Pretty, Sad, Bright, Clear,
Relaxed, Cute, Dark, Delicate, Emotional, Energetic, Male, Female, Mild/Soft,
Charismatic, Whisper/Quiet, Sharp, Pure, Rich, Sweet, Young, Rounded

창법
‣ Genre : Soulful/R&B, Ballad

‣ well-known : Shouty, Vibrato, Falsetto

‣ Qualitative : Robotic/Artiﬁcial, Stable, Breathy, Compressed, Dynamic,
Speech-Like

Music Galaxy Hitchhiker: 3D web music navigation through
audio space trained with tag and artist labels
실시간 태깅 시각화

태그 네트워크로 구성된 공간에서의 음악 검색 및 추천

아티스트 네트워크로 구성된 공간에서의 음악 검색 및 추천

3D상에서 음악 검색

서동우: 학부연구 - Backend

이경연: 학부연구 - Frontend

이종필: 박사과정 - DB & tag networks

박지영: 석사과정 - artist networks

남주한: 교수님 - 지도

Search by song (Pop, K-pop)
tag network
artist network

Music auto-tagging
Sample-level
Raw waveform model
태그 데이터를 이용하여 음악 오디오로부
터 음악을 묘사하는 단어를 예측하게끔
Convolutional Neural Networks를 이
용하여 지도학습한 후,

새로운 음원샘플에 대하여 해당 음원이 어
떤 분위기의 음악인지 자동으로 분석함.
이종필

Representation learning using artist labels
Happy Rock 60s Beyonce Kygo 초아 김동률
…
박지영

Representation learning using artist labels

Music Retrieval Examples – Focusing on General Music Style
• Artist-level

Music Retrieval Examples – Focusing on General Music Style
• Song-level

Music Retrieval Examples – Focusing on Singing Voice
• Focusing on Singing Voice• Focusing on General Music Style

Neuroscape (박승순, 이종필)
https://vimeo.com/230440161

Neuroscape (박승순, 이종필)
이미지를 인공지능 알고리즘으로 분석하여 적합한 사운드를 자동으로 연동하는 기술
Forest
Train
Home
Image AI Machine
Forest
Tree
Park
path
Audio AI Machine
Word space
Language AI Machine

음악과 오디오 검색을 더욱 편하게…
감사합니다

Music Data Start to End

Recomendados

Recomendados

Más contenido relacionado

Más de Dataya Nolja

Más de Dataya Nolja (20)

Music Data Start to End