This document discusses genetic genealogy and analyzing Y-chromosome DNA sequences. It provides information on establishing haplogroup classifications and private mutations in family lineages. Key points covered include the importance of genome builds, variant naming conventions, and establishing new branches in the Y-chromosome tree based on private mutations. The document emphasizes that not all captured positions will pass quality control and sequencing different platforms can be concordant. It also notes that future releases in 2017 will provide additional helpful features for analysis.
1. Doron M Behar, MD, PhD
Family Tree DNA, Houston, Texas
The 12th Genetic Genealogy Conference for Family Tree DNA Group
Administrators November 11-13, 2016
4. It is commonly thought that human genetic diversity in non-
African populations was shaped primarily by an out-of-Africa
dispersal 50-100 thousand yr ago (kya). Here, we present a
study of 456 geographically diverse high-coverage Y
chromosome sequences, including 299 newly reported
samples. Applying ancient DNA calibration, we date the Y-
chromosomal most recent common ancestor (MRCA) in
Africa at 254 (95% CI 192-307) kya and detect a cluster of
major non-African founder haplogroups in a narrow time
interval at 47-52 kya, consistent with a rapid initial
colonization model of Eurasia and Oceania after the out-of-
Africa bottleneck. In contrast to demographic reconstructions
based on mtDNA, we infer a second strong bottleneck in Y-
chromosome lineages dating to the last 10 ky. We hypothesize
that this bottleneck is caused by cultural changes affecting
variance of reproductive success among males. 4
7. Written vs molecular genealogy
Written Molecular
7
1450 =
566 ybp
1615 =
401 ybp
546 ybp
399 ybp
8. This is what you need,
Right?!
Good, cause we are
building it!
8
9. Whole Y Chromosome
60M bps long
Karmin et al:
We exclude all oF Chr Y
outside 10.8-Mb sequence
>5x unique coverage
FTDNA:
Around 11.5 to 12.5 million
base-pairs of reliably
mapped positions of non-
recombining Y
chromosome
9
14. Whole Y Chromosome
60M bps long
Karmin et al:
We exclude all oF Chr Y
outside 10.8-Mb sequence
>5x unique coverage
FTDNA:
Around 11.5 to 12.5 million
base-pairs of reliably
mapped positions of non-
recombining Y
chromosome
14
In which Capture
17. Inter platform performance
Genotyping platforms:
Complete Genomics
Illumina
5 samples were run in both platforms
The overlapping region is 6M bp
Are we identifying the same variants?
17
21. What is a reference genome?
The reference genome does not represent the ancestral
genome!
The reference genome represent a haploid mosaic of
different DNA sequences from different donors. For
example, GRCh37, the Genome Reference Consortium
human genome (build 37) is derived from thirteen
anonymous volunteers from Buffalo, New York.
Accordingly, the Y chromosome sequence is an
assembly of a few haplogroups.
21
23. Genome builds
Release name Date of release Equivalent UCSC version
GRCh38 Dec 2013 hg38
GRCh37 Feb 2009 hg19
NCBI Build 36.1 Mar 2006 hg18
NCBI Build 35 May 2004 hg17
NCBI Build 34 Jul 2003 hg16
23
The same variant can be in “different” positions in
different genome builds.
26. Whole Y sequencing
DNA sample
Library
Preparation
Whole Y
Capturing
Whole Y
Sequencing
Raw Data
(~100M Reads in
FASTQ format)
Raw Data
Statistics Report
Reads Quality
Filtering,
Trimming
Mapping to the
Reference
Genome (hg19)
Mapping
Statistics Report
34. Pipeline for Whole Y analysis
DNA sample
Library
Preparation
Whole Y
Capturing
Whole Y
Sequencing
Raw Data
(~100M Reads
in FASTQ
format)
Raw Data
Statistics
Report
Reads Quality
Filtering,
Trimming
Mapping to the
Reference
Genome (hg19)
Mapping
Statistics
Report
Variant Calling
Annotation of
Variants
Variants
Statistics
Report
Variants Tables
35. VCF (Variant Call Format)
Variant GRC38 position Reference Derived Variant GRC38 position Reference Derived
P305 2842113 G A M231 13357844 G A
L1085 2922685 T C M214 13360045 T C
Z4762 3953196 A C M213 13414871 T C
V171 5030624 C G L1130 14549130 T G
M523 6885478 A G P14 15286718 C T
M522 7305102 G A V168 15835792 G A
M578 7334662 C T L729 17319728 A C
L666 7702775 G A F549 17401190 C T
V221 7721262 G T F3163 19069977 G A
M2308 7822141 A T M9 19568371 C G
F1154 8513272 T C M42 19704954 A T
F1206 8572376 C T M89 19755427 C T
F1329 8720990 C T L1155 20029380 G C
P143 12077161 G A L735 20977731 G T
M168 12702062 C T M526 21389038 A C
P97 12774339 G T F650 21455120 G A
P108 13314368 C T
35
36. VCF (Variant Call Format)
Variant GRC38 position Reference Derived Variant GRC38 position Reference Derived
P305 2842113 G A M231 13357844 G A
L1085 2922685 T C M214 13360045 T C
Z4762 3953196 A C M213 13414871 T C
V171 5030624 C G L1130 14549130 T G
M523 6885478 A G P14 15286718 C T
M522 7305102 V168 15835792 G A
M578 7334662 C T L729 17319728 A C
L666 7702775 G A F549 17401190 C T
V221 7721262 G T F3163 19069977 G A
M2308 7822141 A T M9 19568371 C G
F1154 8513272 T C M42 19704954 A T
F1206 8572376 C T M89 19755427 C T
F1329 8720990 C T L1155 20029380 G C
P143 12077161 G A L735 20977731 G T
M168 12702062 C T M526 21389038 A C
P97 12774339 G T F650 21455120 G A
P108 13314368 C T
36
1. The position failed, nothing to worry about…
2. The position did not fail and shows the reference
which means it is a private back mutation!
39. Establishing a new branch
John Smith
39
N-M231
N-L735
N- F1206
N-F1154
N-F3163
g.2654329C>T
g.4448652G>A
g.7598733A>G
Mike Smith
N-M231
N-L735
N- F1206
N-F1154
N-F3163
g.3447764C>T
g.6853865A>G
40. John Smith 40
N-M231
N-L735
N- F1206
N-F1154
N-F3163
g.2654329C>T
g.4448652G>A
g.7598733A>G
Mike Smith
g.3447764C>T
g.6853865A>G
~1400 ybp
41. Establishing a new branch
John Smith
41
N-M231
N-L735
N- F1206
N-F1154
N-F3163
g.2654329C>T
g.4448652G>A
g.7598733A>G
Mike Smith
N-M231
N-L735
N- F1206
N-F1154
N-F3163
g.3447764C>T
g.4448652G>A
g.6853865A>G
42. John Smith 42
N-M231
N-L735
N- F1206
N-F1154
N-F3163
g.2654329C>T
g.7598733A>G
Mike Smith
g.3447764C>T
g.6853865A>G
g.4448652G>A
~1000 ybp
43. Message No 5:
Help is on the way!
These features will be released
during 2017!
43
44. Tartu
Estonian Biocentre
Lauri Saag
Monika Karmin
Hovhannes Sahakyan
Ene Metspalu
Mait Metspalu
Siiri Rootsi
Richard Villems
Acknowledgements
Genealogical peers
All Big Y friends
Family Tree DNA
Connie Bormans
Luisa Fernanda Sanchez
Brent Manning
Elliott Greenspan
Bennett Greenspan