Forensic Biology & Its biological significance.pdf
Building a Mutation History Tree
1. Combining SNPs, STRs, & Genealogy
to build a Surname Origins Tree
Dr Maurice Gleeson
11th Annual FTDNA Conference
15th Nov 2015
http://gleesondna.blogspot.co.uk/
YouTube – DNA and Family History Research
3. A Combined Mutation / Family History Tree
… using DNA markers when people run out
… is it possible? Can you do it?
4. Topics for Discussion
• Building a tree with STRs
• Building a tree with SNPs
• Combining STRs & SNPs
• Dating branching points in the tree
• Combining STRs, SNPs & genealogy
• Opportunities for the years ahead
5. Topics for Discussion
• Building a tree with STRs
• Building a tree with SNPs
• Combining STRs & SNPs
• Dating branching points in the tree
• Combining STRs, SNPs & genealogy
• Challenges for the years ahead
6.
7.
8. Modal Haplotype for Lineage II
• Lots of Parallel Mutations!
o Back Mutations remain hidden
• Is resolution enough to define the tree?
• Is this the “best fit” model?
570 (17-18)
CDYa (38>39) CDYa (38>39)
3
Branch numbers
9. Courtesy of Ralph Taylor
G64
G39
Fluxus cladogram
• It can help
- useful to check against
the Hand-Drawn Tree
• Shows “maximum
parsimony” version
• Cumbersome, fiddly,
easy to make mistakes,
difficult to interpret,
time-consuming
• Difficult to visualise as a
“Family Tree”
• Gives all markers equal
weight & ignores differing
mutation rates
www.isogg.org/wiki/Cladogram
10. Courtesy of Ralph Taylor
G64
G39
Fluxus cladogram
• Several “Best Fit” models
- at least 8 BF models …
- Tree is not anchored
• No single “most likely” option
• So not enough information
at 37 markers to define
the branching pattern
• Parallel Mutations still
persist
- 390, 392, CDYa&b
• Back Mutations also possible
• Not clear which mutation
came before which
www.isogg.org/wiki/Cladogram
11. 570 (17-18)
CDYa (38>39) CDYa (38>39)
Hand Drawn Tree
570 (17-18)
CDYa (38>39) CDYa (38>39)
Fluxus Tree v1
Branch numbers
12. 570 (17-18)
CDYa (38>39) CDYa (38>39)
Hand Drawn Tree
570 (17-18)
CDYa (38>39) CDYa (38>39)
Fluxus Tree v1
Branch numbers
16. Fluxus Cladogram
(111 markers)
G64
G39
G73
G64
G39
Fluxus Cladogram
(37 markers)
Courtesy of Ralph Taylor
• No weighting … but mutation rates vary by a factor of 400
• James Irvine developed an algorithm for weighting markers
weighting = 99* (1 – mutation rate/0.04)2
https://en.wikipedia.org/wiki/List_of_Y-STR_markers
17. www.isogg.org/wiki/Cladogram Courtesy of Ralph Taylor
• Torso disappears
• No alternative pathways
= 1 single “Best Fit” model
Fluxus Cladogram
(111 markers)
G64
G39
G73
Fluxus Cladogram
(111 markers,
weighted)
18.
19. Some markers behave unusually
• Marker 389: this is tested in 2 parts – mutation in Part 1 is also
counted in Part 2 => so just use Part 2 (389ii) … and we did!
– www.familytreedna.com/learn/y-dna-testing/y-str/different-str-markers-dys389i-dys398ii-
dys389-2-result-family-tree-dna-different-genographic-project/
• Multi-copy markers 464abcd
(but also 385, 459, YCAII, CDY, DYF395S1, 413)
– mutations in multi-copy markers may not be in the correct order
– Kittler test defines relative positions for 385 … not applicable here?
– www.familytreedna.com/learn/y-dna-testing/y-str/infinite-allele-palindromic-markers/
– http://www.isogg.org/wiki/DYS_464
• Multi-copy marker 464abcd: 2 types = c & g
– 464x test defines which type (but not position) … not accounted for!
– http://www.dna-fingerprint.com/static/PalindromicPres.pdf
• 464abcd, CDYa & b: fast-mutating palindromic markers
– http://www.isogg.org/wiki/RecLOH
22. Which is more accurate?
with or without CDY & 464?
or some version in between?
23. How likely is it that 464 & CDY will screw things up?
• Gleeson surname origin = 1000 AD
Surname has had 1000 years to mutate
= 33.3 generations (30 y/gen)
• How many mutations would you expect in 1000 years?
• CDY mutation rate = 0.03531 / gen
= 1.176 per member = c.16 mutations for all 14 branches of Lineage II
Observed rate is 4 for CDYa, and 3 for CDYb
=> 12/16 and 13/16 mutations respectively are hidden?
– So predictions based on CDY will be incorrect (12/16 + 13/16)/2 = 78%
of the time?
• 464 mutation rate = 0.00566 / gen
= 0.188 per member = 2.6 per 14 members (on each of 464abcd)
Observed rate is 0 for 464a & d, and 2 for 464b & c
=> 2.6/2.6 & 0.6/2.6 mutations respectively are hidden?
– So predictions based on 464 will be incorrect 62% of the time?
https://en.wikipedia.org/wiki/List_of_Y-STR_markers
24. How likely is it that 464 & CDY will screw things up?
• Less of a problem in those branches related within the last
200-300 years?
– less time to mutate back
– lower chance of back mutations
– more useful for branch-defining
• More of a problem with those branches more distantly
related (600-1000 yrs)?
– more time to mutate back
– higher chance of back mutations
– less useful for branch-defining
Choose v3a (i.e. use CDY & 464 data)
• Tree will be less than 100% correct
• Be especially wary of mutations in more distant reaches of
the tree
https://en.wikipedia.org/wiki/List_of_Y-STR_markers
26. Caveats & Limitations
• Missing data
– Fluxus fills in the blanks - is its “best guess" valid?
– No adequate mutation rates for many markers
• The Tree is not yet “anchored”
– Moreso in the upper reaches of the tree (sub-branches seem stable)
– Several interpretations are still possible, even at 111 markers (v3a vs v4)
– Will this reduce as more people test? or upgrade?
– Are there hidden Back Mutations?
• Tree may be skewed by recent mutations (last 5-6 generations)
=> Triangulate on each MDKA
– Test at least 2 known distant cousins from each family branch in order to
characterise the haplotype of each MDKA
– Helps eliminate recent mutations which might cloud the interpretation
– Costly … $339 for a 111 marker test … x2 = $678
• Is there Convergence in the Tree? (e.g. 3/111)
www.isogg.org/wiki/Fluxus
27. Topics for Discussion
• Brief overview of key concepts
• Building a tree with STRs
• Building a tree with SNPs
• Combining STRs & SNPs
• Dating branching points in the tree
• Combining STRs, SNPs & genealogy
• Challenges for the years ahead
29. Is fine-scale SNP testing
the best method of determining
branching patterns within a Genetic Family?
… how to do it as cheaply &
efficiently as possible?
31. Working with SNPs
– Opportunities & Challenges
• Declaring SNPs - false positives
• Missing SNPs - false negatives
• Constant change
– “Known, Novel, Shared & Private”
• No name, just a location
• SNP naming process unregulated
– Same SNP, different names
• Making results user-friendly
• Lots of help available
– independent verification & interpretation possible
32. Problems encountered with “declaring a genuine SNP”
Problem Reason(s) Implication
Detection No coverage False negative – SNP is present on Y but
remains undetected
Low no. of
Calls
Poor coverage False Negative – SNP present but fails to
meet threshold criteria
Recognition Detection Filter /
Threshold too strict?
False Negative - SNP is present in data but
missed by analysis - detectable by manual
analysis of possible SNPs on BAM file
Localisation Difficult location on Y
(centromere, palindrome,
in STR / repetitive region)
False Positive or Negative - SNP may be
genuine but its exact position cannot be
known for sure or may vary
Instability Unstable SNP – frequent
& unpredictable mutation
False Positive or Negative - SNP may or may
not be genuine
InDels Not SNPs, but rather a
deletion (usually)
False Positive or Negative - may or may not
be genuine
So is the SNP really present?
… or absent?
Just because it is detected, doesn’t mean it is there …
Just because it’s not detected, doesn’t mean it isn’t there
43. … aka BY2853
Jan 2015
Apr 2015
Jun 2015
Oct 2015
www.ytree.net/DisplayTree.php?blockID=319&star=false
Clicking on a marker or name
brings up further analysis
44. www.ytree.net/MutMatrix.php
Grey = no coverage
Pink = marginal coverage
My simplistic interpretation
+ Definite
* Probable
** Possible
*** Unlikely
The Big Tree: R-A5629 Mutation Matrix of Shared SNPs
47. • Are they really SNPs?
- different thresholds & filters
• SNPs trapped in Private Collections
- Private SNPs will be liberated as more people test
& SNPs become “not private” anymore – move up into the
shared area of the tree … but they will run out! When?
• No names, just locations
- will need to be translated into SNP names in time
=> consult Ybrowse, other utilities??
Inconsistency in “declaring
a genuine SNP”
48. Different strokes for different folks
Who is right?
… or more accurately …
who has estimated correctly?
End Result
SNP = definite, probable, possible, or unlikely
… subject to change ... & Sanger Sequencing?
49. Despite NGS, Sanger Sequencing
will still be required
• Chip-based SNP testing will still be
needed to confirm or refute
discoveries made by NGS
• Multiple Deep Clade Panels will
need to be created
… for subclades, surnames, & genetic clusters
Some Bold Predictions …
50. Topics for Discussion
• Brief overview of key concepts
• Building a tree with STRs
• Building a tree with SNPs
• Combining STRs & SNPs
• Dating branching points in the tree
• Combining STRs, SNPs & genealogy
• Challenges for the years ahead
51. • SNP results consistent?
• Need to tidy it up
456 15-16
52. • SNPs are further up the tree than STRs
• Tell us nothing about branches on left
• Only use “definite SNPs” (not probable/possible)
• Private SNPs are still trapped in Private Collections
Mutation sequence?
BY2853 > A5629 > 456 …
> G68 (Glisson, Branch 14)
> A5628
> Y16880 (Branch 2,7,6)
> A660 (Branch 9)
54. G54 G39
G51
G73
G66 G22 G42 G55 G57 G21
Nigel McCarthy’s Z255 Group E
http://freepages.genealogy.rootsweb.ancestry.com/~skibbgirl/McCarthyDNAProject/
G68
No BY2852 block
Extra marker
Private SNPsPrivate SNPsPrivate SNPs
2 pink SNPs omitted
Differing
Modal
Haplotype
<67 markers excluded
55. Topics for Discussion
• Brief overview of key concepts
• Building a tree with STRs
• Building a tree with SNPs
• Combining STRs & SNPs
• Dating branching points in the tree
• Combining STRs, SNPs & genealogy
• Challenges for the years ahead
56. Iain McDonald, The 2015 report to the U106 group (Sep 2015)
www.jb.man.ac.uk/~mcdonald/genetics/u106-geography-2015-revised.pdf
64. Will additional STR markers help refine TMRCA estimates?
• But … 5% differ? ... some are missing? ... not detected by NGS?
• 35 mutations between G21 & G55
• 24 mutations between G21 & G57
• 9 mutations between G21 & G57
68. Topics for Discussion
• Brief overview of key concepts
• Building a tree with STRs
• Building a tree with SNPs
• Combining STRs & SNPs
• Dating branching points in the tree
• Combining STRs, SNPs & genealogy
• Challenges for the years ahead
73. A Combined Mutation / Family History Tree
… using DNA markers when people run out
… is it possible?
74. Topics for Discussion
• Brief overview of key concepts
• Building a tree with STRs
• Building a tree with SNPs
• Combining STRs & SNPs
• Dating branching points in the tree
• Combining STRs, SNPs & genealogy
• Opportunities for the years ahead
75. Lessons Learned & Future Opportunities
• Transcription errors are easy => triple-check, automate
• Re STRs
– Lots of Parallel Mutations … where are the Back Mutations?
– 111 markers best define the branching pattern
– Placement of CDY & 464 is likely to be incorrect (esp. in
upstream generations)
– Most project members have not tested other male cousins
to triangulate on their MDKA
– Convergence may be a problem (even at 3/111)
– We need more people to test
– We need more people to upgrade to 111 markers
– YFULL analysis liberates 495 STRs
76. Lessons Learned & Future Opportunities
• Re SNPs
– Difficult to declare a genuine SNP
– Different SNPs from different lips
– Definite, probable, possible, unlikely
– Likely to be lots of false negatives (& false positives)
– No names (locations too long)
– Naming is unregulated
– Many SNPs trapped in Private Collections
– Current NGS is discovery, not confirmatory =>
further testing (with other NGS?) needed to confirm
77. Lessons Learned & Future Opportunities
• Re combining STRs & SNPs
– Adding SNPs changed the upper reaches of the tree
– SNPs are still located relatively upstream - STRs offer better
definition downstream
– Start with the Modal of your Haplogoup subgroup
• Re TMRCA estimates
– SNP-based estimates work best for distant branching
points (haplogroup projects)
– STR-based estimates have wide ranges, and skewed
toward distant generations
– Even at 111, upper range ~ double the mid-value
– Even 495 markers has a wide range (+/- 33%)
78. Lessons Learned & Future Opportunities
• Re combining STRs, SNPs & genealogy
– We need to overlay documentary data on DNA
– Some pedigrees not supplied / incomplete
– Need to add MPRs to all (MDKA Profile)
– Need to take a One Name Study approach?
• Collate all Gleeson data worldwide
• Establish a relational database (Access?)
• Assign data to different family branches
• This early draft MHT serves as a useful basis
– Will evolve over time as more people test & upgrade
– Will faciltate collaboration between project members
– Will help attract new project members
80. What would happen if …
• Everyone upgraded to 111 markers?
– Better definition of branching pattern
– More precise TMRCA estimates (with narrower range)
• Everyone did the Big Y?
– SNPs only good for upstream branches? (<1500 AD)
– We will run out of Private SNPs
• Everyone tested on a Surname Specific Panel?
– Would elucidate branching pattern up to 1500 AD? Later?
• Everyone did Whole Genome Sequencing?
– No better than Big Y? Better coverage? Better read length?
– What will happen to Probable / Possible / Unlikely SNPs?
81. Some Bold Predictions …
• (To help stimulate discussion & to learn)
• What is most useful for Surname Projects –
more SNPs or more STRs?
– More STRs … we will run out of Private SNPs
– 111 vs 50,000
– 500 vs 40?
• In 2020, FTDNA will offer 500 STRs for $129
82. Some Bold Predictions …
• How do we best generate a Surname-Specific
SNP Panel?
– Q: How many discovery Big Y tests are needed to
liberate sufficient Private SNPs to adequately
define the Surname Panel?
– A: 5-10 Big Y tests per genetic cluster
– We need another few people to Big Y test, then
generate the Surname Panel for Lineage II
• In 2020, FTDNA will offer over 4000
Surname Specific SNP Panels
for $100 each
84. Acknowledgements
• Bennett Greenspan
• Max Blankfield
• Janine Cloud
• FTDNA team
• Judy Claassen
• Lisa Little
• James Irvine
• Ralph Taylor
• John Cleary
• Haplogroup Admins
• John Murphy
• Neal Downing
• James Kane
• Alex Williamson
• Nigel McCarthy
• Dennis Wright
• Alasdair MacDonald
• YFULL team
The Genetic Genealogy Community
Notas del editor
What are the chances of 5 parallel mutations!??
Several “best fit” models
So not enough resolution at 37 markers to define the branching pattern
Parallel mutations unavoidable – either 390 &392, CDYa&b …
Several pathways to the final mutations per member ... But not clear which came before which
Looks like the constellation of Ursus Major
Or instructions on how to assemble Swedish furniture
Several “best fit” models
So not enough resolution at 37 markers to define a single “most probable” option for the branching pattern
Parallel mutations unavoidable – either 390 &392, CDYa&b …
Several pathways to the final mutations per member ... But not clear which came before which
Best Fit option also includes possible Back Mutations
Several “best fit” models
So not enough resolution at 37 markers to define a single “most probable” option for the branching pattern
Parallel mutations unavoidable – either 390 &392, CDYa&b …
Several pathways to the final mutations per member ... But not clear which came before which
Best Fit option also includes possible Back Mutations
If we compare the Hand Drawn Tree with the Fluxus-based tree (or rather 1 version thereof – as several different versions are possible)
The main area where the Fluxus improves on what we already have in the HDT is in the amalgamation of Branches 2 and 6
So if we move them over beside each other you can see that both branches have parallel mutations on marker 456
If we take a closer look at these branches, let’s assume that the mutation in marker 456 on Branch 6 occurred before the mutation in marker 389
This allows us to create branch 6 as a sub-branch of branch 2 rather than both branches being offshoots from the modal haplotype
This new branching configuration is then reinserted into our tree and branches 3, 4 & 5 moved over to make room
Only Branches 2 and 6 are reconfigured … Everything else remains the same … All other Parallel Mutations still persist
So our Hand Drawn Tree comes out pretty well compared to the Fluxus-based Tree
But that is only at 37 markers … when we have to deal with 111 markers, and many more project members, the option of a Hand Drawn Tree becomes unfeasible
and we have to turn to Fluxus or other software to help us achieve the Best Fit Tree
Even at 111 markers there is no overall most likely Best Fit Tree
there are 2 possible pathways to G71, and 2 to G22
However the rest of the branches appear to be relatively well demarcated by the increase in the number of markers
One problem however is that not everyone has tested to 111 markers
and whereas “99” can be put in place of missing marker values, thus allowing the programme to insert the “best fit” marker value for those that are missing,
there is no guarantee that the programme has chosen the appropriate / correct values
Nevertheless, this cladogram can now be converted into a Family Tree
Some additional members have been added to the tree, namely G02, G68, G70, G71, & G73
Essential piece of technology
Compared to the 37-marker based Fluxus Tree …
- some of the branches haven’t changed at all
- Some new branches have been added as new members have joined - 11-14 in green
- Branch 1 has accumulated a few more mutations
- Branches 4 & 5 have retained their relative position, with 5 being an offshoot of 4
- Branches 2 & 6 have retained their shape (6 an offshoot of 2), as have 9 & 10, and have accumulated a few more mutations as well
But major changes have happened to Branches 7 and 8
- Branch 7 (G55) used to be most closely positioned to Branch 8 (G66). GD was 2/37 but is now 11/111. Now it is closest to Branch 6 (G21); GD was 4/37, now 8/111.
- similarly Branch 8 (G66) has moved over to an entirely different part of the tree. It is now closest to Branch 4 (G22): GD was 4/37, now 2/111 (reconfiguring the tree has removed a parallel mutation at 464b).
Parallel Mutations still exist (CDYa 38-39, 464c 17-16, CDYb 40-39) and others have appeared (461, 390) but others have disappeared completely (464b, 456)
A Back Mutations is now evident - in Branch 1 (G05): 464b mutates forward from 16-17, then back from 17-16, then mutates again from 16-15 in Branch 11 (G02)
?? Generally these developments seem to represent [??a significant step forward], even if James and Ralph aren’t too confident they have hit upon a reliable weighting algorithm, or that the basic mutation rates used (Chandler/Ancestry) are reliable. But it does seem that the use of some weighting alogithm, even if it’s exact form and content are unreliable, is better than none.
From James Irvine:
“Conventional fluxus diagrams give equal weight to all markers, regardless of the fact that their mutation rates vary by a factor of 400, or perhaps arbitrarily exclude the fastest moving markers such as the CDYs. James Irvine introduced me to the concept of weighting markers, and he and Ralph Taylor, who has kindly produced all my fluxus diagrams, have come up with the simple weighting algorithm of: weighting = 99* (1 – mutation rate/0.04)2.
Note the application of this algorithm has the effect of making the “torso” or green ring disappear, although this is only significant if it clearly explains the data in a way that is still consistent with a most parsimonious version of the tree.”
Changes only apparent in the upper reaches of the tree
No change in lower area - relationship of sub-branches remains the same … with the exception of Branch 1 (G05)
This no longer seems as closely related to Branches 8,4,5 as previously
It may be more closely related to Branch 13 (G70)
G70 (Branch 13) now does not come directly off the modal but has a mutation in CDYb
Following a further mutation (in 464c), 2 other branches now split off = Branch 10 (G54) and Branch 1 (G05)
This new configuration allows us to …
remove some of the mutations that were indicated in some of the branches (now crossed-out in red)
And reposition them (in black) to other areas of the tree (eg on Branch 1, the crossed-out markers CDYb & 464c are repositioned to further up the tree)
But during this process, Nigel McCarthy spotted an error in Branch 6 …
GATA A10 is only present in Branch 6 (G21) and not in Branch 7 (G55)
This turned out to be a transcription error during the initial transfer of the data from FTDNA to WFN
The lesson here is: transcription errors are easy to make and happen all the time => we need to double-check and triple-check all these values
So now Branch 7 is no longer a sub-branch of Branch 6
Caveats:
1) Fluxus fills in the blanks / missing data - the question remains: is its “best guess" valid?
2) Some markers behave unusually …
- 464: mutations may not be in the correct order - Kittler test needed to define relative positions
- 389: marker is in two parts. A mutation in the first part is also counted as a mutation in the second part
3) Some markers (esp. 68-111) have no modal value - need more people to test, & at higher levels - may become differentiating in the future
4) the Tree may be skewed by recent mutations (ie within the past 5-6 generations). Ideally it would be optimal to test at least 2 known distant cousins from each family branch in order to characterise the haplotype of each MDKA. Triangulation on all MDKAs would help eliminate recent mutations which might cloud the interpretation of the tree beyond the level of the MDKAs.
G71 has become part of G22
G22 is now a sub-branch of G66
G02 is a separate branch from the Modal and no longer a sub-branch of G05
G70 remains in roughly the same relation to other branches, as do G57, G55, G21, G68, G54, & G51
So removing the unreliable markers CDY and 464 does not result in substantial changes to most of the tree,
because there are other mutations that maintain the tree structure / branching pattern
G71 has become part of G22
G22 is now a sub-branch of G66
G02 is a separate branch from the Modal and no longer a sub-branch of G05
G70 remains in roughly the same relation to other branches, as do G57, G55, G21, G68, G54, & G51
So removing the unreliable markers CDY and 464 does not result in substantial changes to most of the tree,
because there are other mutations that maintain the tree structure / branching pattern
G71 has become part of G22
G22 is now a sub-branch of G66
G02 is a separate branch from the Modal and no longer a sub-branch of G05
G70 remains in roughly the same relation to other branches, as do G57, G55, G21, G68, G54, & G51
So removing the unreliable markers CDY and 464 does not result in substantial changes to most of the tree,
because there are other mutations that maintain the tree structure / branching pattern
Caveats:
1) Fluxus fills in the blanks / missing data - the question remains: is its “best guess" valid?
2) Some markers behave unusually …
- 464: mutations may not be in the correct order - Kittler test needed to define relative positions
- 389: marker is in two parts. A mutation in the first part is also counted as a mutation in the second part
3) Some markers (esp. 68-111) have no modal value - need more people to test, & at higher levels - may become differentiating in the future
4) the Tree may be skewed by recent mutations (ie within the past 5-6 generations). Ideally it would be optimal to test at least 2 known distant cousins from each family branch in order to characterise the haplotype of each MDKA. Triangulation on all MDKAs would help eliminate recent mutations which might cloud the interpretation of the tree beyond the level of the MDKAs.
SNP testing
SNP testing is required to confirm haplogroup assignments, to learn more about your deep ancestry and to rule out false positive matches. Y-SNP chip tests are available from the Genographic Project and BritainsDNA. More comprehensive sequencing tests using next generation sequencing technology are available from Full Genomes Corporation and Family Tree DNA. Single SNPs can be ordered from Family Tree DNA and YSEQ.
For advice on SNP testing consult the project administrator of the relevant Y-DNA haplogroup project.
Single SNPs
There are currently two companies that offer single SNPs.
Family Tree DNA offer a range of single SNPs at US $39 per SNP.[1] SNPs can only be ordered by existing FTDNA customers who have already taken a Y chromosome DNA test with the company. In October 2013 over 3500 individual SNPs were available to order from FTDNA. The placement on the phylogenetic tree is unknown for most of them. SNPs with a known placement are highlighted in the customer's personal results page under the Y-DNA Haplotree & SNPs section. Customers can also request that new SNPs are added to the catalogue.
YSEQ is a new company established by Thomas and Astrid Krahn in November 2013. YSEQ offer a potentially unlimited menu of single SNPs to order. For further details of this company see the blog post by Debbie Kennett entitled YSEQ a new company offering single SNPs.
Full Genomes Corporation have indicated that they will soon be offering single SNPs for sale.
Deep clade tests
Family Tree DNA used the term "deep clade test" to refer to a panel of Y-chromosome SNP tests. This panel was intended to establish which particular subclade the Y-chromosome belonged to. The deep clade test was effectively superseded by the new Geno 2.0 test from the Genographic Project. This new test was introduced in the autumn/fall of 2012 and tests over 12,000 Y-DNA SNPs.
In January 2013 FTDNA announced that they were removing the deep clade test from sale.[2] A link is now provided that will allow people to order the Genographic 2.0 test, whose Y-SNP results can be transferred back to Family Tree DNA.
Family Tree DNA announced at their conference in November 2013 that they would be reintroducing some deep clade tests in 2014, probably in the first quarter of the year.
Sometimes true SNPs are not detected by the machine (FALSE NEGATIVE) due to poor coverage
Sometimes they are detected by the machine but not recognised by the analyst / analyser
Some true snps are missed (and James / John picks them up) (FALSE NEGATIVE)
If they are detected (by machine & analyst), some are clearly true positives
Others are acceptable quality
Others are ambiguous
Others are unreliable
And others are clearly not true positives (ie false matches)
But each of these assessments could be true or false or unsure
How many of each will eventually be true or false SNPs?
What is the sensitivity & specificity?
Find out the L21 story
Detection Threshold / Filter can include criteria of coverage, quality, region of the Y, multiple copy on the Y, known presence in multiple haplogroups
From John Cleary:
As for the Recognition one, I’m not convinced by the “Problem with recognition algorithm” issue. Do we have concrete examples of such problems? After all, we’re talking about something that is essentially comparing two strings of text symbols with each other, which sounds like a pretty simple kind of algorithm to write, for those who can do such a thing. It seems to me that when a SNP is missed – if there are such cases – it will be because of other filters written into the algorithm, in other words it is a type of the problem in the line above when something is rejected because it doesn’t meet the threshold criteria. These can be criteria of coverage, quality, region of the Y, multiple copy on the Y, known presence in multiple haplogroups etc. If these filters are set too strictly, then viable SNP candidates can be rejected and never be thrown up for manual analysis.
I doubt whether ‘manual analysis of BAM’ is really feasible if positions for investigation are not being thrown up by a prior automated search of the BAM file. We can’t eyeball 14 million positions manually. We might get lucky in some cases, but what we really want is a soft set of filters that will throw up borderline cases for manual investigation, and err on the side of the false positive, so we can investigate and reject them if they are flawed. Do we know that our friends at YFull / FGC / ClarifY are not in fact doing exactly this?
So perhaps the Recognition one could be something like – Filters/algorithms insensitive to borderline cases?? False Positive – SNP can be investigated by manual analysis of BAM file; False Negative – SNP will be missed??
And I think we should build a log of cases of the latter type, so that we know it is actually happening.
Only interested in zero difference
Known SNPs – high & medium confidence
Novel Variants – high only
Shared NV – high & medium
Unique NV – high & medium
Men whose NGS data have been fully analyzed are indicated with a grey background color. Red is used for men whose data has not yet been fully analyzed. His position on the tree is not yet final, and will in general be downstream of the current position. He may not be positive for all the SNPs/INDELs in the block he descends from.
A green SNP name with a '?' indicates that the SNP's status for the block is uncertain, but assumed to be positive. The same SNP probably occurs in an upstream block. It will be necessary to check BAM files or perhaps Sanger sequence some men to prove the result.
A red SNP name with a '?' indicates that the SNP's status for the block is uncertain, but assumed to be negative. The same SNP probably occurs in a parallel block. It will be necessary to check BAM files or perhaps Sanger sequence some men to prove the result.
Mutations written with a red background fall within a region of the Y chromosome, such as the palindromic region, which has left the position of the mutation ambiguous. The true mutation may be at the indicated position, or at any one of a number of alternate positions.
In Alex’s Big Tree, we are looking at only those SNPs that are shared between the men who have currently tested.
Private SNPs are excluded
His Tree is characterised by Undifferentiated SNP blocks.
It’s useful to look at the progression of Alex’s Big Tree over time
Most of this happened in 2015 so these results are really very new & the science is changing rapidly
In the early days, the 2 Gleason men were lumped together with a Carroll man pending analysis of the second Gleeson man’s results (indicated by the red background)
But by April, Alex’s analysis split the Carroll man from the Gleason men
And when my Dad did his Big Y, his results split the Gleason group in two
Notice how all the SNPs apart from 1 (A660) have NOT been named and are identified by their location numbers only
- this is a challenge because quoting these numbers or using them in conversation or checking them is unwieldy
But by June, some of the SNPs had names, some were still referred to by location only, and several had 2 separate names
- another challenge: a single name would help avoid confusion
And by Oct, a Glisson man split the Gleeson bunch into 3 distinct branches, and also split the A5629 SNP block into 2
- instead of a block of 5 SNPs, it is now a block of 4 SNPs with 1 SNP breaking away to form its own sub-branch
And look at the Carroll man, he has now been joined by a second Carroll man and now there is a whole block of Shared SNPs above their names.
These would have been in the first Carroll man’s Private Collection of SNPs prior to the arrival of the second Carroll man.
This nicely shows how, as more people test, more SNPs will move from individuals Private Collections into the Shared SNP sections of the Big Tree
Grey = no coverage
Pink = marginal coverage
+ Definite
* Probable
** Possible
*** Unlikely
L21 story
First, let’s identify those members who have undergone SNP-testing (all have done the Big Y test)
Red arrow = Branch 9
Yellow arrow = Branch 14
Purple arrow = Branch 2 (with sub-branches 6 and 7)
So at first glance, the SNPs seem consistent with the present version of the Fluxus-based Tree
The ancient Glisson branch (14) is completely separate from all others and characterised by the absence of SNP A5628
2 of the 3 brothers in Branch 9 have tested and are grouped together under the same SNP mutation block A660
And the 3 remaining Big Y testees are all closely related to each other by STR analysis & are all grouped under SNP Y16880
So at first glance, the STR grouping and SNP grouping seem consistent with each other
The BY2853 SNP block can be imagined to be positioned far up in the tree, above the Gleeson MRCA
Similarly for the next block of SNPs (A5629 block)
Then Glisson & the A5628 block split off … but this is where an inconsistency emerges
All the remaining testees (Branches 2,7,8,9,10) sit underneath A5628
So we have to create a different link to their branches
A second problem is that Branches 9 & 10 do not have a 456 16-15 mutation
So to make this fit, we could imagine that mutation 456 was a relatively early mutation for ALL the branches in question, followed some time later by A5628,and somewhere along the way Branch 9 (& 10) developed a Back Mutation in marker 456which explains why this mutation is missing from the haplotypes of the present-day members of Branch 9
This further reconfiguration of the tree suggests that other branches that have not yet been tested may also have the A5628 mutation
i.e. seeing as how Glisson is an ancient branch, it may be that Branches 8,4,5,12 also share the A5628 mutation – single SNP testing?
The major difference is the differing Modal Haplotype at the top of the tree
I use the MH for the surname group,
But Nigel uses the MH from the overall Z255 group … which is probably a better way of doing it
The branching pattern is identical – it’s just that sometimes his mutations are mirror-images of mine
Noticeable in 464, CDY, 461, 390, & 389
But overall, the two trees are consistent with each other
I take mine one step further by including 37-marker data
Using the 495 STR marker TMRCA estimates, the branching point can be further refined
We can also calculate the missing value between Branch 8,5,4,12 and the MH
Are these TMRCA estimates consistent with the Branching Pattern?
Yes, except for the red circle (Branch 13) but this could be because it is based on a Y37 result in G70
Using the 495 STR marker TMRCA estimates, the branching point can be further refined
We can also calculate the missing value between Branch 8,5,4,12 and the MH
Are these TMRCA estimates consistent with the Branching Pattern?
Yes, except for the red circle (Branch 13) but this could be because it is based on a Y37 result in G70
Using the 495 STR marker TMRCA estimates, the branching point can be further refined
We can also calculate the missing value between Branch 8,5,4,12 and the MH
Are these TMRCA estimates consistent with the Branching Pattern?
Yes, except for the red circle (Branch 13) but this could be because it is based on a Y37 result in G70