Bina Technologies, now part of Roche Sequencing
Trio Analysis on GiaB High-Confidence SVs Bina Technologies, Roche Sequencing Marghoob Mohiyuddin, Jian Li, Hugo Lam
Motivation • GiaB recently released high-confidence SVs for NA12878 for validating SV methods • 2,676 deletions, 68 insertions • Trio sequences are available from Illumina • MetaSV, recently published, calls SVs using multiple methods with high-confidence • Validation of GiaB SV Goldset using trio analysis with MetaSV assures quality
For Research Use Only. Not for use in diagnostic procedures.
Methodology • Validation using trio analysis (50x coverage) • MetaSV calls on parents (NA12891, NA12892) • MetaSV calls on NA12878 • Criteria • Deletions ≥ 100bp considered (2,348/88% in GiaB) • Reciprocal overlap of 50% used • A GiaB deletion is validated if • • •
Detected by MetaSV in any parent or, Detected by MetaSV as a PASS in NA12878 or, Reported in previous literatures with validation
For Research Use Only. Not for use in diagnostic procedures.
What is MetaSV?
An Ensemble Approach
For Research Use Only. Not for use in diagnostic procedures.
MetaSV Workflow ●
●
●
For Research Use Only. Not for use in diagnostic procedures.
Ensemble SV calling ○ Merge SVs from multiple methods and tools ○ SVs detected by multiple methods are high-confidence Enhanced insertion detection ○ Existing tools weak in detecting insertions ○ Use a combination of soft-clip analysis and assembly Assembly and alignment to refine breakpoints
MetaSV Accuracy ● ●
● ● ●
VarSim simulation of 50x Illumina 2x100bp reads for NA12878 Reciprocal overlap of 100% and wiggle of 100bp to access both breakpoint precision and accuracy Performance varies for tools/methods across sizes MetaSV has best stable accuracy across all SV sizes Achieved 90.2% sensitivity against Complete Genomics high-confidence SVs for NA12878
For Research Use Only. Not for use in diagnostic procedures.
Are GiaB SVs of high quality?
Total Validated
Total not validated
Additionally Validated
GiaB HC
0 (0%)
2,348 (100%)
0 (0%)
GiaB HC Validated by Parents (MetaSV ALL)
2,302 (98.0%)
46 (2.0%)
2,302 (98.0%)
GiaB HC Validated by Child (MetaSV PASS)
2,306 (98.2%)
42 (1.8%)
4 (0.2%)
2,342 (99.7%)
6 (0.3%)
36 (1.5%)
GiaB Deletion Validation
GiaB HC Validated by Child (curated)
For Research Use Only. Not for use in diagnostic procedures.
Manual Inspection of Unvalidated Calls Using IGV & SVVIZ (1: 19151770-19152035)
●
Paired-end support: Only reported by BreakDancer For Research Use Only. Not for use in diagnostic procedures.
Manual Inspection of Unvalidated Calls (2:233764771233765484)
●
Split-read support: only Pindel reported this
For Research Use Only. Not for use in diagnostic procedures.
Manual Inspection of Unvalidated Calls (7:8973733589738051)
●
Not reported by any tool. Read-depth support is weak. Read-pair support present.
For Research Use Only. Not for use in diagnostic procedures.
Manual Inspection of Unvalidated Calls (11:2900058329012888)
●
PASS imprecise call with reciprocal overlap of 0.48! Seems to be misaligned
For Research Use Only. Not for use in diagnostic procedures.
Manual Inspection of Unvalidated Calls (14:106798247106822961)
●
Both paired-end and read-depth support but BreakDancer reports a much larger 298K deletion For Research Use Only. Not for use in diagnostic procedures.
Manual Inspection of Unvalidated Calls (16:8398408483984359)
●
Coverage around the region is unusually higher than 50x (> 200x).
For Research Use Only. Not for use in diagnostic procedures.
Is GiaB SV missing anything?
GiaB Deletion Validation using Mendelian Rule
GiaB HC Dels 2,348
GiaB Private 222
Common 2,126
MetaSV Private 456
MetaSV Trio Dels 2,582
(142 not in literature)
●
MetaSV High Quality Trio Deletions: ○ Mendelian Inheritance Consistency with Genotypes ○ Pass in Child and ALL in Parents ○ Considering no call as reference call
For Research Use Only. Not for use in diagnostic procedures.
MetaSV PASS Dels 2,671 96.7% are Mendelian consistent (98.7% if ignoring genotypes)
Manual Inspection of MetaSV Private Calls (1:9569051095690829)
Reported by all 4 tools, genotyped as 1/1 For Research Use Only. Not for use in diagnostic procedures.
Manual Inspection of MetaSV Private Calls (1:153215877153216600)
Reported by CNVnator, BreakDancer, genotyped as 0/1 For Research Use Only. Not for use in diagnostic procedures.
Manual Inspection of MetaSV Private Calls (12:5027256550273865)
Reported by BreakDancer, Pindel, genotyped as 0/1 For Research Use Only. Not for use in diagnostic procedures.
What’s next?
Conclusions • GiaB SVs have a high validation rate using MetaSV trio analysis • Only 6 unvalidated SVs do not have strong support in IGV or SVVIZ • ⇒ GiaB SVs of high quality • Almost all (up to 98.7%) MetaSV PASS calls are Mendelian consistent making them high quality • Significant number (456) of MetaSV trio calls not in the GiaB set, possibly missed due to stringent GiaB requirements since 321 of those in literature • MetaSV trio validation can help validate and extend the existing gold set For Research Use Only. Not for use in diagnostic procedures.
Future Work
• Using MetaSV genotype information on the two additional trios from GiaB • Four levels of quality classification for the child • HighQual (validated by strict mendelian inheritance) • PASSII (validated by presence in parents) • PASSI (validated by multiple methods) • LowQual (none of the above)
For Research Use Only. Not for use in diagnostic procedures.
Acknowledgement • Genome in a Bottle • Hemang Parikh • Justin Zook • The SV Team
For Research Use Only. Not for use in diagnostic procedures.
• Bina Technologies • Jian Li • Hugo Lam • The Science Team
Bina Technologies, now part of Roche Sequencing