Blended Genome-Exome Sequencing is now available for clinical applications! Visit our Clinical BGE page.

Making the Most of Genome Sequencing: The Need for Detection of Sequence Variants in Notoriously Tricky Regions

Author: Marina DiStefano

Genome sequencing has opened opportunities for detecting multiple variant types across the genome with a single technology. Previously, BCL has tackled evaluating copy number variant (CNV) detection from genome data using the DRAGEN™ 4.2.4 CNV pipeline. While CNV detection significantly expands the diagnostic power of genome data, there are still many known regions of the genome where sequence variant detection is difficult because of biological challenges. For example, many genes where variants are associated with genetic conditions and are reported in a clinical setting contain repetitive regions (e.g. NEB) or have high homology with a pseudogene (e.g. PMS2). These genes currently require specialized orthogonal assay design to detect variants with confidence.

With these challenges in mind, BCL has initiated efforts to evaluate new bioinformatic approaches to calling variants in these difficult genes. Illumina has designed specific targeted callers that can be run to detect sequence variants in these tricky regions using the DRAGEN software. Some of these callers are currently available, such as HBA caller that aids detection in the highly homologous genes HBA1 and HBA2, which are associated with α-thalassemia and the SMN1 caller, which detects copy number changes in the highly homologous genes SMN1 and SMN2, which are associated with spinal muscular atrophy.

This blog details the clinical need for these callers and introduces our latest effort in evaluating the performance of additional DRAGEN targeted callers, which will be detailed in subsequent blog posts.

Genome-wide tests make domain-specific expertise more difficult

With the decreasing cost of sequencing, Mendelian disease genes are being discovered more rapidly than ever before [1]. Per a policy statement of the American College of Medical Genetics and Genomics [2], all genes that have a gene-disease relationship of Moderate or above per the semi-quantitative framework developed by the Clinical Genome Resource (ClinGen) [3], are reportable in a diagnostic setting. As of December 2023, there were approximately 4000 genes in the Gene Curation Coalition database (search.thegencc.org) [4] that have a strength of Moderate or above and could be reportable on a clinical test.

If a clinical laboratory is going to offer a diagnostic test to report variants in a particular gene, the lab must understand all of the technical limitations of detecting variants in this gene. Depending on particular limitations, laboratories may be expected to develop multiple technologies to offer a highly sensitive and specific test [5]. For some genes, this ask is more difficult than others because of biology. This blog will speak to two different biological scenarios that often require special ancillary assays for a complete clinical test. Developing these assays takes time, can be costly, and often requires detailed knowledge of the gene, disease, and underlying biology of the system. This was more feasible when labs focused on small test menus and truly developed expertise in certain gene/disease areas, but as clinical laboratories try to move to genome-wide testing technologies, this task becomes exceedingly difficult.

Homologous Regions: Where is my variant?

Next generation sequencing (NGS) involves fragmenting genomic DNA into small pieces, usually ~350 base pairs each, ligating, barcoding, sequencing, and aligning them to a reference genome, much like putting together a jigsaw puzzle. The overall success of this method, particularly because the fragments are so short (usually ~150bp reads are generated), relies on the principle that many sections of the genome have a unique sequence. However, much like a puzzle that has certain colors or images where the pieces are indistinguishable, there are certain sections of the genome that are notoriously difficult to align. Regions of high homology fall into this bucket because they are regions in the genome that have near identical sequences (Figure 1). Homologous regions may have formed in evolution of the mammalian genome due to gene duplication events. One gene retains all of the functional elements of the active gene, while the other is an inactive gene copy, usually referred to as a pseudogene. Pseudogenes may contain all genetic sequence, or may be “processed” which means that all of the introns have been spliced out of that copy before it was reincorporated into the genome. Pseudogenes can be particularly subject to variation because of the lack of evolutionary constraint on the nonfunctional gene elements [6].

Figure 1: Mapping short reads is difficult in regions of high homology

 

Distinguishing between genes and pseudogenes can be critical because variants in some genes with a known pseudogene pair are associated with severe, highly actionable genetic conditions. This is illustrated by the fictitious case example below:

John Smith is a 45 year old patient with a history of colorectal cancer. Histology of a biopsy of this cancer reveals microsatellite instability (MSI). His clinical genetics team orders a comprehensive cancer panel test. The clinical lab detects a nonsense variant in exon 11 of PMS2. PMS2 is definitively associated with autosomal dominant Lynch syndrome, a syndrome with a high risk of early-onset colorectal cancer, among other cancers. The lab may have found the cause for John’s cancer. However, PMS2 has a highly homologous pseudogene PMS2CL that overlaps with exons 9 and 11-15 of PMS2 more than 98% identical sequence [7]. Before the lab can report this variant, they must determine if it is located in PMS2 (likely explains his condition, can inform familial cascade testing) or PMS2CL (pseudogene variant without clinical impact, no identified cause for his condition). NGS cannot differentiate between these regions; thus the laboratory must design and validate an ancillary assay to detect bona fide PMS2 variants in the pseudogene region. This usually consists of Multiplex Ligation-dependent Probe Amplification (MLPA), long-range PCR, or other methods. Reflexing to this assay when a variant is detected can add time and cost to a clinical sequencing test.

Differentiating between variants in genes and pseudogenes is also critical in a screening situation. If you alter the fictitious clinical scenario above slightly and say that John Smith is a 45 year old seemingly healthy individual who opted to do genetic screening and a nonsense variant was identified in exon 11 of PMS2, it would also be critical for the lab to determine if this variant were in PMS2 or PMS2CL to help inform screening protocols and prophylactic measures for this individual. If the variant were in PMS2, this individual would be at risk for Lynch syndrome cancers, but this would not be the case if the variant were in PMS2CL.

Repetitive Regions: Is my variant real?

A second bucket of difficult to align sequence in short read sequencing are those regions that contain long runs of repetitive material. If we return to the puzzle analogy, these would be sections in a puzzle that have a repeating pattern and the first part repeat of the pattern is not distinguishable from the third repeat of the pattern. It may be difficult to accurately detect variants in this region because the sequence is so repetitive (Figure 2). While it may not be as critical to place the variant in a repetitive region, it is critical to determine if a variant is actually present at all or is merely a sequencing artifact. This is illustrated in the fictitious example below:

Figure 2: Mapping short reads is difficult in repetitive regions

 

Jane Doe is a patient with Nemaline myopathy and a muscle biopsy that has identified Nemaline rods. Her clinical team orders a comprehensive myopathy panel and a variant in NEB is identified. However, it is in the region of exons 82-105, which is a highly repetitive region that is a triplication of 8 exons [8]. The lab must determine if this variant is in fact real because it could be an explanation for Jane’s condition. However, because sequencing alone cannot determine this, an ancillary assay must be performed.

Future Directions

BCL, in collaboration with Illumina, has prioritized testing certain targeted callers that would help improve the sensitivity of our current genome sequencing product without the need to perform ancillary testing to detect variants in tricky regions. Current work is being done to source samples that have been tested with orthogonal methods to validate these callers. Future blog posts will be dedicated to the performance of new targeted callers for genes like PMS2, NEB, STRC, and HBA1 and 2, to name a few on the road map. Iterative development of bioinformatic algorithms to leverage the power of genome sequencing will help improve variant calling and, in turn, accurate clinical reporting from WGS, pushing it further toward that “one stop shop” genetic testing method of the future.

 

References:

1. Boycott, K.M., et al., International Cooperation to Enable the Diagnosis of All Rare Genetic Diseases. Am J Hum Genet, 2017. 100(5): p. 695-705.

2. Bean, L.J.H., et al., Diagnostic gene sequencing panels: from design to report-a technical standard of the American College of Medical Genetics and Genomics (ACMG). Genet Med, 2020. 22(3): p. 453-461.

3. Strande, N.T., et al., Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet, 2017. 100(6): p. 895-906.

4. DiStefano, M.T., et al., The Gene Curation Coalition: A global effort to harmonize gene-disease evidence resources. Genet Med, 2022. 24(8): p. 1732-1742.

5. Rehder, C., et al., Next-generation sequencing for constitutional variants in the clinical laboratory, 2021 revision: a technical standard of the American College of Medical Genetics and Genomics (ACMG). Genet Med, 2021. 23(8): p. 1399-1415.

6. https://www.ncbi.nlm.nih.gov/books/NBK535152/

7. Li, J., et al., A Comprehensive Strategy for Accurate Mutation Detection of the Highly Homologous PMS2. J Mol Diagn, 2015. 17(5): p. 545-53.

8. Yuen, M. and C.A.C. Ottenheijm, Nebulin: big protein with big responsibilities. J Muscle Res Cell Motil, 2020. 41(1): p. 103-124.

9. Blog Thumbnail Photo by Warren Umoh on Unsplash

Important note for this blog: Posts do not equal endorsements. Opinions expressed in this blog are those of the author, on behalf of the genomics group at Broad. We make every effort to ensure the accuracy of data/figures presented here but these are not peer-reviewed and errors may occur from time to time. Broad has a collaboration agreement with Illumina that in-part funds this work.

 

Sean Hofherr

Chief of Clinical Strategy and Product Development, Broad Clinical Labs

Sean Hofherr is dual board certified by ABMGG in Clinical Biochemical Genetics and Clinical Molecular Genetics. Sean serves as the Chief of Clinical Strategy and Product Development at Broad Clinical Labs. In this role at BCL, Sean is able to leverage his extensive experience to guide the clinical vision and delivery across the organization. Sean most recently served as the Chief Operating Office at Fabric Genomics, which focuses on the use of AI and Bioinformatics for Clinical Interpretation of whole genome sequencing. Prior to Fabric, Sean was the Chief Scientific Officer and CLIA Director at the commercial reference laboratory, GeneDx.

Sean received his B.S. degree in Microbiology and Cell Sciences from the University of Florida before earning his Ph.D. in Molecular and Human Genetics from Baylor College of Medicine. Sean completed clinical fellowships in Clinical Biochemical Genetics and Clinical Molecular Genetics at the Mayo Clinic.

Danielle Perrin

Chief of Staff, Broad Clinical Labs

As Broad Clinical Labs’ Chief of Staff, Danielle Perrin advises and supports colleagues on the executive leadership team in BCL’s strategic planning and execution. She builds and leads new organizational functions and processes and leads critical projects, as well as driving effective information flow, decision making, and execution throughout the organization. An operations leader with a business, engineering, and biology background and 20+ years of experience in the genomics field, Perrin has a track record of driving operational excellence and building and scaling both physical and business processes. During her career at Broad, which started in 2003 at the tail end of the Human Genome Project, Perrin has led laboratory operations and R&D teams in Broad’s Genomics Platform, as well as fulfilling senior advisory and leadership roles in the Broad Institute’s COO and CFO offices.

Perrin received her B.S. in Biology and M.E. in Biotechnology Engineering from Tufts University and her M.B.A. from the MIT Sloan School of Management.

Tim De Smet

Chief Commercial Officer, Broad Clinical Labs

As Chief Commercial Officer of Broad Clinical Labs, Tim De Smet leads BCL’s business development, alliance management, external project management, and customer support teams. A Broad Institute employee since 2008, De Smet has held leadership roles and managed teams of various sizes in Broad’s Genomics Platform and clinical lab, spanning laboratory operations, finance, and informatics, and has expertise in work design, financial modeling, and high scale laboratory and business operations.

De Smet received his B.S. in Biochemistry and M.B.A. from Northeastern University.

Jim Meldrim

Chief Technology Officer, Broad Clinical Labs

As Chief Technology Officer, Jim Meldrim sets the vision for Broad Clinical Labs’ informatics systems, including the hardware and software used for sample intake and tracking, data production, analysis, and delivery. Having held a variety of laboratory and informatics-focused leadership roles at Broad, spanning R&D and production operations, Meldrim has been a leader and innovator in the generation, management, and analysis of genomic data since 1999, beginning with sequencing data generation for the Human Genome Project.

Meldrim received his B.S. in Biology from Cornell University.

Sheila Dodge

Chief Operating Officer, Broad Clinical Labs

As Chief Operating Officer, Sheila Dodge leads Broad Clinical Labs’ process development and implementation activities, as well as lab operations, financial planning and operations, quality & compliance, and core business processes. A Six Sigma Black Belt with extensive experience in process development and high throughput genomics operations, Dodge is an expert in work design and in collaborating with a range of collaborators, scientists, engineers, and technology partners to rapidly integrate new technologies and operationalize innovations. A member of the Broad Institute since 2001, Dodge is an Institute Scientist and lectures at the MIT Sloan School of Management on operations, dynamic work design, and visual management techniques.

Dodge received her B.A. in biochemistry and molecular biology from Boston University and her master’s degree in biology from Harvard University. She earned her M.B.A. from MIT Sloan School of Management.

Heidi Rehm, Ph.D., FACMG

Chief Medical Officer and Clinical Laboratory Director, Broad Clinical Labs

Heidi Rehm is board-certified by ABMGG in Clinical Molecular Genetics and Genomics and serves as BCL’s Chief Medical Officer and Clinical Laboratory Director. She oversees BCL’s regulatory requirements, leads the clinical team performing genomic interpretation and variant analysis, and guides BCL’s efforts in genomic testing for clinical and research use. She is also an Institute Member of the Broad and co-director of the Medical and Population Genetics Program. Rehm is also the Chief Genomics Officer in the Department of Medicine and Genomic Medicine Unit Director at the Center for Genomic Medicine at Massachusetts General Hospital, working to integrate genomics into medical practice. She is a principal investigator of ClinGen, providing free and publicly accessible resources to support the interpretation of genes and variants. She co-leads both the Broad Center for Mendelian Genomics, focused on discovering novel rare disease genes, and the Matchmaker Exchange, which aids in gene discovery. She is Chair of the Global Alliance for Genomics and Health, a principal investigator of the Broad-LMM-Color All of Us Genome Center, co-leader of the Genome Aggregation Database (gnomAD), and a Board Member and Vice President of Laboratory Genetics for the American College of Medical Genetics and Genomics.

Rehm received her B.A. degree in molecular biology and biochemistry from Middlebury College before earning her M.S. in biomedical science from Harvard Medical School and Ph.D. in genetics from Harvard University. She completed her post-doctoral training with David Corey in neurobiology and a fellowship in clinical molecular genetics at Harvard Medical School.

Niall Lennon, Ph.D.

Chair and Chief Scientific Officer, Broad Clinical Labs

As Chair and Chief Scientific Officer of Broad Clinical Labs, Niall Lennon leads the team and sets the scientific and clinical vision for the organization. Dr. Lennon joined the Broad Institute in 2006 and has since contributed to the development of applications for every major massively parallel sequencing platform across a range of fields. In 2013 Dr. Lennon led the effort to establish a CLIA licensed, CAP-accredited clinical laboratory at the Broad Institute to facilitate return of results to patients and to support clinical trials. More recently, he has led efforts to achieve FDA approval for large-scale genomics projects (NIH’s All of Us Research Program) and for Broad’s own clinical diagnostic for COVID-19 testing operation, which returned 37+ million results to patients. Dr. Lennon is a principal investigator of the eMerge and All of Us projects, an Institute Scientist at Broad, Associate Director of Broad’s Gerstner Center for Cancer Diagnostics, and an adjunct professor of biomedical engineering at Tufts University, where he teaches Molecular Biotechnology.

Dr. Lennon received a Ph.D. in pharmacology from University College Dublin and completed his postdoctoral studies at Harvard Medical School and Massachusetts General Hospital. He holds an executive certificate in management from the MIT Sloan School of Management.