Broad Clinical Research Sequencing Platform is now Broad Clinical Labs! Explore our new website to access our trusted services.

Improving Diagnosis in Rare Diseases: The Role of ExpansionHunter in Whole Genome Sequencing

Author: Andrea Oza

Many individuals and families with genetic disorders are all too familiar with the concept of a “diagnostic odyssey”. It refers to the long and arduous journey to find a diagnosis, which can take years or decades, require many different tests and evaluations, and even involve dead-end incorrect diagnoses along the way. Consolidating the number of genetic tests required to reach a diagnosis could significantly reduce the length of time from symptom onset to diagnosis. Whole genome sequencing (WGS) has already been used to identify multiple variant types that previously required separate tests (eg. single nucleotide variants and copy number variants).


Identifying short tandem repeat expansions in WGS data

Expansions of short tandem repeats (STRs) are a type of variant that are responsible for many neurological disorders including, Fragile X syndrome, Friedreich ataxia, Huntington disease, myotonic dystrophy and spinocerebellar ataxia. Testing for STRs traditionally required targeted methods (triplet-repeat primed PCR or Southern blot), and while these methods are effective, they were typically ordered as a single gene analysis. Enter DRAGEN ExpansionHunter, a computational tool packaged into the Illumina DRAGEN pipeline that is capable of detecting STR expansions across many different genes using PCR-free short-read WGS data. This tool allows for a diagnostic WGS test to include reporting on STR expansions for the genes included in the tool.

The Rare Genomes Project (RGP) at the Broad Institute has been at the forefront of leveraging ExpansionHunter to identify variants responsible for rare conditions. Through their efforts, individuals with features of ataxia, myopathy, or muscular dystrophy, spanning a wide age range, have been diagnosed with STR expansions in various genes. This success story underscores the potential of computational tools like ExpansionHunter in unraveling the genetic mysteries behind rare diseases. Success of ExpansionHunter in RGP has led the Broad Clinical Laboratory to validate Illumina DRAGEN ExpansionHunter for clinical WGS testing.


Testing of DRAGEN ExpansionHunter

A cohort of 22 samples sourced from the Coriell Institute with known STR expansions in six genes (FMR1, ATXN1, FXN, HTT, C9ORF72, and DMPK) that vary by repeat size, motif, inheritance pattern, and patient sex were used. Coriell sizing was performed by Southern blot and/or PCR analysis. The normal, premutation, and full expansion repeat ranges were determined for each gene, along with a cutoff flag that would be used to flag potentially expanded alleles. These samples were run on the llumina NovaSeq 6000 system and called using DRAGEN v3.10.4 ExpansionHunter. The results of this analysis were compared to the truth data from Coriell.

As expected, sequencing read length (~150 bp) limited the ability for ExpansionHunter to call a repeat size accurately. All repeats below 150bp in length were called accurately (+/-1 repeat), whereas none of the repeats >150bp were accurately called (Figures 1a and 1b). Determining the class of expansion (normal, premutation, full mutation) is also limited for expansions beyond ~150bp. Based on these data, clinical interpretive reporting using ExpansionHunter will need to rely on the flagging cutoffs to distinguish between normal and potentially expanded alleles. However, all of the loci in this validation had flagging cutoffs that were under the 150bp limit. This may not be the case for all loci included in the ExpansionHunter caller.

Figure 1a and 1b: ExpansionHunter repeat number compared to the number reported by Coriell. 1a) Samples with Coriell repeats <50 (<150bp) in length. 1b) Samples with Coriell repeats >50 (>150bp) in length, using a log scale for x and y. Repeats >150bp are consistently under called by ExpansionHunter.


Future directions

The integration of ExpansionHunter into clinical WGS testing is not without its challenges. Validation efforts, such as those undertaken by the Broad Clinical Laboratory, are crucial to ensuring accuracy and reliability. Validation and incorporation into clinical WGS interpretation is still in progress. These validations pave the way for incorporating ExpansionHunter calls into interpretive reports, further enhancing the diagnostic utility of WGS.

One of the key advantages of incorporating ExpansionHunter into diagnostic workflows is the expanded variant types reported. This translates to increased sensitivity, particularly for neurological disorders that may be caused by STR expansions. By reducing the need for separate test orders and streamlining the diagnostic process, patients with rare diseases can potentially receive timely and accurate diagnoses, thereby minimizing the diagnostic odyssey they often face.

In conclusion, the inclusion of ExpansionHunter in diagnostic WGS represents a significant leap forward in the field of rare disease diagnostics. Its ability to detect STR expansions with high accuracy and its integration into clinical workflows hold immense promise for improving patient outcomes and reducing the burden of the diagnostic odyssey. As advancements in computational tools continue to evolve, so too will our ability to unlock the genetic mysteries underlying rare diseases, bringing hope to patients and their families worldwide.



  1. Dolzhenko E, et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 2017 Nov;27(11):1895-1903. doi: 10.1101/gr.225672.117. Epub 2017 Sep 8. PMID: 28887402; PMCID: PMC5668946.
  2. Ibañez K, et al. Whole genome sequencing for the diagnosis of neurological repeat expansion disorders in the UK: a retrospective diagnostic accuracy and prospective clinical validation study. Lancet Neurol. 2022 Mar;21(3):234-245. doi: 10.1016/S1474-4422(21)00462-2. PMID: 35182509; PMCID: PMC8850201.
  3. Dolzhenko E, et al.. REViewer: haplotype-resolved visualization of read alignments in and around tandem repeats. Genome Med. 2022 Aug 11;14(1):84. doi: 10.1186/s13073-022-01085-z. PMID: 35948990; PMCID: PMC9367089.


Important note for this blog: Posts do not equal endorsements. Opinions expressed in this blog are those of the author, on behalf of the genomics group at Broad. We make every effort to ensure the accuracy of data/figures presented here but these are not peer-reviewed and errors may occur from time to time.


Sean Hofherr

Chief of Clinical Strategy and Product Development, Broad Clinical Labs

Sean Hofherr is dual board certified by ABMGG in Clinical Biochemical Genetics and Clinical Molecular Genetics. Sean serves as the Chief of Clinical Strategy and Product Development at Broad Clinical Labs. In this role at BCL, Sean is able to leverage his extensive experience to guide the clinical vision and delivery across the organization. Sean most recently served as the Chief Operating Office at Fabric Genomics, which focuses on the use of AI and Bioinformatics for Clinical Interpretation of whole genome sequencing. Prior to Fabric, Sean was the Chief Scientific Officer and CLIA Director at the commercial reference laboratory, GeneDx.

Sean received his B.S. degree in Microbiology and Cell Sciences from the University of Florida before earning his Ph.D. in Molecular and Human Genetics from Baylor College of Medicine. Sean completed clinical fellowships in Clinical Biochemical Genetics and Clinical Molecular Genetics at the Mayo Clinic.

Danielle Perrin

Chief of Staff, Broad Clinical Labs

As Broad Clinical Labs’ Chief of Staff, Danielle Perrin advises and supports colleagues on the executive leadership team in BCL’s strategic planning and execution. She builds and leads new organizational functions and processes and leads critical projects, as well as driving effective information flow, decision making, and execution throughout the organization. An operations leader with a business, engineering, and biology background and 20+ years of experience in the genomics field, Perrin has a track record of driving operational excellence and building and scaling both physical and business processes. During her career at Broad, which started in 2003 at the tail end of the Human Genome Project, Perrin has led laboratory operations and R&D teams in Broad’s Genomics Platform, as well as fulfilling senior advisory and leadership roles in the Broad Institute’s COO and CFO offices.

Perrin received her B.S. in Biology and M.E. in Biotechnology Engineering from Tufts University and her M.B.A. from the MIT Sloan School of Management.

Tim De Smet

Chief Commercial Officer, Broad Clinical Labs

As Chief Commercial Officer of Broad Clinical Labs, Tim De Smet leads BCL’s business development, alliance management, external project management, and customer support teams. A Broad Institute employee since 2008, De Smet has held leadership roles and managed teams of various sizes in Broad’s Genomics Platform and clinical lab, spanning laboratory operations, finance, and informatics, and has expertise in work design, financial modeling, and high scale laboratory and business operations.

De Smet received his B.S. in Biochemistry and M.B.A. from Northeastern University.

Jim Meldrim

Chief Technology Officer, Broad Clinical Labs

As Chief Technology Officer, Jim Meldrim sets the vision for Broad Clinical Labs’ informatics systems, including the hardware and software used for sample intake and tracking, data production, analysis, and delivery. Having held a variety of laboratory and informatics-focused leadership roles at Broad, spanning R&D and production operations, Meldrim has been a leader and innovator in the generation, management, and analysis of genomic data since 1999, beginning with sequencing data generation for the Human Genome Project.

Meldrim received his B.S. in Biology from Cornell University.

Sheila Dodge

Chief Operating Officer, Broad Clinical Labs

As Chief Operating Officer, Sheila Dodge leads Broad Clinical Labs’ process development and implementation activities, as well as lab operations, financial planning and operations, quality & compliance, and core business processes. A Six Sigma Black Belt with extensive experience in process development and high throughput genomics operations, Dodge is an expert in work design and in collaborating with a range of collaborators, scientists, engineers, and technology partners to rapidly integrate new technologies and operationalize innovations. A member of the Broad Institute since 2001, Dodge is an Institute Scientist and lectures at the MIT Sloan School of Management on operations, dynamic work design, and visual management techniques.

Dodge received her B.A. in biochemistry and molecular biology from Boston University and her master’s degree in biology from Harvard University. She earned her M.B.A. from MIT Sloan School of Management.

Heidi Rehm, Ph.D., FACMG

Chief Medical Officer and Clinical Laboratory Director, Broad Clinical Labs

Heidi Rehm is board-certified by ABMGG in Clinical Molecular Genetics and Genomics and serves as BCL’s Chief Medical Officer and Clinical Laboratory Director. She oversees BCL’s regulatory requirements, leads the clinical team performing genomic interpretation and variant analysis, and guides BCL’s efforts in genomic testing for clinical and research use. She is also an Institute Member of the Broad and co-director of the Medical and Population Genetics Program. Rehm is also the Chief Genomics Officer in the Department of Medicine and Genomic Medicine Unit Director at the Center for Genomic Medicine at Massachusetts General Hospital, working to integrate genomics into medical practice. She is a principal investigator of ClinGen, providing free and publicly accessible resources to support the interpretation of genes and variants. She co-leads both the Broad Center for Mendelian Genomics, focused on discovering novel rare disease genes, and the Matchmaker Exchange, which aids in gene discovery. She is Chair of the Global Alliance for Genomics and Health, a principal investigator of the Broad-LMM-Color All of Us Genome Center, co-leader of the Genome Aggregation Database (gnomAD), and a Board Member and Vice President of Laboratory Genetics for the American College of Medical Genetics and Genomics.

Rehm received her B.A. degree in molecular biology and biochemistry from Middlebury College before earning her M.S. in biomedical science from Harvard Medical School and Ph.D. in genetics from Harvard University. She completed her post-doctoral training with David Corey in neurobiology and a fellowship in clinical molecular genetics at Harvard Medical School.

Niall Lennon, Ph.D.

Chair and Chief Scientific Officer, Broad Clinical Labs

As Chair and Chief Scientific Officer of Broad Clinical Labs, Niall Lennon leads the team and sets the scientific and clinical vision for the organization. Dr. Lennon joined the Broad Institute in 2006 and has since contributed to the development of applications for every major massively parallel sequencing platform across a range of fields. In 2013 Dr. Lennon led the effort to establish a CLIA licensed, CAP-accredited clinical laboratory at the Broad Institute to facilitate return of results to patients and to support clinical trials. More recently, he has led efforts to achieve FDA approval for large-scale genomics projects (NIH’s All of Us Research Program) and for Broad’s own clinical diagnostic for COVID-19 testing operation, which returned 37+ million results to patients. Dr. Lennon is a principal investigator of the eMerge and All of Us projects, an Institute Scientist at Broad, Associate Director of Broad’s Gerstner Center for Cancer Diagnostics, and an adjunct professor of biomedical engineering at Tufts University, where he teaches Molecular Biotechnology.

Dr. Lennon received a Ph.D. in pharmacology from University College Dublin and completed his postdoctoral studies at Harvard Medical School and Massachusetts General Hospital. He holds an executive certificate in management from the MIT Sloan School of Management.