Blended Genome-Exome Sequencing is now available for clinical applications! Visit our Clinical BGE page.

Single-Cell Isoform Sequencing of Fluent Libraries with MAS-seq

By: James T. Webber, Daniel A. Bartlett, Ghamdan Al-Eryani, Aziz Al’Khafaji

At the Methods Development Lab (MDL) of Broad Clinical Labs we are always looking to innovate on the latest genomics technologies. One promising new approach is from Fluent BioSciences for single-cell RNA sequencing. Their PIPseq platform can rapidly generate single-cell emulsions by vortexing a cell suspension with particles they call PIPs (Particle-templated Instant Partitions)1. This technology has the potential to generate very large single-cell libraries at a low cost per cell. Given these favorable characteristics, we were keen to investigate PIPseq’s compatibility with our recently developed high-throughput RNA isoform sequencing approach, Multiplexed Arrays Sequencing (MAS-seq). Together, these approaches could enable the large-scale measurement of isoform expression in 10⁴ to 10⁶ cells.

MAS-seq works by concatenating a defined number of cDNAs, typically 16, into long single molecules, termed MAS arrays2. The length of the MAS arrays are tailored to be in the optimal size range for the long-read PacBio sequencing platforms, approximately 10-18kb. After sequencing, the MAS arrays are deconcatenated informatically, boosting sequencing output by the number of constituent cDNAs per array. PacBio has commercialized MAS-seq and provides kits that boost output for bulk and single-cell RNA sequencing libraries by 16 fold.

The current MAS-seq protocols were developed using 10x Genomics Single-cell Gene expression libraries. In theory, MAS-seq is compatible with any method that generates full-length cDNA molecules and contains the appropriate PCR adapter handles. However, unforeseen obstacles can arise when integrating protocols. In this post, we demonstrate successful integration of MAS-seq and PIPseq in the context of peripheral blood, and show that the combination of both technologies provide high-quality single-cell isoform data.

Experiment

To test out this new method we looked at a classic single-cell sample: peripheral blood mononuclear cells (PBMCs). We used Fluent’s T20 3′ kit to encapsulate single cells into droplets according to the manufacturer’s protocol. After single-cell library construction, we had a pool of full-length cDNA molecules with barcodes and UMIs attached. Here, we separated the full-length cDNA library into two parts: one continued with the standard protocol and went on an Illumina sequencer to get short read gene expression data. The other pool was used as input for MAS-seq.

Figure 1: Overview of the experimental workflow. PBMCs were encapsulated into droplets with the PIPseq T20 3′ kit. After cDNA synthesis the pool was split in two and input to both short- and long-read protocols. Illustration created with BioRender.com.

 

Short-read PIPseq

First we examined the short-read data that we generated from the Fluent platform. We sequenced the library on a NovaSeq X 10B, generating 1.1 billion 246-bp cDNA reads. We used Fluent’s computational pipeline, called PIPseeker, to assign reads to barcodes, count UMIs, and map to the genome. The result is a familiar MTX-formatted file of UMI counts, along with the list of barcodes and features. These files are compatible with the standard tools and processes for analyzing single-cell sequencing data.

To identify cells, we selected cell barcodes containing a minimum of 1,000 UMIs and between 2-7% mitochondrial reads (Fig 2). After applying these filtering steps, 7,029 cell barcodes remained.

Figure 2: Left: knee plot showing the distribution of UMIs per barcode sorted in descending order. Selected barcodes are highlighted in orange. Right: histogram of the percent mitochondrial UMIs per cell, with dotted lines showing the cutoffs used to select barcodes.

 

After performing gene selection and Leiden clustering, we identified eight canonical cell types that should be familiar to people who work with peripheral blood. There is certainly more complexity in this dataset and further clustering would resolve some of the structure visible, but for our purposes it was sufficient to identify the major cell types in the data.

Figure 3: Graph embedding of short read data. The plot on the left shows the number of UMIs per cell, on a log scale. On the right the cells are colored by cell type.

 

Consistent with the described behavior of PIP-seq’s V4 chemistry, we saw lower counts in the CD14 monocytes than in other cell types3. This is a known challenge when processing cells that express high levels of RNAses, which is the case in the CD14 monocyte population.

 

Full-length cDNA using PIPseq

We next sought to determine if cDNA generated by PIPseq is compatible with MAS-seq for single-cell isoform-specific long reads. We split a portion of the pool of PIPseq full-length cDNA to be used as input into the MAS-seq library prep kit from PacBio (cat# 102-659-600) to generate 16-mer cDNA arrays.

After array creation, we sequenced the MAS-seq library on a PacBio Revio, yielding 3,062,399 HiFi reads. After segmentation with PacBio’s tool skera, we ended up with 46,116,893 s-reads, demonstrating a >15-fold boost in output. A total of 2,744,221 reads (89.6%) of the HiFi reads were full-length arrays, with an average length of 11,889 bp (Fig 4). The segmented reads had an average length of 725 bp.

We extracted the barcodes and UMIs from these reads, and mapped the long-read cDNA to the same genome that we used for the short-read data. Following barcode extraction and mapping, we quantified isoforms with IsoQuant and associated those calls with the Fluent barcodes to get a cell-by-isoform matrix4.

 

Figure 4: Quality control for MAS-seq data. Left: concatenation factor and read length distribution for HiFi reads. Right: heatmap of ligation adapter pairs.

When we looked at our long-read data we found nearly all of the cell barcodes that we selected in the short-read data: 7,023 out of 7,029, or 99.9%. Consistent with the short-read data, long-read counts for CD14 monocytes were considerably lower than for other cell types. This effect was potentially exacerbated by our long-read processing protocols that preferentially filter out shorter and likely degraded cDNAs.

Reads after segmentation

46,116,893

Bead barcodes in whitelist

498,380

Reads with barcodes in whitelist

30,766,203

Total UMIs, all barcodes

18,160,470

Total UMIs, selected cell barcodes

12,834,983

Median UMIs/cell

1,253

Median isoforms/cell

462

Table 1: Key metrics describing the single-cell MAS-seq results

 

Isoform-level quantification in PIPseq

While RNA degradation is apparent in the CD14 monocytes, the other cell types were sequenced at high depth. We found many isoforms to be specifically or preferentially expressed in specific cell types, recapitulating the findings of Inamo et al., 2022 and other groups5. Below we show two examples.

Figure 5: FCGR3A expression plotted across the cell population. The left panel shows FCGR3A gene expression, while the middle and right panels show FCGR3A isoform expression.


Example 1: FCGR3A (CD16)

FCGR3A (CD16) is an Fc receptor gene and a canonical marker of non-classical monocytes. When the protein is expressed on the cell surface, it serves as a receptor for IgG antibodies and triggers antibody-dependent cell-mediated cytotoxicity. At the gene level FCGR3A is highly expressed in the associated monocyte population, with lower expression in cytotoxic T cells and innate lymphoid cells (Fig 5). When we look at the data at the isoform level, we see that the latter two cell types express an alternate isoform that is absent from the CD16 monocyte cluster. This isoform has an additional exon in the 5′ UTR, suggesting that its inclusion may serve a regulatory role in these cell types (Fig 6).

Figure 6: IGV plot of the differential isoform usage in three cell types. The top panel shows the Gencode annotation for the two isoforms identified.

 

Figure 7: S100A6 expression plotted across the cell population. The left panel shows S100A6 gene expression, while the middle and right panels show S100A6 isoform expression.


Example 2: S100A6

The gene S100A6 codes for a calcium-binding protein in the S100 family that is expressed in a wide range of cell types. In these data we see the gene expressed primarily as two isoforms, with one isoform particularly highly expressed in CD16 monocytes, while other cell types show a more balanced expression profile (Fig 7).

Figure 8: IGV plot of the differential isoform usage in three cell types. The top panel shows the Gencode annotation for the two isoforms identified by IsoQuant.

Again, the long-read data shows differential transcriptional start site (TSS) usage manifesting as differences in 5′ UTR sequences observed among the cell types (Fig 8). Moreover, the MAS-seq data affords us enough resolution to inspect the read pileup and see that the data are not fully explained by this combination of reference isoform annotations or indeed any other known form of this gene. This points us toward strategies for improving our isoform-calling methods, and exploring uncharted areas of the transcriptome.

Conclusion

Here we demonstrate that PIPseq is compatible with MAS-seq, expanding the validated platforms for single-cell RNA isoform sequencing. Future efforts will include cross-platform benchmarking and characterizations of single-cell platforms for RNA isoform sequencing. This early exploration highlights the transformative capability of MAS-seq and PIPseq for large-scale single-cell RNA isoform sequencing studies to efficiently identify alternative splicing in the context of development and disease.

Disclaimers

Fluent Biosciences provided PIPseq T20 3′ Single Cell RNA Kit v4.0, PIPseq starter equipment kit, and cryopreserved PBMC samples to conduct this study. PacBio provided MAS-seq kits, Revio sequencing reagents, and funding as part of a collaboration agreement.

Important note for this blog: Posts do not equal endorsements. Opinions expressed in this blog are those of the author, on behalf of the genomics group at Broad. We make every effort to ensure the accuracy of data/figures presented here but these are not peer reviewed and errors may occur from time to time.

Citations
  1. Clark, I. C. et al. Microfluidics-free single-cell genomics with templated emulsification. Nat. Biotechnol. 1–10 (2023) doi:10.1038/s41587-023-01685-z.
  2. Al’Khafaji, A. M. et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat. Biotechnol. (2023) doi:10.1038/s41587-023-01815-7.
  3. Fontanez, K. et al. High–throughput single cell analysis with Particle–templated Instant Partitions (PIPseqᵀᴹ).
  4. Prjibelski, A. D. et al. Accurate isoform discovery with IsoQuant using long reads. Nat. Biotechnol. 1–4 (2023) doi:10.1038/s41587-022-01565-y.
  5. Inamo, J. et al. Immune Isoform Atlas: Landscape of alternative splicing in human immune cells. 2022.09.13.507708 Preprint at https://doi.org/10.1101/2022.09.13.507708 (2022).

Sean Hofherr

Chief of Clinical Strategy and Product Development, Broad Clinical Labs

Sean Hofherr is dual board certified by ABMGG in Clinical Biochemical Genetics and Clinical Molecular Genetics. Sean serves as the Chief of Clinical Strategy and Product Development at Broad Clinical Labs. In this role at BCL, Sean is able to leverage his extensive experience to guide the clinical vision and delivery across the organization. Sean most recently served as the Chief Operating Office at Fabric Genomics, which focuses on the use of AI and Bioinformatics for Clinical Interpretation of whole genome sequencing. Prior to Fabric, Sean was the Chief Scientific Officer and CLIA Director at the commercial reference laboratory, GeneDx.

Sean received his B.S. degree in Microbiology and Cell Sciences from the University of Florida before earning his Ph.D. in Molecular and Human Genetics from Baylor College of Medicine. Sean completed clinical fellowships in Clinical Biochemical Genetics and Clinical Molecular Genetics at the Mayo Clinic.

Danielle Perrin

Chief of Staff, Broad Clinical Labs

As Broad Clinical Labs’ Chief of Staff, Danielle Perrin advises and supports colleagues on the executive leadership team in BCL’s strategic planning and execution. She builds and leads new organizational functions and processes and leads critical projects, as well as driving effective information flow, decision making, and execution throughout the organization. An operations leader with a business, engineering, and biology background and 20+ years of experience in the genomics field, Perrin has a track record of driving operational excellence and building and scaling both physical and business processes. During her career at Broad, which started in 2003 at the tail end of the Human Genome Project, Perrin has led laboratory operations and R&D teams in Broad’s Genomics Platform, as well as fulfilling senior advisory and leadership roles in the Broad Institute’s COO and CFO offices.

Perrin received her B.S. in Biology and M.E. in Biotechnology Engineering from Tufts University and her M.B.A. from the MIT Sloan School of Management.

Tim De Smet

Chief Commercial Officer, Broad Clinical Labs

As Chief Commercial Officer of Broad Clinical Labs, Tim De Smet leads BCL’s business development, alliance management, external project management, and customer support teams. A Broad Institute employee since 2008, De Smet has held leadership roles and managed teams of various sizes in Broad’s Genomics Platform and clinical lab, spanning laboratory operations, finance, and informatics, and has expertise in work design, financial modeling, and high scale laboratory and business operations.

De Smet received his B.S. in Biochemistry and M.B.A. from Northeastern University.

Jim Meldrim

Chief Technology Officer, Broad Clinical Labs

As Chief Technology Officer, Jim Meldrim sets the vision for Broad Clinical Labs’ informatics systems, including the hardware and software used for sample intake and tracking, data production, analysis, and delivery. Having held a variety of laboratory and informatics-focused leadership roles at Broad, spanning R&D and production operations, Meldrim has been a leader and innovator in the generation, management, and analysis of genomic data since 1999, beginning with sequencing data generation for the Human Genome Project.

Meldrim received his B.S. in Biology from Cornell University.

Sheila Dodge

Chief Operating Officer, Broad Clinical Labs

As Chief Operating Officer, Sheila Dodge leads Broad Clinical Labs’ process development and implementation activities, as well as lab operations, financial planning and operations, quality & compliance, and core business processes. A Six Sigma Black Belt with extensive experience in process development and high throughput genomics operations, Dodge is an expert in work design and in collaborating with a range of collaborators, scientists, engineers, and technology partners to rapidly integrate new technologies and operationalize innovations. A member of the Broad Institute since 2001, Dodge is an Institute Scientist and lectures at the MIT Sloan School of Management on operations, dynamic work design, and visual management techniques.

Dodge received her B.A. in biochemistry and molecular biology from Boston University and her master’s degree in biology from Harvard University. She earned her M.B.A. from MIT Sloan School of Management.

Heidi Rehm, Ph.D., FACMG

Chief Medical Officer and Clinical Laboratory Director, Broad Clinical Labs

Heidi Rehm is board-certified by ABMGG in Clinical Molecular Genetics and Genomics and serves as BCL’s Chief Medical Officer and Clinical Laboratory Director. She oversees BCL’s regulatory requirements, leads the clinical team performing genomic interpretation and variant analysis, and guides BCL’s efforts in genomic testing for clinical and research use. She is also an Institute Member of the Broad and co-director of the Medical and Population Genetics Program. Rehm is also the Chief Genomics Officer in the Department of Medicine and Genomic Medicine Unit Director at the Center for Genomic Medicine at Massachusetts General Hospital, working to integrate genomics into medical practice. She is a principal investigator of ClinGen, providing free and publicly accessible resources to support the interpretation of genes and variants. She co-leads both the Broad Center for Mendelian Genomics, focused on discovering novel rare disease genes, and the Matchmaker Exchange, which aids in gene discovery. She is Chair of the Global Alliance for Genomics and Health, a principal investigator of the Broad-LMM-Color All of Us Genome Center, co-leader of the Genome Aggregation Database (gnomAD), and a Board Member and Vice President of Laboratory Genetics for the American College of Medical Genetics and Genomics.

Rehm received her B.A. degree in molecular biology and biochemistry from Middlebury College before earning her M.S. in biomedical science from Harvard Medical School and Ph.D. in genetics from Harvard University. She completed her post-doctoral training with David Corey in neurobiology and a fellowship in clinical molecular genetics at Harvard Medical School.

Niall Lennon, Ph.D.

Chair and Chief Scientific Officer, Broad Clinical Labs

As Chair and Chief Scientific Officer of Broad Clinical Labs, Niall Lennon leads the team and sets the scientific and clinical vision for the organization. Dr. Lennon joined the Broad Institute in 2006 and has since contributed to the development of applications for every major massively parallel sequencing platform across a range of fields. In 2013 Dr. Lennon led the effort to establish a CLIA licensed, CAP-accredited clinical laboratory at the Broad Institute to facilitate return of results to patients and to support clinical trials. More recently, he has led efforts to achieve FDA approval for large-scale genomics projects (NIH’s All of Us Research Program) and for Broad’s own clinical diagnostic for COVID-19 testing operation, which returned 37+ million results to patients. Dr. Lennon is a principal investigator of the eMerge and All of Us projects, an Institute Scientist at Broad, Associate Director of Broad’s Gerstner Center for Cancer Diagnostics, and an adjunct professor of biomedical engineering at Tufts University, where he teaches Molecular Biotechnology.

Dr. Lennon received a Ph.D. in pharmacology from University College Dublin and completed his postdoctoral studies at Harvard Medical School and Massachusetts General Hospital. He holds an executive certificate in management from the MIT Sloan School of Management.