Skip to main content

REVIEW

Pathol. Oncol. Res., 16 May 2024

Can long-read sequencing tackle the barriers, which the next-generation could not? A review

Nikolett Szakllas
Nikolett Szakállas1*Barbara K. BartkBarbara K. Barták2Gbor Valcz,Gábor Valcz2,3Zsfia B. NagyZsófia B. Nagy2Istvn TakcsIstván Takács2Bla MolnrBéla Molnár2
  • 1Department of Biological Physics, Faculty of Science, Eötvös Loránd University, Budapest, Hungary
  • 2Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
  • 3HUN-REN-SU Translational Extracellular Vesicle Research Group, Budapest, Hungary

The large-scale heterogeneity of genetic diseases necessitated the deeper examination of nucleotide sequence alterations enhancing the discovery of new targeted drug attack points. The appearance of new sequencing techniques was essential to get more interpretable genomic data. In contrast to the previous short-reads, longer lengths can provide a better insight into the potential health threatening genetic abnormalities. Long-reads offer more accurate variant identification and genome assembly methods, indicating advances in nucleotide deflect-related studies. In this review, we introduce the historical background of sequencing technologies and show their benefits and limits, as well. Furthermore, we highlight the differences between short- and long-read approaches, including their unique advances and difficulties in methodologies and evaluation. Additionally, we provide a detailed description of the corresponding bioinformatics and the current applications.

Introduction

The complete genetic information of the organisms is stored and transferred in single- and double-stranded ribonucleic (RNA) and deoxyribonucleic (DNA) acids [1]. The mystery behind rare genetic conditions, like chromosomal irregularities or unique sequence variation and mutation profiles in cancer, induced the need of molecular examination at deeper levels. For many years, only a deficient tool set was available to get a better insight into the genetic attributes of genomes. This encouraged the development of novel technologies, such as RNA and DNA sequencing methods. Provoked by the technical and computational progress of the past 50 years, the features of sequence determination changed and evolved. In the early periods, only a few hundred bases were reachable in length; however, the emergence of long-read technologies allowed the reading of longer genomic sequences even with thousands of kilo bases.

The timeline of the sequencing techniques’ evolution can be divided into three main parts: first-generation (FGS), next-generation (NGS), and third-generation (TGS) sequencing. Before short-read NGS approaches became available, FGS techniques were the only tools capable of describing the nucleic acid sequence of different organisms. Thus, their main advantage is that they emphasized the need to use and develop novel sequencing methods to get a deeper knowledge regarding DNA and RNA sequences with repetitive regions, alternative bases, splicing variants and telomeric regions. Later in time, the NGS and mainly TGS methods were capable of opening closed doors for the detection of the listed alteration types, thereby exploring many reasons (and also the curing solution) for diseases. In our review, we strived to show FGS techniques from this point of view, without explaining their applications and attributes in more detail. In this scope, FGS were the pioneers of sequencing around the 1980s, including Sanger’s chromatography and Maxam-Gilbert’s chemical modification-based assays. In the early times, these technologies allowed focusing on relatively small genomes with a few hundred base pairs (bp) in length [2]. Sanger’s idea was to sequence the DNA strand by chain termination. Consequently, in this case, the DNA fragments were converted into chains by DNA polymerases and by the incorporation of nucleotides [3]. Maxam and Gilbert provided a process, during which the sequences of DNA fragments were determined using the combination of radiolabeling, chemical cleaving, and gel electrophoresis of nucleotides, and autoradiography served as the detection method [4].

The second generation, namely, NGS, includes pyrosequencing [5] and sequencing-by-synthesis [6] approaches. They have a feature in common, which is that DNA polymerase moves along the template DNA and sequencing is performed by catalyzing the incorporation of deoxynucleotide triphosphates (dNTPs) in a new complementary DNA strand [7]. Pyrosequencing is a sequence-based form, where a pyrophosphate is released, when dNTPs are sequentially added to the end of a nascent DNA fragment [8]. Sequencing-by-synthesis is the construction of a nucleic acid chain from the emission spectra of fluorescently labeled nucleotides [6].

Although NGS provides more acceptable error rates and more sophisticated sequencing results than FGS, they have several weaknesses that should be mentioned. Read lengths are shorter than demanded, that is why they are referred to as short read techniques nowadays. Consequently, their shortness limits the study of full-length transcript variants, centromere and telomere genomic regions, and gene fusions [9]. Additionally, they are unable to resolve repetitive regions of the genome, making genetic variations challenging to identify, including repeat expansion disorders and structural variants [10]. Extreme guanine-cytosine (GC) content or sequences with multiple homologous elements in the genome and the epigenetically modified bases of DNA and RNA, like N6-methyladenosine (6mA), 5-methylcytosine (5mC), and 5-hydroxymethylcytosine (5hmC) are challenging to characterize with NGS [11]. PCR amplification is essential, which results in higher costs and longer times in the overall sequencing and evaluation process, involves the usage of large equipment and laborious experimental procedures, and expands the bioinformatics analysis with a data preprocessing step. To overcome the limitations, further sequencing techniques have been developed, the representatives of the TGS family, often referred to as long-read sequencing methods [12, 13].

Many scientific papers describing the methodology, evaluation and use of different sequencing assays become available yearly. Currently, long-read TGS and short-read NGS methods are used problem-specifically, either interchangeably or in combination. Although both methods have their own advantages and disadvantages, reviews setting against the methods cannot be found among the currently available publications. Encouraged by this, our goal is to provide a general comparison of long- and short-read techniques. In the present paper, we aimed to review the development of sequencing assays, presenting a brief characteristics of FGS, NGS and TGS, with special emphasis on the possibilities offered by the TGS methods. We also detail the bioinformatics approaches along with the aspects considered during evaluation, as well as related clinical and biological applications.

Long-read sequencing

TGS provides more precise mapping of reads for reference genomes, promotes different variant detection methods, and offers new solutions for characterizing the epigenetic diversity [14]. In contrast to NGS systems, the generated data is analyzed in real-time and generally, PCR amplification steps are not required before sequencing due to natural isolated nucleic acid strands can be read as well. The longer sequenced reads are the consequences of the improved sequencing chemistries [15, 16]. The increased sequencing speed and accuracy during experiments and the higher quality bioinformatics results also mark the effectiveness of the newly emerged technologies and the inherent chemical kits [17].

TGS technologies conceal the opportunity to emerge as long-term applicable tools in the future. As they provide the long-read sequencing of whole genomes, their usage in the field of genomics entails the chance of more and more accurate description of both human and non-human genetic diversities. Furthermore, improvements aimed to decrease costs and analysis time could invoke their application in routine diagnostics.

Nanopore sequencing

The nanopore sequencing (NS) method, distributed by Oxford Nanopore Technologies (ONT), is based on the detection of the electric current changes provoked by the disorganization of nanopore proteins [1619]. The alterations in the real-time produced electric current can be measured directly. During NS, dsDNA molecules are denatured, and the motor protein directs ssDNAs through the nanochannels (pores) one after the other. The passage of ssDNA molecules leads to disturbances in the electric current, which is detected by specific reader sensor proteins. The deflections are distinct for all nucleotides resulting in unique signatures for each base. The entire process happens inside a device-specific flow cell [20], which contains thousands of nanopore channels. The schematics of NS sequencing is presented on Figure 1A.

Figure 1
www.frontiersin.org

Figure 1. Schematics of long-read sequencing approaches: nanopore sequencing (Oxford Nanopore Technologies) and SMRT sequencing (Pacific BioSciences). (A) NS relies on the passage of the ssDNA through the membrane driven by the motor protein. The nucleotide bases are identified using the directed ionic current intensities arising from the motion of particles through the membrane. (B) During SMRT sequencing, dsDNAs are circularized into a SMRT bell. The SMRT bell contains the four fluorescently labeled nucleotides with unique emission spectra. The bases are distinguished by the alterations in the light emission spectra.

Since the release of the first ONT sequencing device—named MinION—in the mid-2010s, the continuous improvement of the key factors, like accuracy, read length, and sequencing throughput is present. The throughput is determined by the number of active pores on the flow cells and by the DNA/RNA translocation speed. To provide the maximal amount of available active pores on the flow cells, their periodical revision is secured [16, 21]. The read length and the accuracy are highly dependent on the released version and quality of the sequencing chemistry—which in this case includes the traits of nanopores and motor proteins,—however, by introducing special adapters during penetration, an increase in accuracy measure can also be reached with higher ∼420 bps per sec sequencing speeds compared to the previous ∼70 bps per sec rate [22].

NS reads are characterized by longer lengths of 10 kb up to 100 kb, which means more sequenced bases, more generated data and increased informatics resource needs compared to NGS. The large amount of data means longer bioinformatics analysis time and more expensive informatics hardware park. However, due to the increased information amount, a more accurate identification of alterations becomes available. As the most important disadvantage of the increased generated data, higher error rate and read misclassification can be experienced on the ONT platforms compared to NGS [23, 24].

SMRT sequencing

Pacific BioSciences provided the first nanosensor-based technology in the early 2010s relying on the single molecule real-time (SMRT) sequencing model [25]. The key factor in this method is the detection of alterations in light emission when the DNA polymerase incorporates a nucleotide [26]. In more detail, SMRT sequencing is done by the immobilization of the DNA polymerase in each well of a special silicone chip (SMRTcell) using DNA as the mobile molecule. DNA templates are presented as closed, single-stranded DNA (ssDNA) molecules, named SMRTbells, which are created by ligating hairpin adaptors to both ends of a target double-stranded DNA (dsDNA). The SMRTcells contain four fluorescently labeled nucleotides with unique emission spectra. Zero mode waveguides (ZMW) are optical waveguides developed for rapid light sensing and provide the interface for the detection of light emitted by the incorporation of phosphate-labeled dNTPs of SMRTcells [27, 28]. The process of SMRT sequencing is illustrated on Figure 1B.

Compared to NGS, the precision of SMRT sequencing is lower, as an example, due to the many inaccuracies during base identification. Although, in contrast to the experienced higher error rates and costs-per-base, the technology grants several orders of magnitude increase in read lengths (few Mbps in contrast to the previous few hundred bps) and faster sequencing runs. As a possibility, the arising conflict regarding the advantages and disadvantages of NGS and SMRT sequencing suggests the consideration of hybrid-sequencing solutions in the future. Hybrid units are the combinations of different sequencing methods and can be promising to overcome the deficiencies [29].

Technical advances and difficulties of long reads

Following a brief histological and methodological overview of long-read approaches, we detail the technical background and give comprehensive knowledge regarding sequencing challenges and advances. Compared to NGS approaches, the main difficulties of TGS are the overall lower per read accuracy and poorer read quality [30]. In contrast to short reads, long reads are much noisier. Prolonged lengths induce an increase in the number of bases and in reading time. Both contribute to a higher probability of collecting false information, promoting more noise and uncertainty [31]. The continuous change of the sequencing reads in length during a single run also indicates the higher chance of inaccuracies. Due to the above-listed reasons, the proper handling of deflections cannot be emphasized enough and the problem-concentrated improvements are published continuously [32]. Although in the early times base identifying accuracy was around 85% (indicating the error rate to be nearly 15%), nowadays almost 99% (SMRT) and 95% (NS) can be reached [31, 33]. Error correction methods [34, 35] provide a solution to resolve the inaccuracies and are divided into two groups: hybrid and non-hybrid approaches [36]. Hybrid methods take the advantage of the high accuracy of short reads for correcting errors in the long threads, while non-hybrid methods perform self-correction with long reads using overlap information. The effectiveness of error correction methods is highly dependent on the sequencing coverage [36], thus shows a dependence on the percentage of all sequenced base pairs or loci of the genome.

In SMRT devices, the read quality is proportional to the number of DNA fragment transitions. For example, the reading accuracy is around 85%–87% in a 10 kb long sequence if it is passing only once [37]; however, with multiple reading, it can be further improved reaching 99%. In contrast, the quality of NS reads is independent of the reading repetition times and the length of nucleic acid sequences. It only depends on the ratchet rate per base through the nanopores. Fragments traverse only once, the median sign-pass accuracy is around 95% [38], and read length depends only on the amount and the quality of the high-molecular weight input DNA. To reach the maximal sequencing precision, companies focusing on long-reads tend to release chemistry, software, and hardware updates regularly [16, 39].

Reference genomes are integral parts of sequencing assays as they provide the organism-specific support during base order construction [40]. The progression of sequencing methods derived the breakthroughs regarding the imprecision of reference genomes [41], variant identification, genomic assemblies, and other specialized data analyses in the field of genetics. The Genome Reference Consortium (GRC) released the current form of human reference genome (GRCh38. p13) in 2013 with an origin tracing to the Human Genome Project [42, 43]. In contrast to the continuous improvement of the GRCh38.p13 genome, over the last years, due to the technical limitations of NGS short reads, many problems remained unsolved. The underrepresentation of repetitive sequences, the unsolved assembly gaps due to structural polymorphisms and the unfinished polymorphic regions resulted in the need of further investigation. The 151 mega-base (Mbp) pair long unknown sequence data distributed throughout the GRCh38. p13 genome turned out to be fundamental and included centromere and telomere regions, segmental duplications, amplicon gene arrays and ribosomal DNA (rDNA) arrays, all highly affecting cellular processes [44]. Long-read sequencing proved to be the problem-solver, indicating the birth of the Telomere-to-Telomere (T2T) Consortium to construct a new and almost complete human reference genome, the T2T-CHM13 assembly [44]. In this cooperation, the advances of long-read techniques, including the multi-kilobase single-molecule reads of SMRT and the ultra-long reads of NS were combined, providing evidence to the beneficial applications of hybrid sequencing methods. The T2T-CHM13 assembly resulted in a 3 billion-base pair long complete human haplotype, contributing to the recognition of almost 4,000 new genes, with high rates of protein coding nature. In addition, T2T-CHM13 includes the gapless telomere-to-telomere assemblies for all 22 human autosomes and chromosome X, contains the corrected version of the 151 Mbp long unknown genomic sequence data, and has the chance to arise as the mainly applied reference genome in human genomics-related fields. The successful application of the combination of NS and SMRT reads as a hybrid solution in the T2T Consortium projects that the further development of sequencing methods can be still expected, and the seeking to eliminate their limitations is continuous.

Bioinformatics of long reads

After exploring the scientific literature in detail, it clarified that sequencing techniques cannot address questions in genomics without bioinformatics. With the rise of new sequencing approaches, a new generation of bioinformatics tools emerged, being compatible with the unique features of long reads and trying to overcome their biases. As long reads, their analysis also presented many opportunities and challenges. Increased read lengths particularly affects how aligners, assemblers, variant callers store and analyze the data. Many software tools specialized for long-read sequencing data are provided by ONT and PacBio with continuous monitorization [45, 46]. Additional sources and packages are also presented, as it is demonstrated in Table 1.

Table 1
www.frontiersin.org

Table 1. Summary of the most recent and common-used long-read bioinformatics tools.

As a summary of bioinformatics steps, the following section will provide a brief general discussion regarding base calling, detection of base modifications, variant calling, genome assembly, and a bit of specialized evaluation possibilities including both long-read and NGS techniques, emphasizing their unique prominences.

Base calling

The first main step in bioinformatics analysis is always a process named base calling during which the specific electric signals are translated into known nucleotides. The phrase of translation in this case means the conversion process from electric signals to nucleic acid sequences [73]. Raw current and light pulse data and read information are stored in specific format files. In the NGS system, the primary analysis of sequencing data is a critical step before base calling. These sequencing platforms have their own chemical- or sensor-origin biases which should be eliminated before or during base calling [74]. As a result of the pre-sequencing PCR amplification, many redundant PCR duplicates are present among aligned reads, which are marked and excluded in later analysis stages [75]. Considering the two long-read techniques, base calling means the conversion of fluorescent light pulses in SMRT devices, while during NS, the translation of current intensities into k-mers of bases. The alignment of sequencing reads to a reference sequence is a compulsory step after base calling in NGS bioinformatics, however many TGS base callers [5557] execute the alignment in parallel with base identification [55, 56]. As a side note, we would like to emphasize the importance of quality check of sequencing reads [4754, 76, 77] preferably before and after every principal step, paying special attention to base calling and variant calling.

Epigenetic modifications: modified base calling

In addition to traditional bases, like adenine (A), thymine (T), uracil (U), guanine (G) and cytosine (C), DNA and RNA molecules can contain modified bases that alter from their original mates in nature and frequency and have different functional roles. In nucleic acids, the most frequently occurring modified bases are 6mA, 5mC, and 5hmC. Considering the location of 5mC and 5hmC in DNA, they are mostly observed on CpG dinucleotide sites. RNA modifications, including 6mA, are frequent in non-coding RNA like ribosomal RNA (rRNA), transfer RNA (tRNA), and also in coding mRNA. Modified DNA and RNA nucleotides play a key role in many biological processes including development, aging, and cancer [7880]. Their identification secures the analysis of open chromatin regions, the detection of DNA replication and the measurement of RNA metabolism using base analogs [8183].

The methylation signature is not preserved in PCR amplification—which is essential before NGS assays -, thus approaches have been developed to conserve the epigenetic information. These pretreatments rely on methylation-dependent enzymatic restriction, methyl-DNA enrichment, and direct bisulfite conversion [84]. In NGS base modification analysis bisulfite-treated DNAs require specialized alignment to account for the C to T conversion. Encouraged by this, short read alignment algorithms were implemented that can be configured for bisulfite-converted DNA alignment [85].

However, the available NGS methods provided some sort of identification of modified bases in nucleic acid sequences as well, but the real landscape demonstration became fulfilled with TGS assays. The detection of modified bases in SMRT is based on the delay between fluorescence pulses [86]. NS relies on the recognition of the signal shifts resulting from the different current flow through nanopores [19, 87]. Most TGS computational tools are capable of modified base detection from reference-aligned reads [34, 5759, 62, 63, 88], and are based on machine learning training models and statistical tests. Algorithms using neural networks show the highest performance, although statistics-based approaches are the best suited for the identification of de novo modifications [34, 89]. Because of software developmental progress, long-read base callers became capable of calling modified bases directly [55, 56]. The key is the application of specific base calling configuration models indicating in their labels the name of the modified bases of interest [56].

Variant calling

Sequence variations can be grouped based on their somatic or germline nature. Germline variants are presented in all cells of the body, including the germ cells, while somatic mutations arise during lifetime. The standard pipeline of somatic mutation calling is the paired tumor-normal sequencing strategy [90]. It can provide the true somatic mutations by filtering out the germlines of the normal from the tumor mutation data according to some known tissue-specific non-tumorous variant profiles. Germline and somatic groups also involve subtypes like structural variants, single nucleotide variations, short insertions/deletions, and copy number variations.

The shortest variations are single nucleotide polymorphisms, which are germline substitutions of single nucleotides at specific genomic positions. Copy number variation (CNV) is an alteration type describing the uniqueness among individual genomes, meaning a few and thousands of base-scale variations in the copy numbers of specific DNA segments. SVs are large genomic alterations, like insertions, deletions, inversions, and translocations. They are typically longer than 50 bp, describing different combinations of DNA losses, gains, or rearrangements [91]. Structures shorter than 50 bp and longer than few bases are usually referred to as indels.

The key aspect of variant calling is the choice of a robust variant caller concerning NGS and TGS assays as well. To achieve the optimal performance, a prior fine-tuning considering the features of the input is needed. This optimal performance is reached by training and pre-testing the variant callers using the characteristics of the datasets. The exclusion of redundant and duplicate reads from binary alignment mapping (.bam) files, the quality control of .bams, and the identification and the reduction of false-positive variant calls caused by alignment artifacts are crucial steps in input preparation. The accuracy of variant calling can be validated by benchmarking datasets, which are publicly available. The quality of the collected variants is dependent on the precision (and version) of the reference genome, and on the error rate and accuracy of the base and variant identification method. Sequencing coverage affects the sensitivity in a hidden manner, since the appropriateness of the variant caller input is highly dependent on the coverage [92]. We must consider the variant representation differences when searching valid variations from the reference by excluding the low coverage b(i)ases. The appropriate post-filtering of the output data is often required; it prevents us from artificial and false-positive calls [75].

TGS variant callers [5861, 88] are built upon de novo assembly, short-read alignment, or long-read mapping approaches. De novo-assembled sequences cover the alignment of the current assembly to another, or to a reference sequence, and the alterations can be identified by a pointwise positional comparison. During short-read alignment, the presence of SVs induces the appearance of abnormally oriented and spaced reads replacing the organized paired-end form. Long-read mapping approaches can span repetitive and other problematic regions simply, showing an overall better performance [93].

In nowadays-used techniques, long-read sequencing is the most suitable and the most accurate variant calling approach, but especially for the detection of structural variants (SVs) [94]. The special role of genetic variations, especially SVs, has been highlighted primarily in medicine and molecular biology, e.g., in neurological diseases [95, 96], or during the detection of oncogene-specific variations in breast, prostate, or primary gastric tumors [97]. Although their importance is unquestionable, they have been understudied in the past. The origin of this issue arises from the fact that they can overlap or be nested giving rise to complex patterns, which are hard to identify with short-read approaches [93].

Genome assembly

Probably the most important benefit of long-read computational biology can be experienced in the fields of genomic de novo assemblies [6468]. The phrase assembly in this case means the comparison and coupling of the read sequences to each other. Assembly construction is crucial to understand the impact of genomic diversity on health and disease [98]. In the last few years, the process has been simplified and the results are more accurate due to the improvements in the bioinformatics routines [99]. Besides the sequence construction, another important application of genomic assemblies is the reassembling and fixing of the errors of former reference genomes (of fungal, plant, animal, and human) [44]. Unfortunately, repetitive sequences with unresolved repeats are still problematic, enhancing confusion while joining assembled sequences. Linked sequences contain many gaps. To get rid of these, the scaffolding of sequences is a crucial aspect. The term scaffolding means the proper ordering and orientation of assembled sequences using genetic markers, optical maps, or linked reads [100]. Assemblies of the short and long reads are both presented taking their advantages in different issues. Besides the success of their combination in T2T Consortium, many other hybrid applications have been published recently [29, 101], invoking that for accurate genomic assemblies we need error-free short and long sequences.

Applications of long-read sequencing

Although the topic of long-read sequencing is quite recent, its successful application in several fields is highly presented in the scientific literature including cancer genomics, laboratory medicine, methylation studies and rare genetic conditions, as well.

In laboratory medicine, the currently applied diagnostic strategies involve the use of targeted NGS gene panels, exome sequencing, and genome sequencing. Targeted gene panels are somatic and hereditary disease-specific with the ability to maximize coverage, sensitivity, and specificity of characteristic genes. They offer higher diagnostic yield thanks to lower costs and faster diagnostic times, than exome or genome sequencing [102]. The combination of whole-genome and long-read targeted sequencing has already been applied in hematology. Hematologic disorders, like hemophilia A, often involve the appearance of gene fusions and other pathologic events, thus the characterization of fusion transcripts is often done by the combination of NGS and TGS assay-based methods [103]. Another example of laboratory medicine related application of long reads is the characterization of the human leukocyte antigen (HLA) system. The HLA system contains the genes that encode key components of the adaptive immune system, and accounts for the major genetic differences among ethnic populations [104]. HLA-genotyping information is often yielded from targeted exome and non-targeted genome sequence data [105].

For diploid genomes, chromosomal DNA has two haplotypes. These are combinations of alleles from multiple genetic loci on the same chromosome including complex structural variants, one inherited from each parent. Distinguishing the maternal and paternal haplotypes allows the recognition of homozygous and heterozygous mutations in the human genome. Haplotypes within a diploid chromosome are determined by finding a partitioning of reads to two sets, one for each haplotype, such that the reads within subsets have a minimal number of errors compared to a consensus [106]. Their presence helps to discover the nested structural variations, inversions, and other complex rearrangements and studies the interactions between variants in regulatory elements, aneuploidy, evolutionary processes, and drug resistance in viral infections. The key concept to derive haplotypes using sequencing reads is the phasing of heterozygous variants. The advancements in sequencing associated computational tools like reference-based phasing, de novo assembly, or strain-resolved metagenome assembly [107] entail the potential for the near-complete human haplotype structure reconstruction.

The investigation of genomes containing segments with small allele fraction variants and observed rearrangements in regions of associated genes is still challenging even for current long-read methods [108]. The appearance of sequencing techniques with higher-depths and longer-lengths is expected. Regardless, many successful applications can be discussed already. The characterization of tumor genomes and transcriptomes with the analysis of mRNA expression, mutation detection, gene fusions, or chromosomal copy number alterations can highlight new markers of malignancy. With better depiction of the genome-wide landscape and the extent of mutational processes, whole-genome long-read sequencing yields better treatment options in advanced thyroid [109] and other cancers [110]. Improvements in sequencing technologies allowed the recognition and the description of long non-coding RNAs (lncRNAs). They are non-protein coding nucleic acids with lengths greater than 200 nucleotides and characterized by high cell type specificity [111]. LncRNAs are found to be key players in tumorigenesis and immune responses, and evidence supports their unique cellular functions in the tumor immune microenvironment [112]. Most studies related to lncRNAs relied on bulk RNA-sequencing; however, the potentials of scRNAseq can open new possibilities to understand the cell type-specific functions of lncRNA genes [112].

The examination of abnormal RNA expressions helps to understand the molecular mechanisms behind human cancer initiation, development, progression, and metastasis. RNA techniques include the classic bulk RNA (RNAseq), the single-cell RNA (scRNAseq), the spatial RNA (spRNAseq) [113] and the direct RNA (DRS) [114] sequencing methods. Bulk RNAseq means the sequencing of mRNA-only or whole transcriptome libraries with single-end short or paired-end longer approaches. scRNAseq procedures always include single-cell isolation and capture, cell lysis, reverse transcription, cDNA amplification, and library preparation [115]. spRNAseq combines the transcriptional analysis of bulk RNAseq and in situ hybridization providing whole transcriptome data with spatial information [113]. As a novelty, NS terminology offers the direct sequencing of individual polyadenylated RNAs without the need of any amplification step [114].

Circulating cell-free DNA (cfDNA) in the blood of cancer patients can be the signal of worsening tumor progression. Sequencing analyses revealed that tumor-derived cfDNA accounts for only a fraction of the total amount of cfDNAs and this fraction varies according to the tumor burden [116]. Due to the low level and high fragmentation of cfDNAs, their analysis is challenging. In the past few years, NGS techniques were suitable tools for this assay [117], however, the long-reads will possibly promote the provision of deeper cfDNA characteristics providing higher clinical sensitivity for the detection of cancers [117].

The clinical diagnosis of rare genetic disorders often requires the identification of CNVs or repeat variants. Long-read genome sequencing provides an improved opportunity for CNV detection and broadens the possibilities of gene and variant level annotation [118]. As an interesting example, primary mitochondrial diseases (PMD) comprise a group of rare genetic conditions characterized by impaired mitochondrial oxidative phosphorylation. The presence of mixed populations of mitochondria, named heteroplasmy, and the fact that those mitochondria contain its own genome consisting of mitochondrial DNA (mtDNA) poses a challenge in identifying PMD. Long-read sequencing enables the entire mitochondrial genome to be sequenced in one read, ensuring the overcome of the obstacles mentioned-above [119].

Using epigenetic alterations as biomarkers presents a unique opportunity for early cancer detection, monitoring, and prognosis. Methylation is the most widely studied epigenetic modification of nucleic acids and its landscape in cancer tissues is evidently complex and highly variable. DNA methylation plays an important role in the regulation of gene expression. The methylation-associated transcriptional inactivation of genes involved in cell cycle control and damage repair suggest that aberrant nucleotide methylations are hallmarks of carcinogenesis [120, 121]. NS provides the most precise detection and description of methylation landscapes [122]. Studies showed that the methylation of both 5mC and 5hmC has a role in the pathogenesis of pediatric cancer [123], while the presence of 6mA in pancreatic tumors is highly upregulated and has a lower occurrence compared to 5mC [124]. Thus, the idea to use methylation as a biomarker for cancer detection is not far to seek. Due to its prognostic property, DNA methylation was already applied as a prognostic marker in several cancer types, including prostate, bladder, colorectal, non-small-cell lung, breast, ovarian, cervical cancer, and liver malignancies [125, 126].

Although we presented the potentials of TGS long-read sequencing, their utilization in routine diagnostics has not widespread yet. NGS whole exome and targeted sequencing techniques offer well applicable results in routine diagnostics including inborn discrepancy detection, cancer research and diagnostics, hematology, and neurological disorders [72, 127130]. Their instrumentation, the corresponding chemicals, and flow cells are more affordable, and the generated data are more targeted [131]. On the other hand, as long-read techniques offer a wider genomic picture, thus providing a deeper insight into nucleic acid traits, their introduction into routine examinations has started [132136] and their spreading is expected in the near future.

Conclusion

In this review, we discussed the milestones of sequencing techniques, their progression, current applications, and future opportunities. We also provided a general comparison between short- and long-read assays highlighting their strengths and drawbacks from various aspects including methodology, data analysis, and applications. As we introduced in the last chapter, the spread of long-read techniques has led to a rapid progress in genomics-related areas. By expanding and refining sequencing routines, it becomes possible to explore the genetic complexity of biological systems in greater depths facilitating a radical future advance in the field of sequence variances.

Author contributions

NS, BK, and BM: conceptualization and revision; NS: literature research and drafting; BK, GV, BM, ZN, and IT: critical revision of the manuscript. All authors contributed to the article and approved the submitted version.

Funding

The authors declare that financial support was received for the research, authorship, and/or publication of this article. This project was financed from the Hungarian Scientific Research Fund (NKFI-143002), from the NRDI Fund FK0201NEPE/TKPNKTA-47 and from the Fund National Cardiovascular Laboratory RRF-2.3.1-21-2022-00003. This project has been implemented with the support provided by the Ministry of Culture and Innovation of Hungary and from the National Research, Development, and Innovation Fund, financed under the KDP-2023/C2270480.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

1. Alberts, B. 4th chapter: DNA, chromosomes and genomes. In: Molecular biology of the cell. 6th ed. W.W. Norton & Company (2015).

Google Scholar

2. Adewale, BA. Will long-read sequencing technologies replace short-read sequencing technologies in the next 10 years? Afr J Lab Med (2020) 9(1):1340. doi:10.4102/ajlm.v9i1.1340

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Sanger, F, Nicklen, S, and Coulson, AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA (1977) 74(12):5463–7. doi:10.1073/pnas.74.12.5463

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Maxam, AM, and Gilbert, W. A new method for sequencing DNA. Proc Natl Acad Sci USA (1977) 74(2):560–4. doi:10.1073/pnas.74.2.560

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Marulies, M, Egholm, M, Altman, WE, Attiya, S, Bader, JS, Bemben, LA, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature (2005) 437:376–80. doi:10.1038/nature03959

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Guo, J, Yu, L, Turro, NJ, and Ju, J. An integrated system for DNA sequencing by synthesis using novel nucleotide analogues. Acc Chem Res (2010) 43(4):551–63. doi:10.1021/ar900255c

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Liu, L, Li, Y, Li, S, Hu, N, He, Y, Pong, R, et al. Comparison of next-generation sequencing systems. J Biomed Biotechnol (2012) 2012:251364. doi:10.1155/2012/251364

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Harrington, CT, Lin, EI, Olson, MT, and Eshleman, JR. Fundamentals of pyrosequencing. Arch Pathol Lab Med (2013) 137(9):1296–303. doi:10.5858/arpa.2012-0463-RA

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Grigorev, K, Foox, J, Bezdan, D, Butler, D, Luxton, JJ, Reed, J, et al. Haplotype diversity and sequence heterogeneity of human telomeres. Genome Res (2021) 31(7):1269–79. doi:10.1101/gr.274639.120

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Kumar, KR, Cowley, MJ, and Davis, RL. Next-generation sequencing and emerging technologies. Semin Thromb Hemost (2019) 45(7):661–73. doi:10.1055/s-0039-1688446

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Chen, X, Xu, H, Shu, X, and Song, CX. Mapping epigenetic modifications by sequencing technologies. Cell Death Differ (2023). doi:10.1038/s41418-023-01213-1

CrossRef Full Text | Google Scholar

12. Xiao, T, and Zhou, W. The third generation sequencing: the advanced approach to genetic diseases. Transl Pediatr (2020) 9(2):163–73. doi:10.21037/tp.2020.03.06

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Athanasopoulou, K, Boti, MA, Adamopoulos, PG, Skourou, PC, and Scorilas, A. Third-generation sequencing: the spearhead towards the radical transformation of modern genomics. Life (Basel) (2021) 12(1):30. doi:10.3390/life12010030

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Kaplun, L, Krautz-Peterson, G, Neerman, N, Stanley, C, Hussey, S, Folwick, M, et al. ONT long-read WGS for variant discovery and orthogonal confirmation of short read WGS derived genetic variants in clinical genetic testing. Front Genet (2023) 14:1145285. doi:10.3389/fgene.2023.1145285

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Roberts, RJ, Carneiro, MO, and Schatz, MC. The advantages of SMRT sequencing. Genome Biol (2013) 14(7):405. doi:10.1186/gb-2013-14-6-405

PubMed Abstract | CrossRef Full Text | Google Scholar

16. ONT Nanopore Technologies. Continuous development and improvement (2023). Available from: https://nanoporetech.com/about-us/continuous-development-and-improvement (Accessed 2024).

Google Scholar

17. Pollard, MO, Gurdasani, D, Mentzer, AJ, Porter, T, and Sandhu, MS. Long reads: their purpose and place. Hum Mol Genet (2018) 27(R2):R234–R241. doi:10.1093/hmg/ddy177

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Quick, J, and Loman, NJ. Nanopore sequencing: an introduction. World Scientific Press (2019).

Google Scholar

19. Deamer, D, Akeson, M, and Branton, D. Three decades of nanopore sequencing. Nat Biotechnol (2016) 34:518–24. doi:10.1038/nbt.3423

PubMed Abstract | CrossRef Full Text | Google Scholar

20. ONT Nanopore Technologies. Flow cells (2023). Available from: https://nanoporetech.com/how-it-works/flow-cells-and-nanopores (Accessed 2023).

Google Scholar

21. Nicholls, SM, Quick, JC, Tang, S, and Loman, NJ. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. GigaScience (2019) 8(5):giz043. doi:10.1093/gigascience/giz043

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Ni, Y, Liu, X, Simeneh, ZM, Yang, M, and Li, R. Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput Struct Biotechnol J (2023) 21:2352–64. doi:10.1016/j.csbj.2023.03.038

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Jennings, W. Illumina sequencing (2016). doi:10.1201/9781315181431-7

CrossRef Full Text | Google Scholar

24. Stefan, CP, Hall, AT, Graham, AS, and Minogue, TD. Comparison of illumina and Oxford nanopore sequencing technologies for pathogen detection from clinical matrices using molecular inversion probes. J Mol Diagn (2022) 24(4):395–405. doi:10.1016/j.jmoldx.2021.12.005

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Harris, TD, Buzby, PR, Babcock, H, Beer, E, Bowers, J, Braslavsky, I, et al. Single-molecule DNA sequencing of a viral genome. Science (2008) 320(5872):106–9. doi:10.1126/science.1150427

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Eid, J, Fehr, A, Gray, J, Luong, K, Lyle, J, Otto, G, et al. Real-time DNA sequencing from single polymerase molecules. Science (2009) 323(5910):133–8. doi:10.1126/science.1162986

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Levene, MJ, Korlach, J, Turner, SW, Foquet, M, Craighead, HG, and Webb, WW. Zero-mode waveguides for single-molecule analysis at high concentrations. Science (2003) 299(5607):682–6. doi:10.1126/science.1079700

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Garrido-Cardenas, JA, Garcia-Maroto, F, Alvarez-Bermejo, JA, and Manzano-Agugliaro, F. DNA sequencing sensors: an overview. Sensors (Basel) (2017) 17(3):588. doi:10.3390/s17030588

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Vasudevan, K, Devanga Ragupathi, NK, Jacob, JJ, and Veeraraghavan, B. Highly accurate-single chromosomal complete genomes using IonTorrent and MinION sequencing of clinical pathogens. Genomics (2020) 112(1):545–51. doi:10.1016/j.ygeno.2019.04.006

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Warburton, PE, and Sebra, RP. Long-read DNA sequencing: recent advances and remaining challenges. Annu Rev Genomics Hum Genet (2023) 24:109–32. doi:10.1146/annurev-genom-101722-103045

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Ebler, J, Haukness, M, Pesout, T, Marschall, T, and Paten, B. Haplotype-aware diplotyping from noisy long reads. Genome Biol (2019) 20:116. doi:10.1186/s13059-019-1709-0

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Delahaye, C, and Nicolas, J. Sequencing DNA with nanopores: troubles and biases. PLoS One (2021) 16(10):e0257521. doi:10.1371/journal.pone.0257521

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Amarasinghe, SL, Su, S, Dong, X, Zappia, L, Ritchie, ME, and Gouil, Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol (2020) 21:30. doi:10.1186/s13059-020-1935-5

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Liu, Q, Fang, L, Yu, G, Wang, D, Xiao, CL, and Wang, K. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat Commun (2019) 10(1):2449. doi:10.1038/s41467-019-10168-2

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Baid, G, Cook, DE, Shafin, K, Yun, T, Llinares-López, F, Berthet, Q, et al. DeepConsensus: gap-aware sequence transformers for sequence correction. Nat Biotechnol (2023) 41(2):232–8. doi:10.1038/s41587-022-01435-7

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Zhang, H, Jain, C, and Aluru, S. A comprehensive evaluation of long read error correction methods. BMC Genomics (2020) 21(6):889. doi:10.1186/s12864-020-07227-0

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Ardui, S, Ameur, A, Vermeesch, JR, and Hestand, MS. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res (2018) 46(5):2159–68. doi:10.1093/nar/gky066

PubMed Abstract | CrossRef Full Text | Google Scholar

38. ONT Nanopore Technologies. Clive Brown’s keynote at nanopore community meeting (2018). Available from: https://nanoporetech.com/resource-centre/clive-brown-ncm-2018 (Accessed 2018).

Google Scholar

39. Pacific BioSciences HiFi sequencing (2023). Available from: https://www.pacb.com/technology/hifi-sequencing/ (Accessed 2024).

Google Scholar

40. Completing Human Genomes. Completing human genomes. Nat Methods (2022) 19:629. doi:10.1038/s41592-022-01537-9

PubMed Abstract | CrossRef Full Text | Google Scholar

41. Goodwin, S, Gurtowski, J, Ethe-Sayers, S, Deshpande, P, Schatz, MC, and McCombie, WR. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res (2015) 25(11):1750–6. doi:10.1101/gr.191395.115

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Hood, L, and Rowen, L. The Human Genome Project: big science transforms biology and medicine. Genome Med (2013) 5:79. doi:10.1186/gm483

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Lander, ES, Linton, LM, Birren, B, Nusbaum, C, Zody, MC, Baldwin, J, et al. Initial sequencing and analysis of the human genome. Nature (2001) 409:860–921. doi:10.1038/35057062

PubMed Abstract | CrossRef Full Text | Google Scholar

44. Nurk, S, Koren, S, Rhie, A, Rautiainen, M, Bzikadze, AV, Mikheenko, A, et al. The complete sequence of a human genome. Science (2022) 376:44–53. doi:10.1126/science.abj6987

PubMed Abstract | CrossRef Full Text | Google Scholar

45. Suzuki, Y. Informatics for PacBio long-reads. Single molecule and single cell sequencing. In: Y Suzuki, editor. Advances in experimental medicine and biology. Springer (2019).

Google Scholar

46. Oxford Nanopore Technologies. Oxford nanopore community (2023). Available from: https://nanoporetech.com/community (Accessed 2024).

Google Scholar

47. Bioinformatics. Babraham bioinformatics (2023). Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (Accessed 2023).

Google Scholar

48. Ewels, P, Magnusson, M, Lundin, S, and Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016) 32(19):3047–8. doi:10.1093/bioinformatics/btw354

PubMed Abstract | CrossRef Full Text | Google Scholar

49. Fukasawa, Y, Ermini, L, Wang, H, Carty, K, and Cheung, MS. LongQC: a quality control tool for third generation sequencing long read data. G3 Genes, Genomes, Genet (2020) 10(4):1193–6. doi:10.1534/g3.119.400864

CrossRef Full Text | Google Scholar

50. Coster, WD, D'Hert, S, Schultz, DT, Cruts, M, and Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics (2018) 34(15):2666–9. doi:10.1093/bioinformatics/bty149

PubMed Abstract | CrossRef Full Text | Google Scholar

51. Lanfear, R, Schalamun, M, Kainer, D, Wang, W, and Schwessinger, B. MinIONQC: fast and simple quality control for MinION sequencing data. Bioinformatics (2019) 35(3):523–5. doi:10.1093/bioinformatics/bty654

PubMed Abstract | CrossRef Full Text | Google Scholar

52. Bolognini, D, Bartalucci, N, Mingrino, A, Vannucchi, AM, and Magi, A. NanoR: a user-friendly R package to analyze and compare nanopore sequencing data. PLoS One (2019) 14(5):e0216471. doi:10.1371/journal.pone.0216471

PubMed Abstract | CrossRef Full Text | Google Scholar

53. Graubert, A, Aguet, F, Ravi, A, Ardlie, KG, and Getz, G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts. Bioinformatics (2021) 37(18):3048–50. doi:10.1093/bioinformatics/btab135

PubMed Abstract | CrossRef Full Text | Google Scholar

54. PacBio SMRT®. Tools reference guide (v11.0) (2022). Available from: https://www.pacb.com/wp-content/uploads/SMRT_Tools_Reference_Guide_v11.0.pdf (Accessed 2022).

Google Scholar

55. Oxford Nanopore Technologies. Oxford nanopore technologies (2023). Available from: https://github.com/nanoporetech/dorado (Accessed 2023).

Google Scholar

56. Oxford Nanopore Technologies. Oxford nanopore technologies (2024). Available from: https://community.nanoporetech.com/docs/prepare/library_prep_protocols/Guppy-protocol/v/gpb_2003_v1_revax_14dec2018/guppy-software-overview (Accessed 2024).

Google Scholar

57. Zheng, Z, Li, S, Su, J, Leung, AWS, Lam, TW, and Luo, R. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat Comput Sci (2022) 2(12):797–803. doi:10.1038/s43588-022-00387-x

PubMed Abstract | CrossRef Full Text | Google Scholar

58. Sedlazeck, FJ, Rescheneder, P, Smolka, M, Fang, H, Nattestad, M, von Haeseler, A, et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods (2018) 15:461–8. doi:10.1038/s41592-018-0001-7

PubMed Abstract | CrossRef Full Text | Google Scholar

59. Romagnoli, S, Bartalucci, N, and Vannucchi, AM. Resolving complex structural variants via nanopore sequencing. Front Genet (2023) 14:1213917. doi:10.3389/fgene.2023.1213917

PubMed Abstract | CrossRef Full Text | Google Scholar

60. Oxford Nanopore Technologies. Oxford nanopore technologies (2018). Available from: https://github.com/nanoporetech/medaka (Accessed 2018).

Google Scholar

61. Ni, P, Huang, N, Zhang, Z, Wang, DP, Liang, F, Miao, Y, et al. DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning. Bioinformatics (2019) 35(22):4586–95. doi:10.1093/bioinformatics/btz276

PubMed Abstract | CrossRef Full Text | Google Scholar

62. Kolmogorov, M, Yuan, J, Lin, Y, and Pevzner, P. Assembly of long error-prone reads using repeat graphs. Nat Biotechnol (2019) 37(5):540–6. doi:10.1038/s41587-019-0072-8

PubMed Abstract | CrossRef Full Text | Google Scholar

63. Koren, S, Walenz, BP, Berlin, K, Miller, JR, Bergman, NH, and Phillippy, AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res (2017) 27(5):722–36. doi:10.1101/gr.215087.116

PubMed Abstract | CrossRef Full Text | Google Scholar

64. Nurk, S, Walenz, BP, Rhie, A, Vollger, MR, Logsdon, GA, Grothe, R, et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res (2020) 30(9):1291–305. doi:10.1101/gr.263566.120

PubMed Abstract | CrossRef Full Text | Google Scholar

65. Chaisson, MJ, and Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics (2012) 13:238. doi:10.1186/1471-2105-13-238

PubMed Abstract | CrossRef Full Text | Google Scholar

66. Chin, CS, Peluso, P, Sedlazeck, FJ, Nattestad, M, Concepcion, GT, Clum, A, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods (2016) 13(12):1050–4. doi:10.1038/nmeth.4035

PubMed Abstract | CrossRef Full Text | Google Scholar

67. Mayakonda, A, Lin, DC, Assenov, Y, Plass, C, and Koeffler, HP. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res (2018) 28(11):1747–56. doi:10.1101/gr.239244.118

PubMed Abstract | CrossRef Full Text | Google Scholar

68. Wickham, H ggplot2: elegant graphics for data analysis. New York: Springer-Verlag (2016). Available from: https://ggplot2.tidyverse.org (Accessed 2009).

Google Scholar

69. Hunter, JD. Matplotlib: a 2D graphics environment. Comput Sci Eng (2007) 9(3):90–5. doi:10.1109/MCSE.2007.55

CrossRef Full Text | Google Scholar

70. Bruce, J, Abeel, T, Shea, T, Priest, M, Abouelliel, A, Sakthikumar, S, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE (2014) 9(11):e112963. doi:10.1371/journal.pone.0112963

PubMed Abstract | CrossRef Full Text | Google Scholar

71. Vaser, R, Sović, I, Nagarajan, N, and Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res (2017) 27(5):737–46. doi:10.1101/gr.214270.116

PubMed Abstract | CrossRef Full Text | Google Scholar

72. Yépez, VA, Gusic, M, Kopajtich, R, Mertes, C, Smith, NH, Alston, CL, et al. Clinical implementation of RNA sequencing for Mendelian disease diagnostics. Genome Med (2022) 14(1):38. doi:10.1186/s13073-022-01019-9

PubMed Abstract | CrossRef Full Text | Google Scholar

73. Perešíni, P, Boža, V, Brejová, B, and Vinař, T. Nanopore base calling on the edge. Bioinformatics (2021) 37(24):4661–7. doi:10.1093/bioinformatics/btab528

PubMed Abstract | CrossRef Full Text | Google Scholar

74. Ledergerber, C, and Dessimoz, C. Base-calling for next-generation sequencing platforms. Brief Bioinform (2011) 12(5):489–97. doi:10.1093/bib/bbq077

PubMed Abstract | CrossRef Full Text | Google Scholar

75. Koboldt, DC. Best practices for variant calling in clinical sequencing. Genome Med (2020) 12(91):91. doi:10.1186/s13073-020-00791-w

PubMed Abstract | CrossRef Full Text | Google Scholar

76. Schmieder, R, and Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics (2011) 27(6):863–4. doi:10.1093/bioinformatics/btr026

PubMed Abstract | CrossRef Full Text | Google Scholar

77. Bolognini, D, Semeraro, R, and Magi, A. Versatile quality control methods for nanopore sequencing. Evol Bioinform Online (2019) 15:1176934319863068. doi:10.1177/1176934319863068

PubMed Abstract | CrossRef Full Text | Google Scholar

78. Frye, M, Harada, BT, Behm, M, and He, C. RNA modifications modulate gene expression during development. Science (2018) 361(6409):1346–9. doi:10.1126/science.aau1646

PubMed Abstract | CrossRef Full Text | Google Scholar

79. Field, AE, Robertson, NA, Wang, T, Havas, A, Ideker, T, and Adams, PD. DNA methylation clocks in aging: categories, causes, and consequences. Mol Cel (2018) 71(6):882–95. doi:10.1016/j.molcel.2018.08.008

CrossRef Full Text | Google Scholar

80. Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet (2007) 8:286–98. doi:10.1038/nrg2005

PubMed Abstract | CrossRef Full Text | Google Scholar

81. Kumar, S, Chinnusamy, V, and Mohapatra, T. Epigenetics of modified DNA bases: 5-methylcytosine and beyond. Front Genet (2018) 9(18):640. doi:10.3389/fgene.2018.00640

PubMed Abstract | CrossRef Full Text | Google Scholar

82. Duffy, K, Arangundy-Franklin, S, and Holliger, P. Modified nucleic acids: replication, evolution, and next-generation therapeutics. BMC Biol (2020) 18(112):112. doi:10.1186/s12915-020-00803-6

PubMed Abstract | CrossRef Full Text | Google Scholar

83. Kumar, S, and Mohapatra, T. Deciphering epitranscriptome: modification of mRNA bases provides a new perspective for post-transcriptional regulation of gene expression. Front Cel Dev. Biol. (2021) 9(16):628415. doi:10.3389/fcell.2021.628415

CrossRef Full Text | Google Scholar

84. Soto, J, Rodriguez-Antolin, C, Vallespín, E, de Castro Carpeño, J, and Ibanez de Caceres, I. The impact of next-generation sequencing on the DNA methylation–based translational cancer research. Translational Res (2016) 169:1–18. doi:10.1016/j.trsl.2015.11.003

PubMed Abstract | CrossRef Full Text | Google Scholar

85. Hirst, M, and Marra, MA. Next generation sequencing based approaches to epigenomics. Brief Funct Genomics (2010) 9(5-6):455–65. doi:10.1093/bfgp/elq035

PubMed Abstract | CrossRef Full Text | Google Scholar

86. Flusberg, BA, Webster, DR, Lee, JH, Travers, KJ, Olivares, EC, Clark, TA, et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods (2010) 7:461–5. doi:10.1038/nmeth.1459

PubMed Abstract | CrossRef Full Text | Google Scholar

87. Xu, L, and Seki, M. Recent advances in the detection of base modifications using the Nanopore sequencer. J Hum Genet (2020) 65:25–33. doi:10.1038/s10038-019-0679-0

PubMed Abstract | CrossRef Full Text | Google Scholar

88. Smolka, M, Paulin, LF, Grochowski, CM, Horner, DW, Mahmoud, M, Behera, S, et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol (2024). doi:10.1038/s41587-023-02024-y

CrossRef Full Text | Google Scholar

89. Stoiber, M, Quick, J, Egan, R, Lee, JE, Celniker, S, Neely, RK, et al. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. bioRxiv 094672 (2017). doi:10.1101/094672

CrossRef Full Text | Google Scholar

90. Mandelker, D, and Ceyhan-Birsoy, O. Evolving significance of tumor-normal sequencing in cancer care. Trends Cancer (2020) 6(1):31–9. doi:10.1016/j.trecan.2019.11.006

PubMed Abstract | CrossRef Full Text | Google Scholar

91. Alkan, C, Coe, B, and Eichler, E. Genome structural variation discovery and genotyping. Nat Rev Genet (2011) 12:363–76. doi:10.1038/nrg2958

PubMed Abstract | CrossRef Full Text | Google Scholar

92. Zverinova, S, and Guryev, V. Variant calling: considerations, practices, and developments. Hum Mutat (2022) 43(8):976–85. doi:10.1002/humu.24311

PubMed Abstract | CrossRef Full Text | Google Scholar

93. Mahmoud, M, Gobet, N, Cruz-Dávalos, DI, Mounier, N, Dessimoz, C, and Sedlazeck, FJ. Structural variant calling: the long and the short of it. Genome Biol (2019) 20(246):246. doi:10.1186/s13059-019-1828-7

PubMed Abstract | CrossRef Full Text | Google Scholar

94. Mitsuhashi, S, and Matsumoto, N. Long-read sequencing for rare human genetic diseases. J Hum Genet (2020) 65:11–9. doi:10.1038/s10038-019-0671-8

PubMed Abstract | CrossRef Full Text | Google Scholar

95. Schüle, B, McFarland, KN, Lee, K, Tsai, YC, Nguyen, KD, Sun, C, et al. Parkinson’s disease associated with pure ATXN10 repeat expansion. Parkinson's Dis (2017) 3(27):27. doi:10.1038/s41531-017-0029-x

CrossRef Full Text | Google Scholar

96. McColgan, P, and Tabrizi, SJ. Huntington's disease: a clinical review. Eur J Neurol (2018) 25(1):24–34. doi:10.1111/ene.13413

PubMed Abstract | CrossRef Full Text | Google Scholar

97. Sakamoto, Y, Zaha, S, Suzuki, Y, Seki, M, and Suzuki, A. Application of long-read sequencing to the detection of structural variants in human cancer genomes. Comput Struct Biotechnol J (2021) 19:4207–16. doi:10.1016/j.csbj.2021.07.030

PubMed Abstract | CrossRef Full Text | Google Scholar

98. Phillippy, A. New advances in sequence assembly. Genome Res (2017) 27(5):xi–xiii. doi:10.1101/gr.223057.117

PubMed Abstract | CrossRef Full Text | Google Scholar

99. van Dijk, EL, Jaszczyszyn, Y, Naquin, D, and Thermes, C. The third revolution in sequencing technology. Trends Genet (2018) 34(9):666–81. doi:10.1016/j.tig.2018.05.008

PubMed Abstract | CrossRef Full Text | Google Scholar

100. Nagarajan, N, and Pop, M. Sequence assembly demystified. Nat Rev Genet (2013) 14:157–67. doi:10.1038/nrg3367

PubMed Abstract | CrossRef Full Text | Google Scholar

101. Chen, Z, Erickson, DL, and Meng, J. Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing. BMC Genomics (2020) 21:631. doi:10.1186/s12864-020-07041-8

PubMed Abstract | CrossRef Full Text | Google Scholar

102. Zhong, Y, Xu, F, Wu, J, Schubert, J, and Li, MM. Application of next generation sequencing in laboratory medicine. Ann Lab Med (2021) 41(1):25–43. doi:10.3343/alm.2021.41.1.25

PubMed Abstract | CrossRef Full Text | Google Scholar

103. Bartalucci, N, Romagnoli, S, and Vannucchi, AM. A blood drop through the pore: nanopore sequencing in hematology. Trends Genet (2022) 38(6):572–86. doi:10.1016/j.tig.2021.11.003

PubMed Abstract | CrossRef Full Text | Google Scholar

104. Erlich, RL, Jia, X, Anderson, S, Banks, E, Gao, X, Carrington, M, et al. Next-generation sequencing for HLA typing of class I loci. BMC Genomics (2011) 12:42. doi:10.1186/1471-2164-12-42

PubMed Abstract | CrossRef Full Text | Google Scholar

105. Klasberg, S, Surendranath, V, Lange, V, and Schöfl, G. Bioinformatics strategies, challenges, and opportunities for next generation sequencing-based HLA genotyping. Transfus Med Hemother (2019) 46(5):312–25. doi:10.1159/000502487

PubMed Abstract | CrossRef Full Text | Google Scholar

106. Garg, S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol (2021) 22(1):101. doi:10.1186/s13059-021-02328-9

PubMed Abstract | CrossRef Full Text | Google Scholar

107. Cilibrasi, R, van Iersel, L, Kelk, S, and Tromp, J. The complexity of the single individual SNP haplotyping problem. Algorithmica (2007) 49:13–36. doi:10.1007/s00453-007-0029-z

CrossRef Full Text | Google Scholar

108. Sakamoto, Y, Sereewattanawoot, S, and Suzuki, A. A new era of long-read sequencing for cancer genomics. J Hum Genet (2020) 65:3–10. doi:10.1038/s10038-019-0658-5

PubMed Abstract | CrossRef Full Text | Google Scholar

109. Tarabichi, M, Demetter, P, Craciun, L, Maenhaut, C, and Detours, V. Thyroid cancer under the scope of emerging technologies. Mol Cel Endocrinol (2022) 541:111491. doi:10.1016/j.mce.2021.111491

CrossRef Full Text | Google Scholar

110. Muñoz-Barrera, A, Rubio-Rodríguez, LA, Díaz-de Usera, A, Jáspez, D, Lorenzo-Salazar, JM, González-Montelongo, R, et al. From samples to germline and somatic sequence variation: a focus on next-generation sequencing in melanoma research. Life (Basel) (2022) 12(11):1939. doi:10.3390/life12111939

PubMed Abstract | CrossRef Full Text | Google Scholar

111. Vollmers, AC. Long noncoding RNA. Introduction and overview. In: WE Crusio, H Dong, HH Radeke, N Rezaei, O Steinlein, and J Xiao, editors. Advances in experimental medicine and biology. Springer (2022).

Google Scholar

112. Park, EG, Pyo, SJ, Cui, Y, Yoon, SH, and Nam, JW. Tumor immune microenvironment lncRNAs. Brief Bioinform (2022) 23(1):bbab504. doi:10.1093/bib/bbab504

PubMed Abstract | CrossRef Full Text | Google Scholar

113. Li, X, and Wang, C-Y. From bulk, single-cell to spatial RNA sequencing. Int J Oral Sci (2021) 13(36):36. doi:10.1038/s41368-021-00146-0

PubMed Abstract | CrossRef Full Text | Google Scholar

114. Depledge, DP, Srinivas, KP, Sadaoka, T, Bready, D, Mori, Y, Placantonakis, DG, et al. Direct RNA sequencing on nanopore arrays redefines the transcriptional complexity of a viral pathogen. Nat Commun (2019) 10:754. doi:10.1038/s41467-019-08734-9

PubMed Abstract | CrossRef Full Text | Google Scholar

115. Jovic, D, Liang, X, Zeng, H, Lin, L, Xu, F, and Luo, Y. Single-cell RNA sequencing technologies and applications: a brief overview. Clin Trans Med (2022) 12(3):e694. doi:10.1002/ctm2.694

CrossRef Full Text | Google Scholar

116. Razavi, P, Li, BT, Brown, DN, Jung, B, Hubbell, E, Shen, R, et al. High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants. Nat Med (2019) 25:1928–37. doi:10.1038/s41591-019-0652-7

PubMed Abstract | CrossRef Full Text | Google Scholar

117. Song, P, Wu, LR, Yan, YH, Zhang, JX, Chu, T, Kwong, LN, et al. Limitations and opportunities of technologies for the analysis of cell-free DNA in cancer diagnostics. Nat Biomed Eng (2022) 6(3):232–45. doi:10.1038/s41551-021-00837-3

PubMed Abstract | CrossRef Full Text | Google Scholar

118. Shieh, JTC. Genomic technologies to improve variation identification in undiagnosed diseases. Ped Neonatal (2023) 64(S1):S18–S21. doi:10.1016/J.pedneo.2022.10.002

CrossRef Full Text | Google Scholar

119. Macken, WL, Vandrovcova, J, Hanna, MG, and Pitceathly, RDS. Applying genomic and transcriptomic advances to mitochondrial medicine. Nat Rev Neurol (2021) 17:215–30. doi:10.1038/s41582-021-00455-2

PubMed Abstract | CrossRef Full Text | Google Scholar

120. Esteller, M. Epigenetic gene silencing in cancer: the DNA hypermethylome. Hum Mol Genet (2007) 16(Spec No 1):R50–9. doi:10.1093/hmg/ddm018

PubMed Abstract | CrossRef Full Text | Google Scholar

121. Lakshminarasimhan, R, and Liang, G. The role of DNA methylation in cancer. Adv Exp Med Biol (2016) 945:151–72. doi:10.1007/978-3-319-43624-1_7

PubMed Abstract | CrossRef Full Text | Google Scholar

122. Abante, J, Kambhampati, S, Feinberg, AP, and Goutsias, J. Estimating DNA methylation potential energy landscapes from nanopore sequencing data. Sci Rep (2021) 11(1):21619. doi:10.1038/s41598-021-00781-x

PubMed Abstract | CrossRef Full Text | Google Scholar

123. Jhanwar, S, and Deogade, A. 5-Methylcytosine and 5-hydroxymethylcytosine signatures underlying pediatric cancers. Epigenomes (2019) 3(2):9. doi:10.3390/epigenomes3020009

PubMed Abstract | CrossRef Full Text | Google Scholar

124. Zhou, D, Guo, S, Wang, Y, Zhao, J, Liu, H, Zhou, F, et al. Functional characteristics of DNA N6-methyladenine modification based on long-read sequencing in pancreatic cancer. Brief Funct Genomics (2023) 23:150–62. doi:10.1093/bfgp/elad021

CrossRef Full Text | Google Scholar

125. Brockley, LJ, Souza, VGP, Forder, A, Pewarchuk, ME, Erkan, M, Telkar, N, et al. Sequence-based platforms for discovering biomarkers in liquid biopsy of non-small-cell lung cancer. Cancers (Basel) (2023) 15(8):2275. doi:10.3390/cancers15082275

PubMed Abstract | CrossRef Full Text | Google Scholar

126. Ibrahim, J, Peeters, M, Van Camp, G, and Op de Beeck, K. Methylation biomarkers for early cancer detection and diagnosis: current and future perspectives. Eur J Cancer (2023) 178:91–113. doi:10.1016/j.ejca.2022.10.015

PubMed Abstract | CrossRef Full Text | Google Scholar

127. Sahm, F, Schrimpf, D, Jones, DTW, Meyer, J, Kratz, A, Reuss, D, et al. Next-generation sequencing in routine brain tumor diagnostics enables an integrated diagnosis and identifies actionable targets. Acta Neuropathol (2016) 131(6):903–10. doi:10.1007/s00401-015-1519-8

PubMed Abstract | CrossRef Full Text | Google Scholar

128. Arts, P, Simons, A, AlZahrani, MS, Yilmaz, E, AlIdrissi, E, van Aerde, KJ, et al. Exome sequencing in routine diagnostics: a generic test for 254 patients with primary immunodeficiencies. Genome Med (2019) 11:38. doi:10.1186/s13073-019-0649-3

PubMed Abstract | CrossRef Full Text | Google Scholar

129. Breinholt, MF, Nielsen, K, Schejbel, L, Fassi, DE, Schöllkopf, C, Novotny, GW, et al. The value of next-generation sequencing in routine diagnostics and management of patients with cytopenia. Int J Lab Hematol (2022) 44(3):531–7. doi:10.1111/ijlh.13802

PubMed Abstract | CrossRef Full Text | Google Scholar

130. Fogel, BL, Lee, H, Strom, SP, Deignan, JL, and Nelson, SF. Clinical exome sequencing in neurogenetic and neuropsychiatric disorders. Ann N Y Acad Sci (2016) 1366(1):49–60. doi:10.1111/nyas.12850

PubMed Abstract | CrossRef Full Text | Google Scholar

131. Schmidt, J, Blessing, F, Fimpler, L, and Wenzel, F. Nanopore sequencing in a clinical routine laboratory: challenges and opportunities. Clin Lab (2020) 66(6). doi:10.7754/Clin.Lab.2019.191114

PubMed Abstract | CrossRef Full Text | Google Scholar

132. Olivucci, G, Iovino, E, Innella, G, Turchetti, D, Pippucci, T, and Magini, P. Long read sequencing on its way to the routine diagnostics of genetic diseases. Front Genet (2024) 15:1374860. doi:10.3389/fgene.2024.1374860

PubMed Abstract | CrossRef Full Text | Google Scholar

133. Eagle, SHC, Robertson, J, Bastedo, DP, Liu, K, and Nash, JHE. Evaluation of five commercial DNA extraction kits using Salmonella as a model for implementation of rapid Nanopore sequencing in routine diagnostic laboratories. Access Microbiol (2023) 5(2):000468v3. doi:10.1099/acmi.0.000468.v3

CrossRef Full Text | Google Scholar

134. Erdmann, H, Schöberl, F, Giurgiu, M, Leal Silva, RM, Scholz, V, Scharf, F, et al. Parallel in-depth analysis of repeat expansions in ataxia patients by long-read sequencing. Brain (2023) 146(5):1831–43. doi:10.1093/brain/awac377

PubMed Abstract | CrossRef Full Text | Google Scholar

135. Matern, BM, Olieslagers, TI, Groeneweg, M, Duygu, B, Wieten, L, Tilanus, MGJ, et al. Long-read nanopore sequencing validated for human leukocyte antigen class I typing in routine diagnostics. J Mol Diagn (2020) 22(7):912–9. doi:10.1016/j.jmoldx.2020.04.001

PubMed Abstract | CrossRef Full Text | Google Scholar

136. Buenestado-Serrano, S, Herranz, M, Otero-Sobrino, Á, Molero-Salinas, A, Rodríguez-Grande, C, Sanz-Pérez, A, et al. Accelerating SARS-CoV-2 genomic surveillance in a routine clinical setting with nanopore sequencing. Int J Med Microbiol (2024) 314:151599. doi:10.1016/j.ijmm.2024.151599

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: sequencing, short-read, long-read, bioinformatics, DNA

Citation: Szakállas N, Barták BK, Valcz G, Nagy ZB, Takács I and Molnár B (2024) Can long-read sequencing tackle the barriers, which the next-generation could not? A review. Pathol. Oncol. Res. 30:1611676. doi: 10.3389/pore.2024.1611676

Received: 10 January 2024; Accepted: 30 April 2024;
Published: 16 May 2024.

Edited by:

Attila Patocs, Semmelweis University, Hungary

Copyright © 2024 Szakállas, Barták, Valcz, Nagy, Takács and Molnár. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Nikolett Szakállas, szakallasn3@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.