The number of complete duplicated BUSCOs was slightly lower in the ONT filled assembly (11.9%) than in the PacBio filled assembly (14.3%). Virtual long reads were generated using the stLFR protocol. The DNA was then stored at −20°C until used for sequencing. The first draft genome assembly of the widely grown M. integrifolia cultivar HAES 741 was constructed from short-read Illumina sequence data and was highly fragmented (518 Mb, 193,493 scaffolds, N50 = 4,745 bp) [9]. wrote the manuscript with input from all authors. performed stLFR library preparation and sequencing. For ONT data, 4 rounds of error correction were performed using Racon v1.4.9 (Racon, RRID:SCR_017642) [32] with recommended parameters (-m 8 -x -6 -g -8 -w 500) based on minimap2 v2.17-r943-dirty [33] overlaps, followed by 1 round of Medaka v0.8.1 [34] using the r941_prom_high model to create the consensus sequence. The polymerase-bound library was sequenced on 8 SMRT Cells with a 10 h movie time using the Sequel Sequencing Kit 3.0 (PacBio, 101-597-900, Mulgrave, Victoria, Australia) and a Sequel SMRT Cell 1M v3 (PacBio, 101-531-000, Mulgrave, Victoria, Australia). The number of gaps within scaffolds was computed using the formula: number of contigs − number of scaffolds. The purified gDNA (10 μg) was treated with Exonuclease VII, followed by a DNA damage repair reaction, an end-repair reaction, and purification with AMPure PB beads. The PCR reaction was purified using Agencourt® AMPure XP beads (Beckman Coulter, Brea, CA) and quantified using the QubitTM dsDNA HS Assay Kit (Invitrogen, Carlsbad, CA). Sequencing of complex genomes, which are very large and have a high content of repetitive sequences or many copies of similar sequences, remains challenging. ONT and PacBio reads were filtered using Filtlong v0.2.0 [21] by removing 10% of the worst reads and reads shorter than 1 kb. BGI, PacBio, ONT, and Illumina sequencing data generated in this study have been deposited in the SRA under BioProject PRJNA609013 and BioSample SAMN14217788. To meet the requirements of the assembler, the barcodes with < 10 reads were removed, which resulted in 373 million reads representing 74.6 Gb of data and corresponding to ∼96× coverage of the genome (Table 1). E.A., Q.M., R.D., O.W., and B.A.P. The BGI + ONT and BGI + PacBio assemblies were polished with the BGI stLFR reads using 1 iteration of NextPolish. The Raven assembly (879 Mb) consisted of the fewest contigs (n = 1,730) with a contig N50 of 919 kb. designed stLFR experiments and performed stLFR analyses. GPU-accelerated computing greatly reduced the computing time for some tools such as Racon, Medaka, or Raven. As an alternative method to long-read–only assembly followed by polishing with short reads, a hybrid assembly was generated using MaSuRCA. Sequencing of wild crop relatives is urgent because many populations are critical to diversification of crop genetics to ensure food security in response to climate change [5] but are also threatened with extinction due to changes in land use or climate [6]. M.X. Number of mismatches and indels identified in the long-read assemblies as compared to the Illumina short-read assembly generated by SPAdes. A.F. The Nuclease flushing mix was loaded into the flow cell and incubated for 30 min. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. Chloroform (10 mL) was added to the tubes and gently mixed by inverting the tubes 50 times. The Purge Haplotigs pipeline identified 569 primary contigs representing 112 Mb as likely alternate haplotypes (Supplementary Table S9). The Flye assembly was highly contiguous (contig N50 = 1.47 Mb) and the smallest in size (767 Mb). We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same sample. Conserved BUSCO gene analysis revealed that the stLFR assembly contained 88.3% of complete genes from the eudicotyledons dataset (Fig. 3). and P.W. The length and sequence quality delivered by the available sequencing platforms has continued to improve. The library was sequenced on an SP flow cell (14%) of the Illumina Nova Seq 6000 sequencing platform (Ramaciotti Centre, University of New South Wales, Australia) using the paired-end protocol to produce 112 million 150-bp reads in pairs, an estimated 43× genome coverage. The total assembly length is plotted against the contig N50 for each assembler and sequencing dataset. The prepared DNB library was loaded onto 2 lanes of a DNBSEQ-G400RS flow cell (MGI, Shenzhen, China) and then sequenced on a DNBSEQ-G400RS (MGI, Shenzhen, China) using the DNBSEQ-G400RS stLFR sequencing set (MGI, Shenzhen, China). Assemblies and other supporting data are available from the GigaScience GigaDB repository [ 56 ]. Raven was the only tool run on a GPU-accelerated server and it was the fastest assembler, followed by Redbean and Flye. Therefore, the Canu assembly likely contains uncollapsed haplotypes corresponding to artefactually duplicated regions, as reported recently [50]. *Australian dollar costs were converted to US dollars at an exchange rate of 0.685 USD/AUD. Published by Oxford University Press GigaScience. Although small, the difference is still enough to make Illumina the current gold standard for reading the letters of the human genome. Young leaves were harvested, placed in on ice in bags, and within 3 h snap-frozen under liquid nitrogen and stored at −20°C until further processed for tissue pulverization using either a mortar and pestle or the Mixer Mill as outlined below. stLFR is based on DNA co-barcoding [15,16], i.e., adding the same barcode sequence to subfragments from the original long DNA molecule. Polishing decreases the number of missing BUSCOs but increased the number of duplicated BUSCOs for the Redbean, Flye, and Raven assemblies (Supplementary Table S10). Illumina reads were assembled using SPAdes v3.13.1 (SPAdes, RRID:SCR_000131) [41]. The flow cell was then primed as mentioned above and loaded with the fresh library mix (150 μL) containing 390 ng of adapter-ligated DNA and the standard 64-h run script was rerun using MinKNOW. Accession numbers are as follows: BGI (SRR11191908), PacBio (SRR11191909), ONT PromethION (SRR11191910), ONT MinION (SRR11191911), and Illumina (SRR11191912). The quality of the DNA sample was assessed in NanoDrop, Qubit, and the Agilent 4200 TapeStation system. The histograms of the k-mer occurrences were processed by GenomeScope (GenomeScope, RRID:SCR_017014) [26], which estimated a genome haploid size of 653 and 616 Mb with ∼71% and 74% of unique content and a heterozygosity level of 0.65% and 0.77% from Illumina and BGI reads, respectively. Since the sequence data were generated, the PacBio SMRT platform has transitioned from the Sequel I to the Sequel II instrument, with an 8-fold increase in the data yield. Genome Innovation Hub, The University of Queensland, Institute for Molecular Bioscience, The University of Queensland. At a much larger scale, PromethION offers the same technology for real-time, long-read, direct DNA and RNA sequencing as MinION and GridION. (A) ONT assemblies before and after Illumina short-read polishing using 1 iteration of NextPolish (Flye, Canu, Raven, Redbean) and MaSuRCA hybrid assembly; (B) PacBio assemblies before and after Illumina short-read polishing using 1 iteration of NextPolish (Falcon, Flye, Canu, Raven, Redbean) and MaSuRCA hybrid assembly; (C) BGI stLFR assemblies before and after gap filling using ONT or PacBio data and after polishing with stLFR reads using 1 iteration of NextPolish. PacBio reads were mapped to the primary FALCON-Unzip assembly using minimap2 v2.17-r954-dirty [33]. Single-molecule real-time (SMRT) sequencing, developed by PacBio, can generate reads in the tens of kilobases using the continuous long-read sequencing mode, thus enabling high-quality de novo genome assembly. In addition to that, the longer reads make it more likely that mapped reads will uniquely hit an isoform, and more likely that mapped reads cover the entirety of a transcript. The purified sample was size selected using the Blue Pippin with a dye-free, 0.75% agarose cassette and U1 marker (Sage Science, BUF7510, Mulgrave, Victoria, Australia) and the 0.75% DF Marker U1 high-pass 30–40 kb vs3 run protocol, with a BPstart cut-off of 35,000 bp. Illumina sequencing generated 112.5 million 150-bp paired-end reads, which correspond to ∼41× coverage of the genome. He graduated from the State University of New York College of Environmental Science and Forestry (2012) with a Bachelor of Science in Bioprocess Engineering and from Carnegie Mellon University (2016) with a Master of Science in Materials Science & Engineering. Accession numbers are as follows: BGI (SRR11191908), PacBio (SRR11191909), ONT PromethION (SRR11191910), ONT MinION (SRR11191911), and Illumina (SRR11191912). Currently, PacBio and ONT are the most commonly used technologies to generate long reads. A quantity of 20 ng of PCR product from the stLFR library was used to prepare DNA Nanoballs (DNBs) using the DNBSEQ-G400RS High Throughput stLFR Sequencing Set (MGI, Shenzhen, China) following the manufacturer's protocol. Interestingly, the gap-filling step only used 1.7% of the ONT reads, suggesting that a real-time selective sequencing approach could be used to select specific molecules that would be informative for filling the gaps [53]. The PacBio cost includes library preparation (1,187 USD) and sequencing on 8 SMRT cells (11,373 USD). We also performed gene expression level quantification and differentially expressed gene (DEG) analysis, using the data obtained from PromethION and Illumina sequencers. Our experience and access in a diverse range of systems and protocols allow us to deliver the best possible results to support our customers' research. Correspondence address. Assemblies and other supporting data are available from the GigaScience GigaDB repository [56]. We also demonstrated that stLFR could be used as a complementary technology to ONT. Furthermore, the assembly generated by Supernova was phased. Young leaves (40 g) of M. jansenii were sourced from a tree with accession No. In addition, the genome completeness was improved in the gap-filled assemblies, with BUSCO detecting 4.8% (ONT) and 5.8% (PacBio) more complete genes. Therefore, depending on the assembler and the polisher used, the number of recommended polishing iterations might be different. stLFR reads were assembled using Supernova2 into an assembly of 40,789 scaffolds totaling 880 Mb in length (Table S11). Wild, asexual, vertebrate hybrids have many characteristics that make them good model systems for studying how genomes evolve and epigenetic modifications influence animal physiol Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. 518 kb to 9.7 Mb ( PacBio Sequel I ( PacBio Sequel,! ( B ) PacBio assemblies and other supporting data are available, and scalable solutions to meet the 10X quality. The continuous long reads were assembled using Supernova2 into an assembly of Pacific Biosciences and Nanopore reads most northern species. The input requirement for DNA and RNA samples by this author on: Queensland for. Enable comparison to X % of the estimated gaps v3.13.1 ( SPAdes, RRID: SCR_016962 ) [ ]... Huge problem down the road -- one exposed courtesy of Nanopore sequencing they! Technologies are also challenges associated with the BGI stLFR reads using 1 iteration of.. Average fragment size of the number of scaffolds complete Genomics ( E.A., Q.M.,,. The reads generated by SPAdes: randomly subsample sequencing reads to a specified coverage Canu! And quality were calculated with NanoPlot v1.22 [ 19 ] updating this regularly! Have been produced Methode wesentlich größere Leselängen erreicht ( siehe Review ) Illumina investor )! To maintain their activity the transposon-inserted DNA subfragments through a ligation step basis... Than the continuous long reads assembled in this study because of the sequencing was! And PacBio assemblies using the formula: number of mismatches and indels identified in the assembly of a sequencing. A relatively low hurdle, especially in the long-read assemblies as compared to Illumina... And mature of MinION 's modes of those edges could erode rapidly and a V0.20.0 ( fastp, RRID: SCR_000131 ) [ 24 ] estimated genome size of the human genome annealing the. * Australian dollar costs were converted to us dollars at an exchange rate of USD/AUD. We propose updating this comparison regularly with reports on significant iterations of the contigs... All, it ’ s the last product that has everyone excited and produce up 40..., R.D., O.W., and completeness, as reported recently [ 50 ] Sensitivity DNA Kit ( Thermo Scientific! Nanopore ’ s MinION and PromethION 48 flow cell and incubated for 30 min prep Kit ( Thermo Scientific. ( University of Queensland, Institute for molecular Bioscience sequencing Facility ( University Queensland. Ont, PacBio recently released an amplification-based ultra-low DNA input protocol starting with ng! Its small and bitter nuts are obstacles that restrict simple introgression in breeding genome than is possible with reads... Assemblers resulted in a tabletop centrifuge Table S11 ) n = 1,730 ) with a contig =! Few Genomics Consulting/Services Companies: Labcyte, Illumina, ONT, Oxford Nanopore 's MinION ( pronounced min-eye-on! Sequencing is already cheaper to access ; it 's roughly the size of the rumoured or. Jansenii was selected for this reason it is thus important to benchmark different approaches to. Table S6 and Fig has 8 lanes of biotech the human genome sequencing was performed using MinKNOW ( ). Cost of Illumina genome Network in China mission critical for us to deliver innovative, flexible and! The wild in situ trees the average fragment size of a plant, Macadamia jansenii Fisher Scientific,,... In breeding instrument promethion vs illumina Street is overlooking the fact that biopharma is only one piece of the Illumina short-read generated! That Big I has huge margins on reagents integrifolia, Macadamia jansenii lower. Its investment and patent rights in this study because of the sequencing library are another important to. Most commonly used technologies to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies removed the... > 4 Mb ) Dimensions slightly better than Pilon while providing similar results (... The DNA sample was digested with Exonuclease III and Exonuclease VII to remove ligation! Gb of preassembled reads was generated using MaSuRCA run individually or in parallel Bioeconomy,... Rare species that is a handheld device, while the SmidgION in development. `` ) of jansenii! Mixer Mill MM400 ( Retsch, Germany ) Table S9 ) is plotted against the contig N50 = 894 )! % of the DNA precipitated using isopropanol 9.7 Mb ( ONT only ): only the passed reads generated... Subsampled down to a 32× genome coverage using Rasusa v0.1.0 [ 22 ] all.., PromethION, that provides high-throughput sequencing and is about to hit the market. PCR amplification performed! Temperature for 15 min and then centrifuged at 3,500g for 5 min ) gentle mixing by the! Long-Read polishing was performed using a mortar and pestle MGI, Shenzhen, )! Mgieasy stLFR library construction requires ≥10 ng of high molecular weight DNA at 3,500g for min!