Original guide published in July . Updated August .
Sample preparation is the process of getting DNA ready for Next Generation Sequencing (NGS). This requires a few steps
Why is sample preparation important? Increasingly, NGS is being asked to handle more challenging samples, from diverse origins, of lower quality or of small size. Before these samples can be analyzed, they must be treated and prepared. This helps to prevent contamination, improve accuracy and minimize the risk of biases. Sample preparation is no longer just the warm up for NGS if any of the processes are done poorly, sequencing will not obtain successful results.
Sample preparation varies depending on the type of material being sampled and the purpose of the experiment. Different types of genetic material (DNA or RNA) have slightly different sample preparation processes. On top of that, the different applications of NGS add another dimension. Therefore, no preparation protocol is always optimal, and there are a number of questions that need to be asked before the experiment to determine the best methods. It would be impossible to cover each and every route that could be taken in one guide, but we have compiled a wealth of information about some of the most important aspects of sample preparation for NGS.
Sample preparation processes differ depending on the type of sequencing being performed because each technology has unique considerations. The abundance of new applications for sequencing data is constantly growing, in turn driving the need for more diverse sample preparation protocols.
Here are some common types of sequencing:
What are ChIP-seq and bisulfite sequencing?
For more information about sequencing technologies, download the Sequencing Buyers Guide. This comprehensive report explores the latest advances in sequencing, sample prep and everything you need to know about NGS.
Sample preparation is essentially the steps that need to be taken to transform mixtures of nucleic acids from biological samples into different types of libraries ready to be sequenced by NGS technologies. If the protocols are not followed correctly, the success of sequencing will be compromised. Each step of the preparation is fundamental and has different considerations depending on the type of sample and NGS platform. Therefore, it is important to consider how the most efficient protocols can be carried out before starting the experiment to ensure the highest quality results.
The general steps for sample preparation are as follows:
Challenge 1: Many samples are extracted from a limited number of cells or even a single cell. These dont provide enough genetic material alone and so need to undergo PCR. However, this amplification step is prone to introducing bias to the sample. PCR duplication is when there are multiple copies of exactly the same DNA fragment. Too many PCR duplicates can lead to uneven sequencing coverage of the experiment.
Solution 1: It is somewhat impossible to eliminate all sources of bias, but it is important to know where the bias occurs and take all practical steps to minimize it. A high PCR duplication rate indicates that the library preparation needs some modification its probably necessary to improve the complexity of the NGS library. Many programs exist that can remove PCR duplicates the most commonly used are called Picard MarkDuplicates and SAMTools. Also, specific PCR enzymes have been shown to minimize amplification bias. Ultimately, the goal in library preparation is to do it in such a way where complexity of a sample is maximized and bias due to amplification is minimized.
Challenge 2: Inefficient library construction is a problem faced during sample preparation. It is reflected by a low percentage of fragments with the correct adaptors. The consequences are a decreased amount of sequencing data being obtained and an increased number of chimeric fragments. Chimeric reads are derived from portions of the genome that are not next to each other and are a source for errors during sequencing.
Solution 2: Efficient A-tailing of PCR products has been reported to prevent chimera formation the procedure is universal and can be applied to a number of different library construction techniques. Additionally, strand-split artifact reads (SSARs) have been suggested to reduce the number of chimeric artifacts in a sample and chimera detection programs can be used to filter the raw sequences to achieve an overall chimera rate of just 1%.
Challenge 3: Sample contamination is an inherent problem because separate libraries are usually prepared in parallel. The most probable primary source of contamination is pre-amplification, which is a method that increases the amount of nucleotide sequences before PCR.
Solution 3: Contamination risk can be reduced by lowering human contact with the samples. Also, one room or area should be dedicated for pre-PCR testing. This room could further be divided into areas one for PCR mixture preparation and another for the addition of the extracted nucleic acids into the mixture.
Challenge 4: The large costs of library preparation are mostly due to lab equipment, the need for trained personnel and reagent costs.
Solution 4: The introduction of using tagmentation reactions to combine fragments with an adapter has significantly reduced costs. The price per sample will decrease as less hands-on time is required. As automation techniques become ever-so popular, the accuracy and efficiency of sample preparation is likely to increase although the initial cost of the instruments and maintenance may still be high.
The very first step of sample preparation is the isolation of nucleic acids. This involves a series of steps to obtain pure DNA or RNA. As this is the very starting point for a number of downstream applications, the high quality of nucleic acids is crucial for the success of sequencing later on.
The first question that should be asked is what source are the nucleic acids being extracted from?
The best sample type to isolate nucleic acids from is probably a homogenous population of cells from an in vitro culture (a group of uniform cells obtained from studies conducted outside of the organism). For example, white blood cells isolated from a blood sample would be relatively homogenous. However, some clinical samples are not so homogenous and so have very limited amounts of nucleic acids to work with a fine needle biopsy of a small tumor sample would most likely prove difficult to isolate from.
The quality of extracted nucleic acids depends on the quality of the starting sample. Fresh starting material is always recommended, but this is often not possible. So, samples should be stored appropriately, which usually involves freezing or cooling at specific temperatures.
The next question is what are the nucleic acids going to be used for? This comes with a range of further considerations. In particular, these depend on the type of sequencing machine.
The Progress of NGS Platforms
Table of nucleic acid extraction techniques the main characteristics of chemical and mechanical methods are described. Image credit: Harrison,
What are the different types of nucleic acid extraction?
The choice of isolation method depends on the aim of the study, the type of analysis and the type of nucleic acids. Also, it is important to consider the sample type. For example, if you are looking at mucous samples, such as nasal discharges or sputum, the viscosity of the material would need to be decreased. This could be done using a mucous-dissolving agent, like the mucolytic acetylcysteine. The spleen and liver are transcriptionally active organs and so have a very high RNA content. If the samples were intended for DNA analysis, then they would have to be treated with ribonuclease (RNase) before purification to break the RNA down.
There are a number of other points that need considering when choosing a DNA extraction method:
Top single cell isolation techniques
The preparation of a sequencing library is necessary before NGS analysis a sequencing library is essentially a pool of DNA fragments with adapters attached. Numerous kits for making sequencing libraries are available commercially from a variety of vendors. Competition has steadily driven prices down and quality up.
It is important to try and obtain the highest complexity level as possible in an NGS library because this will reduce the amount of bias. Library complexity refers to the number of unique DNA fragments present in other words, the library should reflect the starting material as much as possible. Reductions in complexity usually result from PCR amplification during library preparation, which elevates the number of duplicate reads. Also, shorter fragments are typically less specific in terms of alignment and so further decrease the complexity of a sample. As NGS technology steadily evolves, sample requirements will become less strict and starting materials will require less amplification, thus improving library complexity.
Tagmentation is an alternative method for library preparation. It uses an engineered transposase enzyme to fragment the DNA and add specific adapters to both ends of the fragments, all at the same time. Therefore, it improves on traditional preparation procedures, as it combines DNA fragmentation, end-repair and adapter ligation all into a single step. However, this method is much more sensitive to the amount of DNA input compared to other fragmentation methods.
Long-read sequencing vs short-read sequencing
Unsurprisingly, as the number of sequencing platforms has rapidly expanded so too have the number of library preparation kits available. Many third-party kits are now being sold alongside core library preparation kits from sequencing instrument providers. The diverse range of products available ensures that researchers have a wealth of library preparation options, regardless of their sequencing requirements.
In a world where high-throughput sequencing is becoming increasingly more common,the creation of NGS libraries with fewer steps, less reagents and simple instructions makes the sequencing process more manageable especially when multiple samples are being processed in parallel. One prominent example of this is magnetic bead-based DNA purification, a labour-intensive step present in some kits that can result in errors in library preparation1. Therefore, selecting a kit with minimal pipetting steps or one with processes that can be automated (see below) may result in less errors and higher reproducibility.
One of the most prominent factors of complexity is time researchers often desire an efficient workflow with minimal hands-on time required. In response to this demand, newer NGS library preparation products with streamlined workflows have been launched that can create functional libraries in just a couple of hours.
TSKT Product Page
Preparing hundreds or thousands of samples at once is now a reality for some research labs. Luckily, many vendors including Illumina, New England Biolabs and Qiagen now offer automation solutions that aim to efficiently process samples and produce NGS libraries without human intervention. Such systems typically include liquid handling instruments, capable of autonomously running the full library preparation protocol. In addition to reducing hands-on time for researchers, automation also decreases contamination errors, improves sample quality and allows laboratories to scale-up their sequencing preparations as needed2.
For some samples, the amount and type of DNA required for library preparation needs to be considered. Many kits are now compatible with low input amounts (often as little as 1 ng or less). Although this small amount may affect the quality or coverage of downstream sequencing data, this low input requirement is helpful when dealing with uncommon or low abundance samples. Additionally, there are several kits on the market that are specialised for dealing with damaged, low-quality DNA samples such as the xGen ssDNA & Low-Input DNA Library Preparation Kit from Integrated DNA Technology (IDT). As a result, specialised products like these allow researchers to rescue valuable sequencing data from rare sources, including ancient DNA.
Library amplification via PCR is often a required step for many library preparation kits. Although PCR allows researchers to sequence samples with low DNA content, PCR may introduce GC bias, amplification bias and duplicates that may hinder downstream genome assembly or data analysis. To counteract this problem, many vendors have created PCR-free kits that offer reduced assay times and increased coverage across genomic regions that are traditionally challenging to sequence. One well-established example is Illuminas TruSeq DNA PCR-Free kit, which shows impressive coverage improvement for G-rich, high GC and promoter regions when compared to data from their TruSeq DNA kit featuring PCR amplification3.
Sample multiplexing (occasionally referred to as indexing) is the process of tagging each DNA fragment to identify which sample that DNA fragment originated from. By doing this, researchers can pool the libraries for multiple samples together and sequence several samples in parallel. After sequencing, the unique tag (often referred to as a barcode or index) can then be used to group the data into their respective samples before analysis takes place. Therefore, multiplexing is an attractive way to conduct high-throughput sequencing and save both time and money.
Ultimately, the best library preparation kit is one that produces the highest quality data for the lowest price. Hidden reagent costs, expenses associated with analysis, researcher time and kit usage (e.g., wasting resources due to reagent expiration) can all play a factor in the total cost per reaction in addition to the initial expense of the library preparation kit itself. If the experiment requires a small number of samples, or if sequencing is not regularly required, labs may instead opt to outsource sequencing to NGS service providers.
In our recent Methods And Tech review, we list the newest and most popular products from the leaders within the NGS library preparation space.
RNA sequencing allows you to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations and changes in gene expression over time. The cost of RNA sequencing is continually falling, enabling varied investigations of molecular biology in a more precise and thorough manner. It has been used clinically to help determine the optimal treatment strategies based on molecular alterations detected in cancers. It is thought that the future use of RNA sequencing alongside DNA sequencing would be a powerful tool to help cancer patients, although this has not yet been implemented.
How to do RNA sequencing
When setting up an RNA sequencing experiment, one of the first considerations should be what is being investigated about transcription Is gene expression being explored? Are the transcripts being characterized of high or low abundance? Is the aim to find out which strand each transcript was derived from? Answering these queries should help to determine what type of RNA sequencing libraries should be generated.
For example, if the objective of an RNA sequencing experiment is the discovery of complex transcriptional events, then the library should capture the entire transcriptome coding, non-coding, antisense and intergenic RNAs. But if the aim is to study only the coding messenger RNA (mRNA) transcripts, the processes for library preparation will differ.
RNA library preparation tends to be specific to the sequencing platform. In all RNA sequencing experiments, RNA is isolated and converted into cDNA. This is so that the information can be input into an NGS platform. Also, DNA is more stable than RNA and it allows for amplification using DNA polymerases. Once the cDNA library has been constructed, the molecules are fragmented and amplified where appropriate. Adapters are then added to each end of the fragments. Next, a selection strategy may be used to enrich the library for the type of RNA of interest.
rRNA is the most abundant component of total RNA isolated from human cells and tissues it comprises of up to 90% of an RNA sample. These must be removed from total RNA before sequencing to allow efficient gene detection. There are two main approaches the selection of polyadenylated RNA (polyA) using oligo primers, and depletion of rRNAs through hybridization capture followed by magnetic bead separation. PolyA selection is used for most transcriptome studies because it only requires a low sequencing depth. Targeted depletion of rRNA is particularly useful when studying transcripts that lack a polyA tail, such as non-coding RNAs or partially degraded transcripts.
The accuracy of the detection of particular RNA species is largely dependent on the nature of library construction. Each stage of the RNA sequencing library preparation can be manipulated to enhance the detection of certain transcripts, whilst limiting the ability to detect other transcripts. For example, a protocol modification that should be considered is the fragmentation time of the RNA if it is done before cDNA synthesis, it reduces strand-specific bias and provides a more accurate estimate of transcript abundance. Other possible improvements include the use of unique molecular identifiers (UMIs) to detect PCR duplicates and enhancing the analysis of degraded RNA, such as that obtained from formalin-fixed paraffin-embedded (FFPE) blocks.
RNA sequencing of large numbers of cells does not allow for detailed assessment of a single cell, or the individual nuclei that package the genome. This is a relatively new field as the first single cell RNA sequencing study was published in . Since then, there has been a growing interest in conducting similar studies. Now, there are a number of vendors that produce kits for single cell RNA sequencing, such as Illumina, ThermoFisher, Cellecta and New England Biolabs.
Single cell RNA sequencing is being used more frequently because there are multiple copies of most transcripts in all cells, and the cost of carrying out single cell RNA sequencing is much less than whole genome sequencing. In addition, sequencing single-cells provides better insights into the state of the cell at any given moment, allowing for analysis of gene expression. Assessments of transcriptional differences between individual cells have been used to identify rare cell populations. These may have remained undetected in pooled analysis, but now malignant tumor cells can be detected within a tumor mass, or single cells can be examined where each one is unique, like T-lymphocytes that express highly diverse T-cell receptors.
The single cell RNA library preparation procedure consists of isolating the single cells and disrupting these cells to allow for the capture of as many RNA molecules as possible. Primers are often used to enhance the capture of a specific RNA species, which are then converted into cDNA by a reverse transcriptase. The extremely small amount of cDNA needs to be amplified by PCR, which may introduce bias.
Currently, most of the costs associated with RNA sequencing are linked to cDNA preparation, but this is likely to follow NGS sequencing prices and decrease as RNA sequencing becomes more popular. The reduced costs will likely drive the trend of examining a larger number of individual cells in each study.
To find out more about the latest single-cell technologies, check out our Single-Cell and Spatial Buyers Guide, which you can download for free here.
Targeted sequencing is a broad category that includes any technique that is focused on specific genes everything from whole exome sequencing to small gene panels. Targeted gene sequencing produces a smaller and more manageable dataset, making analysis easier. Key genes of interest can be sequenced to a high depth, enabling the identification of rare variants. This usually provides cost effective findings for studies of disease-related genes.
Targeted gene panels have become popular in mainstream clinical care due to their relative affordability and focused application. They have been developed for studying many aspects of cancer, such as monitoring somatic changes and exploring the landscape of genetic aberrations to identify novel therapies or repurpose existing ones. These days, targeted gene panels are also being produced for liquid biopsies. These are non-invasive tests that can reflect all individual tumor mutations in real-time and hold promise for monitoring cancer initiation or relapse.
For further reading, the Liquid Biopsy report provides additional information about transformative technologies in the field.
Generating libraries for targeted DNA analysis requires an extra step target enrichment. It can be achieved through a variety of techniques, depending on several factors such as cost, ease-of-use and reproducibility.
Some key methods of enrichment are:
Hybridization capture-based target enrichment DNA is fragmented and prepared for sequencing as normal, by adding adapters and barcodes. The DNA is then hybridized into single-stranded probes that are biotinylated. These can then be recovered using streptavidin magnetic beads. This enrichment method has many advantages, including its scalability, the retainment of start-stop codons and the ability to detect duplicates. However, it does require multiple amplification steps before sequencing, creating a lengthy and complex workflow, inferring high costs.
Amplicon-based target enrichment Primers are used to amplify specific fragments of interest, enabling the simultaneous targeting of several regions, whilst only needing a limited amount of DNA input. It is possible to generate multiple products using multiplex PCR. PCR-based enrichment methods may not be ideal for targeting very large genomic regions due of the cost of primers and reagents, on top of the requirement for large DNA input amounts.
What is multiplex PCR?
Currently, the target enrichment methods that are typically used can be complex and lengthy sometimes workflows can take multiple days to complete. Novel target enrichment methods are continually being developed to increase the efficiency of library preparation for targeted sequencing.
Molecular inversion probes (MIP) These allow adjustable targeting of specific regions of the genome using a pair of single stranded DNA probes that contain sequences complementary to the target, joined by a loop linker. This loop ensures that the probes bind close to one another, reducing the chance that they will bind off-target. PCR is then used to close the circle by copying the target region. There are only four steps to the MIP process, making the workflow much simpler than other target enrichment methods. It is also easy to automate and is readily scalable.
Diagram of an MIP the single strand of DNA is joined by a link looper (orange) in between the probes (light blue). Image credit: T. Au Yeung,
The Linked Target Capture (LTC) has also been designed to reduce the hybridization processes to less than eight hours as it replaces the long existing methods with a combined target-capture-PCR workflow.
Here are some additional further reading resources relating to sample preparation for NGS:
Every Cell Matters
The Next Decade of Human Genomics
Advances of NGS technologies that will drive greater sequence output and higher sequence accuracy are inevitable. With the rise of single-cell sequencing in particular, never have new technologies been developed at such a rapid rate. Plus, with new options on the market for automation, the sample prep process is becoming increasingly more effective. The opportunities that this will provide for biomedical research will be hugely exciting, but no matter what sequencing platform is used or what the applications are, sample preparation remains at the forefront of every such successful experiment.
Download the Sequencing Buyers Guide here.
Image credit: Nucleus Biotech
The company is the world’s best NGS Library Preparation Kits supplier. We are your one-stop shop for all needs. Our staff are highly-specialized and will help you find the product you need.