The Basic Structure of DNA
We may hear a lot about deoxyribonucleic acid (DNA) and how scientists discover mutations and other information about the specific makeup of DNA; however, how this information is actually obtained may elude many. The process to determine the actual make-up of a strand of DNA is called DNA sequencing. To know how sequencing is done, it is important to know the basic structure or nature of DNA.
What Makes Up DNA
DNA is made up of bases (nitrogen-based molecules). There are four such bases that makeup DNA,
These bases are represented by the letters A, G, C, and T, respectively. In DNA, the bases are bound to a sugar molecule called deoxyribose. Another molecular group, phosphate, is also attached to the base. The three different connected molecules (the base, sugar, and phosphate) form what is called a nucleotide. A string or sequence of nucleotides forms a DNA strand.
DNA in living organisms is actually found as a connection of two strands and exists in a very well-defined twisted configuration called a double helix. The two strands are connected by way of bonds between bases. For example, guanine from one strand forms and bonds with the cytosine of the second strand. This formation is referred to as a base pair.
How DNA Sequencing Works
DNA sequencing works by reading the bases of any given DNA strand. The DNA bases are separated by size to make analysis easier for sequencing.
Background of DNA Sequencing
DNA sequencing has a long, varied history that continually evolves as technology advances. There are also different types of sequencing that can be used for DNA sequencing and analysis.
DNA sequencing is a biochemical method to determine the order of nucleotides in DNA. The technology of DNA sequencing began in the 1970s. The first methods that developed were the Maxam-Gilbert and Sanger methods. For DNA to be sequenced, the strands of the DNA double helix must be separated, a process called denaturation (usually by applying a high temperature). In the Maxam-Gilbert method, the basic process involves radioactively tagging or labeling the DNA by adding a phosphate molecule containing radioactive phosphorus. The DNA strand is then modified at certain locations followed by chemically cleaving the DNA at the sites where it was chemically modified.
The results are strands that correspond to the cleavage locations. The cleavages occur at 1 or 2 of the 4 possible nucleotides. Let´s say four reactions were employed and there is 1 tube per reaction. The contents of each tube are separated by using gel electrophoresis. Each tube´s contents have a lane on the gel. The DNA fragments of each tube are separated based on molecular weight. Because it is known where cleavage had to occur based on the reaction used, it would be known what nucleotide is represented by a band on the gel.
A sheet of radiographic (X-ray) film is exposed to the gel so that the radioactive bands can be seen. Starting at the bottom of the film, the first band(s) is located, and the nucleotide is identified based on the lane it is in (the reaction representing the lane). Let’s say that based on the reaction in question, the nucleotide representing that lane means that cleavage occurred at guanine. Then, the next sized band(s) above that is found and identified, then the next, and so on. Recognizing the absence or presence of fragments allows this identification process. Once reaching the top of the film and the last band(s), the whole sequence will be ascertained.
The Sanger sequencing method (ultimately favored over the Maxam-Gilbert method) was developed at about the same time as the Maxam-Gilbert method. The basis of this method is called the chain termination technique. With this method, elongation of DNA is terminated by a process using a special nucleotide called a dideoxynucleotide. Four reaction tubes are also employed here, and each one represents 1 of the 4 dideoxynucleotides (the regular nucleotides previously mentioned are deoxynucleotides). These nucleotides, unlike the regular nucleotides, lack the 3’-OH group that is necessary for bond formation between two nucleotides. The reaction mix here contains radiolabeled primers (short single-stranded pieces of DNA that start off the elongation reaction).
When a specified dideoxynucleotide is encountered, the elongation reaction stops there. What results is a termination of strand elongation leading to DNA fragments of various lengths. Each reaction is run on a gel also (and this is exposed to X-ray film). Each of the 4 lanes represents only 1 nucleotide making base calling easier. As described, the bands are identified from the bottom to the top of the film. This method was also considered more efficient than the Maxam-Gilbert method and more sequences can be read (up to 30,000 bases long).
Massive Parallel Sequencing/Next-Generation Sequencing
Massive parallel sequencing, also known as next-generation or second-generation sequencing, differs from Sanger sequencing in regards to the speed with which sequence information can be achieved. It only takes weeks to achieve the amount of sequence information that would take years with Sanger sequencing. Next-generation sequencing (NGS) also uses fewer DNA samples and is much more cost effective1.
Massive parallel sequencing refers to the simultaneous sequencing of millions of small fragments. This results in an enormous or massive amount of data to process. Millions to a billion bases of DNA sequence can result. It is estimated that approximately 250 gigabases per week can be sequenced using NGS technology2.
The formation of DNA sequencing libraries is an important part of NGS. Although automatic sequencing machines were developed to perform Sanger sequencing3, newer sequencers with NGS-based technology have been developed4, 5. One model method is the addition of a nucleotide during the DNA extension reaction, releasing a pyrophosphate molecule. This molecule is converted to adenosine triphosphate (the familiar ATP molecule). Using the ATP, florescent luciferin is converted to oxyluciferin generating an amount of light energy that is proportional to the amount of ATP present. It is this that the sequencer´s camera detects and is then analyzed. The described reactions are done on millions of DNA strands at once, thus the reason for the term massively parallel sequencing.
Massive Parallel Sequencing/Next-Generation Sequencing is highly parallel on a microscale, making it faster and more cost-effective. This process also uses a shorter DNA strand.
DNA Sequencing Machines
There are different machines capable of handling DNA sequencing.
Sanger Sequencing machines include:
- 3500xL Dx Genetic Analyzer
- 3500 Dx Genetic Analyzer
- Applied Biosystems 7900HT Fast Real-Time PCR System
Next-Generation Sequencing machines include:
- NovaSeq 6000
- PacBio Sequel Systems
It is clear that an immense amount of data can be generated by a single NGS run. When taking into account the numerous reactions being performed over a matter of just months, a mammoth amount of data results. The challenges are many, including data processing and interpretation, secure storing of the data, and translation to the applied science and medical arenas.
There are new bioinformatics programs available for NGS data analysis6, and more are to be developed. Another technological advance regarding NGS includes the development of sequencers such as Illumina´s HiSeq X, which can produce nearly 2 terabases of data from a single 3-day run, or the NovaSeq 6000 that can read 3 terabases in a single flow. These advances will impact the ability to more quickly and accurately make diagnoses regarding genetics-based diseases and help scientists to learn more about the connection between DNA, the environment, and how we function.
- Tucker T, Marra M, Friedman JM. Massively parallel sequencing: the next big thing in genetic medicine. American journal of human genetics. 2009;85(2):142-54.
- Lander ES. Initial impact of the sequencing of the human genome. Nature. 2011;470(7333):187-97.
- Martin WJ. New technologies for large-genome sequencing. Genome / National Research Council Canada = Genome / Conseil national de recherches Canada. 1989;31(2):1073-80
- Kato K. Impact of the next generation DNA sequencers. International journal of clinical and experimental medicine. 2009;2(2):193-202.
- Mukhopadhyay R. DNA sequencers: the next generation. Analytical chemistry. 2009;81(5):1736-40.
- Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, et al. A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in bioinformatics. 2014;15(2):256-78.