DNA Sequencing- Methods, Technology, and Scientific Applications
What DNA Sequencing Actually Is
DNA sequencing is the process of determining the exact order of nucleotides in a DNA molecule. Those nucleotides are adenine (A), guanine (G), cytosine (C), and thymine (T). Get the sequence wrong, and your entire analysis falls apart.
Scientists have been sequencing DNA since the 1970s. The methods have changed dramatically since then. What once took years now takes hours. But the core goal remains the same: read the genetic code accurately.
The Main Sequencing Methods
Sanger Sequencing
The original method. Frederick Sanger developed it in the 1970s, and it remained the gold standard for decades.
How it works: You mix normal nucleotides with modified, chain-terminating dideoxy nucleotides. When a ddNTP gets incorporated, the DNA chain stops. Run the products on a gel or capillary, and you can read the sequence.
Sanger is still used today for sequencing single genes or short fragments. It's accurate, but it's slow and expensive for large-scale work. If you need to sequence an entire genome with Sanger, you're looking at years of work and millions of dollars.
Next-Generation Sequencing (NGS)
NGS revolutionized the field when it emerged in the mid-2000s. The key difference: it sequences millions of DNA fragments simultaneously instead of one at a time.
Common NGS platforms include:
- Illumina — uses fluorescently labeled nucleotides and imaging to read sequences
- Ion Torrent — detects hydrogen ions released during nucleotide incorporation
- Pacific Biosciences (PacBio) — uses single-molecule real-time (SMRT) sequencing with long reads
- Nanopore sequencing — DNA passes through a protein pore and changes the electrical signal
Each platform has trade-offs between read length, accuracy, speed, and cost.
Third-Generation Sequencing
This refers to methods that sequence individual DNA molecules in real time without amplification. PacBio and Oxford Nanopore are the main players here.
The biggest advantage is read length. Nanopore can produce reads over 1 million bases long. Compare that to Illumina, which typically produces reads of 100-300 bases. Long reads make it easier to assemble genomes and identify structural variants.
The trade-off is lower accuracy. Third-generation methods have error rates around 5-15%, compared to less than 1% for Illumina. Algorithms exist to correct these errors, but they add processing time.
Comparing Sequencing Technologies
| Method | Read Length | Accuracy | Speed | Cost per Gb |
|---|---|---|---|---|
| Illumina | 50-300 bp | >99.9% | 1-3 days | $5-15 |
| Ion Torrent | 200-600 bp | 98-99% | 2-7 hours | $10-20 |
| PacBio HiFi | 10-25 kb | >99.9% | 8-30 hours | $50-150 |
| Nanopore | Up to 1+ Mb | 85-95% | Hours to days | $10-50 |
| Sanger | Up to 1 kb | >99.99% | Hours | $500-1000 |
Choose based on your project needs. Long-read assembly? Go PacBio or Nanopore. High-accuracy short variants? Illumina. Single gene? Sanger.
How DNA Sequencing Is Actually Used
Whole Genome Sequencing
You sequence the entire genome of an organism. Humans, bacteria, plants, whatever. The cost has dropped from billions to around $1,000-$2,000 per human genome with Illumina or Nanopore.
Researchers use this for de novo genome assembly, identifying all variant types (SNPs, indels, structural variants), and population genomics studies.
Whole Exome Sequencing
You sequence only the protein-coding regions, which make up about 1-2% of the genome. This costs less than whole genome sequencing and focuses on regions most likely to affect protein function.
Clinical labs use exome sequencing for diagnosing genetic diseases when gene panels come back negative.
Targeted Sequencing
You sequence specific genes or regions using hybrid capture or amplicon-based methods. This is the cheapest option for focused studies.
Oncology labs use targeted panels to identify mutations in cancer genes. Genetic testing companies use panels for carrier screening and hereditary disease testing.
RNA Sequencing (RNA-Seq)
You sequence the transcriptome — all the RNA molecules in a sample. This tells you which genes are being expressed and at what levels.
Researchers use RNA-Seq to compare healthy vs. diseased tissue, identify novel transcripts, and study gene expression changes in response to treatments.
Metagenomic Sequencing
You sequence DNA directly from environmental samples without culturing organisms. Soil, water, gut contents, whatever.
This is how researchers discovered most of the microbial diversity that can't be grown in labs. It's also used for pathogen detection in clinical samples.
Getting Started: Practical Steps
If you're setting up DNA sequencing in a lab, here's what you're actually dealing with:
Sample Preparation
- Extract high-quality DNA — degradation ruins everything downstream
- Check your DNA with a fluorometer or bioanalyzer, not just a spectrophotometer
- For Illumina: fragment DNA to the right size range with acoustic shearing or enzymatic digestion
- For long-read sequencing: minimize shearing during extraction
Library Preparation
This is where most of the cost and hands-on time goes. Steps typically include:
- End repair and A-tailing
- Adapter ligation
- Size selection (agarose gels, beads, or automated systems)
- PCR amplification (if needed)
- Quality control with qPCR or fragment analysis
Commercial kits make this more reproducible but they're expensive. Budget $200-500 per library depending on the method.
Sequencing
For Illumina:
- Load the flow cell — proper loading density matters
- Run the sequencer — 1-3 days depending on read length and depth needed
- Monitor metrics during the run (cluster density, Q30 scores)
For Nanopore:
- Load the library into the flow cell
- Start the run — sequencing continues until you stop it
- Longer runs = more data, but you can stop when you have enough coverage
Data Analysis
This is where most people underestimate the work. You'll need:
- Quality control (FastQC is the standard tool)
- Trimming adapters and low-quality bases (Trimmomatic, cutadapt)
- Alignment to a reference genome (BWA-MEM2, minimap2)
- Variant calling (GATK, FreeBayes, DeepVariant)
- Annotation and interpretation
Plan for significant compute resources. A human genome analysis pipeline requires dozens of CPU cores and terabytes of storage.
The Bottom Line
DNA sequencing technology has matured significantly. The methods work. The accuracy is sufficient for most applications. The cost is manageable.
What trips people up is:
- Poor sample quality
- Underestimating library prep complexity
- Skimping on coverage depth
- Ignoring bioinformatics until after data is generated
Know your goals before you start. Different projects require different approaches. A clinical diagnostic lab has different requirements than a research core facility, which has different requirements than a population genetics study.
Pick your platform based on read length, accuracy, and cost trade-offs. Not marketing claims. Not what everyone else is using. What your specific project actually needs.