DNA Sequencing- Methods, Technology, and Scientific Applications

What DNA Sequencing Actually Is

DNA sequencing is the process of determining the exact order of nucleotides in a DNA molecule. Those nucleotides are adenine (A), guanine (G), cytosine (C), and thymine (T). Get the sequence wrong, and your entire analysis falls apart.

Scientists have been sequencing DNA since the 1970s. The methods have changed dramatically since then. What once took years now takes hours. But the core goal remains the same: read the genetic code accurately.

The Main Sequencing Methods

Sanger Sequencing

The original method. Frederick Sanger developed it in the 1970s, and it remained the gold standard for decades.

How it works: You mix normal nucleotides with modified, chain-terminating dideoxy nucleotides. When a ddNTP gets incorporated, the DNA chain stops. Run the products on a gel or capillary, and you can read the sequence.

Sanger is still used today for sequencing single genes or short fragments. It's accurate, but it's slow and expensive for large-scale work. If you need to sequence an entire genome with Sanger, you're looking at years of work and millions of dollars.

Next-Generation Sequencing (NGS)

NGS revolutionized the field when it emerged in the mid-2000s. The key difference: it sequences millions of DNA fragments simultaneously instead of one at a time.

Common NGS platforms include:

Each platform has trade-offs between read length, accuracy, speed, and cost.

Third-Generation Sequencing

This refers to methods that sequence individual DNA molecules in real time without amplification. PacBio and Oxford Nanopore are the main players here.

The biggest advantage is read length. Nanopore can produce reads over 1 million bases long. Compare that to Illumina, which typically produces reads of 100-300 bases. Long reads make it easier to assemble genomes and identify structural variants.

The trade-off is lower accuracy. Third-generation methods have error rates around 5-15%, compared to less than 1% for Illumina. Algorithms exist to correct these errors, but they add processing time.

Comparing Sequencing Technologies

Method Read Length Accuracy Speed Cost per Gb
Illumina 50-300 bp >99.9% 1-3 days $5-15
Ion Torrent 200-600 bp 98-99% 2-7 hours $10-20
PacBio HiFi 10-25 kb >99.9% 8-30 hours $50-150
Nanopore Up to 1+ Mb 85-95% Hours to days $10-50
Sanger Up to 1 kb >99.99% Hours $500-1000

Choose based on your project needs. Long-read assembly? Go PacBio or Nanopore. High-accuracy short variants? Illumina. Single gene? Sanger.

How DNA Sequencing Is Actually Used

Whole Genome Sequencing

You sequence the entire genome of an organism. Humans, bacteria, plants, whatever. The cost has dropped from billions to around $1,000-$2,000 per human genome with Illumina or Nanopore.

Researchers use this for de novo genome assembly, identifying all variant types (SNPs, indels, structural variants), and population genomics studies.

Whole Exome Sequencing

You sequence only the protein-coding regions, which make up about 1-2% of the genome. This costs less than whole genome sequencing and focuses on regions most likely to affect protein function.

Clinical labs use exome sequencing for diagnosing genetic diseases when gene panels come back negative.

Targeted Sequencing

You sequence specific genes or regions using hybrid capture or amplicon-based methods. This is the cheapest option for focused studies.

Oncology labs use targeted panels to identify mutations in cancer genes. Genetic testing companies use panels for carrier screening and hereditary disease testing.

RNA Sequencing (RNA-Seq)

You sequence the transcriptome — all the RNA molecules in a sample. This tells you which genes are being expressed and at what levels.

Researchers use RNA-Seq to compare healthy vs. diseased tissue, identify novel transcripts, and study gene expression changes in response to treatments.

Metagenomic Sequencing

You sequence DNA directly from environmental samples without culturing organisms. Soil, water, gut contents, whatever.

This is how researchers discovered most of the microbial diversity that can't be grown in labs. It's also used for pathogen detection in clinical samples.

Getting Started: Practical Steps

If you're setting up DNA sequencing in a lab, here's what you're actually dealing with:

Sample Preparation

Library Preparation

This is where most of the cost and hands-on time goes. Steps typically include:

Commercial kits make this more reproducible but they're expensive. Budget $200-500 per library depending on the method.

Sequencing

For Illumina:

For Nanopore:

Data Analysis

This is where most people underestimate the work. You'll need:

Plan for significant compute resources. A human genome analysis pipeline requires dozens of CPU cores and terabytes of storage.

The Bottom Line

DNA sequencing technology has matured significantly. The methods work. The accuracy is sufficient for most applications. The cost is manageable.

What trips people up is:

Know your goals before you start. Different projects require different approaches. A clinical diagnostic lab has different requirements than a research core facility, which has different requirements than a population genetics study.

Pick your platform based on read length, accuracy, and cost trade-offs. Not marketing claims. Not what everyone else is using. What your specific project actually needs.