DNA Sequencing- Methods, Technology, and Scientific Applications

What DNA Sequencing Actually Is

DNA sequencing is the process of determining the exact order of nucleotides in a DNA molecule. Those nucleotides are adenine (A), guanine (G), cytosine (C), and thymine (T). Get the sequence wrong, and your entire analysis falls apart.

Scientists have been sequencing DNA since the 1970s. The methods have changed dramatically since then. What once took years now takes hours. But the core goal remains the same: read the genetic code accurately.

The Main Sequencing Methods

Sanger Sequencing

The original method. Frederick Sanger developed it in the 1970s, and it remained the gold standard for decades.

How it works: You mix normal nucleotides with modified, chain-terminating dideoxy nucleotides. When a ddNTP gets incorporated, the DNA chain stops. Run the products on a gel or capillary, and you can read the sequence.

Sanger is still used today for sequencing single genes or short fragments. It's accurate, but it's slow and expensive for large-scale work. If you need to sequence an entire genome with Sanger, you're looking at years of work and millions of dollars.

Next-Generation Sequencing (NGS)

NGS revolutionized the field when it emerged in the mid-2000s. The key difference: it sequences millions of DNA fragments simultaneously instead of one at a time.

Common NGS platforms include:

Illumina — uses fluorescently labeled nucleotides and imaging to read sequences
Ion Torrent — detects hydrogen ions released during nucleotide incorporation
Pacific Biosciences (PacBio) — uses single-molecule real-time (SMRT) sequencing with long reads
Nanopore sequencing — DNA passes through a protein pore and changes the electrical signal

Each platform has trade-offs between read length, accuracy, speed, and cost.

Third-Generation Sequencing

This refers to methods that sequence individual DNA molecules in real time without amplification. PacBio and Oxford Nanopore are the main players here.

The biggest advantage is read length. Nanopore can produce reads over 1 million bases long. Compare that to Illumina, which typically produces reads of 100-300 bases. Long reads make it easier to assemble genomes and identify structural variants.

The trade-off is lower accuracy. Third-generation methods have error rates around 5-15%, compared to less than 1% for Illumina. Algorithms exist to correct these errors, but they add processing time.

Comparing Sequencing Technologies

Method	Read Length	Accuracy	Speed	Cost per Gb
Illumina	50-300 bp	>99.9%	1-3 days	$5-15
Ion Torrent	200-600 bp	98-99%	2-7 hours	$10-20
PacBio HiFi	10-25 kb	>99.9%	8-30 hours	$50-150
Nanopore	Up to 1+ Mb	85-95%	Hours to days	$10-50
Sanger	Up to 1 kb	>99.99%	Hours	$500-1000

Choose based on your project needs. Long-read assembly? Go PacBio or Nanopore. High-accuracy short variants? Illumina. Single gene? Sanger.

How DNA Sequencing Is Actually Used

Whole Genome Sequencing

You sequence the entire genome of an organism. Humans, bacteria, plants, whatever. The cost has dropped from billions to around $1,000-$2,000 per human genome with Illumina or Nanopore.

Researchers use this for de novo genome assembly, identifying all variant types (SNPs, indels, structural variants), and population genomics studies.

Whole Exome Sequencing

You sequence only the protein-coding regions, which make up about 1-2% of the genome. This costs less than whole genome sequencing and focuses on regions most likely to affect protein function.

Clinical labs use exome sequencing for diagnosing genetic diseases when gene panels come back negative.

Targeted Sequencing

You sequence specific genes or regions using hybrid capture or amplicon-based methods. This is the cheapest option for focused studies.

Oncology labs use targeted panels to identify mutations in cancer genes. Genetic testing companies use panels for carrier screening and hereditary disease testing.

RNA Sequencing (RNA-Seq)

You sequence the transcriptome — all the RNA molecules in a sample. This tells you which genes are being expressed and at what levels.

Researchers use RNA-Seq to compare healthy vs. diseased tissue, identify novel transcripts, and study gene expression changes in response to treatments.

Metagenomic Sequencing

You sequence DNA directly from environmental samples without culturing organisms. Soil, water, gut contents, whatever.

This is how researchers discovered most of the microbial diversity that can't be grown in labs. It's also used for pathogen detection in clinical samples.

Getting Started: Practical Steps

If you're setting up DNA sequencing in a lab, here's what you're actually dealing with:

Sample Preparation

Extract high-quality DNA — degradation ruins everything downstream
Check your DNA with a fluorometer or bioanalyzer, not just a spectrophotometer
For Illumina: fragment DNA to the right size range with acoustic shearing or enzymatic digestion
For long-read sequencing: minimize shearing during extraction

Library Preparation

This is where most of the cost and hands-on time goes. Steps typically include:

End repair and A-tailing
Adapter ligation
Size selection (agarose gels, beads, or automated systems)
PCR amplification (if needed)
Quality control with qPCR or fragment analysis

Commercial kits make this more reproducible but they're expensive. Budget $200-500 per library depending on the method.

Sequencing

For Illumina:

Load the flow cell — proper loading density matters
Run the sequencer — 1-3 days depending on read length and depth needed
Monitor metrics during the run (cluster density, Q30 scores)

For Nanopore:

Load the library into the flow cell
Start the run — sequencing continues until you stop it
Longer runs = more data, but you can stop when you have enough coverage

Data Analysis

This is where most people underestimate the work. You'll need:

Quality control (FastQC is the standard tool)
Trimming adapters and low-quality bases (Trimmomatic, cutadapt)
Alignment to a reference genome (BWA-MEM2, minimap2)
Variant calling (GATK, FreeBayes, DeepVariant)
Annotation and interpretation

Plan for significant compute resources. A human genome analysis pipeline requires dozens of CPU cores and terabytes of storage.

The Bottom Line

DNA sequencing technology has matured significantly. The methods work. The accuracy is sufficient for most applications. The cost is manageable.

What trips people up is:

Poor sample quality
Underestimating library prep complexity
Skimping on coverage depth
Ignoring bioinformatics until after data is generated

Know your goals before you start. Different projects require different approaches. A clinical diagnostic lab has different requirements than a research core facility, which has different requirements than a population genetics study.

Pick your platform based on read length, accuracy, and cost trade-offs. Not marketing claims. Not what everyone else is using. What your specific project actually needs.