How to Do Phylogenetic Tree- Complete Guide

What Is a Phylogenetic Tree?

A phylogenetic tree is a diagram showing how species or genes evolved from common ancestors. It maps relationships between organisms based on genetic or morphological similarities.

Biologists use these trees to understand evolution. They answer questions like: which species are most closely related? When did a particular trait evolve? How did pathogens branch out?

If you're working in genetics, microbiology, systematics, or evolutionary biology, you'll need to build one eventually. This guide tells you exactly how.

Types of Phylogenetic Trees

Not all trees are the same. The format you choose depends on what you're trying to show.

Rooted vs. Unrooted Trees

A rooted tree has a single common ancestor at the base. It shows direction of evolution and when lineages split.

An unrooted tree shows relationships without specifying a common ancestor. Useful when you're still figuring out outgroups.

Cladogram vs. Phylogram

A cladogram shows branching order only. Branch lengths don't represent time or amount of change.

A phylogram adjusts branch lengths to reflect genetic distance. Longer branches = more mutations.

ultrametric Trees

Branch tips align at the same level, representing equal time since divergence. Useful for molecular clock analyses.

Data You Need to Build a Phylogenetic Tree

You can't build a tree without data. Here's what works:

The quality of your tree depends almost entirely on the quality and quantity of your input data. Garbage in, garbage out.

How to Build a Phylogenetic Tree: Step by Step

Step 1: Collect and Align Your Sequences

Grab your sequences from databases like GenBank, NCBI, or UniProt. Format them as FASTA files.

Then align them. This is where most people screw up. Alignment quality determines tree quality. Use tools like:

Inspect your alignment manually. Remove poorly aligned regions. Check for conserved motifs.

Step 2: Choose a Substitution Model

DNA evolves at different rates and patterns. Your model accounts for this.

Common models include:

Most phylogenetic software can select the best model automatically using AIC or BIC criteria. Don't guess — let the software decide.

Step 3: Select a Tree-Building Method

Two main approaches exist. Each has trade-offs.

Distance-Based Methods

Calculate genetic distance between all pairs of sequences. Build a tree that best fits these distances.

UPGMA — assumes constant evolution rate. Rarely used for real data because this assumption is usually wrong.

Neighbor-Joining — doesn't assume a molecular clock. Faster. Good for initial exploration.

Character-Based Methods

These use the actual sequence data at each position, not just distances.

Maximum Likelihood (ML) — finds the tree most likely to produce your observed data given the evolutionary model. Best balance of accuracy and computational cost. Most researchers use this.

Bayesian Inference — calculates probability that a tree is correct given your data. Produces a posterior distribution of trees. Computationally intensive but often the most powerful method.

Maximum Parsimony — finds tree requiring fewest evolutionary changes. Simple but prone to long-branch attraction artifacts. Avoid for molecular data unless you have a specific reason.

Step 4: Assess Tree Support

A tree without support values is useless. You need to know how reliable each branch is.

Bootstrap resampling — most common. Resample columns of your alignment, rebuild tree, see if same branches appear. Values above 70% generally indicate reasonable support. Below 50% — treat that branch with skepticism.

Bayesian posterior probabilities — for Bayesian trees. Values above 0.95 indicate strong support.

Step 5: Visualize and Interpret Your Tree

Export your tree in Newick or Nexus format. Open it in:

Root your tree correctly. Include an outgroup — a species known to be distantly related to all others in your analysis. This determines tree orientation.

Phylogenetic Tree Software Comparison

Software Type Best For Cost Learning Curve
MEGA GUI Beginners, teaching, small datasets Free Low
RAxML Command-line Large datasets, ML analysis Free Medium
IQ-TREE Command-line ML with model testing, ultrafast bootstrap Free Medium
MrBayes Command-line Bayesian inference Free Medium-High
PHYML Command-line Fast ML analysis Free Medium
PAUP* GUI/Command-line Parsimony analysis Commercial High

Getting Started: Quick Workflow

Here's a practical starting point for beginners:

  1. Download sequences in FASTA format from NCBI
  2. Align with MAFFT online (no install needed)
  3. Open alignment in MEGA
  4. Select Model Test to find best substitution model
  5. Run Maximum Likelihood tree building
  6. Apply 100 bootstrap replicates for support
  7. Visualize in FigTree

This gets you a basic, defensible tree in under an hour.

Common Mistakes to Avoid

Advanced Considerations

Once you've mastered the basics, these areas will improve your trees:

Concatenation vs. coalescence — combine all genes into one supermatrix, or analyze each gene separately and compare trees? Coalescent methods are increasingly popular for species-level phylogenies.

Divergence time estimation — add fossil calibrations or known mutation rates to estimate when lineages split.

Species tree vs. gene tree — gene trees can disagree with species trees due to incomplete lineage sorting, hybridization, or horizontal transfer. Know which you're actually trying to reconstruct.

Whole genome approaches — for closely related taxa, whole-genome alignments or SNP-based methods outperform single-gene trees.

When to Use What

The Bottom Line

Building a phylogenetic tree isn't magic. It's a pipeline: align → model → infer → support → visualize. Each step has standard tools and accepted practices.

Start simple. Use MEGA or IQ-TREE. Get one working tree. Then refine from there.

Don't overthink the theory before you've built your first tree. You'll learn more from doing than from reading documentation.