Learn SQL- From Basics to Advanced
What SQL Actually Is (And Why You Can't Avoid It)
SQL stands for Structured Query Language. It's the standard language for talking to relational databases. If you're working with data of any kind, you'll encounter SQL eventually.
Most data jobs require it: data analysis, data science, backend development, even marketing analysts pulling reports. This guide covers everything from writing your first query to advanced optimization techniques. No fluff.
Setting Up Your SQL Environment
Before writing queries, you need somewhere to run them. Here are your options:
- Local databases β MySQL, PostgreSQL, SQLite. Full control, runs on your machine.
- Browser-based sandboxes β W3Schools, SQLZoo, Mode Analytics. No install required, good for practice.
- Cloud platforms β BigQuery, Snowflake, Amazon Redshift. What most companies actually use in production.
For beginners, start with SQLite or a browser sandbox. Install nothing until you know you need more.
SQL Syntax Basics
SQL reads almost like English. The core pattern:
SELECT column_name
FROM table_name
WHERE condition;
That's it. Most queries follow this structure. Learn this pattern first, then expand.
The Core Clauses
- SELECT β which columns to retrieve
- FROM β which table to query
- WHERE β filtering conditions
- ORDER BY β sorting results
- LIMIT β restricting row count
Example query:
SELECT name, email, signup_date
FROM users
WHERE status = 'active'
ORDER BY signup_date DESC
LIMIT 10;
This grabs the 10 most recent active users. Simple, readable, useful.
Filtering and Conditional Logic
The WHERE clause is where most filtering happens. Common operators:
- = (equals)
- != or <> (not equals)
- > < >= <= (comparison)
- IN (matches any value in a list)
- LIKE (pattern matching)
- IS NULL / IS NOT NULL (null checks)
Example with LIKE:
SELECT * FROM products
WHERE product_name LIKE '%coffee%';
This finds any product with "coffee" anywhere in the name. The % is a wildcard.
Combining Conditions
Use AND and OR to combine conditions. Watch operator precedenceβuse parentheses to be explicit:
SELECT * FROM orders
WHERE (status = 'shipped' AND amount > 100)
OR (status = 'pending' AND amount > 500);
Aggregations and Grouping
Raw data is rarely the end goal. You need summaries. SQL has built-in aggregate functions:
- COUNT() β row count
- SUM() β total
- AVG() β average
- MIN/MAX() β extremes
Aggregate functions require GROUP BY when used with non-aggregated columns:
SELECT category, COUNT(*) as product_count, AVG(price) as avg_price
FROM products
GROUP BY category;
This gives you product counts and average prices per category.
HAVING vs WHERE
Both filter, but at different stages:
- WHERE β filters rows before aggregation
- HAVING β filters groups after aggregation
SELECT customer_id, SUM(amount) as total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 1000;
This finds customers who spent over $1,000 total.
JOINs β Connecting Tables
Real databases split data across multiple tables. JOINs combine them. This is where most beginners struggle.
Types of JOINs
- INNER JOIN β only matching rows from both tables
- LEFT JOIN β all rows from left table, matching from right (nulls for no match)
- RIGHT JOIN β opposite of left (rarely used)
- FULL OUTER JOIN β all rows from both tables
Example INNER JOIN:
SELECT orders.order_id, customers.name, orders.amount
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;
This links orders to customer names using the customer_id relationship.
Multiple JOINs
You can chain JOINs for more complex relationships:
SELECT o.order_id, c.name, p.product_name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN products p ON o.product_id = p.id;
Aliases (o, c, p) make queries shorter and more readable.
Subqueries
A subquery is a query nested inside another query. Use them when you need to filter based on calculated values:
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
This finds employees earning above average. The subquery calculates the average first, then the outer query filters.
Subqueries in SELECT
You can also embed subqueries in the SELECT clause:
SELECT name, salary,
(SELECT COUNT(*) FROM employees WHERE department_id = e.department_id) as dept_count
FROM employees e;
This includes the department headcount for each employee.
Advanced SQL Techniques
Window Functions
Window functions perform calculations across a set of rows related to the current row. Unlike GROUP BY, they don't collapse rows.
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;
This ranks employees by salary within each department. The OVER clause defines the window.
Common window functions:
- RANK() β ranking with gaps
- ROW_NUMBER() β sequential numbering
- LAG/LEAD() β previous/next row values
- SUM()/AVG() OVER β running totals or averages
Common Table Expressions (CTEs)
CTEs make complex queries readable. They're temporary named result sets:
WITH high_value_customers AS (
SELECT customer_id, SUM(amount) as total
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 5000
)
SELECT c.name, hvc.total
FROM high_value_customers hvc
INNER JOIN customers c ON hvc.customer_id = c.id;
CTEs break complex logic into readable chunks. Use them when your query gets hard to follow.
Conditional Aggregation
Use CASE WHEN inside aggregate functions for conditional sums or counts:
SELECT
COUNT(*) as total_orders,
COUNT(CASE WHEN status = 'completed' THEN 1 END) as completed,
COUNT(CASE WHEN status = 'cancelled' THEN 1 END) as cancelled
FROM orders;
This gives you totals and breakdowns in one query.
Query Optimization Basics
Slow queries kill performance. Here's what actually matters:
- Index columns used in WHERE, JOIN, and ORDER BY clauses
- Select only columns you need β avoid SELECT * in production
- Filter early β apply WHERE conditions before JOINs when possible
- Understand execution order β FROM β WHERE β GROUP BY β HAVING β SELECT β ORDER BY
Use EXPLAIN or EXPLAIN ANALYZE (depending on your database) to see how your query executes:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;
This shows the query plan and identifies bottlenecks.
SQL Tools Comparison
| Tool | Best For | Cost | Learning Curve |
|---|---|---|---|
| PostgreSQL | General purpose, complex data types | Free | Medium |
| MySQL | Web applications, CMS platforms | Free | Low |
| SQLite | Local apps, testing, learning | Free | Low |
| BigQuery | Large datasets, cloud analytics | Pay per query | Medium |
| Snowflake | Enterprise data warehousing | Credit-based | Medium |
Getting Started: Your First Real Query
Here's a practical exercise. Create this setup and run the queries:
-- Create a simple table
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT,
department TEXT,
salary INTEGER,
hire_date TEXT
);
-- Insert sample data
INSERT INTO employees VALUES
(1, 'Alice', 'Engineering', 95000, '2021-03-15'),
(2, 'Bob', 'Engineering', 87000, '2020-07-22'),
(3, 'Carol', 'Marketing', 72000, '2022-01-10'),
(4, 'Dave', 'Engineering', 110000, '2019-11-05'),
(5, 'Eve', 'Marketing', 68000, '2021-09-30');
-- Query 1: Average salary by department
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;
-- Query 2: Employees earning above company average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Query 3: Rank employees by salary within their department
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
Run these in SQLite or any online sandbox. Once you understand the output, you're ready to apply this to real data.
What to Learn Next
After mastering the basics here, focus on what your specific job requires:
- Data analysis β window functions, complex aggregations, CTEs
- Backend development β indexing, query optimization, transactions
- Data engineering β SQL dialects across platforms, ETL patterns
Pick one database (PostgreSQL is the safest bet), install it, and practice on real datasets. Kaggle has free datasets you can import and query.