SQL is one of the most important tools in data science. Whether you are a student, a fresher, or someone who just wants to understand how data works, this tutorial will teach you SQL from zero to hero - in the simplest words possible.

Think of SQL like a language you use to talk to a database. Just like you use English to ask your teacher a question, you use SQL to ask a database for data. By the end of this tutorial, you will know how to store data, retrieve it, filter it, sort it, combine it, and even analyze it for real data science projects.

What is SQL?

SQL stands for Structured Query Language. It is a programming language designed specifically to manage and retrieve data from databases. SQL is used by data scientists, data analysts, software developers, and business intelligence teams all around the world. It is not just a skill - it is a superpower for anyone working with data.

Here is a simple analogy. Imagine a huge school library with thousands of books. Now imagine you want to find all books written by a specific author. You walk up to the librarian and say, "Please give me all books by Author X." In SQL, that conversation is replaced by a simple command. SQL is that librarian - fast, precise, and always available.

SQL is not case-sensitive, which means writing SELECT and select both work the same way. However, by convention, SQL keywords are always written in UPPERCASE to make the code easier to read.

Why SQL is Important in Data Science

Before going deeper into how to write SQL, it is important to understand why data scientists need SQL. In real-world data science jobs, data is almost never in a neat Excel file. It is stored in large databases - sometimes containing millions or billions of rows. SQL is the standard tool to access and work with this data efficiently.

Here is why every data scientist must know SQL:

  • Data lives in databases. Most companies store their customer, sales, and operational data in relational databases.
  • SQL handles big data well. SQL databases are optimized to run queries across millions of records in seconds.
  • It is easy to learn. SQL reads almost like English, making it one of the most beginner-friendly programming tools.
  • It is widely used. SQL is consistently ranked as one of the top skills required in data science job listings.
  • It works everywhere. SQL works in MySQL, PostgreSQL, SQL Server, Oracle, SQLite, and many more platforms.

Understanding Databases and Tables

Before writing your first SQL query, you need to understand two things: databases and tables.

A database is like a big folder on a computer that holds all the data of a company or application. Inside that folder, data is organized into tables. A table is just like a spreadsheet - it has rows and columns.

For example, a school database might have:

  • A table called Students (with columns like Name, Age, Grade, City)
  • A table called Marks (with columns like StudentID, Subject, Score)
  • A table called Teachers (with columns like TeacherID, Name, Subject)

Each row in a table is one record. Each column is one type of information. Together, rows and columns make up the data.

The type of database SQL works with is called a Relational Database. "Relational" means the tables in the database can be connected to each other using common columns (called keys). For example, the Students table and the Marks table can be connected using the StudentID column.

Setting Up Your SQL Environment

To practice SQL, you need a tool where you can write and run queries. Here are the most popular options:

  • MySQL - Free, widely used, great for beginners. Download from [mysql.com].
  • PostgreSQL - Free and very powerful. Great for data science use cases.
  • SQLite - Lightweight, no installation needed, works inside Python.
  • DB Fiddle / SQLiteOnline - Online tools where you can practice SQL in your browser with zero installation.

For beginners, the fastest way to start is to go to sqliteonline.com or sqlfiddle.com and start writing queries right in your browser.

SQL Command Types

SQL commands are divided into five main categories. Think of them as five different jobs SQL can do:

Category Full Name What It Does Example Commands
DDL Data Definition Language Creates and changes the database structure CREATE, ALTER, DROP, TRUNCATE
DML Data Manipulation Language Adds, updates, and deletes data INSERT, UPDATE, DELETE
DQL Data Query Language Retrieves data SELECT
DCL Data Control Language Controls access to data GRANT, REVOKE
TCL Transaction Control Language Manages transactions COMMIT, ROLLBACK, SAVEPOINT

For data science, you will mostly use DQL (SELECT) and some DML commands. Let's now learn each one step by step.

Creating Your First Database and Table (DDL)

Let's start from the very beginning - creating a database and a table.

CREATE DATABASE

CREATE DATABASE school_db;

This one line creates a brand new database called school_db. It is like creating a new folder on your computer.

USE (Select a Database)

Before you can work inside a database, you need to tell SQL which one to use:

USE school_db;

CREATE TABLE

Now let's create a Students table inside this database:


CREATE TABLE Students (
    StudentID   INT PRIMARY KEY,
    Name        VARCHAR(100),
    Age         INT,
    City        VARCHAR(50),
    Grade       VARCHAR(10)
);

Let's understand each part:

  • StudentID INT PRIMARY KEY - This is a number column. PRIMARY KEY means every student has a unique ID. No two students can have the same ID.
  • VARCHAR(100) - This stores text up to 100 characters.
  • INT - This stores whole numbers.

SQL Data Types

Data types tell SQL what kind of data a column will hold. Here are the most common ones:

Data Type What It Stores Example
INT Whole numbers 1, 25, 1000
VARCHAR(n) Text up to n characters "Rahul", "Delhi"
FLOAT / DECIMAL Decimal numbers 9.5, 23.75
DATE Date values 2024-04-01
BOOLEAN True or False values TRUE, FALSE

Inserting Data (DML – INSERT)

Now that the table exists, let's fill it with data using the INSERT INTO command.


INSERT INTO Students (StudentID, Name, Age, City, Grade)
VALUES (1, 'Rahul Sharma', 17, 'Delhi', 'A');

INSERT INTO Students (StudentID, Name, Age, City, Grade)
VALUES (2, 'Priya Verma', 16, 'Mumbai', 'B');

INSERT INTO Students (StudentID, Name, Age, City, Grade)
VALUES (3, 'Aman Singh', 18, 'Bangalore', 'A');

INSERT INTO Students (StudentID, Name, Age, City, Grade)
VALUES (4, 'Sneha Gupta', 17, 'Delhi', 'C');

INSERT INTO Students (StudentID, Name, Age, City, Grade)
VALUES (5, 'Rohit Jain', 16, 'Chennai', 'B');

Each INSERT statement adds one new row of data into the table. Think of it as filling one new row in a register.

Retrieving Data (DQL – SELECT)

The SELECT statement is the most important SQL command in data science. It is how you ask the database to show you data.

Select All Columns

SELECT * FROM Students;

The * means "all columns." This will show you all rows and all columns from the Students table.

Output:

StudentID Name Age City Grade
1 Rahul Sharma 17 Delhi A
2 Priya Verma 16 Mumbai B
3 Aman Singh 18 Bangalore A
4 Sneha Gupta 17 Delhi C
5 Rohit Jain 16 Chennai B

Select Specific Columns

SELECT Name, City FROM Students;

This shows only the Name and City columns. In data science, you often only need a few columns, not all of them.

DISTINCT – Remove Duplicates

SELECT DISTINCT City FROM Students;

DISTINCT removes duplicate values. In the output above, Delhi appears twice. Using DISTINCT, it will appear only once.

Output:

CityDelhiMumbaiBangaloreChennai

Filtering Data with WHERE

The WHERE clause lets you filter rows based on a condition. It is like saying,"Show me only the students who live in Delhi."

SELECT * FROM Students WHERE City = 'Delhi';

Output:

StudentID Name Age City Grade
1 Rahul Sharma 17 Delhi A
4 Sneha Gupta 17 Delhi C
Comparison Operators in WHERE

You can use many types of conditions in WHERE:

Operator Meaning Example
= Equal to City = 'Delhi'
!= or <> Not equal to Grade != 'C'
> Greater than Age > 16
< Less than Age < 18
>= Greater than or equal to Age >= 17
<= Less than or equal to Age <= 17

AND, OR, NOT

You can combine multiple conditions:

-- Students from Delhi with Grade ASELECT * FROM StudentsWHERE City = 'Delhi' AND Grade = 'A';
-- Students from Delhi or MumbaiSELECT * FROM StudentsWHERE City = 'Delhi' OR City = 'Mumbai';
-- Students not from DelhiSELECT * FROM StudentsWHERE NOT City = 'Delhi';

IN Operator

IN is a shortcut for checking multiple values at once:

SELECT * FROM StudentsWHERE City IN ('Delhi', 'Mumbai', 'Chennai');

BETWEEN Operator

BETWEEN filters values within a range:

SELECT * FROM StudentsWHERE Age BETWEEN 16 AND 17;

LIKE Operator

LIKE is used for pattern matching in text. It uses two wildcards:

  • % means zero or more characters
  • _ means exactly one character

-- Find students whose name starts with 'R' 
SELECT * FROM Students 
WHERE Name LIKE 'R%'; 

-- Find students whose name has exactly 5 characters 
SELECT * FROM Students 
WHERE Name LIKE '_____';

Sorting Data with ORDER BY

ORDER BY sorts your results. You can sort in ascending (A to Z, 1 to 100) or descending order (Z to A, 100 to 1).


-- Sort by Age from youngest to oldest (ASC is default)
SELECT * FROM Students
ORDER BY Age ASC;

-- Sort by Name in reverse alphabetical order
SELECT * FROM Students
ORDER BY Name DESC;

Limiting Results with LIMIT

In real data science, tables can have millions of rows. You don't want to load all of them every time. LIMIT lets you see only a few rows:


-- Show only the first 3 students
SELECT * FROM Students
LIMIT 3;

This is especially useful when you are exploring a new dataset and want to quickly see the first few rows - similar to df.head() in Python Pandas.

Updating and Deleting Data (DML)

UPDATE – Change Existing Data

The UPDATE command changes the value of an existing row:

UPDATE Students SET Grade = 'A' WHERE StudentID = 4;

This changes Sneha's grade from C to A. Always use WHERE with UPDATE, or else it will change ALL rows!

DELETE – Remove Rows

DELETE FROM Students WHERE StudentID = 5;

This removes Rohit's record. Again, always use WHERE with DELETE, or it will delete the entire table's data.

Aggregate Functions

Aggregate functions are used to calculate values across multiple rows. These are extremely important in data science for summarizing and analyzing data.

The five main aggregate functions are:

Function What It Does Example
COUNT() Counts the number of rows COUNT()
SUM() Adds up all values in a column SUM(Score)
AVG() Calculates the average AVG(Age)
MAX() Finds the highest value MAX(Score)
MIN() Finds the lowest value MIN(Age)

Let's create a Marks table and use these functions:


CREATE TABLE Marks (
    MarkID    INT PRIMARY KEY,
    StudentID INT,
    Subject   VARCHAR(50),
    Score     INT
);

INSERT INTO Marks VALUES (1, 1, 'Math', 85);
INSERT INTO Marks VALUES (2, 2, 'Math', 72);
INSERT INTO Marks VALUES (3, 3, 'Math', 91);
INSERT INTO Marks VALUES (4, 4, 'Math', 65);
INSERT INTO Marks VALUES (5, 5, 'Math', 78);

Now use aggregate functions:

-- Total number of recordsSELECT COUNT(*) AS TotalStudents FROM Marks;
-- Average score in MathSELECT AVG(Score) AS AverageScore FROM Marks;
-- Highest and lowest scoreSELECT MAX(Score) AS Highest, MIN(Score) AS Lowest FROM Marks;
-- Total of all scoresSELECT SUM(Score) AS TotalScore FROM Marks;

The keyword AS gives the output column a custom name. This is called an alias. For example, COUNT(*) AS TotalStudents names the output column "TotalStudents."

GROUP BY and HAVING

GROUP BY

GROUP BY groups rows that have the same value and then applies an aggregate function on each group. Think of it as organizing data into categories and then calculating something for each category.

Let's add more data to the Marks table:

INSERT INTO Marks VALUES (6, 1, 'Science', 90);INSERT INTO Marks VALUES (7, 2, 'Science', 68);INSERT INTO Marks VALUES (8, 3, 'Science', 88);

Now find the average score per subject:

SELECT Subject, AVG(Score) AS AvgScoreFROM MarksGROUP BY Subject;

Output:

Subject AvgScore
Math 78.2
Science 82.0

HAVING

HAVING is like WHERE, but it works after GROUP BY. You use it to filter grouped results.

-- Show only subjects where average score is above 80SELECT Subject, AVG(Score) AS AvgScoreFROM MarksGROUP BY SubjectHAVING AVG(Score) > 80;

Key rule: Use WHERE to filter rows before grouping. Use HAVING to filter groups after grouping.

SQL JOINs - Combining Tables

This is one of the most powerful features of SQL. JOINs are used to combine data from two or more tables using a common column.

Think of it like this: You have a table of students with their IDs and names, and another table of marks with student IDs and scores. A JOIN brings these two tables together so you can see each student's name along with their score.

There are four main types of JOINs:

INNER JOIN

Returns only the rows that have matching values in both tables.

SELECT Students.Name, Marks.Subject, Marks.ScoreFROM StudentsINNER JOIN Marks ON Students.StudentID = Marks.StudentID;

Output: Only students who have a mark entry will appear.

LEFT JOIN

Returns all rows from the left (first) table, and matching rows from the right table. If no match, the right side shows NULL.

SELECT Students.Name, Marks.Subject, Marks.ScoreFROM StudentsLEFT JOIN Marks ON Students.StudentID = Marks.StudentID;

Even if a student has no marks, they will still appear in the results.

RIGHT JOIN

The opposite of LEFT JOIN. Returns all rows from the right table and matching rows from the left.

SELECT Students.Name, Marks.Subject, Marks.ScoreFROM StudentsRIGHT JOIN Marks ON Students.StudentID = Marks.StudentID;

FULL OUTER JOIN

Returns all rows from both tables. Where there is no match, NULLs are shown.

SELECT Students.Name, Marks.Subject, Marks.ScoreFROM StudentsFULL OUTER JOIN Marks ON Students.StudentID = Marks.StudentID;
Visual Summary of JOINs
JOIN Type What it Returns
INNER JOIN Only matching rows from both tables
LEFT JOIN All rows from the left table + matching rows from the right
RIGHT JOIN All rows from the right table + matching rows from the left
FULL OUTER JOIN All rows from both tables

Subqueries – Queries Inside Queries

A subquery is a query written inside another query. It is like asking a question inside a question. You put the inner query inside parentheses ().

-- Find students whose score is above the average scoreSELECT StudentID, ScoreFROM MarksWHERE Score > (SELECT AVG(Score) FROM Marks);

Here, the inner query (SELECT AVG(Score) FROM Marks) runs first and finds the average score. Then the outer query uses that average to filter results.

Subquery in FROM Clause

You can also use a subquery as a temporary table:


SELECT Name, Score
FROM Students
JOIN (
    SELECT StudentID, MAX(Score) AS Score
    FROM Marks
    GROUP BY StudentID
) AS TopScores
ON Students.StudentID = TopScores.StudentID;

This finds the highest score for each student and shows it alongside their name.

CASE Statement – IF-ELSE in SQL

The CASE statement works like an IF-ELSE condition in other programming languages. It lets you create conditional logic inside a query.


SELECT Name, Grade,
    CASE
        WHEN Grade = 'A' THEN 'Excellent'
        WHEN Grade = 'B' THEN 'Good'
        WHEN Grade = 'C' THEN 'Average'
        ELSE 'Needs Improvement'
    END AS Performance
FROM Students;

Output:

Name Grade Performance
Rahul Sharma A Excellent
Priya Verma B Good
Aman Singh A Excellent
Sneha Gupta C Average

In data science, CASE statements are very useful to create new categories or labels from existing data - a process called feature engineering.

SQL String Functions

SQL has built-in functions to work with text data. These are very useful when cleaning and preparing data for analysis.


-- Convert name to uppercase
SELECT UPPER(Name) FROM Students;

-- Convert name to lowercase
SELECT LOWER(Name) FROM Students;

-- Get length of the name
SELECT Name, LENGTH(Name) AS NameLength FROM Students;

-- Extract first 5 characters
SELECT SUBSTRING(Name, 1, 5) AS ShortName FROM Students;

-- Replace a word in text
SELECT REPLACE(City, 'Delhi', 'New Delhi') FROM Students;

-- Remove spaces from beginning and end
SELECT TRIM('  Rahul  ') AS CleanName;

-- Combine two columns into one
SELECT CONCAT(Name, ' - ', City) AS NameCity FROM Students;

SQL Date and Time Functions

Working with dates is a very common task in data science. Here are some useful date functions:


-- Get today's date
SELECT CURDATE();

-- Get current date and time
SELECT NOW();

-- Extract year from a date
SELECT YEAR('2024-04-01') AS YearOnly;

-- Extract month
SELECT MONTH('2024-04-01') AS MonthOnly;

-- Calculate difference between two dates
SELECT DATEDIFF('2024-12-31', '2024-01-01') AS DaysBetween;

Date functions are especially useful when analyzing time-series data, sales trends, or user activity logs.

NULL Values in SQL

A NULL value means there is no data in that cell. It is not zero. It is not an empty string. It simply means "no value."

In data science, you will often deal with missing or NULL data. Here is how to handle it in SQL:


-- Find students with no Grade assigned
SELECT * FROM Students
WHERE Grade IS NULL;

-- Find students who have a Grade
SELECT * FROM Students
WHERE Grade IS NOT NULL;

-- Replace NULL with a default value using COALESCE
SELECT Name, COALESCE(Grade, 'Not Assigned') AS Grade
FROM Students;

COALESCE returns the first non-NULL value. It is very useful for cleaning messy datasets.

Common Table Expressions (CTEs)

A CTE (Common Table Expression) is a temporary result set that you can reference within a query. It is written using the WITH keyword and makes your queries much easier to read and maintain.


WITH TopStudents AS (
    SELECT StudentID, AVG(Score) AS AvgScore
    FROM Marks
    GROUP BY StudentID
    HAVING AVG(Score) >= 80
)
SELECT Students.Name, TopStudents.AvgScore
FROM Students
JOIN TopStudents ON Students.StudentID = TopStudents.StudentID;

Think of a CTE like giving a name to a mini-query so you can reuse it. CTEs are much cleaner than writing long nested subqueries. In data science projects, CTEs are used heavily for step-by-step data transformation.

Window Functions - Advanced Data Analysis

Window functions are one of the most powerful SQL features for data science. Unlike aggregate functions that collapse all rows into one result, window functions calculate values across rows while keeping each row separate.

You use window functions with the OVER() clause:


-- Rank students by score within each subject
SELECT
    StudentID,
    Subject,
    Score,
    RANK() OVER (PARTITION BY Subject ORDER BY Score DESC) AS SubjectRank
FROM Marks;

Here, PARTITION BY Subject creates a separate window for each subject, and ORDER BY Score DESC ranks within that window.

Common Window Functions

Function What It Does
ROW_NUMBER() Assigns a unique sequential number to each row
RANK() Ranks rows; ties get the same rank with gaps
DENSE_RANK() Ranks rows; ties get the same rank without gaps
SUM() OVER() Running total across rows
AVG() OVER() Moving average across rows
LAG() Access the value from the previous row
LEAD() Access the value from the next row
Running Total Example
SELECT    StudentID,    Score,    SUM(Score) OVER (ORDER BY StudentID) AS RunningTotalFROM Marks;

This gives a running (cumulative) total of scores. This is very useful in financial data analysis and time-series analytics.

Views - Saving Queries as Virtual Tables

A VIEW is a saved SQL query that you can treat like a real table. Once you create a view, you can query it anytime without writing the original query again.


-- Create a view that shows student names with their scores
CREATE VIEW StudentScores AS
SELECT Students.Name, Marks.Subject, Marks.Score
FROM Students
INNER JOIN Marks ON Students.StudentID = Marks.StudentID;

-- Use the view like a table
SELECT * FROM StudentScores
WHERE Score > 80;

Views are useful in data science when you have a complex join or transformation that you need to use repeatedly. Instead of writing the same 20-line query every day, you just call the view.

Indexes - Making SQL Faster

When a table has millions of rows, searching through all of them is slow. An INDEX is a special data structure that speeds up search queries.

-- Create an index on the City columnCREATE INDEX idx_city ON Students(City);

Now when you run WHERE City = 'Delhi', SQL will use the index to find results much faster - just like an index at the back of a textbook.

When to use an index:

  • On columns you frequently use in WHERE clauses
  • On columns used in JOIN conditions
  • On columns used in ORDER BY

When not to over-index:

  • Too many indexes slow down INSERT, UPDATE, and DELETE operations
  • Index only the columns you really need

SQL for Real Data Science Projects

Now that you know all the key SQL concepts, let's see how they come together in a real data science workflow.

Step 1 – Understand the Data

Before writing any query, explore the tables:


-- See all tables in the database
SHOW TABLES;

-- See the structure of a table
DESCRIBE Students;

-- Quick preview of the data
SELECT * FROM Students LIMIT 5;

Step 2 – Data Profiling

Check for missing values, duplicates, and data quality issues:


-- Count total rows
SELECT COUNT(*) FROM Students;

-- Count rows with NULL City
SELECT COUNT(*) FROM Students WHERE City IS NULL;

-- Check for duplicate names
SELECT Name, COUNT(*) AS Count
FROM Students
GROUP BY Name
HAVING COUNT(*) > 1;

Step 3 - Data Cleaning

Fix or handle bad data:


-- Replace NULLs in Grade with 'Unknown'
UPDATE Students SET Grade = 'Unknown' WHERE Grade IS NULL;

-- Delete duplicate rows (keep the one with lower ID)
DELETE FROM Students
WHERE StudentID NOT IN (
    SELECT MIN(StudentID)
    FROM Students
    GROUP BY Name, City
);

Step 4 - Exploratory Data Analysis (EDA)

Use SQL to answer business questions:


-- How many students are from each city?
SELECT City, COUNT(*) AS StudentCount
FROM Students
GROUP BY City
ORDER BY StudentCount DESC;

-- What is the average score per grade?
SELECT s.Grade, AVG(m.Score) AS AvgScore
FROM Students s
JOIN Marks m ON s.StudentID = m.StudentID
GROUP BY s.Grade
ORDER BY AvgScore DESC;

Step 5 – Feature Engineering

Create new meaningful columns for your analysis:


SELECT
    StudentID,
    Score,
    CASE
        WHEN Score >= 90 THEN 'High Performer'
        WHEN Score >= 75 THEN 'Average Performer'
        ELSE 'Low Performer'
    END AS PerformanceCategory
FROM Marks;

SQL vs Python for Data Science

Many beginners ask: "Should I use SQL or Python (Pandas) for data science?" The answer is: both are needed, and they work together.

Task SQL Python (Pandas)
Querying large databases ✅ Better ❌ Slower
Data cleaning ✅ Good ✅ Better
Machine learning models ❌ Not for this ✅ Better
Visualization ❌ Limited ✅ Better
Joins and aggregations ✅ Excellent ✅ Good
Working with billions of rows ✅ Built for this ❌ Struggles

In a typical data science project, you use SQL to extract and prepare the data from a database, and then use Python or R to model, analyze, and visualize it.

SQL Best Practices for Data Scientists

Following good habits makes your SQL code cleaner, faster, and easier to understand.

  1. Always use WHERE with UPDATE and DELETE - Without it, you will accidentally change or delete all rows.
  2. Use aliases - SELECT AVG(Score) AS AvgScore is much clearer than just AVG(Score).
  3. Avoid SELECT * in production - Always specify the columns you need. It is faster and more intentional.
  4. Use CTEs for complex queries - CTEs are much easier to read than deeply nested subqueries.
  5. Comment your code - Use -- to add comments and explain what your query does.
  6. Test on small data first - Use LIMIT to preview results before running a query on millions of rows.
  7. Understand your data before querying - Use DESCRIBE and SELECT * LIMIT 5 to explore before diving deep.
  8. Use indexes on large tables - Indexes make filtering and joining on large tables dramatically faster.

SQL Learning Roadmap

Here is a clear roadmap to go from beginner to advanced in SQL for data science:

Level 1 – Beginner (Week 1-2)

  • Understand databases and tables
  • Learn SELECT, WHERE, ORDER BY, LIMIT
  • Practice INSERT, UPDATE, DELETE
  • Learn data types and constraints

Level 2 – Intermediate (Week 3-4)

  • Master aggregate functions (COUNT, SUM, AVG, MAX, MIN)
  • Learn GROUP BY and HAVING
  • Understand all four types of JOINs
  • Practice LIKE, IN, BETWEEN, NULL handling

Level 3 – Advanced (Week 5-6)

  • Write subqueries and nested queries
  • Master CTEs (WITH clause)
  • Learn window functions (RANK, ROW_NUMBER, LAG, LEAD)
  • Understand views and indexes
  • Practice on real datasets (Kaggle, public databases)

Level 4 – Expert (Week 7+)

  • Stored procedures and triggers
  • Performance optimization and execution plans
  • Working with large datasets and partitioned tables
  • Connecting SQL with Python (using pandas.read_sql() or SQLAlchemy)

Practice Problems

The best way to learn SQL is by practicing. Here are some exercises you can try on the Students and Marks tables:

  1. Find all students who scored more than 80 marks.
  2. Find the student with the highest average score across all subjects.
  3. Count how many students are in each grade category (A, B, C).
  4. Find all students who have NOT submitted any marks (use LEFT JOIN).
  5. Display each student's name along with their rank based on their total score.
  6. Find the top 2 scoring students in each subject using window functions.
  7. Calculate the percentage of students from each city out of the total.
  8. Find students whose name starts with 'R' and score is between 70 and 90.

You can practice these on SQLZoo, DataLemur, or SQLBolt for free.

Quick Reference: Most Used SQL Commands

Here is a handy cheat sheet of the most important SQL commands in data science:


-- Select data
SELECT column1, column2 FROM table_name;

-- Filter data
WHERE condition;

-- Sort data
ORDER BY column ASC/DESC;

-- Limit rows
LIMIT n;

-- Remove duplicates
SELECT DISTINCT column FROM table;

-- Aggregate functions
COUNT(*), SUM(col), AVG(col), MAX(col), MIN(col)

-- Group data
GROUP BY column HAVING condition;

-- Join tables
INNER JOIN table2 ON table1.id = table2.id;

-- Subquery
WHERE column > (SELECT AVG(column) FROM table);

-- CTE
WITH cte_name AS (SELECT ...) SELECT * FROM cte_name;

-- Window function
RANK() OVER (PARTITION BY col ORDER BY col2 DESC)

-- Create view
CREATE VIEW view_name AS SELECT ...;

-- Handle NULLs
COALESCE(column, 'default')

-- Conditional logic
CASE WHEN ... THEN ... ELSE ... END

Conclusion

SQL is not just a tool - it is the language of data. Every data scientist, no matter how advanced, uses SQL daily to explore, clean, and prepare data. It is simple enough for beginners to learn in a week, yet powerful enough for experts to use on billion-row datasets. Start with the basics, practice on real data, and keep building complexity step by step. The more you practice SQL, the more confident and capable you will become as a data scientist.