A set of related information.
Something of interest to the database user community. Examples include customers, parts, geographic locations, etc.
An individual piece of data stored in a table.
A set of columns that together completely describe an entity or some action on an entity. Also called a record or a tuple.
A set of rows, held either in memory (nonpersistent) or on permanent storage (persistent).
Another name for a nonpersistent table, generally the result of an SQL query.
One or more columns that can be used as a unique identifier for each row in a table.
One or more columns that can be used together to identify a single row in another table.
Copy this code in the CLI
$ sudo service postgresql start
$ sudo sudo -u postgres psql
-- Get the current date
SELECT CURRENT_DATE;
/**
* date
* ------------
* 2016-02-16
* (1 row)
*/
sql
in your File exporer01-awesome-name.sql
SELECT CURRENT_DATE;
cmd + s
or ctrl + s
or File > Save
\i sql/01-awesome-name.sql
$ sudo sudo -u postgres psql
postgres=# \i PATH/to-script.sql
postgres=# \q
All the values of a column have to be of the same type
I want to store some information about the students of GA
A Student has only one City
Add a Foreign Key in the Student table referring to the City table
A Student has one or many Classes
Create a pivot table with 2 Foreign Keys referring to Student table and to Class table
The Foreign key couple is in fact the Primary key of this pivot table
-- Sample query
select o.order_id "Order ID", count(distinct i.id) "Quantity" from orders o inner join items i on o.order_id = i.order_id where o.delivered_at is not null and not i.is_sample group by o.order_id order by 1 desc limit 100;
-- Indented sample query
SELECT
o.order_id AS "Order ID",
COUNT(DISTINCT i.id) AS "Quantity"
FROM
orders o
INNER JOIN
items i
ON o.order_id = i.order_id
WHERE
o.delivered_at IS NOT NULL AND NOT i.is_sample
GROUP BY
o.order_id
ORDER BY o.created_at DESC
LIMIT 100;
Not only for you, but also for everyone else!
-- Create table query
DROP TABLE IF EXISTS student;
CREATE TABLE student (
id SERIAL PRIMARY KEY,
first_name VARCHAR(255),
last_name VARCHAR(255),
email VARCHAR(255),
age INTEGER
);
-- Insert rows
INSERT INTO student (
first_name,
last_name,
email,
age
) VALUES (
'Leo',
'Di Caprio',
'leo@gmail.com',
40
), (
'Angelina',
'Joli',
'angi@gmail.com',
42
);
-- Select rows
SELECT
id,
first_name,
last_name
FROM
student
LIMIT 10;
Create the tables City and Class and insert some data inside
\d+ city
-- Update a table
ALTER TABLE student
ADD city_id INTEGER REFERENCES city(id);
-- Update rows
UPDATE
student -- Table name
SET
city_id = 1 -- Field to update
WHERE
id = 1 -- Row to update
;
Create the pivot table student_class
-- Select some data
SELECT
s.first_name AS "Student first name",
s.last_name AS "Student last name",
c.name AS "City"
FROM
student s
INNER JOIN city c ON c.id = s.city_id
WHERE
c.name = 'NYC'
ORDER BY
s.first_name DESC
LIMIT 10;
# Leave PSQL if opened with \q
$ wget https://raw.githubusercontent.com/aeud/sql-pres/master/movielens.sql # Download the dataset
$ sudo sudo -u postgres psql < movielens.sql # Add the dataset to your database
$ sudo sudo -u postgres psql # Restart PSQL client
occupations
users
ratings
movies
genres
genres_movies
A User has one Occupation
A Rating has one User and one Movie
A Movie has many Genres (via genre_movie
)
-- PostgreSQL
\d+ occupations
\d+ users
\d+ ratings
\d+ movies
\d+ genres
\d+ genres_movies
-- Indented sample query
SELECT
o.order_id AS "Order ID",
COUNT(DISTINCT i.id) AS "Quantity"
FROM
orders o
INNER JOIN
items i
ON o.order_id = i.order_id
WHERE
o.delivered_at IS NOT NULL AND NOT i.is_sample
GROUP BY
o.order_id
ORDER BY o.created_at DESC
LIMIT 100;
-- Apply these queries
SELECT * FROM users LIMIT 5;
SELECT id, age, gender FROM users LIMIT 5;
SELECT id AS "User ID", age AS "Age", gender AS "Gender" FROM users LIMIT 5;
-- Apply these queries
SELECT * FROM users LIMIT 5;
SELECT * FROM movies LIMIT 5;
-- Apply these queries
SELECT * FROM users WHERE id = 1 LIMIT 5;
SELECT * FROM users WHERE gender = 'M' LIMIT 5;
How to determine how many females / males are recorded in our dataset?
Gender | Count |
---|---|
M | 670 |
F | 273 |
-- Apply this queries
SELECT
u.gender AS "Gender", -- Dimension(s)
count(u.*) AS "Count" -- Aggregate function
FROM
users AS u
GROUP BY
gender -- Dimension(s)
;
-- Apply these queries
SELECT * FROM users ORDER BY age ASC LIMIT 5;
SELECT * FROM users ORDER BY age DESC LIMIT 5;
SELECT * FROM users ORDER BY gender ASC LIMIT 5;
-- Apply these queries
SELECT * FROM users ORDER BY age ASC LIMIT 5;
SELECT * FROM users ORDER BY age ASC LIMIT 10;
SELECT * FROM users ORDER BY age ASC LIMIT 100; -- use \q to escape
How can I get the distribution of the movies, by release year?
The WHERE
clause is made of a serie of expressions, linked by logical operators
AND
FALSE = FALSEOR
FALSE = TRUEAND
(FALSE OR
TRUE) = TRUEAND
NOT
(FALSE OR
TRUE) = FALSE=
, !=
, <
, >
, <>
, LIKE
, IN
+
, −
, *
, /
-- Conditions
SELECT
id,
age,
gender
FROM
users u
WHERE
u.gender = 'M' AND u.age < 35
ORDER BY
id desc
LIMIT 5;
With the documentation, write a query to fetch the movies released in 1996 with a title starting with the letter D or T
Get the distribution of movies released in 1996, by title first letter
Get the distribution of movies with a title starting with P, by year of release
How to get how many time a movie has been rated? And by which user?
-- Join, use aliases!!
SELECT
m.title, -- 1
count(r.id) reviews, -- 2
count(distinct r.user_id) unique_reviews -- 3
FROM
movies m
INNER JOIN ratings r ON r.movie_id = m.id
ORDER BY
2 desc
LIMIT 5;
-- Join, use aliases!!
SELECT
m.title, -- 1
count(r.id) reviews, -- 2
count(distinct r.user_id) unique_reviews -- 3
FROM
movies m
INNER JOIN ratings r ON r.movie_id = m.id
GROUP BY
1
ORDER BY
2 desc
LIMIT 5;
Get the top 10 movies, by average rate
Get some statistics by year, about the number of rates, the average rates, etc.. Be creative
Try to find an answer to this question: Does users give rates to movies only if they liked it? only if they hated it? only if they liked or hated it? (tip: analyse the distribution of rates)