跳至主要内容

Modern SQL

SQL History

1971 年,IBM 發明了第一個查詢語言叫做 SEQUEL (Structured English Query Language), 接著又在 1983 年推出了採用了其語法的 DB2,SEQUEL 也隨之改名為 SQL (Structured Query Language), 並加入了 ANSI (1986) 和 ISO (1987) 的標準。

  • SQL:1999 Regular expressions, Triggers
  • SQL:2003 XML, Windows, Sequences
  • SQL:2008 Truncation, Fancy sorting
  • SQL:2011 Temporal DBs, Pipelined DML
  • SQL:2016 JSON, Polymorphic tables
  • SQL:2023 Property Graph Queries, Muti-Dim. Arrays

SQL Introduction

與傳統的 relational algebra 相比,SQL 採用的是 bag (無序可重覆) 而非 set (無序不可重覆)。

SQL 語法可以被分為三類 :

  • DML (Data Manipulation Language) : 查詢、插入、更新、刪除資料
  • DDL (Data Definition Language) : 定義 table、view、index 等等
  • DCL (Data Control Language) : 定義權限 (資安相關)

接下來會介紹常用語法,使用的資料表如下 :

sample table

CREATE TABLE student(
sid INT PRIMARY KEY,
name VARCHAR(16),
login VARCHAR(32) UNIQUE,
age SMALLINT,
gpa FLOAT
);

CREATE TABLE course(
cid VARCHAR(32) PRIMARY KEY,
name VARCHAR(32) NOT NULL
);

CREATE TABLE enrolled(
sid INT REFERENCES student(sid),
cid VARCHAR(32) REFERENCES course(cid),
grade CHAR(1)
);

INSERT INTO student (sid, name, login, age, gpa) VALUES
(53666, 'RZA', 'rza@cs', 44, 4.0),
(53688, 'Bieber', 'jbieber@cs', 27, 3.9),
(53655, 'Tupac', 'shakur@cs', 25, 3.5);

INSERT INTO course (cid, name) VALUES
('15-445', 'Database Systems'),
('15-721', 'Advanced Database Systems'),
('15-826', 'Data Mining'),
('15-799', 'Special Topics in Databases');

INSERT INTO enrolled (sid, cid, grade) VALUES
(53666, '15-445', 'C'),
(53688, '15-721', 'A'),
(53688, '15-826', 'B'),
(53655, '15-445', 'B'),
(53666, '15-721', 'C');

Joins

可以用於連接兩張 table,並且可以依據不同的條件來連接。

  • Inner Join
  • Left Join
  • Right Join
  • Cross Join
SELECT *
FROM student AS s
JOIN enrolled AS e ON s.sid = e.sid;

Aggregates

常見的包括 COUNT, SUM, AVG, MIN, MAX 等等。

SELECT COUNT(login) AS cnt
FROM student
WHERE login LIKE '%@cs';

SELECT COUNT(*) AS cnt
FROM student
WHERE login LIKE '%@cs';

SELECT COUNT(1) AS cnt
FROM student
WHERE login LIKE '%@cs';

SELECT COUNT(1+1+1) AS cnt
FROM student
WHERE login LIKE '%@cs';
SELECT AVG(gpa), COUNT(sid)
FROM student
WHERE login LIKE '%@cs';
SELECT COUNT(DISTINCT login)
FROM student
WHERE login LIKE '%@cs';

Aggregates function 不能與其他欄位同時出現在 SELECT 子句中,因此下面的寫法是錯誤的 :

SELECT AVG(s.gpa), e.cid
FROM enrolled AS e JOIN student AS s
ON e.sid = s.sid;

需要使用 GROUP BY 子句來解決這個問題,所有非聚合函數的欄位都必須出現在 GROUP BY 子句中。

SELECT AVG(s.gpa), e.cid
FROM enrolled AS e JOIN student AS s
ON e.sid = s.sid
GROUP BY e.cid;

如果要過濾的話,可以使用 HAVING 子句

SELECT AVG(s.gpa), e.cid
FROM enrolled AS e JOIN student AS s
ON e.sid = s.sid
GROUP BY e.cid
HAVING AVG(s.gpa) > 3.8;

String Operations

不同的資料庫有不同的比較字串以及大小寫敏感度,如下圖所示 :

string operations

也可以在 LIKE 中使用 %_ 來進行字串比較, % 代表任意長度的字串 (包括空字串),_ 代表任意單一字元。

SELECT *
FROM enrolled AS e
WHERE e.cid LIKE '15-%';

SELECT *
FROM student AS s
WHERE s.login LIKE '%@c_';

SQL-92 中定義了一些字串函數,例如 :

SELECT SUBSTRING(name,1,5) AS abbrv_name
FROM student WHERE sid = 53688

SELECT * FROM student AS s
WHERE UPPER(s.name) LIKE 'KAN%'

除此之外,各個資料庫有不同的字串函數,例如 :

-- SQL-92
SELECT name FROM student
WHERE login = LOWER(name) || '@cs'

-- MSSQL
SELECT name FROM student
WHERE login = LOWER(name) + '@cs'

-- MySQL
SELECT name FROM student
WHERE login = CONCAT(LOWER(name), '@cs')

Date and Time

不同的 DBMS 差異非常大,所以需要依據不同的 DBMS 來撰寫 SQL 語法。

-- PostgreSQL
SELECT NOW();

SELECT CURRENT_TIMESTAMP;

SELECT CAST('2023-08-01' AS DATE) - CAST('2023-01-02' AS DATE) AS diff;

Output Redirection

可以將查詢結果儲存到另一張已經存在且有相同欄位結構的 table 中,

-- MySQL
CREATE TABLE CourseIds (
SELECT DISTINCT cid FROM enrolled
);

-- PostgreSQL
SELECT DISTINCT cid
INTO TEMPORARY CourseIds
FROM enrolled;

也可以用於 insert 語句中,將查詢結果插入到另一張 table 中。

-- PostgreSQL
INSERT INTO CourseIds (cid)
(SELECT DISTINCT cid FROM enrolled);

Output Control

可以使用 ORDER BY 子句來排序查詢結果, 語法為 ORDER BY <column*> [ASC|DESC], 預設為 ASC,可以依據多個欄位進行排序。

SELECT sid, grade FROM enrolled
WHERE cid = '15-721'
ORDER BY 2;

SELECT sid FROM enrolled
WHERE cid = '15-721'
ORDER BY grade DESC, sid ASC;

也可以使用 LIMIT 子句來限制查詢結果的數量, 語法為 LIMIT <number> [OFFSET <number>]

SELECT * FROM student
ORDER BY sid
LIMIT 1 OFFSET 1;

還有一種使用 FETCH 的方法,語法如下 (WITH TIES 會將相同的資料列全部取出) :

SELECT sid, name
FROM student
WHERE login LIKE '%@cs'
ORDER BY gpa
OFFSET 10 ROWS
FETCH FIRST 10 ROWS WITH TIES;
SELECT sid, name
FROM student
WHERE login LIKE '%@cs'
FETCH FIRST 10 ROWS ONLY;

Window Functions

可以對每一個欄位進行一些運算,例如 RANKROW_NUMBERDENSE_RANK 等等。

SELECT *, ROW_NUMBER() OVER () AS row_num
FROM enrolled;
SELECT cid, sid,
ROW_NUMBER() OVER (PARTITION BY cid)
FROM enrolled
ORDER BY cid;

選每堂課成績最高的學生

SELECT * FROM (
SELECT *,
RANK() OVER (PARTITION BY cid ORDER BY grade ASC) AS rank
FROM enrolled
) AS ranking
WHERE ranking.rank = 1;

Nested Queries

包含了 inner query 以及 outer query,inner query 會先執行,然後再執行 outer query,且 inner query 可以使用 outer query 中的欄位。

可以支持 NOTINANYALLEXISTS 等等。

SELECT sid, name
FROM student
WHERE sid >= ALL (
SELECT sid FROM enrolled
);

SELECT sid, name FROM student
WHERE sid IN (
SELECT MAX(sid) FROM enrolled
);

SELECT sid, name FROM student
WHERE sid IN (
SELECT sid
FROM enrolled
ORDER BY sid DESC LIMIT 1
);

除此之外也可以使用 LATERAL 子句來進行查詢, LATERAL 子句可以將一個查詢的結果作為另一個查詢的輸入。

SELECT *
FROM (
SELECT 1 AS x
) AS t1,
LATERAL (
SELECT t1.x+1 AS y
) AS t2;
SELECT * FROM course AS c,
LATERAL (
SELECT COUNT(*) AS cnt
FROM enrolled
WHERE enrolled.cid = c.cid
) AS t1,
LATERAL (
SELECT AVG(gpa) AS avg
FROM student AS s
JOIN enrolled AS e ON s.sid = e.sid
WHERE e.cid = c.cid
) AS t2;

Common Table Expressions

ctes 可以幫助我們將複雜的查詢分解成更小的部分, 並且重複使用。

WITH cteName AS (
SELECT 1
)
SELECT * FROM cteName;
WITH cteSource(maxId) AS (
SELECT MAX(sid)
FROM enrolled
)
SELECT name
FROM student, cteSource
WHERE student.sid = cteSource.maxId;

References