Author Tony, who doesn’t cut her hair
This paper introduces the design idea of relational database: everything is relational in SQL.
There are many great design concepts and ideas in the computer field, such as:
- In UNIX, everything is a file.
- In object-oriented programming languages, everything is an object.
The same is true of relational database design.
The relational model was proposed by Dr. e.f.codd in 1970. It is based on the concept of relationship in set theory. Both the entity objects in the real world and the relationship between them are expressed by relation. The relationship we see in the database system is a two-dimensional table, which is composed of rows and columns. Therefore, it can also be said that a relational table is a collection of data rows.
The relational model consists of data structure, relation operation and integrity constraint.
- The data structure in relational model is relational table, including basic table, derived table (query result) and virtual table (view).
- Common relational operations include add, delete, modify and query (crud), which uses SQL language. The query operation is the most complex, including selection, projection, union, intersection, exception and Cartesian product.
- Integrity constraints are used to maintain data integrity or meet the requirements of business constraints, including entity integrity (primary key constraint), referential integrity (foreign key constraint), and user-defined integrity (non NULL constraint, unique constraint, check constraint, and default value).
SQL is the standard operation language of database. SQL is very close to English and very easy to use. At the beginning of its design, it takes into account the needs of non-technical personnel. We usually only need to explain the desired results (what) and hand over the data processing process (how) to the database management system. So, SQL is the real programming language for people!
Next, we will analyze the various operation statements of relation. The purpose is to let you understand that SQL is a set oriented programming language, and its operation object is set, and the result of operation is also a set.
In relational database, relation, table and set usually represent the same concept.
Here is a simple query statement:
SELECT employee_id, first_name, last_name, hire_date FROM employees;
Its function is to query employee information from the employees table. Obviously, we all know that after from is a table (relation, set). Not only that, the result of the entire query statement is also a table. Therefore, we can use the above query as a table:
SELECT * FROM (SELECT employee_id, first_name, last_name, hire_date FROM employees) t;
The query in parentheses is called a derived table, and we give it an alias called t. Again, the entire query result is a table; this means that we can continue nesting, even if it’s boring.
Let’s take another example from PostgreSQL:
-- PostgreSQL SELECT * FROM upper('sql'); | upper | |-------| | SQL |
Upper () is a uppercase conversion function. It appears in the from clause, which means that its result is also a table, just a special table with 1 row and 1 column.
The select clause is used to specify the fields to be queried, and can contain expressions, function values, and so on. Select is called projection in relation operation. It should be easier to understand by looking at the diagram below.
In addition to select, there are some common SQL clauses.
Where is used to specify data filtering conditions, which is called selection in relation operation. The schematic diagram is as follows:
Order by is used to sort the query results. The schematic diagram is as follows:
In short, SQL can perform various data operations, such as filtering, grouping, sorting, and limiting the number of data. The objects of all these operations are relational tables, and the results are relational tables.
Among these relational operations, one of the more special is grouping.
The group by operation is different from other relational operations in that it changes the structure of the relationship. Here’s an example:
SELECT department_id, count(*), first_name FROM employees GROUP BY department_id;
The purpose of this statement is to count the number of employees by department, but there is a syntax error: first_ Name cannot appear in the query list. The reason is that if you group by department, each department contains multiple employees; it is not possible to determine which employee’s name needs to be displayed, which is a logical error.
So group by changes the structure of set elements (data rows) and creates a new relationship. The schematic diagram of grouping operation is as follows:
Nevertheless, the result of group by is still a set.
The most obvious embodiment of SQL set oriented feature is union (Union operation), intersect (intersection operation) and Excel / min (difference set operation).
The function of these set operators is to combine two sets into a set, so the following conditions need to be met:
- The number and order of fields in the set on both sides must be the same;
- The types of the corresponding fields in the collection on both sides must match or be compatible.
Specifically, union and union all are used to compute the union of two sets and return the data that appears in the first query result or the second query result. The difference between them is that union excludes duplicate data in the result, while union all retains duplicate data. The following is a schematic diagram of union operation:
The intersect operator is used to return the common part of the two sets, that is, the data that appears in both the first query result and the second query result, and excludes duplicate data in the result. The schematic diagram of intersect operation is as follows:
The except or minus operators are used to return the difference set of two sets, that is, the records that appear in the first query result but not in the second query result, and exclude duplicate data in the result. The schematic diagram of the except operator is as follows:
In addition, the distinct operator is used to eliminate duplicate data, that is, to exclude duplicate elements from the collection.
The concept of relation in SQL comes from the set theory in mathematics, so Union, intersect and excel come from Union (∪), intersection (∩) and difference set (∖) in set theory. It should be noted that the set in the set theory does not allow duplicate data, but SQL does. Therefore, the set in SQL is also called multiset; both the multiple set and the set in set theory are out of order, but SQL can sort the query results through the order by clause.
In SQL, not only entity objects are stored in relational tables, but also relationships between objects are stored in relational tables. Therefore, when we want to get the relevant data, we need to use another operation: join.
Common SQL connection query types include internal connection, external connection, cross connection, etc. Among them, the outer connection can be divided into left outer connection, right outer connection and all outer connection.
Inner join returns the data that meets the join conditions in two tables. The principle of inner join is shown in the following figure:
The left outer join returns all the data in the left table; for the right table, it returns the data that meets the connection conditions; if not, it returns a null value. The principle of the left outer connection is shown in the following figure:
The right outer join returns all the data in the right table; for the left table, it returns the data that meets the connection conditions, and returns a null value if there is none. The right outer connection and the left outer connection are interchangeable, and the following two are equivalent:
t1 RIGHT JOIN t2 t2 LEFT JOIN t1
Full outer join is equivalent to left outer join plus right outer join, and all data in left table and right table are returned at the same time; null value is returned for data in two tables that does not meet join conditions. The principle of full external connection is shown in the following figure:
Cross linking is also known as Cartesian product. A cross join of two tables is equivalent to a pairwise combination of all rows in one table and all rows in another table. The number of results is multiplied by the number of rows in two tables. The principle of cross connection is shown in the following figure:
Other types of join include semi join and anti join.
A set operation combines two sets into a larger or smaller set; a join query transforms the two sets into a larger or smaller set, while obtaining a larger element (more columns). In many cases, collection operations can be implemented through join queries, for example:
SELECT department_id FROM departments UNION SELECT department_id FROM employees;
SELECT COALESCE(d.department_id, e.department_id) FROM departments d FULL JOIN employees e ON (e.department_id = d.department_id);
Now that we’ve covered many examples of queries, let’s look at other data operations.
DML stands for data manipulation language, that is, insert, update and delete. Here is an example of an insert statement:
CREATE TABLE test(id int);
We insert three records through an insert statement, or a relational table containing three records. Because union all returns a relational table. Values also specifies a relational table. The following statements are supported in SQL server and PostgreSQL:
SELECT *FROM ( VALUES(1),(2),(3)) test(id);
As we have said before, after from is a relation table, so the values here are the same. Because we often insert a single record, we don’t realize that we actually operate on a table basis.
Similarly, the update and delete statements are based on relational tables; however, we are used to saying that a row of data is updated or several records are deleted.