Six paradigms of database design

Time:2022-5-8

Recently, I’m looking at database design ethics. By the way, let’s clarify the six paradigms of database design.
First, let’s understand some concepts.

  • Normal form (NF)
The set of relationship patterns that conform to a certain level represents the rationalization degree of the relationship between the attributes within a relationship.
In fact, you can roughly understand it as the level of some design standard that the table structure of a data table conforms to.
  • The difference between “relationship mode” and “relationship”
Similar to the difference between "class" and "object" in object-oriented programming.
”Relationship "is an example of" relationship pattern ". You can understand" relationship "as a table with data, and" relationship pattern "is the table structure of this data table
  • code
An attribute in a relationship or a combination of several attributes,
It is used to distinguish each tuple (tuple can be understood as each record in a table, that is, each row).
  • Candidate code
Set K as an attribute or attribute group in a table. If all attributes except K are completely functionally dependent on K (this "complete" should not be omitted),
Then we call K as candidate code, abbreviated as code
  • Main attribute
The attributes of all candidate codes are called primary attributes.
Attributes that are not included in any candidate code are called non primary attributes or non code attributes.
  • Functional dependency
We can understand it this way (but not particularly strict definition): if in a table, the value of attribute y can be determined when the value of attribute (or attribute group) x is determined, then it can be said that the Y function depends on X and is written as X → y.
In other words, there are no two records in the data table. They have the same value on the X attribute (or attribute group) and different value on the Y attribute.
This is the origin of the name "function dependence", which is similar to the function relationship y = f (x). When the value of X is determined, the value of y must be determined

For example, for the data in Table 3, no record can be found. Their student numbers are the same and their corresponding names are different. So we can say that the name function depends on the student number, and writing student number → name. On the contrary, because there may be students with the same name, there may be two different student records. Their values on the name are the same, but the corresponding student numbers are different. Therefore, we can’t say that the student number function depends on the name.

  • Complete functional dependence
In a table, if x → y, and for any true subset of X (if attribute group x contains more than one attribute),
If x '→ y does not hold, then we call y fully functional dependence on X, which is recorded as X f → y.
Six paradigms of database design

image.png

For example, student number f → name, (student number, class name) f → score

  • Partial functional dependency
If y function depends on X, but y is not completely dependent on X,
Then we call the Y part function dependent on X
Six paradigms of database design

image.png

For example (student number, class name) P → name

  • Transfer function dependency
If Z function depends on Y and Y function depends on X, here it is changed to: "Y is not included in X and X function does not depend on Y"),
Then we call the Z transfer function dependent on X
First normal form (1NF)
Each field cannot be further divided, that is, the atomicity of the field

As shown in the table of 1NF

Six paradigms of database design

image.png

However, if the design only conforms to 1NF, there will still be problems such as excessive data redundancy, inserting exceptions, deleting exceptions and modifying exceptions, such as the design in Table 3

Six paradigms of database design

image.png
Second paradigm (2NF)
I will not introduce the strict definition in relationship theory here (because it involves a lot of foreshadowing),
Just know what improvements 2NF has made to 1NF. The improvement is that 2NF eliminates the partial functional dependence of non primary attributes on codes on the basis of 1NF.

According to the definition of 2NF, the basis of judgment is actually to look at the data tableIs there a partial functional dependency of non primary attributes on the code。 If it exists, the data sheet only meets the requirements of 1NF at most; if it does not exist, it meets the requirements of 2NF. The judgment method is:

  • Step 1: find all the data in the data tablecode
  • Step 2: find out all the codes according to the codes obtained in step 1Main attribute
  • Step 3: in the data table, remove all the main attributes, and the rest areNon primary attributeYes.
  • Step 4: check whether there are non primary attribute pairsPartial functional dependency

Step 1:

1. Check every single attribute. When its value is determined, whether all the remaining attribute values can be determined.
2. Check all attribute groups containing two attributes. When its value is determined, whether all the remaining attribute values can be determined.
3.……
6. Check all attribute groups containing six attributes, that is, all attributes. When its value is determined, whether all the remaining attribute values can be determined.

It seems troublesome, but here’s a trick. If a is a code, then all attribute groups containing a, such as (a, b), (a, c), (a, B, c), etc., are not codes (because there is a “complete functional dependency” in the requirements as codes). Figure 4 shows all functional dependencies in the table:

Six paradigms of database design

image.png

After this step is completed, you can get that there is only one code in Table 3, that is (student number and class name).
Step 2:
There are two main attributes: student number and class name

Step 3:
There are four non primary attributes: name, department name, department head and score

Step 4:

  • For (student number, class name) → name, there is a student number → name, and there is some functional dependence of non primary attribute surname + name on code (student number, class name).
  • For (student number, class name) → department name, there is a student number → department name, there is some functional dependence of non primary attribute department name on code (student number, class name).
  • For (student number, class name) → department head, with student number → department head, there is partial functional dependence of non primary attribute on code (student number, class name).

Therefore, table 3 has partial functional dependence of non primary attributes on codes, which only meets the requirements of 1NF at most and does not meet the requirements of 2NF.

In order to make table 3 meet the requirements of 2NF, we must eliminate these partial functional dependencies. The only way is to split the big data table into two or more smaller data tables. In the process of splitting, we should meet the requirements of a higher-level paradigm, This process is called “pattern decomposition”. The method of pattern decomposition is not unique. Here is one of the methods: Course Selection (student number, class name, score) students (student number, name, department name, Dean). Let’s judge whether the following course selection table and student table meet the requirements of 2NF? For the course selection table, its code is (student number, class name) , the primary attribute is the student number and class name, while the non primary attribute is the score. If the student number is determined, the score cannot be uniquely determined, and if the class name is determined, the score cannot be uniquely determined. Therefore, there is no partial functional dependence of the non primary attribute score on the code (student number and class name), so this table meets the requirements of 2NF. For the student table, the code is the student number, the main attribute is the student number, and the non main attribute is the name, department name and department head. Because the code has only one attribute, there can be no partial functional dependence of the non main attribute on the code, so this table meets the requirements of 2NF.

Third paradigm (3NF)
Based on 2NF, 3NF eliminates the transfer function dependence of non primary attributes on codes.
In other words, if there is a transfer function dependency of non primary attributes on the code, it does not meet the requirements of 3NF.

For the student table, the primary code is the student number, the primary attribute is the student number, and the non primary attribute is the name, department name and Dean. Because the student number → department name and the Department name → department head, there is a transfer function dependence of the department head on the code student number, so the design of the student table does not meet the requirements of 3NF.
In order to make the data table design reach 3NF, we must further decompose the mode into the following forms: Course Selection (student number, class name, score), students (student number, name, department name) and department (Department name, department director). The course selection table meets the requirements of 3NF, which has been analyzed before. For the student table, the code is the student number, the main attribute is the student number, and the non main attribute is the Department name. There can be no transfer function dependence of the non main attribute on the code, so it meets the requirements of 3NF. For the Department table, the code is the Department name, the main attribute is the Department name, and the non main attribute is the dean. There can be no transfer function dependency of the non main attribute on the code (there must be at least three attributes before there can be transfer function dependency), so it meets the requirements of 3NF.

conclusion
It can be seen that the database design meeting the requirements of 3NF basically solves the problems of excessive data redundancy, inserting exceptions, modifying exceptions and deleting exceptions. Of course, in practice, 2NF or 1NF is often achieved in order to meet the needs of performance or expansion, but as a database designer, at least we should know what the requirements of 3NF are.

BCNF paradigm
On the basis of 3NF, the dependence of the main attribute on the code and the transfer function is eliminated.
Fourth paradigm (4NF)
Meet 3NF and eliminate multi value dependency in the table

Understanding: obviously, if a relationship model is 4NF, it must be BCNF. In other words, when the non primary attributes in a table are independent of each other (3NF), these non primary attributes should not have multiple values. If there are multiple values, it violates 4NF

Fifth paradigm (5NF)
The fifth paradigm has the following requirements: (1) it must meet the fourth paradigm; (2) Tables must be able to be broken down into smaller tables unless those tables logically have the same primary key as the original table.

The fifth paradigm is further standardized on the basis of the fourth paradigm. The fourth paradigm deals with independent multivalued cases, while the fifth paradigm deals with interdependent multivalued cases.