Summary of computer foundation for 2021 autumn recruitment interview database, redis

Time:2020-11-24

database

Storage process

It is a kind of database object that stores complex program in database for external program to call. After being compiled and optimized, it is stored in the database server and can be used only by calling.

advantage
  • It can encapsulate the code and guarantee the dataIntegrityTo enable a group of related actions to be executed together, you canMultiple calls
  • A stored procedure is a precompiled code block. Before calling the stored procedure, the database has optimized it, and the execution scheme has been optimized,The execution efficiency is relatively high
  • Stored procedures can be written with flow control statementsVery flexibleIt can complete complex judgment and complex operation.
  • Instead of a large number of transaction SQL statements, you canReduce the network traffic and improve the communication rate
shortcoming
  • The syntax of each database stored procedure is different and difficult to maintain.

function

A subroutine consisting of one or more SQL statements that can be used to encapsulate code for reuse.
And stored procedures

Storage process function
parameter in,out, inout in
Return value 0 or more return values There must be and only one return value
Return type Can return parameters, such as a recordset Cannot return parameters, only values and table objects can be returned
implement As a separate part Can be called as part of a query statement

trigger

A trigger is a program that can be automatically executed. It is triggered before and after the operation of a table or data, such as update, insert, delete. The difference between stored procedure and trigger is that trigger is executed by event execution trigger, and stored procedure can be called directly by stored procedure name. (triggers are generally used to perform a series of operations after data changes. For example, to modify the core data of the system, triggers are needed to store the operation log information.)

affair

Transaction is proposed to solve the problem of data consistency in concurrent situation (similar to multithreading).

ACID

  • Atomicity
    In other words, one transaction failed at the same time.
  • uniformity
    Data integrity before and after a transaction should be consistent. (in the case of bank transfer, a transfers to B, B increases the amount, and a must reduce the corresponding amount)
  • persistence
    Once a transaction is committed, changes to the data are permanent. Even a database failure will not have any impact on it.
  • Isolation
    When multiple transactions are running concurrently, the modification of data in other transactions has no effect on the data in the current transaction before it is committed. After other transactions are committed, the current transaction may be affected according to different isolation levels.

Isolation level

From low to high:

  • Uncommitted readProblem: dirty reading, non repeatable reading, unreal reading
  • Read committedProblem: non repeatable reading, unreal reading
  • RepeatableQuestion: Fantasy reading
  • SerializableSolve all problems

set up:SET GLOBAL transaction_ Isolation ='isolation level ';

Dirty reading, nonrepeatability and unreal reading

  • Dirty readingTransaction a reads the uncommitted data of transaction B, transaction B rolls back, and transaction a reads dirty data
  • Non repeatable: in the process of transaction a reading a group of data multiple times, transaction B modifies and commits the group of data, then transaction a will read different values. (for update operation, solution: row level lock is used, and other transactions are not allowed to modify this row of data)
  • Illusory reading: in the process of transaction a reading the total amount of data for many times, transaction B adds or deletes data and commits, resulting in inconsistent total data read before and after transaction a. (for insert and delete, solution: table level lock is used, and other transactions are not allowed to modify the table before the end of transaction a)

Three paradigms

  • 1NF emphasizes the atomicity of columns, that is, a column cannot be split into multiple columns
    • Student information forms a student information table, including age, gender, student number and so on. None of these fields can be subdivided, so it satisfies the first normal form
  • 2NF is based on 1NF and satisfies a tableThere must be a primary key, columns that are not included in the primary key mustabsolute dependenceIn primary key
    • The student number 1024 is Borris and his age is 23. The name and age fields depend on the primary key of student number.
  • 3NF is based on 1 and 2NF. Non primary key columns must directly rely on primary keys, and there can be no dependency passing. In other words, there can be no non primary key columns that depend on non primary key columns B, and non primary key columns B depend on primary keys.
    • For example, the university is divided into many departments (Chinese Department, English Department, computer department, etc.) The information in the department management table consists of the following fields: department number, department head, Department profile, and department structure. Then, you can’t add the department number, department director, Department profile, and department structure fields to the student information table, because it will be redundant, and the fields outside the primary key of the department management table will depend on the student information table. The correct way is: only one department number field can be added to the student table.

SQL classification

  • DQL: data query language, select, where, from
  • DML: data manipulation language, insert, update, delete
  • DCL: data control language, revoke, grant, set user group access rights
  • Create, drop

view

  • It is a virtual table with the same functions as the physical table, but the view is based on the existing table.

  • Data content is obtained through select query, which is usually a subset of rows or columns with one or more tables.

  • The view does not store the real data, which is still stored in the base table.

  • Selective access to part of the data can simplify the query. You can get multiple views from a single table.

  • Changes to the view affect the base table.

  • Disadvantages:
    Performance: when querying the view, the query of the view must be converted into the query of the basic table, and the query performance is not improved.
    Modify: if the view is defined by a complex multi table query, the data cannot be modified because the query data is based on multiple tables.

Indexes

Type (Major)

  • Full text index
    In order to solve the problem of low efficiency of fuzzy query, the key words in the text are searched instead of the values in the index.
  • Hash indices
    Based on hash table, onlyMatch all columns of the index exactlyIs valid. For each row of data, the storage engine will calculate a hash code for all index columns, and the hash codes calculated for rows with different key values are also different. The hash index stores all the hash codes in the index and holds the pointer to each data row in the hash table.
  • B-tree index
    It is to build the index in the form of B-tree. InnoDB engine uses B + tree, similar to binary search tree.
    The slot of the root node stores the pointers to the child nodes, and the storage engine searches the lower level according to these pointers. By comparing the value of the node page and the value to be searched, appropriate pointers can be found to enter the next level child node. These pointers actually define the upper and lower limits of the value in the child node page. Finally, the pointer of the leaf node points to the indexed data.
    B + tree index all the user data is stored in the leaf node. Through the directory entries of the upper level node, search from the root node layer by layer to find the corresponding data.
  • Overlay index
    An index contains or covers the values of all the fields to be queried, so it is no longer necessary to query data back to the table according to the index. The overlay index must store the value of the index column, so MySQL can only use the b-tree index as the overlay index.

type

  • General index: use key or index keyword to establish
  • Unique index: with unique, the value of index column must be unique, the combination of composite index must be unique, and null value is allowed
  • Primary key index: primary, a special unique index. Null value is not allowed. InnoDB will automatically establish a cluster index for the primary key
  • Composite index: an index created based on multiple fields, which must be followed when queryingLeftmost prefix principle(go through the details in your mind).
  • Full text index: full text, used to find keywords in text, in order to solve the problem of low efficiency of fuzzy query

Clustered index and non clustered index

  • Cluster index: the data storage and index are put together, and the leaf node of the index structure saves the row data
  • Nonclustered index: the data and index are stored separately, and the leaf node of index structure points to the corresponding position of data

Why use B + tree

  • Since the data of B + tree are stored in leaf nodes, the branch nodes are all indexes, so it is convenient to scan the database. It is only necessary to scan the leaf nodes once. However, because the branch nodes of B + tree also store data, we need to conduct a middle order traversal to scan the specific data. Therefore, B + tree is more suitable for interval query, so B + tree is usually used for database index.
  • Range based queries are very frequent in databases, but B-tree does not support such operations or is inefficient.

How to implement B + tree index query of innondb?

  • First of all, each level node of B + tree is a page, which is connected by two-way linked list, and the record items in the page are connected by one-way linked list. The record items are arranged from small to large according to the index column value

  • The process of index establishment

    • When a table creates a B + tree index, a root node page is created
      • The root node will not be moved again. When InnoDB uses the index, it will take the page number of the root node of the index from the data dictionary for access.
    • Whenever a user record is inserted in the table, the user record is stored in the root node
    • When the root node runs out of free space (each page size of InnoDB is 16kb), all user records of the root node are copied to a newly allocated page. Through page splitting, two pages are obtained. User records are allocated to two new pages from small to large according to the index value to become leaf nodes, and the root node becomes the page for storing directory entry records.
    • The minimum index value and page number of the two pages are recorded in the directory entry record, and the records are arranged from small to large according to the index value.
    • When all the leaf nodes are full, the previous split operation is performed to ensure that the leaf node stores user data, and the superior page becomes the directory entry record page. This process is repeated to form a multi-level directory.
  • Search by primary key index

    • After accessing the primary key value of the root page, you need to access the primary key value of the node
    • Since the directory entry records in the root node are arranged from small to large according to the primary key index value, we can use binary search to find the directory entry record page where the primary key is located
    • Go to the next level of this directory entry to continue binary search, find the underlying directory entry record page
    • Until the directory entry of the leaf node is found, the specific primary key value is found in the directory entry
  • Search according to the secondary index established by oneself

    • In the B + tree established by secondary index, the leaf node stores the secondary index value + primary key value
    • According to the index value, find all the matching secondary index values in the B + tree established by the index, and then query the table one by one according to their corresponding primary key values to get the final complete query results.
Applicable conditions of B + tree index
  • Full value matching
  • Leftmost column matching
  • Column prefix matching
  • Range value matching
  • Orderby sort, and DESC, ASC are not mixed
  • groupBy
The cost of returning to the table
  • If the secondary index gets too many results, it will return to the table too many times, resulting in low performance of the query using this secondary index.
    • Reason: sequential I / O is used to access the secondary index, because the data is stored in consecutive pages according to the order of size and connected with linked list. After the secondary index query results are retrieved, the primary key index is not queried according to the size of the primary key. Therefore, it is random I / O, which will access more data pages, resulting in poor performance.

Index selection

  • Create indexes only for columns that are used to search, group, and sort
  • The cardinality of the column should be as large as possible (fewer repeated values will result in fewer results and fewer times of returning to the table)
  • Index column data type as small as possible (mediumint, int, bigint)
  • Index the string prefix (only the first few bits of the string are taken to save the space of B + tree and the time of query comparison)

Why is the primary key index integer and self increasing?

  • If the table uses an auto increment primary key, each time a new record is inserted, the record will be added to the subsequent position of the current inode in sequence. When a page is full, a new page will be opened automatically
  • If you use a non self increasing primary key (if Id number or student number, etc.), because each time the insertion of the primary key is almost random, every new record is inserted into the middle of the existing index page. At this point, MySQL has to move the data in order to insert the new record into the right place, and even the target page may have been written back to the disk and cleared from the cache. To read it back from the disk, this increases a lot of overhead. At the same time, frequent movement and paging operations cause a lot of fragmentation, resulting in an index structure that is not compact enough. In the future, we have to rebuild the table and optimize the page filling by optimizing table.
  • The purpose of using integers is to save the storage space of indexes. Store as many indexes as possible in one page.

Differences between InnoDB and MyISAM engines

Both index structures are B + trees!

InnoDB MyISAM
Support transaction and foreign key Transaction and foreign key are not supported
Must have a primary key as a clustered index There is no clustered index. All indexes are secondary indexes. The data and index are stored separately. The index stores the pointer of data file
Do not save the specific number of rows, scan the whole table to get the specific number of rows Save line number information
Support table level lock and row level lock Only table level locks are supported

connect

Internal connection:Connect only matching rows
Left external connection:Contains all the rows of the table on the left (regardless of whether there are rows matching them in the table on the right), and all matching rows in the table on the right
Right outer connection:Contains all the rows of the table on the right (whether or not there are rows matching them in the table on the left), and all matching rows in the table on the left

Database optimization

Query optimization

  • Before MySQL 8.0, the interference of cache on query speed should be eliminated.
  • Explain analysis of execution plan
    • For example, it is found that the difference between the values of rows in explain and the number of rows obtained by count (*) is too large, resulting in wrong index selection. Force index can be used to force the correct index.
    • Or it is found that the index is not used when the statement is executed, and the query conditions need to be adjusted.
  • Use overlay index
    • Through the overlay index query, the required data can be directly found on its own index, avoiding the back table and reducing the number of queries.
  • Establish appropriate joint index
    • For some high-frequency queries, establish a joint index (for example: student number, name). The purpose is to avoid returning the table to the primary key index to reduce the number of queries.
  • Index push down optimization(official optimization): since MySQL 5.6, in the process of index traversal, it first judges the fields contained in the joint index, directly filters out the records that do not meet the conditions, and then queries in the primary key index to reduce the number of table returns.
  • Follow when queryingLeftmost matching principleTo query according to the field order of the index
  • Use prefix index.For example: index the first few digits of the mailbox, reduce the space consumed for index building, and put index values into a data page as much as possible to improve search efficiency.
    • For very long fields, if you need to build a prefix index, the key is to improve the discrimination of this index. You can use hash to intercept fields and reverse strings, but you need to pay attention to the overhead of using functions. Avoid using functions in query conditions.
    • flush

Database structure optimization

  • Choose the right database engine
  • Paradigm Optimization: eliminating redundant columns in a table
  • Anti normal form optimization: add redundant columns in a table appropriately to reduce join queries
  • Split table: horizontal split and vertical split
    • It is still necessary to divide the large scale into different performance tables
    • Horizontal segmentation is to divide a table into different tables or databases according to certain rules.
    • Popular understanding: horizontal split branches, row data split into different tables, vertical split column, column data split into different tables

Server hardware optimization

  • Upgrade your hardware..

Explain keyword

Analysis of the use of explain in MySQL performance optimization artifact

${} and {} in mybatis

  • Mybatis does not deal with the parameters passed in by the method of ${}. However, the parameters passed in by {} will be treated as strings by mybatis by default.
  • #And $are not the same in precompiling.#It is similar to the Preparedstatement in JDBCFor the parameters passed in, the place holder? Will be used in the preprocessing phase, which can effectively prevent SQL injection

SQL injection

SQL injection attack is an attack by inserting malicious SQL query or add statement into the input parameters of the application, and then parsing and executing on the background SQL server.

  • solve:
    • SQL precompiled, variable binding
      • Precompiled statement is to replace the value in SQL statement with a place holder, that is, SQL statement is templated to make its structure fixed. After precompiling, the next time you encounter the same precompiled statement, you just need to pass the variable value into the compiled statement execution code to execute.
      • If there is precompiling, if the SQL injection statement conforms to the precompiled statement, the variables of the SQL injection statement will be passed, and the or / and will also be regarded as variables, which will lose its original function.
      • For example:
        • Precompiled sentencesselect * from users where username = ? and password = ?
        • Injection statementselect * from users where username='' or true or'' and password=''In precompilationor trueIt will lose the effect of constancy.
    • Strict parameter verification: check whether the variable data type and format meet the requirements before executing SQL statement.

Database lock

By type

  • It is divided into shared lock (s lock) and exclusive lock (x lock), also known as read lock and write lock. Read locks are shared and do not block each other. Multiple clients can read the same resource at the same time. The write lock is exclusive and blocks other write locks and read locks to ensure that only one user can write at a given time.
  • A write lock request may be inserted in front of the read lock queue, but a read lock cannot be inserted in front of the write lock queue.

By range

  • Table lock can lock the whole table with low overhead and no deadlock, but it has high probability of lock conflict and low concurrency.
  • Row lock can support concurrency to the greatest extent, and the probability of lock conflict is low, but the overhead is high, and deadlock will occur. Row lock is only implemented in the storage engine layer. InnoDB implements row lock.

dead lock
1. Alternation of resource access order between transactions

  • Causes:
    One user a accesses table a (locks table a) and then accesses table B; another user B accesses table B (locks table b), and then attempts to access table A. at this time, user a has to wait for user B to release table B before it can continue. Similarly, user B has to wait for user a to release table a before it can continue.
  • resolvent:
    This deadlock is more common, is due to the program bug, in addition toThe logic of the adjusted programThere is no other way. Carefully analyze the logic of the program. For the operation of multiple tables in the database,Try to process in the same order and avoid locking two resources at the same timeFor example, when operating two tables a and B, they are always processed in the order of a before B. when two resources must be locked at the same time, the resources should be locked in the same order at any time

2. Modify the same record concurrently

  • Causes: mainly due to the lack of a one-time application for a lock with sufficient permissions.
    Reference: record a deadlock troubleshooting process
    User a queries a record and then modifies the record. At this time, user B modifies the record. At this time, the nature of the lock in user a’s transaction changes from the shared lock of the query to the exclusive lock. The exclusive lock in user B must wait until a releases the shared lock because a has a shared lock. The exclusive lock that a cannot rise due to B’s exclusive lock cannot release the shared lock It’s a deadlock. This kind of deadlock is more hidden, but it often occurs in larger projects.
  • resolvent:
    a. Optimistic lock: implement write write concurrency
    b. Pessimistic lock: use pessimistic lock for control. Pessimistic lock mostly depends on the lock mechanism of database, such as Oracle select For UPDATE statement to ensure maximum exclusivity of the operation. However, it is followed by a lot of overhead of database performance, especially for long transactions, which is often unbearable.

3. Deadlock caused by improper index

  • Causes:
    If a statement that does not meet the conditions is executed in the transaction, the whole table scan is executed, and the row level lock is upgraded to the table level lock. After several such transactions are executed, it is easy to produce deadlock and blocking. Similarly, when the amount of data in the table is very large and the index is too few or inappropriate, the whole table scanning often occurs, and the final application system will become slower and slower, and eventually block or deadlock occurs.
  • resolvent:
    In SQL statements, do not use too complex queries related to multiple tables; use “execution plan” to analyze SQL statements, and establish corresponding indexes to optimize SQL statements with full table scanning.

So, how to avoid deadlock as much as possible?
1) Access tables and rows in a fixed order. That is to apply for locks in order, so as not to cause the scene of waiting for each other.
2) Big business is small. Large transactions are more prone to deadlock. If the business allows, the large transactions will be split into smaller ones.
3) In the same transaction, try to lock all the resources needed at one time to reduce the probability of deadlock.
4) Reduce the isolation level. If the business permits, it is also a good choice to lower the isolation level. For example, adjusting the isolation level from RR to RC can avoid many deadlock caused by gap lock.
5) Add a reasonable index to the table. If the index is not moved, a lock will be added to each row of the table, and the probability of deadlock will be greatly increased.

The difference between drop, truncate and delete

sketch:

  • Drop belongs to DDL (database definition language) and directly drops the whole table.
  • Truncate is also DDL, which deletes the data in the table and does not delete the table structure. After re inserting the data, the counting starts again from the increment ID.
  • Delete belongs to DML (database operation language). One row of data is deleted from the table each time. You can use the where statement to filter.

Details:

DROP TRUNCATE DELETE
SQL DDL DDL DML
Application scenarios Delete all tables, including structure Delete the whole table, excluding the structure Delete the eligible data in the table
Execution speed fast secondary slow
Submit Implicitly commit, cannot roll back, do not trigger trigger Same as drop Each operation is recorded in the redo and undo table spaces for rollback

Relational database vs non relational database

One side’s advantage is the other side’s disadvantage.
Advantages of non relational database:

  • Performance:NoSQL is based on key value pairs and does not need to be parsed by SQL layer, so the read-write performance is very high.
  • Scalability:It is also because there is no coupling between data based on key value pairs, so it is very easy to scale horizontally.
  • There are various data storage formats.

Advantages of relational database

  • The structure is simpleTwo dimensional table structure, clear logic, easy to understand.
  • Complex query:Using general SQL statement, complex query can also be carried out conveniently.
  • Transaction support:The data access requirements with high security performance can be realized.

Varchar() and char ()

  • The length of char is immutable, while the length of varchar is variable, and it takes an additional 1 byte to store location information.
    Therefore, when fetching data, trim() should be used to remove the extra space for char type, while varchar is not needed.

  • The storage length of char is faster than that of char.
    Because of its fixed length, char will inevitably have redundant space placeholders to occupy space, which can be said to exchange space for time efficiency.

  • Char is stored in one byte for English characters (ASCII) and two bytes for a Chinese character.
    The storage method of varchar is: it takes 2 bytes for each English character and 2 bytes for Chinese characters.

  • Char usage scenarios: data is of fixed length; data length is short; columns with frequent changes (because varchar needs extra calculation length for each storage)

reference resources

SQL injection details
MySQL stored procedure – detailed description of principle, syntax and function
Database interview questions (developers must see)
What are the common database optimization methods?

This work adoptsCC agreementThe author and the link to this article must be indicated in the reprint