Still generating database documents manually? Three steps are completed automatically. Let’s have a look


Automatic generation of database documents can be completed in three simple steps. Learn about it.

1 Preface

In daily work, everyone should encounter the need to export database description documents (also known as data dictionaries), that is, sort the field information of each data table into table descriptions one by one, and then save them in document formats such as Excel / word / HTML / MD. Many small partners are still using the original manual method to copy and paste the field descriptions (name, type, length, comments, etc.) of the database. I have to say that this method is too inefficient. As a programmer, the problem that can be solved by programming is not a problem. The method described below is very simple and requires only 3 steps. This paper will use Python to encode these three steps, and output the data sheet information description to excel document. Therefore, it mainly includes the following contents:

  • 3-step instructions for generating database documents
  • Get database table meta information
  • Get meta information of data table columns
  • Generate database description excel document
  • (optional) set excel document format

2 3 steps of generating database description document

Since the database will store corresponding metadata information (that is, information describing the database, data table and data field itself, such as table name, field table, type, etc.), generally speaking, the idea of generating database description document is very simple, which is divided into three steps:

  • 1) According to the database name, obtain the data table meta information from the database, mainly including table name, table comments, etc
  • 2) According to the data table name, obtain the meta information of the data field, mainly including field name, field type, whether it can be empty, field comment, etc
  • 3) Generate documents based on metadata information

According to this idea, these three steps can be encoded to automatically generate documents. To obtain metadata information, various databases will have different query statements. Specifically, you can query relevant official documents. The following is a brief list of MySQL and Oracle:

#MySQL query table information and field information
SELECT * FROM information_schema.`TABLES` WHERE TABLE_SCHEMA = %db_name%
SELECT * FROM information_schema.`COLUMNS` WHERE TABLE_SCHEMA = %db_name% AND TABLE_NAME = %table_name%

#Oracle query table information and field information
SELECT * FROM all_tables WHERE where owner= %db_name%
SELECT * FROM all_COL_COMMENTS WHERE owner = %db_name%  and TABLE_NAME=%table_name%

The implementation method can also be implemented according to everyone’s favorite programming language. In this paper, taking MySQL as an example, we use python programming to output data information to EXCEL documents (for specific excel operations, please refer to my last article)《Python processing Excel files》)。 The output effect is as follows:

Still generating database documents manually? Three steps are completed automatically. Let's have a look

Now follow me to realize it.

3 get database table meta information

3.1 basic usage of client pymysql

Use Python to read and write MySQL usingpymysql, readers can access itOfficial documentsUnderstand its installation and use. To put it simply, the following steps are required to read data from the database:

  • Connect to database:connect
  • Get cursor for reading data:connection.cursor()
  • Execute SQL statement to obtain data:cursor.execute(sql,args)cursor.fetchall()cursor.fetchone(),cursor.fetchmany()
  • Close cursor and connection:connection.close()
  • Get worksheet in use:workbook.activecursor.close()

Therefore, we are in class initialization(__init__)And off(__del__Database connection and closing operations are performed. The code is as follows:

def __init__(self, host, port, user, password, db_name, charset):
    #Initialize database operation

    self.db = pymysql.connect(host=host, port=port, user=user,password=password, database=db_name, charset=charset)
    self.cursor = self.db.cursor()

def __del__(self):
    #Close database connection


3.2 obtaining database table meta information

According to the above basic operations of pymysql, you only need to execute SQL to query the meta information of the data table. As mentioned earlier, the SQL statement for querying table meta information in MySQL isSELECT * FROM information_schema.TABLES WHERE TABLE_SCHEMA = %db_name%, and we only need the table name and table comments. Therefore, the implementation is as follows:

def get_table_info(self, db_name):
    #Get data table information

    sql = '''SELECT table_schema, table_name, table_comment FROM information_schema.`TABLES` WHERE TABLE_SCHEMA = %s order by table_name'''
    params = [db_name]
    #Query data
    self.cursor.execute(sql, params)
    return self.cursor.fetchall()

This function functions: pass in the database name and return all table information.

4 get meta information of data table fields

After obtaining the table information, similarly, you need to traverse each table and obtain the field information of each table according to the table name. As mentioned earlier, the fields of the MySQL get table are query tablesinformation_schema.COLUMNSFor field information, we mainly focus on the field name, field type, whether it is allowed to be empty, comment description of the field, etc. The code is as follows:

def get_table_column_info(self, database_name, table_name):
    #Get data table column information

    params = [database_name, table_name]
    sql = '''SELECT
                TABLE_ Schema as' Library name ', table_ Name as' table name ',
                COLUMN_ Name as' column name ', official_ Position as' arrangement order of columns',
                COLUMN_ Default as' default ', is_ Nullable as' is empty ',
                DATA_ Type as' data type ', character_ MAXIMUM_ Length as' maximum length of characters',
                NUMERIC_ Precision as' numerical precision (maximum digits) ", numeric_ Scale as' decimal precision ',
                COLUMN_ Type as' column type ', column_ Comment as' comment '
            FROM information_schema.`COLUMNS`
    #Query data
    self.cursor.execute(sql, params)
    return self.cursor.fetchall()

This function can obtain the field information of this table according to the database name and table name.

5 generate excel document

5.1 output table field information to excel document

For Excel operation, we use openpyxl to read and write. For specific excel operation, please refer to my last article《Python processing Excel files》。 The function we need to realize now is to write the field information of each table into excel table in the form of table, and output it according to field name, allowed to be empty, field type and field description.

Another point is that we often have some public fields in the process of designing tables, such as ID, creation time, creator, modification time, modifier, etc. we can choose to filter them out when exporting dictionaries. Therefore, the following code is used for implementation:

def create_file(self, file_path):
    #Get the file. If the file does not exist, it will be created. If it does exist, it will be deleted and recreated

    if os.path.exists(file_path):
    wb = Workbook()
def save_column_info_to_excel(self, table_name, table_comment, column_info, file_path, col_names_skip):
    #Write table information to excel file

    workbook = openpyxl.load_workbook(file_path)
    #Get by subscript (subscript starts from 0)
    sheet = workbook.worksheets[0]
    row_data = [table_name]
    if table_comment:
        row_data = [table_name + "(" + table_comment + ")"]
        rurrent_max_row = sheet.max_row
        #Blank line separation
        col_ name_ Data = ["field name", "null allowed", "type", "field description"]
        for row in column_info:
            #To be filtered
            if col_names_skip and row[2].lower() in col_names_skip:
                Print ("#" * 10, "skip this field:", row [2])
            print(row[2] + "," + row[5] + "," + row[10] + "," + row[11])
            row_data = [row[2], row[5], row[10], row[11]]
        #Save document

There are two functions here,create_fileThe main function is to create documents. If documents exist, they will be deleted first.save_column_info_to_excelThe function is to use the for statement to output to excel by line according to the table field information and the field name to be filtered. During the output process, if there are fields to be filtered, skip it, and finally save the document to the specified path.

5.2 connect various functions

Previously, three functions have been realized: database table meta information acquisition, data table field meta information acquisition and field information output to excel document. Now connect these three functions to form a complete database document export function. The idea is to traverse the generated table meta information (get)_ table_ Info) to obtain the field information (Gen) of the table according to the table element information_ table_ column_ Info), and then output the excel document (save)_ column_ info_ to_ Excel), as follows:

def gen_db_table_info_skip_col(self, db_name, file_path, col_names_skip):
    #Filter the specified column and export the data table information to the document

    table_info_rows = self.get_table_info(db_name)
    for table_row in table_info_rows:
        Print ("\ n", "*" * 10, "generate table information:", table_ row[1])
        self.gen_table_column_info(table_row, file_path, col_names_skip)

def gen_table_column_info(self, table_info_row, file_path, col_names_skip=None):
    #Export field information table to document

    database_name = table_info_row[0]
    table_name = table_info_row[1]
    table_comment = table_info_row[2]
    #Get table information from database
    column_info = self.get_table_column_info(database_name, table_name)
    #Write excel file
    self.save_column_info_to_excel(table_name, table_comment, column_info, file_path, col_names_skip)

There are two functions here,gen_db_table_info_skip_colThe function is to export the table element information according to the database name, file saving path and field name to be filtered, and then use the for statement to traverse.gen_table_column_infoIt is based on the table information and the fields to be filtered. First read the table field information, and then write it to the excel document. Note herecol_names_skipThe default value is none, that is, if you do not need to filter, do not enter this parameter. So far, our function of automatically generating database documents has been completed. stay__main__Take a look at the output during execution:

if __name__ == '__main__':
    #Output document address
    excel_path = "E:/pythontest/test_tableinfo.xlsx"
    #Database connection information
    host = "localhost"
    port = 3306
    user = "root"
    password = "123456"
    db_name = "test"
    charset = 'utf8'
    #Fields to be filtered
    col_names_to_skip = ["id", "sys_create_time", "sys_create_user", "sys_update_time", "sys_update_user", "record_version"]
    #Initialize classes, create files, and generate database documentation
    dbInfoGenerator = DbInfoGenerator(host, port, user, password, db_name, charset)
    dbInfoGenerator.gen_db_table_info_skip_col(db_name, excel_path, col_names_to_skip)

The results are as follows:

Still generating database documents manually? Three steps are completed automatically. Let's have a look

The field description of the table has been output to the excel document, and the corresponding fields have been filtered. However, the format is not very good-looking. Therefore, if necessary, you can use openpyxl to set the format of EXCEL documents.

6 (optional) set excel document format

If you need to format EXCEL documents, the following is one of my basic format settings. If necessary, you can refer to it and make your own document format. The main ideas of format setting are as follows:

  • Sets the width of each column
  • Traverse each row of Excel table. If it is a table name, it will be merged into one row as the header. Set the header format to bold, black border, center alignment and fill background color.
  • If the table field contains information, you can set a black border.
def set_file_format(self, file_path):
    #Format table

    if not os.path.exists(file_path):
        Print ("the file does not exist and will not be processed")
    workbook = openpyxl.load_workbook(file_path)
    sheet = workbook.worksheets[0]
    #Set the width of each column
    sheet.column_dimensions["A"].width = 16
    sheet.column_dimensions["B"].width = 10
    sheet.column_dimensions["C"].width = 20
    sheet.column_dimensions["D"].width = 40

    #Format table name
    max_row = sheet.max_row
    for i in range(1, max_row + 1):
        col1_value = sheet.cell(i, 1).value
        col2_value = sheet.cell(i, 2).value
        #If the first column has data and the second column has no data, it is the table name
        if col1_value and not col2_value:
            #Consolidated table name
            sheet.merge_cells(start_row=i, start_column=1, end_row=i, end_column=4)
            #Bold font
            Font = font (name = "Microsoft YaHei", size = 12, bold = true, italic = false, color = "000000")
            #Black border
            side_style = Side(style="thin", color="000000")
            border = Border(left=side_style, right=side_style, top=side_style, bottom=side_style)
            #Center alignment
            cell_alignment = Alignment(horizontal="center", vertical="center", wrap_text=True)
            #Fill background color
            p_fill = PatternFill(fill_type="solid", fgColor="BFBFBF")
            #Table name cell format
            for j in range(1, 5):
                sheet.cell(i, j).font = font
                sheet.cell(i, j).border = border
                sheet.cell(i, j).alignment = cell_alignment
                sheet.cell(i, j).fill = p_fill
         #If both the first and second columns have data, it is the table content
         if col1_value and col2_value:
            #Black border
            side_style = Side(style="thin", color="000000")
            border = Border(left=side_style, right=side_style, top=side_style, bottom=side_style)
            #Table name cell format
            for j in range(1, 5):
                sheet.cell(i, j).border = border
    #Save document

After generating the database specification document, this function can be called to modify its document format. The results are as follows:

Still generating database documents manually? Three steps are completed automatically. Let's have a look

7 Summary

This paper mainly explains the automatic generation of database description document (data dictionary). By using SQL to read database tables and field meta information, and then output it to excel document, the function of automatic document generation is completed in Python. If you are still manually generating database description documents, you can try this method, which will greatly increase your efficiency. I hope I can help people in need. If you want to see the complete code, please come to megithubView in address:

According to the ideas of this paper, we can leave several thinking questions for you to think about how to do:

  • Instead of using python, use other languages you are familiar with to implement this function.
  • How to generate only the field information of the specified table or filter the specified table?
  • Database table names usually have prefixes or suffixes. Can they be generated or filtered according to prefixes or suffixes? What should I do?
  • This article is to generate EXCEL documents. What should I do if I need to generate documents in word, HTML, MD, PDF and other formats?

reference material

Previous articles

My official account (search)Mason technical record), for more technical records:

Still generating database documents manually? Three steps are completed automatically. Let's have a look

Recommended Today

Implementation example of go operation etcd

etcdIt is an open-source, distributed key value pair data storage system, which provides shared configuration, service registration and discovery. This paper mainly introduces the installation and use of etcd. Etcdetcd introduction etcdIt is an open source and highly available distributed key value storage system developed with go language, which can be used to configure sharing […]