Automatic generation of database documents can be completed in three simple steps. Learn about it.
In daily work, everyone should encounter the need to export database description documents (also known as data dictionaries), that is, sort the field information of each data table into table descriptions one by one, and then save them in document formats such as Excel / word / HTML / MD. Many small partners are still using the original manual method to copy and paste the field descriptions (name, type, length, comments, etc.) of the database. I have to say that this method is too inefficient. As a programmer, the problem that can be solved by programming is not a problem. The method described below is very simple and requires only 3 steps. This paper will use Python to encode these three steps, and output the data sheet information description to excel document. Therefore, it mainly includes the following contents:
- 3-step instructions for generating database documents
- Get database table meta information
- Get meta information of data table columns
- Generate database description excel document
- (optional) set excel document format
2 3 steps of generating database description document
Since the database will store corresponding metadata information (that is, information describing the database, data table and data field itself, such as table name, field table, type, etc.), generally speaking, the idea of generating database description document is very simple, which is divided into three steps:
- 1) According to the database name, obtain the data table meta information from the database, mainly including table name, table comments, etc
- 2) According to the data table name, obtain the meta information of the data field, mainly including field name, field type, whether it can be empty, field comment, etc
- 3) Generate documents based on metadata information
According to this idea, these three steps can be encoded to automatically generate documents. To obtain metadata information, various databases will have different query statements. Specifically, you can query relevant official documents. The following is a brief list of MySQL and Oracle:
#MySQL query table information and field information SELECT * FROM information_schema.`TABLES` WHERE TABLE_SCHEMA = %db_name% SELECT * FROM information_schema.`COLUMNS` WHERE TABLE_SCHEMA = %db_name% AND TABLE_NAME = %table_name% #Oracle query table information and field information SELECT * FROM all_tables WHERE where owner= %db_name% SELECT * FROM all_COL_COMMENTS WHERE owner = %db_name% and TABLE_NAME=%table_name%
The implementation method can also be implemented according to everyone’s favorite programming language. In this paper, taking MySQL as an example, we use python programming to output data information to EXCEL documents (for specific excel operations, please refer to my last article)《Python processing Excel files》）。 The output effect is as follows:
Now follow me to realize it.
3 get database table meta information
3.1 basic usage of client pymysql
Use Python to read and write MySQL usingpymysql, readers can access itOfficial documentsUnderstand its installation and use. To put it simply, the following steps are required to read data from the database:
- Connect to database:
- Get cursor for reading data:
- Execute SQL statement to obtain data:
- Close cursor and connection:
- Get worksheet in use:
Therefore, we are in class initialization（
__del__）Database connection and closing operations are performed. The code is as follows:
def __init__(self, host, port, user, password, db_name, charset): #Initialize database operation self.db = pymysql.connect(host=host, port=port, user=user,password=password, database=db_name, charset=charset) self.cursor = self.db.cursor() def __del__(self): #Close database connection self.db.close() self.cursor.close()
3.2 obtaining database table meta information
According to the above basic operations of pymysql, you only need to execute SQL to query the meta information of the data table. As mentioned earlier, the SQL statement for querying table meta information in MySQL is
SELECT * FROM information_schema.TABLES WHERE TABLE_SCHEMA = %db_name%, and we only need the table name and table comments. Therefore, the implementation is as follows:
def get_table_info(self, db_name): #Get data table information sql = '''SELECT table_schema, table_name, table_comment FROM information_schema.`TABLES` WHERE TABLE_SCHEMA = %s order by table_name''' params = [db_name] #Query data self.cursor.execute(sql, params) return self.cursor.fetchall()
This function functions: pass in the database name and return all table information.
4 get meta information of data table fields
After obtaining the table information, similarly, you need to traverse each table and obtain the field information of each table according to the table name. As mentioned earlier, the fields of the MySQL get table are query tables
information_schema.COLUMNSFor field information, we mainly focus on the field name, field type, whether it is allowed to be empty, comment description of the field, etc. The code is as follows:
def get_table_column_info(self, database_name, table_name): #Get data table column information params = [database_name, table_name] sql = '''SELECT TABLE_ Schema as' Library name ', table_ Name as' table name ', COLUMN_ Name as' column name ', official_ Position as' arrangement order of columns', COLUMN_ Default as' default ', is_ Nullable as' is empty ', DATA_ Type as' data type ', character_ MAXIMUM_ Length as' maximum length of characters', NUMERIC_ Precision as' numerical precision (maximum digits) ", numeric_ Scale as' decimal precision ', COLUMN_ Type as' column type ', column_ Comment as' comment ' FROM information_schema.`COLUMNS` WHERE TABLE_SCHEMA = %s AND TABLE_NAME = %s ORDER BY TABLE_NAME, ORDINAL_POSITION''' #Query data self.cursor.execute(sql, params) return self.cursor.fetchall()
This function can obtain the field information of this table according to the database name and table name.
5 generate excel document
5.1 output table field information to excel document
For Excel operation, we use openpyxl to read and write. For specific excel operation, please refer to my last article《Python processing Excel files》。 The function we need to realize now is to write the field information of each table into excel table in the form of table, and output it according to field name, allowed to be empty, field type and field description.
Another point is that we often have some public fields in the process of designing tables, such as ID, creation time, creator, modification time, modifier, etc. we can choose to filter them out when exporting dictionaries. Therefore, the following code is used for implementation:
def create_file(self, file_path): #Get the file. If the file does not exist, it will be created. If it does exist, it will be deleted and recreated if os.path.exists(file_path): os.remove(file_path) wb = Workbook() wb.save(file_path) def save_column_info_to_excel(self, table_name, table_comment, column_info, file_path, col_names_skip): #Write table information to excel file workbook = openpyxl.load_workbook(file_path) #Get by subscript (subscript starts from 0) sheet = workbook.worksheets row_data = [table_name] if table_comment: row_data = [table_name + "(" + table_comment + ")"] sheet.append(row_data) rurrent_max_row = sheet.max_row #Blank line separation sheet.insert_rows(rurrent_max_row) #Listing col_ name_ Data = ["field name", "null allowed", "type", "field description"] sheet.append(col_name_data) for row in column_info: #To be filtered if col_names_skip and row.lower() in col_names_skip: Print ("#" * 10, "skip this field:", row ) continue print(row + "," + row + "," + row + "," + row) row_data = [row, row, row, row] sheet.append(row_data) #Save document workbook.save(file_path)
There are two functions here,
create_fileThe main function is to create documents. If documents exist, they will be deleted first.
save_column_info_to_excelThe function is to use the for statement to output to excel by line according to the table field information and the field name to be filtered. During the output process, if there are fields to be filtered, skip it, and finally save the document to the specified path.
5.2 connect various functions
Previously, three functions have been realized: database table meta information acquisition, data table field meta information acquisition and field information output to excel document. Now connect these three functions to form a complete database document export function. The idea is to traverse the generated table meta information (get)_ table_ Info) to obtain the field information (Gen) of the table according to the table element information_ table_ column_ Info), and then output the excel document (save)_ column_ info_ to_ Excel), as follows:
def gen_db_table_info_skip_col(self, db_name, file_path, col_names_skip): #Filter the specified column and export the data table information to the document table_info_rows = self.get_table_info(db_name) for table_row in table_info_rows: Print ("\ n", "*" * 10, "generate table information:", table_ row) self.gen_table_column_info(table_row, file_path, col_names_skip) def gen_table_column_info(self, table_info_row, file_path, col_names_skip=None): #Export field information table to document database_name = table_info_row table_name = table_info_row table_comment = table_info_row #Get table information from database column_info = self.get_table_column_info(database_name, table_name) #Write excel file self.save_column_info_to_excel(table_name, table_comment, column_info, file_path, col_names_skip)
There are two functions here,
gen_db_table_info_skip_colThe function is to export the table element information according to the database name, file saving path and field name to be filtered, and then use the for statement to traverse.
gen_table_column_infoIt is based on the table information and the fields to be filtered. First read the table field information, and then write it to the excel document. Note here
col_names_skipThe default value is none, that is, if you do not need to filter, do not enter this parameter. So far, our function of automatically generating database documents has been completed. stay
__main__Take a look at the output during execution:
if __name__ == '__main__': #Output document address excel_path = "E:/pythontest/test_tableinfo.xlsx" #Database connection information host = "localhost" port = 3306 user = "root" password = "123456" db_name = "test" charset = 'utf8' #Fields to be filtered col_names_to_skip = ["id", "sys_create_time", "sys_create_user", "sys_update_time", "sys_update_user", "record_version"] #Initialize classes, create files, and generate database documentation dbInfoGenerator = DbInfoGenerator(host, port, user, password, db_name, charset) dbInfoGenerator.create_file(excel_path) dbInfoGenerator.gen_db_table_info_skip_col(db_name, excel_path, col_names_to_skip)
The results are as follows:
The field description of the table has been output to the excel document, and the corresponding fields have been filtered. However, the format is not very good-looking. Therefore, if necessary, you can use openpyxl to set the format of EXCEL documents.
6 (optional) set excel document format
If you need to format EXCEL documents, the following is one of my basic format settings. If necessary, you can refer to it and make your own document format. The main ideas of format setting are as follows:
- Sets the width of each column
- Traverse each row of Excel table. If it is a table name, it will be merged into one row as the header. Set the header format to bold, black border, center alignment and fill background color.
- If the table field contains information, you can set a black border.
def set_file_format(self, file_path): #Format table if not os.path.exists(file_path): Print ("the file does not exist and will not be processed") return workbook = openpyxl.load_workbook(file_path) sheet = workbook.worksheets #Set the width of each column sheet.column_dimensions["A"].width = 16 sheet.column_dimensions["B"].width = 10 sheet.column_dimensions["C"].width = 20 sheet.column_dimensions["D"].width = 40 #Format table name max_row = sheet.max_row for i in range(1, max_row + 1): col1_value = sheet.cell(i, 1).value col2_value = sheet.cell(i, 2).value #If the first column has data and the second column has no data, it is the table name if col1_value and not col2_value: #Consolidated table name sheet.merge_cells(start_row=i, start_column=1, end_row=i, end_column=4) #Bold font Font = font (name = "Microsoft YaHei", size = 12, bold = true, italic = false, color = "000000") #Black border side_style = Side(style="thin", color="000000") border = Border(left=side_style, right=side_style, top=side_style, bottom=side_style) #Center alignment cell_alignment = Alignment(horizontal="center", vertical="center", wrap_text=True) #Fill background color p_fill = PatternFill(fill_type="solid", fgColor="BFBFBF") #Table name cell format for j in range(1, 5): sheet.cell(i, j).font = font sheet.cell(i, j).border = border sheet.cell(i, j).alignment = cell_alignment sheet.cell(i, j).fill = p_fill #If both the first and second columns have data, it is the table content if col1_value and col2_value: #Black border side_style = Side(style="thin", color="000000") border = Border(left=side_style, right=side_style, top=side_style, bottom=side_style) #Table name cell format for j in range(1, 5): sheet.cell(i, j).border = border #Save document workbook.save(file_path)
After generating the database specification document, this function can be called to modify its document format. The results are as follows:
This paper mainly explains the automatic generation of database description document (data dictionary). By using SQL to read database tables and field meta information, and then output it to excel document, the function of automatic document generation is completed in Python. If you are still manually generating database description documents, you can try this method, which will greatly increase your efficiency. I hope I can help people in need. If you want to see the complete code, please come to megithubView in address:
According to the ideas of this paper, we can leave several thinking questions for you to think about how to do:
- Instead of using python, use other languages you are familiar with to implement this function.
- How to generate only the field information of the specified table or filter the specified table?
- Database table names usually have prefixes or suffixes. Can they be generated or filtered according to prefixes or suffixes? What should I do?
- This article is to generate EXCEL documents. What should I do if I need to generate documents in word, HTML, MD, PDF and other formats?
- Openpyxl official documentation:
- Pymysql official document:
- Python processing Excel files
- Basic operations of Python – files, directories and paths
- Distributed deployment of Minio
- Easily build static resource services with Minio
- Handle springboot multiple data sources (3): parameterized change source
- Handle springboot multiple data sources (2): dynamic data sources
- Handle springboot multiple data sources (1): multiple source strategies
- Necessary knowledge for java development: dynamic agent
- 2019 recommended good books
My official account (search)
Mason technical record）, for more technical records: