Simple exploration of data internal storage structure of SQL Server

Time:2020-7-13

Databases often need to deal with each other, but never thought about how to store data inside the database.

Today, we will explore how to store data in the database from the following aspects

  1. How to store data in database
  2. How to store index data
  3. Operational data impact on storage
  4. summary

How to store data in database

1. To verify, prepare the data first. Here, create a table and add 3 pieces of data

create table DataTable(Id int identity(1,1), [Name] varchar(50), [Address] varchar(200), CreateTime datetime2)
    
					insert into DataTable
					select 'Wilson','Tianhe District, Guangzhou',GETDATE() union all
					select 'Alice','Chaoyang District of Beijing City',GETDATE() union all
			select 'Key','Panyu District, Guangzhou',GETDATE()

View Code

2. Use DBCC to view page data, database name demo

DBCC TRACEON(2588,3604)            
				DBCC IND(Demo,DataTable,-1)        
		DBCC PAGE(Demo,1,224,1)            

There are too many page contents, and some data are intercepted.

Slot 0, Offset 0x60, Length 43, DumpStyle BYTE
    
				Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
				Record Size = 43                    
				Memory Dump @0x000000557C77A060
    
				0000000000000000:   30001000 01000000 c0bb04c8 7fea400b 04000002  [email protected]
				0000000000000014:   001f002b 0057696c 736f6eb9 e3d6ddca d0ccecba  ...+.Wilson.........
				0000000000000028:   d3c7f8                                        ...    
    
				Slot 1, Offset 0x8b, Length 42, DumpStyle BYTE
    
				Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
				Record Size = 42                    
				Memory Dump @0x000000557C77A08B
    
				0000000000000000:   30001000 02000000 c0bb04c8 7fea400b 04000002  [email protected]
				0000000000000014:   001e002a 00416c69 6365b1b1 bea9cad0 b3afd1f4  ...*.Alice..........
				0000000000000028:   c7f8                                          ..     
    
				Slot 2, Offset 0xb5, Length 40, DumpStyle BYTE
    
				Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
				Record Size = 40                    
				Memory Dump @0x000000557C77A0B5
    
				0000000000000000:   30001000 03000000 c0bb04c8 7fea400b 04000002  [email protected]
				0000000000000014:   001c0028 004b6579 b9e3d6dd cad0b7ac d8aec7f8  ...(.Key............
    
				OFFSET TABLE:
    
				Row - Offset                        
				2 (0x2) - 181 (0xb5)                
				1 (0x1) - 139 (0x8b)                
		0 (0x0) - 96 (0x60)                 

The above is hexadecimal, take the first data to analyze, and then verify with the other two data

Slot 0, Offset 0x60, Length 43, DumpStyle BYTE
				0000000000000000: 30001000 01000000 c0bb04c8 7fea400b 04000002 [email protected]
				0000000000000014: 001f002b 0057696c 736f6eb9 e3d6ddca d0ccecba ...+.Wilson.........
		0000000000000028: d3c7f8

30001000: the flag bit of the field. No official description can be found. At present, it is understood that there is inconsistency between variable field and non variable field. The deleted line will change from 30 to 3C
01000000: This obviously represents ID, which can be inferred from the other two data
C0bb04c8 7fea400b: because the name is at the end of the article, there are just 8 characters. This should be the time datetime2. I wrote a code to convert the hexadecimal to datetime2, and now I will verify it with the code. The code is put out at the end of the article. Because the precision of the time conversion is lost, the two times are not exactly the same.

04000002:04 represents a total of 4 fields, and 02 represents a total of consecutive variable length fields
001f002b: here is the offset of two variable characters
0057696c 736f6eb9 e3d6ddca d0ccecba d3c7f8: this string is saved as a name and address. The first two 00 should be a separator of the length above. The following is also verified with code

At this point, it is also clear how SQL stores data. It does not store data in the order of fields. First, it stores fixed length fields, then inserts delimited symbols, and then stores variable length fields. This may reduce the cost of moving data, which will be discussed later.

How to store index data

So far, the index has not been involved, because the above data is not created an index, which is a heap table.

Index is divided into nonclustered index and clustered index

1. Create a nonclustered index

create index IX_DataTable_Name on DataTable(Name ASC)

1.1 view data page allocation

DBCC IND(Demo,DataTable,-1)        
		DBCC PAGE(Demo,1,280,1)            

1.2 page allocation

Slot 0, Offset 0x60, Length 21, DumpStyle BYTE
    
				Record Type = INDEX_RECORD          Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
				Record Size = 21                    
				Memory Dump @0x00000055721FA060
    
				0000000000000000:   36000100 00010001 00020000 01001500 416c6963  6...............Alic
				0000000000000014:   65                                            e      
				Slot 1, Offset 0x75, Length 22, DumpStyle BYTE
    
					Record Type = INDEX_RECORD          Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
					Record Size = 22                    
					Memory Dump @0x00000055721FA075
    
					0000000000000000:   36000100 00010002 00020000 01001600 4b657920  6...............Key 
					0000000000000014:   4c69                                          Li     
    
					Slot 2, Offset 0x8b, Length 22, DumpStyle BYTE
    
					Record Type = INDEX_RECORD          Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
					Record Size = 22                    
					Memory Dump @0x00000055721FA08B
    
					0000000000000000:   36000100 00010000 00020000 01001600 57696c73  6...............Wils
		0000000000000014:   6f6e                                          on

1.3 page page page analysis

0100 000100010001: This is the row ID, and the escaped one is (1:256:1). Because it is still a heap table, it points to the row ID

00020000: This is also the body start tag

01001500: the length of the index is recorded here

What follows is the index content

You can see the general structure of the index storage

Pointer > index information (length) – > index content

To view the line ID, you can use sys.fn_ PhysLocFormatter(%%physloc%%)

select sys.fn_PhysLocFormatter(%%physloc%%),* from DataTable

2. Create a clustered index

create clustered index IX_DatTaable_Name on DataTable(Name ASC)

2.1 view data page allocation

DBCC IND(Demo,DataTable,-1)        
		DBCC PAGE(Demo,1,234,3)

2.2 page allocation

0000000000000000:   30001000 02000000 20d1565f 0deb400b 05000003  0....... [email protected]
				0000000000000014:   001b0020 002c0041 6c696365 b1b1bea9 cad0b3af  ... .,.Alice........
		0000000000000028:   d1f4c7f8   

2.3 page page page analysis

Here, the data is roughly in the format of line ID + fixed length field + line information + variable length data.

It is worth noting that the number of fields and the number of variable length fields here is 05000003

The reason why there is an extra field here is that the clustered index we added is not specified to be unique. SQL server will automatically add a 4-byte field to ensure that the clustered index is unique.

Operational data impact on storage

1. INSERT

insert into DataTable(Name,Address,CreateTime)
		Select 'Jack', 'Tianhe District, Guangzhou', getdate ()

1.1 view page

Slot 0, Offset 0x60, Length 44, DumpStyle BYTE
    
				Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
				Record Size = 44                    
				Memory Dump @0x0000005CD11FA060
    
				0000000000000000:   30001000 02000000 40f61b57 1aeb400b 05000003  [email protected]@.....
				0000000000000014:   001b0020 002c0041 6c696365 b1b1bea9 cad0b3af  ... .,.Alice........
				0000000000000028:   d1f4c7f8                                      ....   
    
				Slot 1, Offset 0xe3, Length 43, DumpStyle BYTE
    
				Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
				Record Size = 43                    
				Memory Dump @0x0000005CD11FA0E3
    
				0000000000000000:   30001000 04000000 350f285e 1aeb400b 05000003  0.......5.(^[email protected]
				0000000000000014:   001b001f 002b004a 61636bb9 e3d6ddca d0ccecba  .....+.Jack.........
		0000000000000028:   d3c7f8                                        ...    

You can see that data is inserted in the slot after Alice (because the clustered index of this table is in ascending order of name)

2. UPDATE

update DataTable set Address = '8 huangbian Road, Baiyun District, Guangzhou' where Id  = 1

2.1 check PAG, use 2 to view the whole page

0000005CD387A064:   02000000 40f61b57 1aeb400b 05000003 001b0020  [email protected]@........ 
				0000005CD387A078:   002c0041 6c696365 b1b1bea9 cad0b3af d1f4c7f8  .,.Alice............
				0000005CD387A08C:   30001000 03000000 40f61b57 1aeb400b 05000003  [email protected]@.....
				0000005CD387A0A0:   001b001e 002a004b 6579b9e3 d6ddcad0 b7acd8ae  .....*.Key..........
				0000005CD387A0B4:   c7f83000 10000100 000040f6 1b571aeb 400b0500  [email protected]@...
				0000005CD387A0C8:   0003001b 0021002d 0057696c 736f6eb9 e3d6ddca  .....!.-.Wilson.....
				0000005CD387A0DC:   d0b0d7d4 c6c7f830 00100004 00000035 0f285e1a  .......0.......5.(^.
				0000005CD387A0F0:   eb400b05 00000300 1b001f00 2b004a61 636bb9e3  [email protected]+.Jack..
				0000005CD387A104:   d6ddcad0 ccecbad3 c7f83000 10000100 000040f6  [email protected]
				0000005CD387A118:   1b571aeb 400b0500 0003001b 00210036 0057696c  [email protected]!.6.Wil
		0000005CD387A12C:   736f6eb9 e3d6ddca d0b0d7d4 c6c7f8bb c6b1dfc2  son.................

You can see that there are two Wilson records. Because the updated field is longer than the original one, the original place can not be placed. Copy the whole record in the idle place of the current page, and then the offset points to the new address. In fact, this migration happens when the fields are shorter.

So the original place becomes a fragment. In fact, the maintenance of index pages is the same.

3.DELETE

delete DataTable where Id = 2

3.1 check PAG, use 2 to view the whole page

3c001000  ................<...
				0000005CD627A064:   02000000 40f61b57 1aeb400b 05000003 001b0020  [email protected]@........ 
				0000005CD627A078:   002c0041 6c696365 b1b1bea9 cad0b3af d1f4c7f8  .,.Alice............
		0000005CD627A08C:   30001000 03000000 40f61b57 1aeb400b 05000003  [email protected]@.....

You can see that Alice’s record still exists on the page. The four flag bits of the original line header are from 30001000 to 3c001000

summary

In fact, how the database is stored has no impact on the normal development, just boring research.

In fact, it still has some influence. I can think of the following, which may not be accurate and complete.

1. Try to select fields with fixed length, such as status and type, and define int

2. Char is to trade space for time, and the fields that are often updated and relatively short can be considered to define char

3. The extra long varchar field can not be placed in the main table, otherwise a page cannot be saved, and row overflow will occur

4. There are advantages and disadvantages of indexes. We should try our best to use indexes reasonably, especially clustered indexes. We should use them carefully.

Example code: https://github.com/WilsonPan/Net.Demos/tree/master/Demo.SQLTools