Analysis and Solution of Sql Server Deadlock Monitoring

Time:2019-8-20

1 Background

1.1 Alarm

Recently, I’ve been sorting out my notes and intending to migrate them all to EVERNOTE. Tidy up to the lock part, there happens to be a case recorded by myself, reorganize and share it to you.

One day at noon, received an alarm message, DB deadlock abnormal, one minute deadlock 120.

The deadlock XML file is as follows:

<deadlock-list>
<deadlock victim="process810b00cf8">
<process-list>
<process task priority = "0" logused = "0" logused= "0" waitresource = "RID: 13:1:1541136:62" waittime = "RID: 13:13:1:1:1541136:62" waittime = "7682" ownerId = "7682" ownerownerId = "3396587959" transactionname = "UPDATE" lasttarted = "2016-01-08T12:03:03:51.067:51.067" XDES = "0xa99746d08" locklockmode= "U" lockmode duld= "41" kpid= "17308" status status status = "suspesuspesuspesuspended" spid= "status status" spispispid= "spispid status status="0" "Trancount=" 2 "lastbatch start=" 2016-01-08T12:03:51.067 "lastbatch completed=" 2016-01-08T12:03:51.067 "lastattention=" 1900-01-01T00:00.067 "clientapp=" Microsoft SQL Server Management Studio - Query "hostname=" test-server "hostpid=" 1433 "loginname=" level "isolation xin=" read committed (2), "xa attention=" Ctid= "3396587959" current db= "13" lockTimeout= "4294967295" clientoption1= "671098976" clientoption2= "390200">
<executionStack>
<frame procname="adhoc" line="7" stmtstart="214" stmtend="484" sqlhandle="0x020000003acf4f010561e479685209fb09a7fd15239977c60000000000000000000000000000000000000000">
UPDATE FinanceReceiptNoRule SET [email protected],ISRUNNING='0',LastWriteTime=GETDATE() WHERE IsRunning='1' AND [email protected] </frame>
</executionStack>
<inputbuf>
declare @SeqCode varchar(60)
declare @ReturnNum bigint
set @SeqCode='CGJS20160106'
while(1=1)
begin
UPDATE FinanceReceiptNoRule SET [email protected],ISRUNNING='0',LastWriteTime=GETDATE() WHERE IsRunning='1' AND [email protected]
end </inputbuf>
</process>
<process task priority= "0" logused= "248" waitresource= "KEY: 13:72057594040090624" (b3ade7c5980c)"waittime="4"ownerId="3396522828"transactionname="user_transaction"lasttranstarted="2016-01-08T12:03:05.310"XDES="0x18c1db63a8"lockmode="U"scheerid="57"scheduled="16448"status="suspended"status="161" Sbid = "0" ECID = "0" priority = "0" trancount = "2" lastbatch start = "2016-01-08T12:03:58.737" lastbatch complete = "2016-01-08T12:03:33.847" lastattention = "2016-01-08T12:03:33.847" lastattention = "2016-01-08T12:03:33.850" clientapp = "Microsoft SQL Server Management Studio" hosting = "test-server" PID = "1433" loginname "xinysu isolation" name Nlevel= "read committed (2)" xactid= "3396522828" current db= "13" lockTimeout= "4294967295" clientoption1= "671090784" clientoption2= "390200">
<executionStack>
<frame procname="adhoc" line="6" stmtstart="210" stmtend="400" sqlhandle="0x020000001b4f23368af7bba99098c10dec46585804f1b4ce0000000000000000000000000000000000000000">
Update dbo.FinanceReceiptNoRule Set [IsRunning]='1' where [email protected] and IsRunning='0' </frame>
</executionStack>
<inputbuf>
declare @SeqCode varchar(60)
declare @ReturnNum bigint
set @SeqCode='CGJS20160106'
while(1=1)
begin
Update dbo.FinanceReceiptNoRule Set [IsRunning]='1' where [email protected] and IsRunning='0' 
end
</inputbuf>
</process>
</process-list>
<resource-list>
<ridlock fileid="1" pageid="1541136" dbid="13" objectname="fin_test.dbo.FinanceReceiptNoRule" mode="X" associatedObjectId="72057594040025088">
<owner-list>
<owner mode="X" />
</owner-list>
<waiter-list>
<waiter mode="U" requestType="wait" />
</waiter-list>
</ridlock>
<keylock hobtid="72057594040090624" dbid="13" objectname="fin_test.dbo.FinanceReceiptNoRule" indexname="PK_FINANCERECEIPTNORULE" mode="U" associatedObjectId="72057594040090624">
<owner-list>
<owner mode="U" />
</owner-list>
<waiter-list>
<waiter mode="U" requestType="wait" />
</waiter-list>
</keylock>
</resource-list>
</deadlock>
</deadlock-list>

The table structure and simulation data are as follows:

- Relevant tables:
CREATE TABLE [dbo].[FinanceReceiptNoRule](
[SeqCode] [varchar](60) NOT NULL,
[NowSeqValue] [bigint] NULL,
[SeqDate] [varchar](14) NOT NULL,
[IsRunning] [varchar](1) NULL,
[LastWriteTime] [datetime] NULL,
[Prefix] [varchar](4) NULL
) ON [PRIMARY]
GO
Data simulation
INSERT [dbo].[FinanceReceiptNoRule] ([SeqCode], [NowSeqValue], [SeqDate], [IsRunning], [LastWriteTime], [Prefix]) VALUES (N'TEST20150108', 1469, N'20150108', N'0', CAST(N'2015-01-08 05:05:49.163' AS DateTime), N'TEST')
GO
INSERT [dbo].[FinanceReceiptNoRule] ([SeqCode], [NowSeqValue], [SeqDate], [IsRunning], [LastWriteTime], [Prefix]) VALUES (N'TEST20150109', 1377, N'20150109', N'0', CAST(N'2015-01-09 04:50:26.610' AS DateTime), N'TEST')
GO
 
ALTER TABLE [dbo].[FinanceReceiptNoRule] ADD CONSTRAINT [pk_FinanceReceiptNoRule] PRIMARY KEY NONCLUSTERED 
(
[SeqCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

1.2 How to monitor

There are many ways to capture deadlocks. Here we introduce two ways: the SQL SERVER Profiler tool and Extended Events. Profiler is relatively resource-intensive, but because it only monitors deadlock, its performance impact is not great, and its visual interface is easy to use; Extended Events consumes less resources, and records the penultimate deadlock in real time, and needs SQL statement to analyze the query record file.

How to use Profiler monitoring?

Open SSMS, click Tools, and select SQL Server Profiler, as shown below.

Log in to the DB instance that needs to be monitored and fill in the corresponding tracking attributes, starting with the < General > page, as shown below. Here we pay attention to two aspects: first, choose the < TSQL-Locks > template, which can be used to monitor deadlocks, also can be used to observe the lock application and release situation, very detailed, you can use SELECT UPDATE DELETE and other statements to see the lock application and release situation; second, monitoring results storage, recommendations can be used. It can be stored in a table to facilitate periodic analysis and statistics.

Then fill in the < Event Selection > item, just select < deadlock graph > Events, no other ticks are needed, and finally click on the run to start monitoring.

A common example for ten thousand years can be used to check whether the monitoring is normal, open three query windows, and execute in the following order will result in resource occupation and application mutually exclusive deadlock. When the fifth step is completed, the deadlock will occur while waiting for 1-3 seconds. The script is provided as follows:

--session 1
CREATE TABLE Test_DL(
id int not null primary key ,
name varchar(100));

INSERT INTO Test_DL(id,name) select 1,'a';
INSERT INTO Test_DL(id,name) select 2,'b';

--session2 2 2 2 2 2 2 2 2 2 
BEGIN TRANSACTION
UPDATE Test_DL SET Name='a-test' WHERE ID=1

--session3 3 3 3 3 3 3 3 3 3 
BEGIN TRANSACTION
UPDATE Test_DL SET Name='b-test' WHERE ID=2

--session2 2 2 2 2 2 2 2 2 2 
 SELECT * FROM Test_DL WHERE ID=2

--session3 3 3 3 3 3 3 3 3 3
 SELECT * FROM Test_DL WHERE ID=1

Simulated deadlock SQL

The deadlock Interface monitored is as follows:

 

How to use Extended Events for monitoring?

The script for establishing extended event monitoring is as follows: (Extended event is very good, the 2012 version supports visual operation, and you can learn about it on MSDN if you are interested: https://msdn.microsoft.com/zh-cn/library/bb630282.aspx, this article will not analyze the grammar and other knowledge points)


CREATE EVENT SESSION [DeadLock] ON SERVER 
ADD EVENT sqlserver.xml_deadlock_report 
ADD TARGET package0.event_file(SET filename=N'F:\events\deadlock\deadlock.xel',max_file_size=(20)),
ADD TARGET package0.ring_buffer(SET max_events_limit=(100),max_memory=(10240),occurrence_number=(50))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO

Query SQL as follows, here we need to note: query is based on buffer or file analysis, the number of buffer storage is generally limited, such as the above we only allocated 4M storage, file analysis is complete, but it depends on the number of retained files. Here we give buffer’s query SQL as follows. File’s query is interesting and can be written down by hand.


DECLARE @deadlock_xml XML
SELECT @deadlock_xml=(
      SELECT 
        ( 
        SELECT
          CONVERT(XML, target_data)
        FROM sys.dm_xe_session_targets st
        JOIN sys.dm_xe_sessions s ON s.address = st.event_session_address
        WHERE s.name = 'deadlock' AND st.target_name = 'ring_buffer'
        ) AS [x]
      FOR XML PATH('') , TYPE
      )

SELECT 
dateadd(hour,+6,tb.col.value('@timestamp[1]','varchar(max)')) TimePoint,
tb.col.value('(data/value/deadlock/process-list/process/executionStack/frame)[1]','VARCHAR(MAX)') statement_parameter_k,
tb.col.value('(data/value/deadlock/process-list/process/executionStack/frame)[2]','VARCHAR(MAX)') statement_k,
tb.col.value('(data/value/deadlock/process-list/process/executionStack/frame)[3]','VARCHAR(MAX)') statement_parameter,
tb.col.value('(data/value/deadlock/process-list/process/executionStack/frame)[4]','VARCHAR(MAX)') [statement],
tb.col.value('(data/value/deadlock/process-list/process/@waitresource)[1]','VARCHAR(MAX)') waitresource_k,
tb.col.value('(data/value/deadlock/process-list/process/@waitresource)[2]','VARCHAR(MAX)') waitresource,
tb.col.value('(data/value/deadlock/process-list/process/@isolationlevel)[1]','VARCHAR(MAX)') isolationlevel_k,
tb.col.value('(data/value/deadlock/process-list/process/@isolationlevel)[2]','VARCHAR(MAX)') isolationlevel,
tb.col.value('(data/value/deadlock/process-list/process/@waittime)[1]','VARCHAR(MAX)') waittime_k,
tb.col.value('(data/value/deadlock/process-list/process/@waittime)[2]','VARCHAR(MAX)') waittime,
tb.col.value('(data/value/deadlock/process-list/process/@clientapp)[1]','VARCHAR(MAX)') clientapp_k,
tb.col.value('(data/value/deadlock/process-list/process/@clientapp)[2]','VARCHAR(MAX)') clientapp,
tb.col.value('(data/value/deadlock/process-list/process/@hostname)[1]','VARCHAR(MAX)') hostname_k,
tb.col.value('(data/value/deadlock/process-list/process/@hostname)[2]','VARCHAR(MAX)') hostname
FROM @deadlock_xml.nodes('//event') as tb(col)

This SQL can query very detailed resource contention situation, if you want to effectively use extended events, we recommend that you take a detailed look at the XML grammar of the official website (SQL SERVER support for XML is also great, looking forward to JSON support in the 2016 edition)

 

Is it very clear, clear at a glance, with this you can go to analysis pull!

2 Analysis

Depending on the content of the XML file or the monitoring content of the extended event, you can organize the following information (the deadlock analysis at the beginning):

 

Look at the execution plans for transaction 1 and transaction 2 as follows:

 

Combining tables and execution plans, deadlock processes can be roughly inferred:

Session 1:

  • Find index_Page of index page where the key value is located according to the primary key SeqCode, find index_key of key hashvalue key line above the page, hold IU lock for index_Page and U lock for index_key;
  • Because the table is a heap table, bookmark lookup is searched by RID, that is, by line identifier, it finds the data page where the row data corresponding to RID is located, and then finds the row data pointing to the slot number on the page, and holds the U lock on the row data.
  • At this time, we have found the row data that need to be updated. We can upgrade the IU lock on the data page Data_Page to the IX lock. The row data pointed by RID is upgraded from the U lock to the X lock. After the upgrade, we release the IU lock and the U lock on the index page and the key value row.
  • At this point, session 1 holds IX locks on Data_Page and X locks on RID rows.

During this process, Session 2 happens to apply for such a lock:

  • Find Lock Resources in Transaction 2According to sys. partitions, you can see that 72057594038910976 is the primary key pk_FinanceReceiptNoRule and the primary key column is SeqCode.
  • According to the primary key SeqCode, find index_Page of index page where the key value is located, find the key line index_key above the page, hold IU locks for index_Page and U locks for index_key.
  • Because the table is a heap table, bookmark lookup is searched by RID, that is, by line identifier, to find the data page of the row data corresponding to RID, and then find the row data of the RID pointing to the slot number on the page, prepare the row data to hold U locks, but find that the row of RID is held by session 1 X locks. To apply for U lock Timeout.
  • Session 2 holds IU locks on Index_Page, U locks on Index_key, and IU locks on Data_Page, requesting U locks on RID rows.

Suppose that at this point, Session 1 performs an update operation (in the same transaction):

Find where the key value is based on the primary key SeqCodeIndex Page Index_Page, find the one on the pageKey rowIndex_key, which holds an IU lock for Index_Page, intends to hold a U lock for Index_key, but finds that Index_key holds a U lock by session 2.

Then a deadlock occurs (see figure below for details):

  • Session 1 holds IX locks on Data_Page, X locks on RID rows, and applies for U locks on Index_key (waiting for Session 2 to release)
  • Session 2 holds IU locks on Index_Page, U locks on Index_key, and IU locks on Data_Page, requesting U locks on RID rows (waiting for Session 1 to release)

 

3. Solution

This deadlock will not occur if RID lookup is removed and data is directly indexed. That is to say, the aggregated index is rebuilt on the primary key and the original non-aggregated index primary key is discarded. Because this excludes the application and holding of U-lock in RID, it keeps X-lock until the end of the transaction, and can modify the data page where the key value is located directly according to the primary key, thus reducing the time of inquiring rows in RID.

The revised implementation plan is as follows:

 

The process of applying for release of the lock is as follows (see screenshot for details):

  • Find index_Page of index page where the key value is located according to the primary key SeqCode, find index_key of key hashvalue key line above the page, hold IU lock for index_Page and U lock for index_key;
  • Since the table is already an aggregated index table and the page where the primary key is located contains row data, the IU lock can be upgraded to IX lock directly for Index_Page, and the U lock for Index_key can be upgraded to X lock, thus avoiding the lock application of RID for row-by-row data.

 

Above is the whole content of this article, I hope that the content of this article can bring some help to everyone’s study or work, but also hope to support more developpaer!