Internet of things massive device heartbeat registration, off network clearance — multithreading high concurrency mutex landing

Time:2020-9-18

Internet of things massive device heartbeat registration, off network clearance — multithreading high concurrency mutex landing

catalog
  • Internet of things massive device heartbeat registration, off network clearance — multithreading high concurrency mutex landing
  • 1. Application background
  • 2. Overall framework
    • 2.1. Heartbeat registration framework
      • 2.1.1. Mass equipment
      • 2.1.2. Heartbeat report handler process
    • 2.2. Off grid cleaning framework
      • 2.2.1. Method of clearing off network equipment by activating dictionary table
      • 2.2.2. Flow chart of off grid cleaning
  • 3. Multithreading and high concurrency
    • 3.1. Multithreading description
    • 3.2. High concurrency description
  • 4. Abnormal phenomena caused by high concurrency of multithreading
    • 4.1. Null reference
    • 4.2. Unsuccessful assignment of elements in dictionary
    • 4.3. The total number of equipment is incorrect
  • 5. Analysis of abnormal causes
    • 5.1. Causes of null references
    • 5.2. Reasons for unsuccessful device IP assignment
    • 5.3. Reasons for incorrect statistics of total number of equipment
  • 6. Solutions
  • 7. Code implementation
  • 8. Summary

1. Application background

In the application scenario of the Internet of things, it is necessary to maintain the connection of many devices, such as the long connection based on TCP socket communication, in order to obtain the information collected by the device and reverse control the digital switch or analog quantity of the device. We put these long TCP connections into the concurrent dictionary activation dictionary table based on thread safety, with IP address as key and device box domain model as value. We need to maintain the dictionary table of the activated device box. We need to clear up the activation dictionary table and write it into the off network alarm dictionary table for devices that have no heartbeat after timeout. When the off network equipment has a heartbeat next time, it can be moved into the active dictionary again to generate a recovery alarm and carry out a series of other actions.

2. Overall framework

2.1. Heartbeat registration framework

2.1.1. Mass equipment

Because we want to simulate the TCP scenario of massive devices, we use the simulator to generate 12000 simulated devices. Eight real devices.

2.1.2. Heartbeat report handler process

Detailed heartbeat reporting process is shown in the above frame chart

  • Establish a long TCP connection for the first time, and report the heartbeat message;
  • Socket cache will first deal with the sticky packets in TCP. For specific methods, please refer to this blog post about TCP sticky packet processing and its solutions — Based on NewLife.Net Pipeline frame length gluing method for Network Library
  • Then, the e event in onReceive will be triggered to pass in the message after gluing;
  • Judge the validity of the package, because this aspect is relatively simple. You can write a class according to different protocols to handle it, and it will not be expanded here;
  • For the CRC judgment of package payload, please refer to the performance comparison of three CRC16 verification methods based on Modbus in this blog;
  • Packet type resolution (here specifically refers to the resolution of heartbeat packets);
  • Heartbeat packet analysis, see these two blogs for details. Simple explanation of C ා structure — structure of encapsulating Ethernet heartbeat packet as an example, performance comparison test of class and structure — Taking encapsulated network heartbeat packet as an example
  • Finally, the device will add an activation dictionary table (the first heartbeat) or refresh the heartbeat time (not the first heartbeat) in the activation dictionary table.

Suddenly found that I can write a collection system series of Internet of things, organize a directory. I hope I can stick to it.

2.2. Off grid cleaning framework

2.2.1. Method of clearing off network equipment by activating dictionary table

The principle is very simple, traverse the dictionary table beyond the set detection cycle, filter it into the IEnumerable of a dictionary, and then delete the corresponding timeout key (in this case, IP address) in the active dictionary table. Of course here_ Internal cycles can be * n, multiple cycles, which can be set in the configuration file by yourself. The configuration file is as follows:

"ipboxNumStaticInternal": 12
public static void DeleteDeadBoxFromActiveBox(in _internal)
    {
        {
            var outTime = DateTime.Now.AddSeconds(-_internal);
            var iboxTimeOutList = iboxActiveDictionary.Where(q => (outTime > q.Value.UpdateTime));//.Select(x=> iboxActiveDictionary[x.Key]) ;
            foreach (var item in iboxTimeOutList)
            {
                iboxActiveDictionary.Remove(item.Key);
            }               
        }
    }

2.2.2. Flow chart of off grid cleaning

Here, a system timer is opened to call the method of clearing the off network devices. The call interval is ipboxnumstaticinternal. The code is as follows:

public void systemTimerStart()
    {
        var interval = ReadTheInternalFromSetting();
        _systemTimer = new Timer(state =>
        {               
            IBoxActiveDicManager.DeleteDeadBoxFromActiveBo(_internal);
            Console.WriteLine ("{1}, number of activated devices: {0} n ', IBoxActiveDicManager.iboxActiveDictionary.Count , DateTime.Now );
        }, null, interval, interval);
        Console.WriteLine ("pemscom acquisition system clock is on");
        LoggerHelper.Info ("pemscom acquisition system clock is on");
    }

    /// 
    ///Configuration file read in interval method
    /// 
    /// 
    private int ReadTheInternalFromSetting()
    {
        _internal = int.Parse(Appsettings.app(new string[] {"ipboxNumStaticInternal" }));
        Console.WriteLine ("the clock configuration parameters of pemscom acquisition system have been read");
        LoggerHelper.Info ("the clock configuration parameters of pemscom acquisition system have been read");
        return Convert.ToInt32(TimeSpan.FromSecond(_internal).TotalMilliseconds);
    }

3. Multithreading and high concurrency

3.1. Multithreading description

There will be many threads for the CPU to round chip execution, such as:

  • 12008 receive event trigger threads;
  • Clear the off network device threads regularly;
  • The main thread monitors the command line input and executes the corresponding command;

Take a practical example, as shown in the picture

For 12008 devices, the peak value of receiving network packets per second is 9218 packets. That is, in a certain second, the CPU runs 9218 threads in total. For example, if it is a dual core 4-threaded system, 9218 / 4 = 2304.5. That is, the CPU executed 2305 times in 1 second. In other words, it is executed once in 0.43 Ms.

3.2. High concurrency description

In fact, 3.1 has explained high concurrency. In a given second, nearly 10000 receive events need to be processed. The execution order at this moment is out of order. With so many threads in 9218, we don’t know which one is executed first and which is executed later. If you do not think to add some logic control, such as the mutex we are going to introduce today, there will be some exceptions.

4. Abnormal phenomena caused by high concurrency of multithreading

Here only describe the phenomenon, the reason will be in the following 5. Analysis of abnormal causes to make a specific description.

4.1. Null reference

Exception location: the heartbeat processing class is as follows.

public class HeartHandler
    {
        static string _deviceIndex = Appsettings.app(new string[] { "DeviceIndex" });
        private static IBoxActive iboxActive;
        public static void Register(TcpHeartPacket heartPacket,int sessId)
        {
            UInt32 IP;
            UInt64 mac;
            if (_deviceIndex == "IP")
            {
         
                IP =(UInt32)BitConverter.ToUInt32(heartPacket.IP, 0);
                if (IBoxActiveDicManager.GetBoxActive(IP, out iboxActive) != true)
                {       
                    IBoxActiveDicManager.iboxActiveDictionary.TryAdd(IP, iboxActive);
                    iboxActive.SessID = sessId;
                }
               
            }
            else
            {
                 mac = (UInt64)BitConverter.ToUInt64(heartPacket.Mac, 0);
                if (IBoxActiveDicManager.GetBoxActive(mac, out iboxActive) != true)
                {
                    IBoxActiveDicManager.iboxActiveDictionary.TryAdd(mac, iboxActive);
                    iboxActive.SessID = sessId;
                }
            }

            //Reference type, intelligent pointer, easy to use
            iboxActive.UpdateTime = DateTime.Now;

           
        }
    }

4.2. Unsuccessful assignment of elements in dictionary

/// 
        ///Query whether there are reported equipment boxes in the active equipment box dictionary,
        ///If yes, it returns true, if not, it returns false, and creates a new equipment box model
        /// 
        /// 
        /// 
        /// 
        public static bool GetBoxActive(UInt32 IP, out IBoxActive iboxActive)
        {
 
            if (iboxActiveDictionary.TryGetValue(IP, outiboxActive))
            {
                return true;
            }
            
            iboxActive = new IBoxActive();
           
            iboxActive.IP = IP;

            if (iboxActive.IP != IP)
            {
                LoggerHelper.Error ( string.Format ("instantiation assignment failed iboxActive.IP :{0};IP{1}",  iboxActive.IP , IP));
            }

            return false;
        }

Is it strange that the previous sentence has been assigned a value, and the comparison of the next sentence is not equal. However, this is possible in multithreading concurrency, which will be analyzed in detail below.

4.3. The total number of equipment is incorrect

Since 12008 units are prone to errors when they are running at large concurrency, they are changed to 1000. There are errors in the following statistics, which are also caused by high concurrency of multithreading.

5. Analysis of abnormal causes

5.1. Causes of null references

In fact, the three reasons in the fourth part are all caused by the same reason, so it will be elaborated in detail in 5.1, and only briefly described in 5.2 and 5.3. Here knock down the blackboard to analyze the abnormal problem of high concurrency of multithreading. The characteristic of program running is to plug in when there is a crack. Just like an old driver, it can be summarized as the disorder between threads. For example, when our device heartbeat thread is updating the device heartbeat time. The device will be cleaned up by the off network cleaning thread. As a result, time cannot be assigned to an empty object (which has been cleaned up by the offline thread). Therefore, it can only report null reference exception. Yes, it is so simple that it takes me a long time to debug and think about this exception.

5.2. Reasons for unsuccessful device IP assignment

Similarly, after the device instance is created, the IP assignment is completed, and the device is cleared just after the device is disconnected. When comparing, the original address is referenced. The original address of the dictionary already contains the IP of other device boxes, so the IP addresses are not equal.

5.3. Reasons for incorrect statistics of total number of equipment

The reason is actually caused by 5.2, unable to successfully register, of course, the number is wrong.

6. Solutions

When I create an active device instance (the first heartbeat registration) or update the heartbeat time (not the first registration), do not let the disordered off network cleanup thread run. Knock on the blackboard: it is to ensure the atomicity of the heartbeat processing registration process. Yes, in fact, this is very similar to the transaction of relational database, atomicity. Atomicity is a powerful weapon against program disorder. We can add a mutex on the registered heartbeat processing method to let the compiler arrange more reasonable execution order with the runtime.

7. Code implementation

The code is simple.

//Define a lock
    public static Mutex activeIpboxDicMutex = new Mutex();
    //The device box is registered and locked. All anomalies are eliminated
    IBoxActiveDicManager.activeIpboxDicMWaitOne();
    HeartHandler.Register(tcpHeartPacsessionId);
    IBoxActiveDicManager.activeIpboxDicMReleaseMutex();

Here we insert the use of transactions, which is very similar. We add our main business to the middle, so that we can understand and remember by analogy. It’s like a sandwich.

unitOfWork.BeginTransaction();

            // Adds new device
            unitOfWork.DeviceRepository.Add(device);

            // Commit transaction
            unitOfWork.Commit();

Of course, you can also lock the device box off-line cleaning thread.

IBoxActiveDicManager.activeIpboxDicMutex.WaitOne();
     IBoxActiveDicManager.DeleteDeadBoxFromActiveBox(_internal);
     IBoxActiveDicManager.activeIpboxDicMutex.ReleaseMutex();

Considering that the off network clearing thread will lose some performance, I also tested the case of removing the lock, and there will not be the third exception in the fourth. So far, all the problems have been solved.

8. Summary

  • The number of simulation devices is small, so we can not detect this problem. So we can see the importance of massive devices, because the above three problems will certainly appear in the reality, and they are all very serious and fatal problems. Good testing methods can kill problems in the cradle;

  • When multithreading is highly concurrent, it is easy to have such and such exceptions. We should think in awe and solve problems;


Copyright notice: This article is the original article of the blogger. It is in accordance with the CC 4.0 by-sa copyright agreement. Please attach the link of the original source and this notice to reprint.

Link to this article: https://www.cnblogs.com/JerryMouseLi/p/12709048.html