Data sharing in multithreading and multiprocessing in Python

Time:2020-11-28

Previously, when writing multithreading and multiprocessing, because they usually complete their own tasks, and there is not much contact between each sub thread or sub process before. If I need to communicate, I will use the queue or database to complete it. But recently, when I wrote some code for multithreading and multiprocessing, I found that if they need to use shared variables Then, there are some points for attention

Data sharing among multithreads

Standard data types are shared between programs

Look at the code below

#coding:utf-8
import threading

def test(name,data):
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))


if __name__ == '__main__':
    d = 5
    Name = "Yang Yanxing"
    for i in range(5):
        th = threading.Thread(target=test,args=(name,d))
        th.start()

Here I create a global int variable D, whose value is 5. When I call the test function in the 5 threads, I pass D as a parameter. Do these 5 threads have the same D? I passed in the test functionid(data)To print their IDs, you get the following results

In thread name is Yang Yanxing
data is 5 id(data) is 1763791776
In thread name is Yang Yanxing
data is 5 id(data) is 1763791776
In thread name is Yang Yanxing
data is 5 id(data) is 1763791776
In thread name is Yang Yanxing
data is 5 id(data) is 1763791776
In thread name is Yang Yanxing
data is 5 id(data) is 1763791776

From the results, we can see that the ID of data in the five sub threads is 1763791776, which indicates that the variable D is created in the main thread and can be shared in the sub thread. Changes to the shared elements in the sub thread will affect other threads. Therefore, if you want to modify the shared variable, that is, the thread is unsafe and needs to be locked.

Custom type object sharing between programs

What if we want to customize a class and pass an object as a variable in a child thread? What is the effect?

#coding:utf-8
import threading

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

def test(name,data):
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data.get(),id(data)))


if __name__ == '__main__':
    d = Data(10)
    Name = "Yang Yanxing"
    print("in main thread id(data) is {}".format(id(d)))
    for i in range(5):
        th = threading.Thread(target=test,args=(name,d))
        th.start()

Here, I define a simple class. The main thread initializes an object D of this type, and then passes it as a parameter to the child thread. The main thread and the child thread print the ID of the object respectively. Let’s see the result

in main thread id(data) is 2849240813864
In thread name is Yang Yanxing
data is 10 id(data) is 2849240813864
In thread name is Yang Yanxing
data is 10 id(data) is 2849240813864
In thread name is Yang Yanxing
data is 10 id(data) is 2849240813864
In thread name is Yang Yanxing
data is 10 id(data) is 2849240813864
In thread name is Yang Yanxing
data is 10 id(data) is 2849240813864

We can see that the ID of this object is the same in the main thread and the child thread, indicating that they are using the same object.

Whether it is a standard data type or a complex custom data type, they share the same data type among multiple threads, but is this the case in multi process?

Shared data between multiple processes

Standard data types are shared between processes

Again, let’s take a look at the sharing between sub processes of variables of type int

#coding:utf-8
import threading
import multiprocessing

def test(name,data):
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))


if __name__ == '__main__':
    d = 10
    Name = "Yang Yanxing"
    print("in main thread id(data) is {}".format(id(d)))
    for i in range(5):
        pro = multiprocessing.Process(target=test,args=(name,d))
        pro.start()

The result is that

in main thread id(data) is 1763791936
In thread name is Yang Yanxing
data is 10 id(data) is 1763791936
In thread name is Yang Yanxing
data is 10 id(data) is 1763791936
In thread name is Yang Yanxing
data is 10 id(data) is 1763791936
In thread name is Yang Yanxing
data is 10 id(data) is 1763791936
In thread name is Yang Yanxing
data is 10 id(data) is 1763791936

You can see that they have the same ID, indicating the same variable, but when I tried to change d from int to string, I found that they were not the same

if __name__ == '__main__':
    d = 'yangyanxing'
    Name = "Yang Yanxing"
    print("in main thread id(data) is {}".format(id(d)))
    for i in range(5):
        pro = multiprocessing.Process(target=test,args=(name,d))
        pro.start()

The result is

in main thread id(data) is 2629633397040
In thread name is Yang Yanxing
data is yangyanxing id(data) is 1390942032880
In thread name is Yang Yanxing
data is yangyanxing id(data) is 2198251377648
In thread name is Yang Yanxing
data is yangyanxing id(data) is 2708672287728
In thread name is Yang Yanxing
data is yangyanxing id(data) is 2376058999792
In thread name is Yang Yanxing
data is yangyanxing id(data) is 2261044040688

So I tried list, tuple, dict again, and the results were different. I went back and tried to use list tuples and dictionaries in multithreading, and the results showed that their IDs were the same in multithreading.

If the ID is less than 256, if there is no more than one interesting reason for them to be less than 256, then if there is no more interesting reason for them to be equal to 256.

Custom type objects are shared between processes

#coding:utf-8
import threading
import multiprocessing

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

def test(name,data):
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data.get(),id(data)))


if __name__ == '__main__':
    d = Data(10)
    Name = "Yang Yanxing"
    print("in main thread id(data) is {}".format(id(d)))
    for i in range(5):
        pro = multiprocessing.Process(target=test,args=(name,d))
        pro.start()

The result is that

in main thread id(data) is 1927286591728
In thread name is Yang Yanxing
data is 10 id(data) is 1561177927752
In thread name is Yang Yanxing
data is 10 id(data) is 2235260514376
In thread name is Yang Yanxing
data is 10 id(data) is 2350586073040
In thread name is Yang Yanxing
data is 10 id(data) is 2125002248088
In thread name is Yang Yanxing
data is 10 id(data) is 1512231669656

You can see that their IDs are different, that is, different objects.

How to share data among multiple processes

However, how do we want to see the data sharing between the sub process of the int and multiple processes (except for the data sharing between the main process and the sub process)?

Before looking at this problem, let’s make some changes to the previous multithreaded code

#coding:utf-8
import threading
import multiprocessing

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

def test(name,data,lock):
    lock.acquire()
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))
    data.set(data.get()+1)
    lock.release()


if __name__ == '__main__':
    d = Data(0)
    thlist = []
    name = "yang"
    lock = threading.Lock()
    for i in range(5):
        th = threading.Thread(target=test,args=(name,d,lock))
        th.start()
        thlist.append(th)
    for i in thlist:
        i.join()
    print(d.get())

The purpose of this code is to use custom data type objects. After five sub threads operate, each sub thread will add 1 to its data value, and finally print the data value of the object in the main thread.
The output is as follows

in thread  name is yang
data is  id(data) is 1805246501272
in thread  name is yang
data is  id(data) is 1805246501272
in thread  name is yang
data is  id(data) is 1805246501272
in thread  name is yang
data is  id(data) is 1805246501272
in thread  name is yang
data is  id(data) is 1805246501272
5

We can see that 5 is printed out at the end of the main thread, which meets our expectation. But what if it is put into multi process? Because the objects held by each sub process are different in multi process, each sub process operates its own data object, which should not affect the data object of the main process. Let’s take a look at its results

#coding:utf-8
import threading
import multiprocessing

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

def test(name,data,lock):
    lock.acquire()
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))
    data.set(data.get()+1)
    lock.release()


if __name__ == '__main__':
    d = Data(0)
    thlist = []
    name = "yang"
    lock = multiprocessing.Lock()
    for i in range(5):
        th = multiprocessing.Process(target=test,args=(name,d,lock))
        th.start()
        thlist.append(th)
    for i in thlist:
        i.join()
    print(d.get())

Its output results are as follows:

in thread  name is yang
data is  id(data) is 1997429477048
in thread  name is yang
data is  id(data) is 3044738469504
in thread  name is yang
data is  id(data) is 2715076202224
in thread  name is yang
data is  id(data) is 2482736991872
in thread  name is yang
data is  id(data) is 1861188783744
0

The final output is 0, which indicates that the sub process has no effect on the data object passed in by the main process. What kind of operation do we need to realize that the sub process can operate on the object of the main process? We can use itmultiprocessing.managers Basemanager underTo achieve

#coding:utf-8
import threading
import multiprocessing
from multiprocessing.managers import BaseManager

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data
        
BaseManager.register("mydata",Data)

def test(name,data,lock):
    lock.acquire()
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))
    data.set(data.get()+1)
    lock.release()



def getManager():
    m = BaseManager()
    m.start()
    return m


if __name__ == '__main__':
    manager = getManager()
    d = manager.mydata(0)
    thlist = []
    name = "yang"
    lock = multiprocessing.Lock()
    for i in range(5):
        th = multiprocessing.Process(target=test,args=(name,d,lock))
        th.start()
        thlist.append(th)
    for i in thlist:
        i.join()
    print(d.get())

usefrom multiprocessing.managers import BaseManager After the introduction of basemanager, after defining the data type, use theBaseManager.register("mydata",Data)Register the data type in basemanager and give it a namemydataAnd then you can use itBaseManagerObject to initialize the object. Let’s take a look at the output

C:\Python35\python.exe F:/python/python3Test/multask.py
in thread  name is yang
data is  id(data) is 2222932504080
in thread  name is yang
data is  id(data) is 1897574510096
in thread  name is yang
data is  id(data) is 2053415775760
in thread  name is yang
data is  id(data) is 2766155820560
in thread  name is yang
data is  id(data) is 2501159890448
5

We see that although different objects are used in each subprocess, their values can be “shared.”.

Standard data types can also be used through the value object in the multiprocessing library, for example

#coding:utf-8
import threading
import multiprocessing
from multiprocessing.managers import BaseManager

class Data:
    def __init__(self,data=None):
        self.data = data

    def get(self):
        return self.data

    def set(self,data):
        self.data = data

BaseManager.register("mydata",Data)

def test(name,data,lock):
    lock.acquire()
    print("in thread {} name is {}".format(threading.current_thread(),name))
    print("data is {} id(data) is {}".format(data,id(data)))
    data.value +=1
    lock.release()


if __name__ == '__main__':
    d = multiprocessing.Value("l",10) #
    print(d)
    thlist = []
    name = "yang"
    lock = multiprocessing.Lock()
    for i in range(5):
        th = multiprocessing.Process(target=test,args=(name,d,lock))
        th.start()
        thlist.append(th)
    for i in thlist:
        i.join()
    print(d.value)

Use it hered = multiprocessing.Value("l",10)Initializes an object of type number, which isSynchronized wrapper for c_long ,multiprocessing.ValueDuring initialization, the first parameter is type and the second parameter is value. The specific supported types are as follows

You can also use the ctypes library and classes to initialize strings

>>> from ctypes import c_char_p
>>> s = multiprocessing.Value(c_char_p, b'\xd1\xee\xd1\xe5\xd0\xc7')
>>> print(s.value.decode('gbk'))
Yang Yanxing

You can also use the manager object to initialize list, dict, and so on

#coding:utf-8
import multiprocessing


def func(mydict, mylist):
    #The sub process changes dict, and the main process changes
    mydict["index1"] = "aaaaaa" 
    #The sub process changes the list, and the main process changes accordingly 
    mydict["index2"] = "bbbbbb"
    mylist.append(11)  
    mylist.append(22)
    mylist.append(33)


if __name__ == "__main__":
    #The master process shares this dictionary with its children
    mydict = multiprocessing.Manager().dict()
    #The master process shares this list with its children
    mylist = multiprocessing.Manager().list(range(5))  

    p = multiprocessing.Process(target=func, args=(mydict, mylist))
    p.start()
    p.join()

    print(mylist)
    print(mydict)

In fact, what we are talking about here is only the sharing of data values. Because in multiple processes, the objects held by each are different. Therefore, if you want to synchronize the state, you need to save the country by curve. However, it can be easily used in small projects written by ourselves. If we do some larger projects, we still recommend not to use this way of sharing data. This greatly increases the coupling between programs, and the use logic becomes complicated and difficult to understand. Therefore, it is recommended to use queues or databases as communication channels.

Reference articles
Sharing data (global variables) between Python processes

Python multiprocess programming – data sharing between processes

Recommended Today

What is “hybrid cloud”?

In this paper, we define the concept of “hybrid cloud”, explain four different cloud deployment models of hybrid cloud, and deeply analyze the industrial trend of hybrid cloud through a series of data and charts. 01 introduction Hybrid cloud is a computing environment that integrates multiple platforms and data centers. Generally speaking, hybrid cloud is […]