An undervalued Python data structure named duplex

Time:2020-11-30

Original address:https://miguendes.me/everythi…

By Miguel Brito

Translator: Dean Wu

This article discusses PythonnamedtupleThe key usage of. We will introduce from the simple to the deepnamedtupleThe concept of. You’ll learn why and how to use them, so the code is simpler. After studying this guide, you will love to use it.

Learning objectives

At the end of this tutorial, you should be able to:

  • Learn why and when to use it
  • Convert regular tuples and dictionaries toNamedtuple
  • takeNamedtupleConvert to dictionary or regular tuple
  • YesNamedtupleSort the list
  • understandNamedtupleAnd data class
  • Create with optional fieldsNamedtuple
  • takeNamedtupleSerialize to JSON
  • Add document string (docstring)

Why use itnamedtuple

namedtupleIt’s a very interesting (and underrated) data structure. We can easily find Python code that relies heavily on regular tuples and dictionaries to store data. I’m not saying that it’s not good. It’s just that sometimes they are often abused and listen to me.

Suppose you have a function that converts strings to colors. Colors must be represented in 4-dimensional RGBA.

def convert_string_to_color(desc: str, alpha: float = 0.0):
    if desc == "green":
        return 50, 205, 50, alpha
    elif desc == "blue":
        return 0, 0, 255, alpha
    else:
        return 0, 0, 0, alpha

Then we can use it like this:

r, g, b, a = convert_string_to_color(desc="blue", alpha=1.0)

OK, yes. But we have a few problems here. The first is that the order of the returned values cannot be guaranteed. In other words, nothing can prevent other developers from calling like this

convert_string_to_color:
g, b, r, a = convert_string_to_color(desc="blue", alpha=1.0)

In addition, we may not know that the function returns four values. We may call the function as follows:

r, g, b = convert_string_to_color(desc="blue", alpha=1.0)

Therefore, because the return value is not enough, theValueErrorError, call failed.

Such is the case. But, you might ask, why not use a dictionary?

Python’s dictionary is a very general data structure. They are a simple way to store multiple values. However, dictionaries are not without shortcomings. Because of its flexibility, dictionaries are easy to be abused. Give Way
Let’s look at the examples after using dictionaries.

def convert_string_to_color(desc: str, alpha: float = 0.0):
    if desc == "green":
        return {"r": 50, "g": 205, "b": 50, "alpha": alpha}
    elif desc == "blue":
        return {"r": 0, "g": 0, "b": 255, "alpha": alpha}
    else:
        return {"r": 0, "g": 0, "b": 0, "alpha": alpha}

Well, we can now use it like this, expecting only one value to be returned:

color = convert_string_to_color(desc="blue", alpha=1.0)

There is no need to remember the order, but it has at least two disadvantages. The first is that we have to track the name of the key. If we change it{"r": 0, “g”: 0, “b”: 0, “alpha”: alpha}by{”red": 0, “green”: 0, “blue”: 0, “a”: alpha}When you access the field, you will getKeyErrorBack because of the keyr,g,bandalphaNo longer exists.

The second problem with dictionaries is that they are not hashable. This means that we can’t store them in set or other dictionaries. Suppose we want to track how many colors a particular image has. If we usecollections.CounterCount, we’ll getTypeError: unhashable type: ‘dict’

Moreover, the dictionary is variable, so we can add any number of new keys as needed. Believe me, these are some nasty mistakes that are hard to spot.

Okay, good. So what now? What can I use instead?

namedtuple! Yes, that’s it!

Convert our function to usenamedtuple

from collections import namedtuple
...
Color = namedtuple("Color", "r g b alpha")
...
def convert_string_to_color(desc: str, alpha: float = 0.0):
    if desc == "green":
        return Color(r=50, g=205, b=50, alpha=alpha)
    elif desc == "blue":
        return Color(r=50, g=0, b=255, alpha=alpha)
    else:
        return Color(r=50, g=0, b=0, alpha=alpha)

As with dict, we can assign values to individual variables and use them as needed. There is no need to remember the order. Moreover, if you use ide such as pychar and vscode, you can automatically prompt for completion.

color = convert_string_to_color(desc="blue", alpha=1.0)
...
has_alpha = color.alpha > 0.0
...
is_black = color.r == 0 and color.g == 0 and color.b == 0

most important of allnamedtupleIt is immutable. If another developer on the team thinks it’s a good idea to add new fields at run time, the program will report an error.

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)

>>> blue.e = 0
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-8c7f9b29c633> in <module>
----> 1 blue.e = 0

AttributeError: 'Color' object has no attribute 'e'

Not only that, we can now use its counter to track how many colors a collection has.

>>> Counter([blue, blue])
>>> Counter({Color(r=0, g=0, b=255, alpha=1.0): 2})

How to convert a regular tuple or dictionary to a named double

Now that we know why we use namedtuple, it’s time to learn how to convert regular tuples and dictionaries into named tuples. Suppose, for some reason, that you have a dictionary instance that contains color RGBA values. If you want to convert it toColor namedtupleThe following steps can be taken:

>>> c = {"r": 50, "g": 205, "b": 50, "alpha": alpha}
>>> Color(**c)
>>> Color(r=50, g=205, b=50, alpha=0)

We can take advantage of this**The structure will be decompresseddictbynamedtuple

What if I want to create a namedtupe from dict?

No problem. Here’s how to do it

>>> c = {"r": 50, "g": 205, "b": 50, "alpha": alpha}
>>> Color = namedtuple("Color", c)
>>> Color(**c)
Color(r=50, g=205, b=50, alpha=0)

By passing the dict instance to the namedtuple factory function, it will create fields for you. Then, color decompresses dictionary C like the example above to create a new instance.

How to convert a named double to a dictionary or regular tuple

We just learned how to convertnamedtuplebydict。 And vice versa? How can we convert it into a dictionary instance?

Experiments show that named duplex has a method called._asdict()。 Therefore, converting it is as simple as calling a method.

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)
>>> blue._asdict()
{'r': 0, 'g': 0, 'b': 255, 'alpha': 1.0}

You may want to know why the method uses_start. This is one of the inconsistencies with Python’s regular specifications. Usually,_Represents a private method or property. But,namedtuplefor fear ofname conflictThey are added to the public method. except_asdict, and_replace_fieldsand_field_defaults。 You can use thehereFind all of these.

We shouldnamedtupeTo a regular tuple, just pass it to the tuple constructor.

>>> tuple(Color(r=50, g=205, b=50, alpha=0.1))
(50, 205, 50, 0.1)

How to sort the namedtables list

Another common use case is to combine multiplenamedtupleAnd sort them in the list according to some conditions. For example, suppose we have a list of colors that we need to sort by alpha strength.

Fortunately, python allows you to do this in a very Python way. We can use itoperator.attrgetterOperator. according tofileattrgetter“Returns the callable object that gets attr from its operands.”. In short, we can use this operator to get the fields passed to the sorted function for sorting. For example:

from operator import attrgetter
...
colors = [
    Color(r=50, g=205, b=50, alpha=0.1),
    Color(r=50, g=205, b=50, alpha=0.5),
    Color(r=50, g=0, b=0, alpha=0.3)
]
...
>>> sorted(colors, key=attrgetter("alpha"))
[Color(r=50, g=205, b=50, alpha=0.1),
 Color(r=50, g=0, b=0, alpha=0.3),
 Color(r=50, g=205, b=50, alpha=0.5)]

Now, the list of colors is in ascending order of alpha intensity!

How to serialize namedtuples to JSON

Sometimes you may need to storenamedtupleTo JSON. Python dictionaries can be converted to JSON through the JSON module. So we can use it_ The asdict method converts tuples into dictionaries, and then just like dictionaries. For example:

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)
>>> import json
>>> json.dumps(blue._asdict())
'{"r": 0, "g": 0, "b": 255, "alpha": 1.0}'

How to add docstring to namedtuple

In Python, we can use pure strings to record methods, classes, and modules. This string can then be used as a special property named__doc__。 Having said that, how do we respond to usColor namedtupleAdd docstring?

We can do this in two ways. The first (more cumbersome) is the use of wrappers to extend tuples. In this way, we can define docstring in this wrapper. For example, consider the following code snippet:

_Color = namedtuple("Color", "r g b alpha")

class Color(_Color):
    """A namedtuple that represents a color.
    It has 4 fields:
    r - red
    g - green
    b - blue
    alpha - the alpha channel
    """

>>> print(Color.__doc__)
A namedtuple that represents a color.
    It has 4 fields:
    r - red
    g - green
    b - blue
    alpha - the alpha channel
>>> help(Color)
Help on class Color in module __main__:

class Color(Color)
 |  Color(r, g, b, alpha)
 |  
 |  A namedtuple that represents a color.
 |  It has 4 fields:
 |  r - red
 |  g - green
 |  b - blue
 |  alpha - the alpha channel
 |  
 |  Method resolution order:
 |      Color
 |      Color
 |      builtins.tuple
 |      builtins.object
 |  
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)

As above, by inheritance_ColorTuple, we add a__doc__Property.

Add the second method, set directly__doc__Property. This method does not need to extend tuples.

>>> Color.__doc__ = """A namedtuple that represents a color.
    It has 4 fields:
    r - red
    g - green
    b - blue
    alpha - the alpha channel
    """

Note that these methods only apply toPython 3+

What is the difference between namedtuples and data class?

function

Before Python 3.7, you could create a simple data container using any of the following methods:

  • namedtuple
  • General class
  • Third party library,attrs

If you want to use regular classes, that means you will have to implement several methods. For example, a regular class will need a__init__Method to set properties during class instantiation. If you want the class to be hashable, you mean implementing one yourself__hash__method. To compare different objects, you also need to__eq__Implement a method. Finally, to simplify debugging, you need a__repr__method.

Let’s use the regular class to implement our color use case.

class Color:
    """A regular class that represents a color."""

    def __init__(self, r, g, b, alpha=0.0):
        self.r = r
        self.g = g
        self.b = b
        self.alpha = alpha

    def __hash__(self):
        return hash((self.r, self.g, self.b, self.alpha))

    def __repr__(self):
        return "{0}({1}, {2}, {3}, {4})".format(
            self.__class__.__name__, self.r, self.g, self.b, self.alpha
        )

    def __eq__(self, other):
        if not isinstance(other, Color):
            return False
        return (
            self.r == other.r
            and self.g == other.g
            and self.b == other.b
            and self.alpha == other.alpha
        )

As mentioned above, you need to implement many methods. You just need a container to hold the data for you without worrying about distracting details. Again, one of the key differences that people prefer to implement classes is that regular classes are mutable.

In fact, the introduction ofData classOfPEPCall them “variable namedtuples with default values.”https://docs.python.org/zh-cn…

Now, let’s see how to use itData classTo achieve.

from dataclasses import dataclass
...
@dataclass
class Color:
    """A regular class that represents a color."""
    r: float
    g: float
    b: float
    alpha: float

WOW! It’s that simple. Because there is no__init__You just need to define the property after the docstring. In addition, you must annotate it with type hints.

In addition to being mutable, data classes can be used out of the box to provide optional fields. Suppose our color class doesn’t need an alpha field. Then we can set it to optional.

from dataclasses import dataclass
from typing import Optional
...
@dataclass
class Color:
    """A regular class that represents a color."""
    r: float
    g: float
    b: float
    alpha: Optional[float]

We can instantiate it like this:

>>> blue = Color(r=0, g=0, b=255)

Because they are mutable, we can change any fields we need. We can instantiate it like this:

>>> blue = Color(r=0, g=0, b=255)
>>> blue.r = 1
>>>You can set more property fields
>>> blue.e = 10

By contrast,namedtupleBy default, there are no optional fields. We need a little bit of programming and a little bit of skill.

Tip: to add__hash__Method, you need to set theunsafe_hashTo make it immutableTrue

@dataclass(unsafe_hash=True)
class Color:
    ...

Another difference is that unpacking is a first class citizen feature of named tops. If you want toData classIf you have the same behavior, you must realize yourself.

from dataclasses import dataclass, astuple
...
@dataclass
class Color:
    """A regular class that represents a color."""
    r: float
    g: float
    b: float
    alpha: float

    def __iter__(self):
        yield from dataclasses.astuple(self)

performance comparison

It’s not enough to just compare functionality, and namedtuple and data classes also differ in performance. The data class implements dict based on pure python. This makes them faster when accessing fields. Namedtuples, on the other hand, are just a regular extension of tuple. This means that their implementation is based on faster C code and has a smaller memory footprint.

To prove this, consider doing this experiment on Python 3.8.5.

In [6]: import sys

In [7]: ColorTuple = namedtuple("Color", "r g b alpha")

In [8]: @dataclass
   ...: class ColorClass:
   ...:     """A regular class that represents a color."""
   ...:     r: float
   ...:     g: float
   ...:     b: float
   ...:     alpha: float
   ...: 

In [9]: color_tup = ColorTuple(r=50, g=205, b=50, alpha=1.0)

In [10]: color_cls = ColorClass(r=50, g=205, b=50, alpha=1.0)

In [11]: %timeit color_tup.r
36.8 ns ± 0.109 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [12]: %timeit color_cls.r
38.4 ns ± 0.112 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [15]: sys.getsizeof(color_tup)
Out[15]: 72

In [16]: sys.getsizeof(color_cls) + sys.getsizeof(vars(color_cls))
Out[16]: 152

As mentioned above, data classes access fields a little faster in, but they take up more memory space than nametuple.

How to add type hints to namedtuple

Data classes use type hints by default. We can also put them on named tops. We can annotate color tuples by importing and inheriting from the namedtuple annotation type.

from typing import NamedTuple
...
class Color(NamedTuple):
    """A namedtuple that represents a color."""
    r: float
    g: float
    b: float
    alpha: float

Another detail that may not be noticed is that this approach also allows us to use docstring. If you type, help (color) we will be able to see them.

Help on class Color in module __main__:

class Color(builtins.tuple)
 |  Color(r: float, g: float, b: float, alpha: Union[float, NoneType])
 |  
 |  A namedtuple that represents a color.
 |  
 |  Method resolution order:
 |      Color
 |      builtins.tuple
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __getnewargs__(self)
 |      Return self as a plain tuple.  Used by copy and pickle.
 |  
 |  __repr__(self)
 |      Return a nicely formatted representation string
 |  
 |  _asdict(self)
 |      Return a new dict which maps field names to their values.

How to add optional default values to namedtuple

In the previous section, we learned that data classes can have optional values. In addition, I mentioned to imitate the same behavior,namedtupleSome operation skills are needed. It turns out that we can use inheritance, as shown in the following example.

from collections import namedtuple

class Color(namedtuple("Color", "r g b alpha")):
    __slots__ = ()
    def __new__(cls, r, g, b, alpha=None):
        return super().__new__(cls, r, g, b, alpha)
>>> c = Color(r=0, g=0, b=0)
>>> c
Color(r=0, g=0, b=0, alpha=None)

conclusion

Tuples are a very powerful data structure. Make them cleaner and more reliable. Although with the newData classThe competition is fierce, but they still have a lot of scenarios available. In this tutorial, we learned how to usenamedtuplesThere are several ways you can use them.