miguendes's blog

miguendes's blog

Everything You Need to Know About Python's Namedtuples

Everything You Need to Know About Python's Namedtuples

Learn the most important stuff about named tuples in Python. Learn to convert a namedtuple to dict and json. Add optional fields, docstring, and more!

Subscribe to my newsletter and never miss my upcoming articles

Listen to this article

Here's the truth: namedtuple's are underrated, I mean... VERY!

Many developers, unfortunately, overlook this important data structure. When used appropriately, a namedtuple() can make your code cleaner, faster and easier to read. And this is what we're going to see in this article.

We'll see the most important aspects of a named tuple in Python 3 and, starting from the very basics, we'll move up to more complex concepts. You’ll learn why you should use them and how you can use them to create cleaner, pythonic code.

By the end of this guide, you’ll have learned:

  • why you should you use it and how it can improve code readability
  • how to convert a namedtuple to dict
  • how to create a namedtuple from dict
  • when to choose a namedtuple vs. dict
  • the best way to convert a namedtuple to JSON
  • how to add an optional default values to a namedtuple
  • the unfamiliar way to add a docstring to namedtuples
  • the difference between named tuples and dataclasses
  • how to add a method to a namedtuple
  • how to add type hints to a namedtuple

Table of Contents

  1. What Is a Namedtuple And Why You Should Use It
  2. How to Create a namedtuple from Dict or Regular Tuple
  3. How to Convert a Named Tuple to Dict or Regular Tuple
  4. How to Sort a List of namedtuples
  5. How to Serialize namedtuples to JSON
  6. How to Add a docstring to a namedtuple
  7. How to Add a Method to a Namedtuple
  8. What Are the Differences Between namedtuples and Data Classes?
  9. How to Add Optional Default Values to a namedtuple
  10. How to Add Type Hints to a namedtuple
  11. Conclusion

What Is a Namedtuple And Why You Should Use It

namedtuple is a very interesting—and also underrated—data structure.

A named tuple is an extension of the regular built-in tuple (namedtuple is a tuple subclass). It provides the same features as the conventional tuple, but also allows you to access fields via attribute lookup using dot notation, that is, using their names instead of only indexes.

It’s very common to find Python’s code that heavily relies on regular tuples, or sometimes dictionaries, to store data. And don’t get me wrong, both dictionaries and regular tuples have their value; the problem lies in misusing them. For example:

Suppose that you have a function that converts a string into a color. The color must be represented in a 4-dimensional space, the RGBA.

def convert_string_to_color(desc: str, alpha: float = 0.0):
    if desc == "green":
        return 50, 205, 50, alpha
    elif desc == "blue":
        return 0, 0, 255, alpha
    else:
        return 0, 0, 0, alpha

Then, we can use it like this:

r, g, b, a = convert_string_to_color(desc="blue", alpha=1.0)

Ok, that works, but... we have a couple of problems here. The first one is, there's no way to ensure the order of the returned values. That is, there's nothing stopping another developer to call convert_string_to_color like this:

g, b, r, a = convert_string_to_color(desc="blue", alpha=1.0)

Also, we may not know that the function returns 4 values, and end up calling the function like so:

r, g, b = convert_string_to_color(desc="blue", alpha=1.0)

Which, in turn, fails with ValueError since we cannot unpack the whole tuple.

That's true. But why don't you use a dictionary instead?

Python’s dictionaries are a very versatile data structure. They can serve as an easy and convenient way to store multiple values. However, a dict doesn’t come without shortcomings. Due to its flexibility, dictionaries are very easily abused. As an illustration, let us convert our example to use a dictionary instead of tuple.

def convert_string_to_color(desc: str, alpha: float = 0.0):
    if desc == "green":
        return {"r": 50, "g": 205, "b": 50, "alpha": alpha}
    elif desc == "blue":
        return {"r": 0, "g": 0, "b": 255, "alpha": alpha}
    else:
        return {"r": 0, "g": 0, "b": 0, "alpha": alpha}

Ok, we now can use it like this, expecting just one value to be returned:

color = convert_string_to_color(desc="blue", alpha=1.0)

No need to remember the order, but it has at least two drawbacks. The first one is that we must keep track of the key’s names. If we change {"r": 0, “g”: 0, “b”: 0, “alpha”: alpha} to {”red": 0, “green”: 0, “blue”: 0, “a”: alpha}, when accessing a field, we’ll get a KeyError back, as the keys r, g, b, and alpha no longer exist.

The second issue with dicts is that they are not hashable. That means we cannot store them in a set or other dictionaries. Let’s imagined that we want to keep track of how many colors a particular image has. If we use collections.Counter to count, we’ll get TypeError: unhashable type: ‘dict’.

Also, dictionaries are mutable objects, so we can add as many new keys as we want. Trust me, this is a recipe for nasty bugs that are really hard to track down.

Ok, fine, that makes sense. So, now what? What I can use instead?

namedtuples! Just... use it!

Converting our function to use namedtuples is as easy as this:

from collections import namedtuple
...
Color = namedtuple("Color", "r g b alpha")
...
def convert_string_to_color(desc: str, alpha: float = 0.0):
    if desc == "green":
        return Color(r=50, g=205, b=50, alpha=alpha)
    elif desc == "blue":
        return Color(r=50, g=0, b=255, alpha=alpha)
    else:
        return Color(r=50, g=0, b=0, alpha=alpha)

Like the dict’s case, we can assign it to a single variable and use as we please. There’s no need to remember ordering. And if you’re using an IDE such as PyCharm and VSCode, you have auto completions out of the box.

color = convert_string_to_color(desc="blue", alpha=1.0)
...
has_alpha = color.alpha > 0.0
...
is_black = color.r == 0 and color.g == 0 and color.b == 0

To top it all off, namedtuples are immutable objects. If another developer on the team thinks it’s a good idea to add a new field during runtime, the program will fail.

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)

>>> blue.e = 0
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-8c7f9b29c633> in <module>
----> 1 blue.e = 0

AttributeError: 'Color' object has no attribute 'e'

Not only that, now we can use it the Counter to track how many colors a collection has.

>>> Counter([blue, blue])
>>> Counter({Color(r=0, g=0, b=255, alpha=1.0): 2})

How to Create a namedtuple from Dict or Regular Tuple

Now that we understand the motivations behind using namedtuple, it’s time to learn how to convert normal tuples and dictionaries into named tuples.

Let's say that you have dictionary instance containing the RGBA values for a color. If you want to instantiate the Color namedtuple we just created, you can pass the dict as keyword arguments to the named tuple constructor:

>>> Color = namedtuple("Color", "r g b alpha")
>>> c = {"r": 50, "g": 205, "b": 50, "alpha": alpha}
>>> Color(**c)
>>> Color(r=50, g=205, b=50, alpha=0)

That’s it. We can just leverage the ** construct to unpack the dict as keyword arguments into a namedtuple.

What if I want to create a namedtuple class from the dict? I mean... not an instance but named tuple class?

No problem, if you pass a dict to the namedtuple factory function, it creates a named tuple class using the dictionary fields.

>>> c = {"r": 50, "g": 205, "b": 50, "alpha": alpha}
>>> Color = namedtuple("Color", c)
>>> Color(**c)
Color(r=50, g=205, b=50, alpha=0)

Then, to create a new Color instance from a dict we can just unpack the dictionary as keyword arguments, like in the previous example.

How to Convert a Named Tuple to Dict or Regular Tuple

We've just learned how to convert a namedtuple from a dict. What about the inverse? How can we convert a namedtuple to a dictionary instance?

It turns out, namedtuple comes with a method called ._asdict(). So, converting it is as simple as calling the method.

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)
>>> blue._asdict()
{'r': 0, 'g': 0, 'b': 255, 'alpha': 1.0}

You may be wondering why the method starts with a _. Unfortunately, this is one of the inconsistencies with Python. Usually, _ represents private method or attribute. However, namedtuple adds them to its public method to avoid naming conflicts. Besides _asdict, there’s also _replace, _fields, and _field_defaults. You can find all of them here.

To convert a named tuple into a regular tuple, it's enough to pass it to a tupleconstructor.

>>> tuple(Color(r=50, g=205, b=50, alpha=0.1))
(50, 205, 50, 0.1)

How to Sort a List of namedtuples

Another common use case is storing several namedtuples instances in a list and sort them based on some criteria. For example, say that we have a list of colors and we need to sort them by alpha intensity.

Fortunately, Python allows a very pythonic way of doing that. We can use the operator.attrgetter operator. According to the docs, attrgetter “returns a callable object that fetches attr from its operand”. In layman’s terms, we can pass the name of the field, we want to sort it and pass it to the sorted function. Example:

from operator import attrgetter
...
colors = [
    Color(r=50, g=205, b=50, alpha=0.1),
    Color(r=50, g=205, b=50, alpha=0.5),
    Color(r=50, g=0, b=0, alpha=0.3)
]
...
>>> sorted(colors, key=attrgetter("alpha"))
[Color(r=50, g=205, b=50, alpha=0.1),
 Color(r=50, g=0, b=0, alpha=0.3),
 Color(r=50, g=205, b=50, alpha=0.5)]

Now, the list of colors is sorted in ascending order by alpha intensity!

How to Serialize namedtuples to JSON

Sometimes you may need to save a namedtupleto JSON. As you may probably know, Python’s dictionaries can be converted to JSON through the json module. As a result, if we convert our tuple to dictionary with the _asdict method, then we’re all set. As an example, consider this scenario:

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)
>>> import json
>>> json.dumps(blue._asdict())
'{"r": 0, "g": 0, "b": 255, "alpha": 1.0}'

As you can see, json.dumps converts a dict into a JSON string.

How to Add a docstring to a namedtuple

In Python, we can document methods, classes and modules using plain strings. This string is then made available as a special attribute named __doc__. That being said, how can we add a documentation to our Color namedtuple?

There’s no right answer to this, but we can do it in two ways. The first one (and a bit more cumbersome) is to extend the tuple using a wrapper. By doing so, we can then define the docstring in this wrapper. As an example, consider the following snippet:

_Color = namedtuple("Color", "r g b alpha")

class Color(_Color):
    """A namedtuple that represents a color.
    It has 4 fields:
    r - red
    g - green
    b - blue
    alpha - the alpha channel
    """

>>> print(Color.__doc__)
A namedtuple that represents a color.
    It has 4 fields:
    r - red
    g - green
    b - blue
    alpha - the alpha channel
>>> help(Color)
Help on class Color in module __main__:

class Color(Color)
 |  Color(r, g, b, alpha)
 |  
 |  A namedtuple that represents a color.
 |  It has 4 fields:
 |  r - red
 |  g - green
 |  b - blue
 |  alpha - the alpha channel
 |  
 |  Method resolution order:
 |      Color
 |      Color
 |      builtins.tuple
 |      builtins.object
 |  
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)

As you can see, by inheriting the _Color tuple, we added a __doc__ attribute it.

The second way of adding docstring is just setting __doc__. You see? There’s no need to extend the tuple in the first place.

>>> Color.__doc__ = """A namedtuple that represents a color.
    It has 4 fields:
    r - red
    g - green
    b - blue
    alpha - the alpha channel
    """

Just bear in mind that these methods only work on Python 3+.

How to Add a Method to a Namedtuple

You can add a method to a named tuple class by using inheritance. Following the previous example, we can extend it not only to add a docstring but also to add custom methods.

>>> from collections import namedtuple

>>> _Color = namedtuple("Color", "r g b")

>>> class Color(_Color):
            """A namedtuple that represents a color.
            It has 3 fields:
            r - red
            g - green
            b - blue
            """
            def to_hex(self) -> str:
                return f"#{self.r:02x}{self.g:02x}{self.b:02x}"    

>>> blue = Color(r=0, g=0, b=255, alpha=1.0)

>>> blue.to_hex()
'#0000ff'

What Are the Differences Between namedtuples and Data Classes?

Before Python 3.7, creating a simple container of data involved using either:

  • a namedtuple
  • a regular class
  • a third-party library, such as attrs.

If you wanted to go through the class route, that meant you would have to implement a couple of methods. For instance, a regular class will require a __init__ method to set the attributes during class instantiation. If you wanted the class to be hashable, that meant implementing yourself a __hash__ method. To compare different objects, you also want a __eq__ method implemented. And finally, to make debugging easier, you need a __repr__ method. Again, let’s revisit our color use case again using a regular class.

class Color:
    """A regular class that represents a color."""

    def __init__(self, r, g, b, alpha=0.0):
        self.r = r
        self.g = g
        self.b = b
        self.alpha = alpha

    def __hash__(self):
        return hash((self.r, self.g, self.b, self.alpha))

    def __repr__(self):
        return "{0}({1}, {2}, {3}, {4})".format(
            self.__class__.__name__, self.r, self.g, self.b, self.alpha
        )

    def __eq__(self, other):
        if not isinstance(other, Color):
            return False
        return (
            self.r == other.r
            and self.g == other.g
            and self.b == other.b
            and self.alpha == other.alpha
        )

As you can see, there's a lot to implement. You just need a container to hold the data for you and not bother with distracting details. Also, a key difference why people preferred to implement a class is that they are mutable. In fact, the PEP that introduced Data Classes refers them as "mutable namedtuples with defaults".

Now, let's see how this class is implemented as a Data Class.

from dataclasses import dataclass
...
@dataclass
class Color:
    """A regular class that represents a color."""
    r: float
    g: float
    b: float
    alpha: float

Wow! Is that it?

Yes, that's it. As simple as that! A major difference is that, since there's no __init__ any more, you can just define the attributes after the docstring. Also, they must be annotated with a type hint.

Besides being mutable, a Data Class can also have optional fields out of the box. Let’s say that our Color class does not require an alpha field. We can then make it Optional.

from dataclasses import dataclass
from typing import Optional
...
@dataclass
class Color:
    """A regular class that represents a color."""
    r: float
    g: float
    b: float
    alpha: Optional[float] = None

And we can instantiate it like so:

>>> blue = Color(r=0, g=0, b=255)

Since they're mutable, we can change whatever field we want. And we can instantiate it like so:

>>> blue = Color(r=0, g=0, b=255)
>>> blue.r = 1
>>> # or even add more fields on the fly
>>> blue.e = 10

Unfortunately, due to their nature, namedtuples don't have optional fields by default. To add them we need a bit of a hack and a little meta-programming.

Caveat: To add a __hash__ method, you need to make them immutable by setting unsafe_hash to True:

@dataclass(unsafe_hash=True)
class Color:
    ...

Another difference is that unpacking is a first-class citizen with namedtuples. If you want your Data Class to have the same behavior, you must implement yourself.

from dataclasses import dataclass, astuple
...
@dataclass
class Color:
    """A regular class that represents a color."""
    r: float
    g: float
    b: float
    alpha: float

    def __iter__(self):
        yield from dataclasses.astuple(self)

Performance Comparison

Comparing only the features is not enough, named tuples and data classes differ in performance too. Data classes are implemented in pure Python and based on a dict. This makes them faster when it comes to accessing the fields using dot notation.

On the other hand, namedtuples are just an extension a regular tuple. That means their implementation is based on a faster C code and have a smaller memory footprint.

To show that, consider this experiment on Python 3.8.5.

In [6]: import sys

In [7]: ColorTuple = namedtuple("Color", "r g b alpha")

In [8]: @dataclass
   ...: class ColorClass:
   ...:     """A regular class that represents a color."""
   ...:     r: float
   ...:     g: float
   ...:     b: float
   ...:     alpha: float
   ...: 

In [9]: color_tup = ColorTuple(r=50, g=205, b=50, alpha=1.0)

In [10]: color_cls = ColorClass(r=50, g=205, b=50, alpha=1.0)

In [11]: %timeit color_tup.r
36.8 ns ± 0.109 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [12]: %timeit color_cls.r
38.4 ns ± 0.112 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [15]: sys.getsizeof(color_tup)
Out[15]: 72

In [16]: sys.getsizeof(color_cls) + sys.getsizeof(vars(color_cls))
Out[16]: 152

As you can see, accessing a field is slightly faster in a dataclass, however when it comes to memory usage, they take up much more space than a tuple.

How to Add Type Hints to a namedtuple

As you can see, Data Classes use type hints by default. However, we can have them on namedtuples as well. By importing the Namedtuple annotation type and inheriting from it, we can have our Color tuple annotated.

from typing import NamedTuple
...
class Color(NamedTuple):
    """A namedtuple that represents a color."""
    r: float
    g: float
    b: float
    alpha: float

Another detail that might have gone unnoticed is that this way also allows us to have docstrings. If we type help(Color) we'll be able to see them.

Help on class Color in module __main__:

class Color(builtins.tuple)
 |  Color(r: float, g: float, b: float, alpha: Union[float, NoneType])
 |  
 |  A namedtuple that represents a color.
 |  
 |  Method resolution order:
 |      Color
 |      builtins.tuple
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __getnewargs__(self)
 |      Return self as a plain tuple.  Used by copy and pickle.
 |  
 |  __repr__(self)
 |      Return a nicely formatted representation string
 |  
 |  _asdict(self)
 |      Return a new dict which maps field names to their values.

How to Add Optional Default Values to a namedtuple

In the last section, we learned that Data Classes can have optional values. Also, I mentioned that to mimic the same behavior on a named tuple requires some hacking. As it turns out, we can use inheritance, as in the example below.

from collections import namedtuple

class Color(namedtuple("Color", "r g b alpha")):
    __slots__ = ()
    def __new__(cls, r, g, b, alpha=None):
        return super().__new__(cls, r, g, b, alpha)
>>> c = Color(r=0, g=0, b=0)
>>> c
Color(r=0, g=0, b=0, alpha=None)

Conclusion

Named tuples are a very powerful data structure. They allows us to create pythonic code that's cleaner, and more reliable. Despite the competition against the new Data Classes, they still have plenty of firewood to burn. In this tutorial, we learned several ways of making use of namedtuples, and I hope you can them useful.

 
Share this
Proudly part of