How to Shoot Yourself in the Foot With Python. Part 1.

Featured on Hashnode

Subscribe to my newsletter and never miss my upcoming articles

Have you ever had a bug that took ages to fix and made no sense at all?

If the answer is yes, then keep reading. Chances are that if you program in Python, you will probably fall into one of these silly behaviors.

In the part 1 of this series I’m going to show you 5 things that have a great potential drive you mad in Python. Some of them are very subtle and others are not obvious at all. By learning about them in advanced, you can save hours of debugging time. So brace yourself and let’s go!

Table of Contents

  1. Implicit String Concatenation
  2. Walrus WAT!?
  3. Be Careful When Using += With Lists
  4. Mutable Defaults
  5. Chained Operations Gone Wrong
  6. Conclusion

Implicit String Concatenation

A dog falling into a pit

I confess that this one has costed me several hours of my life. When used correctly it is very nice but when you don’t, it’s a headache.

In Python you can concatenate strings using not only using the + operator but also implicitly. The following snippet illustrates a very common bug in Python.

In [2]: "french " + "bulldog"
Out[2]: 'french bulldog'

That's fine, <string> + <string> generates a new <string>. But what you may not know is that you can leave + out and Python will still concatenate the string.

In [3]: "french " "bulldog"
Out[3]: 'french bulldog'

When is this a problem, then? It'll be an issue when you want a list of strings and forget a comma.

In [4]: dogs = ["poodle" "french bulldog", "pit bull", "american bully"]
In [6]: dogs
Out[6]: ['poodlefrench bulldog', 'pit bull', 'american bully']

Oh, it can get worse. Imagine you have a function that accepts two strings, but the second one is optional.

In [11]: def print_pair(a: str, b: Optional[str] = None):
    ...:     print("a: ", a, "b: ", b)
In [13]: print_pair(
    ...: "First string"
    ...: "Second string"
    ...: )
a:  First stringSecond string b:  None

You see? The function runs just fine, so that’s not great! Situations like these can hide nasty bugs. The lesson here is clear: be careful when passing strings to functions or using them as list of items.

Walrus WAT!?

A dog playing goes #fail

In 2019, Python 3.8 introduced the walrus operator. This new feature generated a lot of controversies. Some people loved whereas other actually hated it. The goal of this post is not to debate that, so I’ll dive right into what makes walrus confusing.

Before Python 3.8, you could not assign a value to a variable and test if it was “truthy” in the same statement. For example, see the following example, of reading data from a socket until an empty string is read. This examples is inspired by one described in the PEP.

Without walrus:

data = sock.recv(4096)
while data:
    clean_data = clean(data)
    print("Received data:", data)
    data = sock.recv(4096)

With walrus:

while data := sock.recv(4096):
    clean_data = clean(data)
    print("Received data:", data)

That is, you do both the assignment and the checking in the same line by using :=.

In Python we can use tuples to assign values to more than one variable in the same line.

In [1]: a, b = 2, 3

In [2]: a
Out[2]: 2

In [3]: b
Out[3]: 3

Hum... we probably can do the same using walrus, right?

In [4]: (a, b := 16, 19)
Out[4]: (2, 16, 19)

WAT!

Yeah, a 3-tuple is returned!

̶T̶h̶e̶ ̶r̶e̶a̶s̶o̶n̶ ̶f̶o̶r̶ ̶t̶h̶a̶t̶ ̶i̶s̶ ̶t̶h̶a̶t̶ ̶t̶h̶e̶ ̶̶b̶̶ ̶t̶a̶k̶e̶s̶ ̶p̶r̶e̶c̶e̶d̶e̶n̶c̶e̶ ̶a̶n̶d̶ ̶g̶e̶t̶s̶ ̶a̶s̶s̶i̶g̶n̶e̶d̶ ̶t̶o̶ ̶i̶t̶ ̶t̶h̶e̶ ̶t̶u̶p̶l̶e̶ ̶̶1̶6̶,̶ ̶1̶9̶̶.̶ ̶I̶n̶ ̶o̶t̶h̶e̶r̶ ̶w̶o̶r̶d̶s̶,̶ ̶i̶t̶’̶s̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶a̶s̶ ̶̶(̶a̶,̶ ̶(̶b̶ ̶:̶=̶ ̶1̶6̶,̶ ̶1̶9̶)̶)̶̶.̶ ̶A̶n̶d̶ ̶s̶i̶n̶c̶e̶ ̶̶a̶̶ ̶h̶a̶d̶ ̶a̶l̶r̶e̶a̶d̶y̶ ̶b̶e̶e̶n̶ ̶b̶o̶u̶n̶d̶ ̶t̶o̶ ̶̶2̶̶,̶ ̶s̶o̶ ̶t̶h̶e̶ ̶r̶e̶t̶u̶r̶n̶ ̶i̶s̶ ̶t̶h̶e̶ ̶t̶u̶p̶l̶e̶ ̶̶(̶a̶,̶ ̶b̶ ̶:̶=̶ ̶1̶6̶,̶ ̶1̶9̶)̶ ̶=̶>̶ ̶(̶2̶,̶ ̶(̶1̶6̶,̶ ̶1̶9̶)̶)̶̶.̶

Thanks to @ForceBru who kindly corrected me, what gets assigned to b is not a tuple, but only the first element after :=. As a result, (a, b := 16, 19) is the same as (a, (b := 16), 19). And that explains why a 3-tuple is returned.

You can verify that by printing the AST .

import ast

print(ast.dump(ast.parse("(a, b:= 16, 19)")))

Which produces the following output:

Module(
    body=[
        Expr(
            value=Tuple(
                elts=[
                    Name(id="a", ctx=Load()),
                    NamedExpr(
                        target=Name(id="b", ctx=Store()),
                        value=Constant(value=16, kind=None),
                    ),
                    Constant(value=19, kind=None),
                ],
                ctx=Load(),
            )
        )
    ],
    type_ignores=[],
)

As you can see, the thing is confusing!

What if a is not defined?

In [14]: (a, b := 16, 19)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-bd6265d8ca84> in <module>
----> 1 (a, b := 16, 19)

NameError: name 'a' is not defined

When the first variable is not defined, a NameError will be raised. Now you know how to avoid that!

Be Careful When Using += With Lists

A dog falling down the stairs

Lists in Python are incredibly nice. You can perform all sorts of stuff like:

  • concatenating multiple lists using + operator
  • generating a repeated list by using the * operator
  • concatenate and assign lists using +=

Let’s look at an example on how + operator works with the list object. I know you may be tired of such toy examples, but please, bear with me.

In [20]: lst = [3, 4, 5, 6, 7]

In [21]: lst_copy = lst

In [22]: lst = lst + [8, 9]

In [23]: lst
Out[23]: [3, 4, 5, 6, 7, 8, 9]

In [24]: lst_copy
Out[24]: [3, 4, 5, 6, 7]

Cool, we created a list called lst then we built a new one named lst_copy by pointing it to lst. Then we changed lst by appending [8, 9] to it. As expected, the + operator expanded lst whereas lst_copy remained the same.

In Python one can shorten expressions like a = a + 1 as a += 1. As I mentioned in the beginning, you can also use the += operator with lists. So, let’s give it a shot and re-write our example.

In [25]: lst = [3, 4, 5, 6, 7]

In [26]: lst_copy = lst

In [27]: lst += [8, 9]

In [28]: lst
Out[28]: [3, 4, 5, 6, 7, 8, 9]

In [29]: lst_copy
Out[29]: [3, 4, 5, 6, 7, 8, 9]

WAAATTT!? What happened here?

The reason for this behavior is that, like other Python operators, the implementation of += is defined by the class that implements it. That is, to define += the list class has defined a object.__iadd__(self, other) magic method. And the way it works is the same as list.extends.

So why has lst_copy been modified?

Because it is not an actual copy of lst but it points to the in memory.

Screenshot_2020-10-18_08-11-01.png

In [28]: lst
Out[28]: [3, 4, 5, 6, 7, 8, 9]

In [29]: lst_copy
Out[29]: [3, 4, 5, 6, 7, 8, 9]

In [30]: lst = [3, 4, 5, 6, 7]

In [31]: lst_copy = lst

In [32]: lst.extend([8, 9])

In [33]: lst
Out[33]: [3, 4, 5, 6, 7, 8, 9]

In [34]: lst_copy
Out[34]: [3, 4, 5, 6, 7, 8, 9]

The key takeaway is, don't blindly assume operators will have the same semantics across different classes.

Mutable Defaults

A dog jumping sofas #fail

I understand that this one might not be new to you. However, it’s unquestionably one of the most dangerous. The case I’m talking about is the usage of mutable default arguments on functions. If you don’t know what this is all about, take a look at the following example.

In [35]: def add_fruit(fruit: str, basket: list = []) -> list:
    ...:     basket.append(fruit)
    ...:     return basket
    ...: 

In [36]: b = add_fruit("banana")

In [37]: b
Out[37]: ['banana']

In [38]: c = add_fruit("apple")

In [39]: c
Out[39]: ['banana', 'apple']

WAAATT!?

As you can see, we call the function twice without passing a list to it. The ultimate result is a list with two items, how did that happen?

The reason for this behavior is that when the interpreter defines the function, it also creates the default argument. Then, it binds the object created to the function argument.

In our problem, Python allocated an empty list and bound it to the argument basket. To make things simpler to follow, let’s look at a visual example made with python tutor.

mutable-1.png

mutable-2.png

As you can see, the argument basket is created once and the function points to it during its entire lifetime. The only exception is when you pass another list to it but that won’t change the default. Whenever you call the function again without passing a list to it, it will use the one created when the function was defined.

How can we avoid this, then?

To avoid this, you must set the argument to None and create a list if none is passed.

In [44]: def add_fruit(fruit: str, basket: Optional[list] = None):
    ...:     if basket is None:
    ...:         basket = []
    ...:     basket.append(fruit)
    ...:     return basket

In [45]: b = add_fruit("banana")

In [46]: b
Out[46]: ['banana']

In [47]: c = add_fruit("apple")

In [48]: c
Out[48]: ['apple']

Great! Now we create a list whenever no argument is passed to the function, which fixes the bug.

Chained Operations Gone Wrong

A dog playing on grass and failing

Chained operations are an exceptional feature. It makes the code terse without sacrificing readability. I discuss it in more detail in another blog post but to provide you a bit of context let’s see it in action.

>>> x = 10
>>> 20 == x == 0
False
>>> 25 > x <= 15
True

Let's pay close attention to the first example. If Python didn't have this feature, that statement could be re-written as:

In [53]: 20 == x and x == 0
Out[53]: False

Now, what happens if we add parentheses to enforce some kind of precedence?

In [51]: 20 == x == 0
Out[51]: False

In [52]: (20 == x) == 0
Out[52]: True

Wait? What on earth has just happened?

When we added the parentheses, (20 == x) was evaluated to False. However, the problem is that False is then compared to 0. ̶S̶i̶n̶c̶e̶ ̶̶0̶̶ ̶i̶s̶ ̶c̶o̶n̶s̶i̶d̶e̶r̶e̶d̶ ̶a̶ ̶"̶F̶a̶l̶s̶y̶"̶ ̶v̶a̶l̶u̶e̶,̶ ̶t̶h̶e̶n̶ ̶t̶h̶e̶ ̶c̶o̶m̶p̶a̶r̶i̶s̶o̶n̶ ̶r̶e̶t̶u̶r̶n̶s̶ ̶'̶T̶r̶u̶e̶`̶.̶

As pointed out by @alexmojaki, False == 0 is True because bool is a subclass of int. For instance, other "Falsy" values such as [] or "" are not equal to False.

In [54]: False == 0
Out[54]: True

In [55]: bool(0)
Out[55]: False

In [56]: False == ""
Out[56]: False

In [57]: False == []
Out[57]: False

The lesson here is, be careful when using parentheses in chained operations.

Conclusion

That’s it for today, folks! I hope you’ve learned something new and useful.

Python has amazing features, but we must use some of them with caution. If we’re not mindful, we may lose tons of time debugging our code. By learning the common pitfall we are much better prepared and not only can prevent these bugs but also avoid them.

If you liked this post, consider sharing it with your friends! Also, feel free to follow me miguendes.me.

Other posts you may like:

See you next time!

Alex Hall's photo

Since 0 is considered a "Falsy" value, then the comparison returns 'True`.

No, that's the kind of thing JavaScript does. Most Falsy values (None, empty string/list/dict/etc) are not equal to False.

False == 0 and True == 1 because bool is a subclass of int. You can treat booleans just like those numbers, for example True + True == 2. This can be very useful.

Miguel Brito's photo

Interesting, thanks for the correction. I'll update the post.

Regardless, I wish it boolean wasn't subclass of ints.

IMHO, True + True + True shouldn't be equal to 3, or False == 0. This would avoid bugs such as the ones when using chained comparisons.

Alex Hall's photo

To me it seems obvious that the chained comparisons wouldn't work. I've never seen someone make that mistake before. Doing that breaks extremely easily without the fact about booleans. If a == b == c is True, then (a == b) == c is guaranteed to be False unless they are all equal to True (or 1).

I often use this property of booleans to write something like sum(predicate(x) for x in lst) to count the number of items in a list satisfying a condition, or conveniently making something plural with f"thing{(len(things) > 1) * 's'}". It's hacky but it's fun.

Nickolai Belakovski's photo

Great article.

I think the example for the walrus operator could be better. Something like

# We want to increment by the number if it's nonzero, otherwise we want to increment by at least 1
x = get_some_number_maybe_0()
if x:
  y += x
else:
  y += 1

Which can be transformed with walrus operator to:

if x := get_some_number_maybe_0():
    y += x
else:
    y += 1

If the example you've shown, should_run isn't necessary, you can just base the if statement on the return of the os.environ call.

I guess the above could also be done as y += max(1, get_some_number_maybe_0()). Dammit, I need to stop finding ways to optimize my examples :P

Edit: The PEP introducing the operator has some better, albeit more complicated examples python.org/dev/peps/pep-0572/#syntax-and-se..

Miguel Brito's photo

Thanks for the feedback Nickolai Belakovski. I've updated the post to make it clearer.

Jordan Kalebu's photo

enlightening article, Love it, Thanks for sharing

Paula Maranhão's photo

I was Waaaaat in the whole post. hahahahah Thanks to Python Tutor I can understand the logic. Loved the GIFs, btw!

Sobolev Nikita's photo

Thanks a lot! Great points.

Looks like a good linter like github.com/wemake-services/wemake-python-st.. will catch almost all of these problems.

Edidiong Asikpo's photo

Very well written Miguel Brito. Thanks for sharing.