Hitchhiker's guide to the Python imports

January 13, 2018

Disclaimer: If you write Python on a daily basis you will find nothing new in this post. It’s for people who occasionally use Python like Ops guys and forget/misuse its import system. Nonetheless, the code is written with Python 3.6 type annotations to entertain an experienced Python reader. As usual, if you find any mistakes, please let me know!

Modules

Let’s start with a common Python stanza of

if __name__ == '__main__':
    invoke_the_real_code()

A lot of people, and I’m not an exception, write it as a ritual without trying to understand it. We somewhat know that this snippet makes difference when you invoke your code from CLI versus import it. But let’s try to understand why we really need it.

For illustration, assume that we’re writing some pizza shop software. It’s on Github. Here is the pizza.py file.

# pizza.py file

import math

class Pizza:
    name: str = ''
    size: int = 0
    price: float = 0

    def __init__(self, name: str, size: int, price: float) -> None:
        self.name = name
        self.size = size
        self.price = price

    def area(self) -> float:
        return math.pi * math.pow(self.size / 2, 2)

    def awesomeness(self) -> int:
        if self.name == 'Carbonara':
            return 9000

        return self.size // int(self.price) * 100

print('pizza.py module name is %s' % __name__)
if __name__ == '__main__':
    print('Carbonara is the most awesome pizza.')

I’ve added printing of the magical __name__ variable to see how it may change.

OK, first, let’s run it as a script:

$ python3 pizza.py
pizza.py module name is __main__
Carbonara is the most awesome pizza.

Indeed, the __name__ global variable is set to the __main__ when we invoke it from CLI.

But what if we import it from another file? Here is the menu.py source code:

# menu.py file

from typing import List
from pizza import Pizza

MENU: List[Pizza] = [
    Pizza('Margherita', 30, 10.0),
    Pizza('Carbonara', 45, 14.99),
    Pizza('Marinara', 35, 16.99),
]

if __name__ == '__main__':
    print(MENU)

Run menu.py

$ python3 menu.py
pizza.py module name is pizza
[<pizza.Pizza object at 0x7fbbc1045470>, <pizza.Pizza object at 0x7fbbc10454e0>, <pizza.Pizza object at 0x7fbbc1045b38>]

And now we see 2 things:

  1. The top-level print statement from pizza.py was executed on import
  2. __name__ in pizza.py is now set to the filename without .py suffix.

So, the thing is, __name__ is the global variable that holds the name of the current Python module.

  • Module name is set by the interpreter in __name__ variable
  • When module is invoked from CLI its name is set to __main__

So what is the module, after all? It’s really simple - module is a file containing Python code that you can execute with the interpreter (the python program) or import from other modules.

  • Python module is just a file with Python code

Just like when executing, when the module is being imported, its top-level statements are executed, but be aware that it’ll be executed only once even if you import it several times even from different files.

  • When you import module it’s executed

Because modules are just plain files, there is a simple way to import them. Just take the filename, remove the .py extension and put it in the import statement.

  • To import modules you use the filename without the .py extensions

What is interesting is that __name__ is set to the filename regardless how you import it – with import pizza as broccoli __name__ will still be the pizza. So

  • When imported, the module name is set to filename without .py extension even if it’s renamed with import module as othername

But what if the module that we import is not located in the same directory, how can we import it? The answer is in module search path that we’ll eventually discover while discussing packages.

Packages

  • Package is a namespace for a collection of modules

The namespace part is important because by itself package doesn’t provide any functionality – it only gives you a way to group a bunch of your modules.

There are 2 cases where you really want to put modules into a package. First is to isolate definitions of one module from the other. In our pizza module, we have a Pizza class that might conflict with other’s Pizza packages (and we do have some pizza packages on pypi)

The second case is if you want to distribute your code because

  • Package is the minimal unit of code distribution in Python

Everything that you see on PyPI and install via pip is a package, so in order to share your awesome stuff, you have to make a package out of it.

Alright, assume we’re convinced and want to convert our 2 modules into a nice package. To do this we need to create a directory with empty __init__.py file and move our files to it:

pizzapy/
├── __init__.py
├── menu.py
└── pizza.py

And that’s it – now you have a pizzapy package!

  • To make a package create the directory with __init__.py file

Remember that package is a namespace for modules, so you don’t import the package itself, you import a module from a package.

>>> import pizzapy.menu
pizza.py module name is pizza
>>> pizzapy.menu.MENU
[<pizza.Pizza object at 0x7fa065291160>, <pizza.Pizza object at 0x7fa065291198>, <pizza.Pizza object at 0x7fa065291a20>]

If you do the import that way, it may seem too verbose because you need to use the fully qualified name. I guess that’s intentional behavior because one of the Python Zen items is “explicit is better than implicit”.

Anyway, you can always use a from package import module form to shorten names:

>>> from pizzapy import menu
pizza.py module name is pizza
>>> menu.MENU
[<pizza.Pizza object at 0x7fa065291160>, <pizza.Pizza object at 0x7fa065291198>, <pizza.Pizza object at 0x7fa065291a20>]

Package init

Remember how we put a __init__.py file in a directory and it magically became a package? That’s a great example of convention over configuration – we don’t need to describe any configuration or register anything. Any directory with __init__.py by convention is a Python package.

Besides making a package __init__.py conveys one more purpose – package initialization. That’s why it’s called init after all! Initialization is triggered on the package import, in other words importing a package invokes __init__.py

  • When you import a package, the __init__.py module of the package is executed

In the __init__ module you can do anything you want, but most commonly it’s used for some package initialization or setting the special __all__ variable. The latter controls star import – from package import *.

And because Python is awesome we can do pretty much anything in the __init__ module, even really strange things. Suppose we don’t like the explicitness of import and want to drag all of the modules’ symbols up to the package level, so we don’t have to remember the actual module names.

To do that we can import everything from menu and pizza modules in __init__.py like this

# pizzapy/__init__.py

from pizzapy.pizza import *
from pizzapy.menu import *

See:

>>> import pizzapy
pizza.py module name is pizzapy.pizza
pizza.py module name is pizza
>>> pizzapy.MENU
[<pizza.Pizza object at 0x7f1bf03b8828>, <pizza.Pizza object at 0x7f1bf03b8860>, <pizza.Pizza object at 0x7f1bf03b8908>]

No more pizzapy.menu.Menu or menu.MENU :-) That way it kinda works like packages in Go, but note that this is discouraged because you are trying to abuse the Python and if you gonna check in such code you gonna have a bad time at code review. I’m showing you this just for the illustration, don’t blame me!

You could rewrite the import more succinctly like this

# pizzapy/__init__.py

from .pizza import *
from .menu import *

This is just another syntax for doing the same thing which is called relative imports. Let’s look at it closer.

Absolute and relative imports

The 2 code pieces above is the only way of doing so-called relative import because since Python 3 all imports are absolute by default (as in PEP328), meaning that import will try to import standard modules first and only then local packages. This is needed to avoid shadowing of standard modules when you create your own sys.py module and doing import sys could override the standard library sys module.

  • Since Python 3 all import are absolute by default – it will look for system package first

But if your package has a module called sys and you want to import it into another module of the same package you have to make a relative import. To do it you have to be explicit again and write from package.module import somesymbol or from .module import somesymbol. That funny single dot before module name is read as “current package”.

  • To make a relative import prepend the module with the package name or dot

Executable package

In Python you can invoke a module with a python3 -m <module> construction.

$ python3 -m pizza
pizza.py module name is __main__
Carbonara is the most awesome pizza.

But packages can also be invoked this way:

$ python3 -m pizzapy
/usr/bin/python3: No module named pizzapy.__main__; 'pizzapy' is a package and cannot be directly executed

As you can see, it needs a __main__ module, so let’s implement it:

# pizzapy/__main__.py

from pizzapy.menu import MENU

print('Awesomeness of pizzas:')
for pizza in MENU:
    print(pizza.name, pizza.awesomeness())

And now it works:

$ python3 -m pizzapy
pizza.py module name is pizza
Awesomeness of pizzas:
Margherita 300
Carbonara 9000
Marinara 200
  • Adding __main__.py makes package executable (invoke it with python3 -m package)

Import sibling packages

And the last thing I want to cover is the import of sibling packages. Suppose we have a sibling package pizzashop:

.
├── pizzapy
│   ├── __init__.py
│   ├── __main__.py
│   ├── menu.py
│   └── pizza.py
└── pizzashop
    ├── __init__.py
    └── shop.py
# pizzashop/shop.py

import pizzapy.menu
print(pizzapy.menu.MENU)

Now, sitting in the top level directory, if we try to invoke shop.py like this

$ python3 pizzashop/shop.py
Traceback (most recent call last):
  File "pizzashop/shop.py", line 1, in <module>
    import pizzapy.menu
ModuleNotFoundError: No module named 'pizzapy'

we get the error that our pizzapy module not found. But if we invoke it as a part of the package

$ python3 -m pizzashop.shop
pizza.py module name is pizza
[<pizza.Pizza object at 0x7f372b59ccc0>, <pizza.Pizza object at 0x7f372b59ccf8>, <pizza.Pizza object at 0x7f372b59cda0>]

it suddenly works. What the hell is going on here?

The explanation to this lies in the Python module search path and it’s greatly described in the documentation on modules.

Module search path is a list of directories (available at runtime as sys.path) that interpreter uses to locate modules. It is initialized with the path to Python standard modules (/usr/lib64/python3.6), site-packages where pip puts everything you install globally, and also a directory that depends on how you run a module. If you run a module as a file like python3 pizzashop/shop.py the path to containing directory (pizzashop) is added to sys.path. Otherwise, including running with -m option, the current directory (as in pwd) is added to module search path. We can check it by printing sys.path in pizzashop/shop.py:

$ pwd
/home/avd/dev/python-imports

$ tree
.
├── pizzapy
│   ├── __init__.py
│   ├── __main__.py
│   ├── menu.py
│   └── pizza.py
└── pizzashop
    ├── __init__.py
    └── shop.py

$ python3 pizzashop/shop.py
['/home/avd/dev/python-imports/pizzashop',
 '/usr/lib64/python36.zip',
 '/usr/lib64/python3.6',
 '/usr/lib64/python3.6/lib-dynload',
 '/usr/local/lib64/python3.6/site-packages',
 '/usr/local/lib/python3.6/site-packages',
 '/usr/lib64/python3.6/site-packages',
 '/usr/lib/python3.6/site-packages']
Traceback (most recent call last):
  File "pizzashop/shop.py", line 5, in <module>
    import pizzapy.menu
ModuleNotFoundError: No module named 'pizzapy'

$ python3 -m pizzashop.shop
['',
 '/usr/lib64/python36.zip',
 '/usr/lib64/python3.6',
 '/usr/lib64/python3.6/lib-dynload',
 '/usr/local/lib64/python3.6/site-packages',
 '/usr/local/lib/python3.6/site-packages',
 '/usr/lib64/python3.6/site-packages',
 '/usr/lib/python3.6/site-packages']
pizza.py module name is pizza
[<pizza.Pizza object at 0x7f2f75747f28>, <pizza.Pizza object at 0x7f2f75747f60>, <pizza.Pizza object at 0x7f2f75747fd0>]

As you can see in the first case we have the pizzashop dir in our path and so we cannot find sibling pizzapy package, while in the second case the current dir (denoted as '') is in sys.path and it contains both packages.

  • Python has module search path available at runtime as sys.path
  • If you run a module as a script file, the containing directory is added to sys.path, otherwise, the current directory is added to it

This problem of importing the sibling package often arise when people put a bunch of test or example scripts in a directory or package next to the main package. Here is a couple of StackOverflow questions:

The good solution is to avoid the problem – put tests or examples in the package itself and use relative import. The dirty solution is to modify sys.path at runtime (yay, dynamic!) by adding the parent directory of the needed package. People actually do this despite it’s an awful hack.

The End!

I hope that after reading this post you’ll have a better understanding of Python imports and could finally decompose that giant script you have in your toolbox without fear. In the end, everything in Python is really simple and even when it is not sufficient to your case, you can always monkey patch anything at runtime.

And on that note, I would like to stop and thank you for your attention. Until next time!