Embedding Python: How To Confuse Python and Yourself

This is a cautionary tale about how embedded Python finds its runtime files under Windows. Don’t worry, though — everyone lives happily ever after.

The story begins with a client’s request to build an executable under Windows that embeds Python. This is not so hard; there is documentation for embedding Python.

I had two versions of Python installed on my Windows virtual machine because I was experimenting with different Pythons at the time. (Doesn’t everyone go through an experimental phase in their youth?) In C:\Python27 I had installed 64-bit Python 2.7.9 from Python.org, and in C:\Users\philip\Miniconda2 I had installed 64-bit Python 2.7.11 from Continuum. The 2.7.9 version was an older install. I was only interested in using the 2.7.11 version.

My executable’s C code told Python where to find its runtime files —

Py_SetPythonHome("C:/Users/philip/Miniconda2");

After compiling and linking, I ran my Python-embedding executable which imported my hello_world.py file. It printed “Hello world!” as expected.

Here Come the Dragons

I thought everything was fine until I added these print statements to my Python code —

import sys
print sys.exec_prefix
print sys.version

The output was not what I expected —

C:/Users/philip/Miniconda2

2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)]

This is contradictory! The Miniconda Python was 2.7.11, yet sys.version identified itself as Python 2.7.9. How can the same module report two different Pythons simultaneously?

What Went Wrong

The sys module reported information about two Pythons simultaneously because I was mixing parts from two different Python runtimes simultaneously.

Python’s runtime consists of a dynamically loaded library (python27.dll), the standard library which is largely written in Python itself, and an executable. The executable is usually python.exe, but in this case it was my C program that embedded Python. When my C program asked Windows to find python27.dll, Windows searched for it in these directories as documented in the Windows DLL search strategy  —

  1. The directory where the executable module for the current process is located.
  2. The current directory.
  3. The Windows system directory (usually C:\Windows\system32).
  4. The Windows directory (usually C:\Windows).
  5. The directories listed in the PATH environment variable.

My problem was that Windows found C:\Windows\system32\python27.dll first, and that was from my Python 2.7.9 installation. Meanwhile, my call to Py_SetPythonHome() had told Python to use the standard library from Miniconda Python. The value of sys.version comes from a string hardcoded in the runtime DLL, while sys.exec_prefix is derived from the value I passed in Py_SetPythonHome(). I was using the standard library from one Python installation, and the runtime DLL from another.

Consequences

Although I didn’t experiment with this for long, I might not have noticed that there was a problem if I hadn’t been lucky enough to double check my Python setup with the sys module. The standard library probably doesn’t care about which interpreter it runs under. I can imagine a few cases may exist where changes/bug fixes were made to the Python part of the standard library for versions 2.7.10 and 2.7.11 that rely on corresponding changes to the binary runtime, and that code might behave badly.

Both of the Pythons I was using were built with the same compiler, so theoretically binary extensions like numpy should run just fine under either Python. But I could certainly forgive numpy if it crashed as a result.

In short, this is neither a typical nor a supported use of Python which puts it in the “Here there be dragons” category.

The Solution

The solution was very simple. I copied C:\Users\philip\Miniconda2\python27.dll into the same directory as my custom executable. Since that’s the first location Windows searches when loading a DLL, it isolates my code from other Python DLLs that might appear in (or disappear from) other locations in the file system. Problem solved!

CentOS Revisited – How Old is Too Old?

I recently started working with CentOS 5.11 as an operating system on which to build Python wheels for Linux. I wrote about why I used CentOS 5.11. The oversimplified reason is because it’s old. After using it for a while, I’ve started to wonder, how old is too old?

Why CentOS 5.11, Again?

To understand why the authors of PEP 513 recommend CentOS 5.11, we must briefly consider a subject that Python programmers can usually ignore—binary dependencies.

When one builds a binary on Linux (or any system, for that matter), it comes with dependencies on runtime libraries. Even a simple “hello world” program will depend on the C runtime library (glibc if you build with GCC). The benefit of building binaries on an older Linux is that the runtime dependencies—particularly glibc—are likely to be present on newer systems. The reverse is not true. If you link your binary to a brand new glibc, it won’t be able to run on older systems because they don’t have the glibc needed to load your binary.

CentOS 5.11 was released on the last day of September 2014 (according to DistroWatch.org), so it’s only 1½ years old. But it’s a derivative of Red Hat which is a notoriously conservative distro. To give you some idea of how conservative it is, CentOS 5.11 provides Python 2.4.3 which was released in 2006, eight years before CentOS 5.11, and almost ten years before PEP 513 was released. CentOS 5.11 is a snapshot of what was state-of-the-art some years ago.

There’s nothing wrong with a Linux distro that chooses to be this conservative. If you want modern software, or bleeding edge, there are distros for that. RedHat Enterprise Linux (and thus CentOS) is not one of them.

CentOS 5.11’s “old school” attitude makes it a very safe bet that the versions of it base libraries (like glibc) will appear on other Linux distros, and that’s why it’s a good choice for building Linux wheels.

Great! Are There Any Downsides?

Yup.

The point of using an older Linux is to ensure that runtime libraries are not too new to appear on other Linuxes. But what if the opposite happens?

What happens if a library on CentOS 5.11 is so old that some of the Linux world no longer supports it? That’s what happened when I tried (for one of my clients) to wrap a Fortran library with Python and distribute it as a wheel. I built the Fortran code with the default GFortran/GCC version which was 4.1.2.

The resulting binary has a dependency on libgfortran.so.1. This library has become old enough that it’s not always easy to install. For instance, it’s not in the repositories of the very popular Ubuntu 14.04 LTS.

That’s particularly surprising when you consider that Ubuntu 14.04 LTS was released about six months before CentOS 5.11. Despite this, the former had already dropped support for the default libgfortran of the latter.

This is a good example of how CentOS 5.11 helps to avoid dependency problems, but doesn’t entirely solve them. In short, caveat munitor (builder beware).

How I Resolved the libgfortran.so.1 Dependency

I was able to build a wheel for my client that solved the specific libgfortran.so.1 dependency problem described above. I set the binary’s rpath to include the binary’s directory ($ORIGIN) and shipped libgfortran.so.1 as part of the Python wheel in the same directory as the custom shared library. The relevant Makefile portion looks like this—

gfortran -shared               \
         -fPIC                 \
         -Wall                 \
         -Wl,-rpath,'$$ORIGIN' \
         -o my_library.so      \
         $(OBJS)

And running ldd on the resulting library shows that the binary uses the local libgfortran.so.1 as intended—

$ ldd libmy_library.so
   linux-vdso.so.1 => (0x00007ffd0d93d000)
   libfftw3.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3.so.3 (0x00007f1b97297000)
   libgfortran.so.1 => /home/philip/miniconda2/lib/python2.7/site-packages/my_library/bin/./libgfortran.so.1 (0x00007f1b97000000)
   libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1b96cfa000)
   libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1b96ae4000)
   libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1b9671f000)
   /lib64/ld-linux-x86-64.so.2 (0x00007f1b9f8b9000)

Matplotlib Advice

A little bonus tip: if you want to install matplotlib on CentOS 5.11, save some time and read this Stack Overflow comment by Byron Dover first.