Git Blame in your Python Tracebacks

Ofer Koren
3 min readFeb 12, 2021

--

Photo by Samuel Regan-Asante on Unsplash

In this short article I will demonstrate a nifty trick to get your python tracebacks annotated with git blame information, specifically who last modified each line and when. This is quite useful in the process of routing CI failures to their appropriate owner.

Here’s an example of such a traceback:

The basic python traceback

Ignore the code itself, or why it failed — we’re just using code from the real-easypy python package for the purpose of this demo.

So how do we annotate this traceback?

Helpful Modules: traceback & linecache

Normally (that is — in CPython), when python renders a traceback, it uses an internal implementation, which we can’t really tinker with. So the first thing we must do is replace the default python exception hook with a pure-python implementation. Luckily, such a function is readily available in the built-in traceback module:

Replacing the default python exception hook

Now — it turns out that whenever the traceback module wants to show you a line of code, it uses the built-inlinecache module, kinda like this:

Using linecache

As the name suggests, the module caches source code lines to memory.
What if we change the lines in this cache?

Meddling with the cache

Now let’s run this all together, and confirm the traceback is affected:

Tinkering with the traceback — Note line #27

Assigning Blame

Let’s now do something useful with our newfound super power.
Let’s see how we get git blame annotations and inject them into the linecache:

Inject annotations from `git blame` into linecache

The above function takes a path to a source file, runs git blame on it, extracting the author’s email and commit time. It then goes through each line, parsing it using a regular expression, and formats a new, annotated line, using the extracted information. The annotation is add as if it were a code comment, prefixed by a distinct ### separator.

Let’s use this function to inject blame on one of the modules in our project, as an example. Here’s the result:

An annotated traceback (scroll to the right!)

A welcome surprise

Quite conveniently, it turns out many other tools use the linecache module to extract source code. Below you can see how PuDB now shows the annotations without any special code:

Next Steps — Integrating

Of course, this isn’t quite ready for production. We’ve injected the annotations to only one of our modules. A more complete solution would traverse the project’s directory structure and populate the linecache for each python module. Or we might decided to monkey-patch the linecache.getlines() function to perform this git blame extraction “lazily”.

Additionally, our code may run in an environment where the git repository is no longer available. In that case we may want to generate the blame information when packaging the code, storing it alongside. We will then modify the above example to read from the stored files instead of running git blame directly.

Finally, since running git blame on a large project could take a while, we might also want to cache this data to speed up the packaging time.

Conclusion

I’ve shown here a useful technique for making tracebacks more informative, using git blame and the linecache module. At our R&D organization it has made it easier to troubleshoot bugs — A quick look up the traceback and we can find a line that changed recently, getting us closer to the root cause.
And if the solution isn’t found — at least we’ll know whom to blame.

--

--

Ofer Koren
Ofer Koren

Written by Ofer Koren

A pythonist at heart, dabbling in Devops and Automations of various kinds

Responses (2)