Git Blame in your Python Tracebacks
In this short article I will demonstrate a nifty trick to get your python tracebacks annotated with git blame
information, specifically who last modified each line and when. This is quite useful in the process of routing CI failures to their appropriate owner.
Here’s an example of such a traceback:
Ignore the code itself, or why it failed — we’re just using code from the real-easypy python package for the purpose of this demo.
So how do we annotate this traceback?
Helpful Modules: traceback & linecache
Normally (that is — in CPython), when python renders a traceback, it uses an internal implementation, which we can’t really tinker with. So the first thing we must do is replace the default python exception hook with a pure-python implementation. Luckily, such a function is readily available in the built-in traceback
module:
Now — it turns out that whenever the traceback
module wants to show you a line of code, it uses the built-inlinecache
module, kinda like this:
As the name suggests, the module caches source code lines to memory.
What if we change the lines in this cache?
Now let’s run this all together, and confirm the traceback is affected:
Assigning Blame
Let’s now do something useful with our newfound super power.
Let’s see how we get git blame
annotations and inject them into the linecache:
The above function takes a path to a source file, runs git blame
on it, extracting the author’s email and commit time. It then goes through each line, parsing it using a regular expression, and formats a new, annotated line, using the extracted information. The annotation is add as if it were a code comment, prefixed by a distinct ###
separator.
Let’s use this function to inject blame on one of the modules in our project, as an example. Here’s the result:
A welcome surprise
Quite conveniently, it turns out many other tools use the linecache module to extract source code. Below you can see how PuDB now shows the annotations without any special code:
Next Steps — Integrating
Of course, this isn’t quite ready for production. We’ve injected the annotations to only one of our modules. A more complete solution would traverse the project’s directory structure and populate the linecache
for each python module. Or we might decided to monkey-patch the linecache.getlines()
function to perform this git blame
extraction “lazily”.
Additionally, our code may run in an environment where the git repository is no longer available. In that case we may want to generate the blame
information when packaging the code, storing it alongside. We will then modify the above example to read from the stored files instead of running git blame
directly.
Finally, since running git blame
on a large project could take a while, we might also want to cache this data to speed up the packaging time.
Conclusion
I’ve shown here a useful technique for making tracebacks more informative, using git blame
and the linecache
module. At our R&D organization it has made it easier to troubleshoot bugs — A quick look up the traceback and we can find a line that changed recently, getting us closer to the root cause.
And if the solution isn’t found — at least we’ll know whom to blame.