Pruning Deadwood

Nov 12, 2020

Mark Connell Engineering Director, Applications

Software developers generally lean on principles and techniques when creating or modifying software. Some of these are design principles, like the SOLID principles. Others might be process or techniques like test-driven development, or pair programming. Ultimately, why we follow these principles and methods is to create better, more maintainable software with as few defects as possible. Helping us make our code more reliable, and easier to grok and change in the future.

One principle I strongly believe in is to try and leave things in a better condition than you found them. Martin Fowler describes this as ‘Opportunistic Refactoring’. The general principle is that as we’re writing software, if there is an opportunity to improve the code that’s related or near to our changes, we should try to carry out that improvement for the future benefit to yourself and other developers.

One recurring theme I’ve seen when working in codebases with a reasonable history, and a number of contributors, is deadwood: code that exists in the codebase, but upon closer inspection, isn’t actually doing anything and can be safely pruned. This is code that might have accidentally been added in the past, and simply an oversight of the original author. Or it could be code that perhaps should have been removed in a refactor, but got missed because the developer either didn’t know it existed, forgot about it, wasn’t sure if it could be deleted, or perhaps cynically, they didn’t feel it was worth their effort to cleanup whilst making a change.

If we can clean up our unused code, we stand to benefit over both the short and long term. For today, we’re reducing the cognitive load required when trying to scan and understand our code. We’re no longer getting side-tracked with questions or uncertainties about code in the peripheral of where we’re working. And we potentially make our test suite a little bit faster, not having to run defunct tests. Longer term, we’ve a cleaner codebase that will be easier to work on in the future, for yourself or future developers.

Additionally, the longer dead code resides, the more resistance or hesitancy there might be in deleting it. I’ve removed code that had sat unused for 10 years in a codebase. No one had needed to alter the code in that time, and glancing at the code, you could perhaps assume that it must be used somewhere as it looked to be doing something important. It’s only after taking the time to understand that code, would you realise it was in fact defunct. You might argue it wasn’t doing anyone any harm sitting there unused all that time. However, code implements intent of a system. Unused code can mislead from this intent, skewing our perception of our expected behaviours, and influence how we might implement something based on existing code and patterns present in the source.

If we don’t need it, let’s not keep it lying around.

So.. how do we spot deadwood?

Unfortunately, this can be easier said than done at times. We can try and automate some of the task by using static analysis tools like Ryan Davis’ debride if we’re in Ruby. Generally though, for dynamically typed software, I think there’s value in having an awareness of how to identify deadwood whilst we’re in and about our code.

There’s probably numerous ways of identifying dead code, but the following is some of my current thinking and approach to uncovering bits we can remove.

1. Known dependency changes

The easiest piece of dead code to spot is one where you’ve switched out a library, and some clear references for the old library are still kicking around. For example, say you switched your billing gateway provider from X to Y a few years ago. Every time you encounter a reference to X, you could probably assume that it’s no longer used. And provides a nice starting breadcrumb for tracking down methods, classes, files that are now defunct.

2. Historic knowledge

The second is one that comes with knowledge of the codebase over time. If you’re in and about code, you develop a sense of what it is, and the expected behaviours of a system. You may begin to observe methods that you’ve never seen used, or had to interact with. This is when it’s time to do a little sleuthing. If you follow the code and stumble upon something that can be removed, great. If you follow the code and it’s valid and shouldn’t be removed, you’ve just discovered something new about the system you’re working in that you didn’t know about before.

3. Take the time to investigate and develop domain knowledge

The hardest to spot, and the one that requires the most diligence, is verifying code that is on the peripheral of where you’re working. Perhaps in a context of the system you’re not completely familiar with. It can be easy when adding functionality, to step into an existing class, add behaviour, and jump back out. After all, you’ve just added code to the class, so everything else that’s already there must be used, right?

Taking the time to learn and understand surrounding code feeds into that bucket of knowledge for a codebase, which in turn provides a greater context to know what code might be dead within a codebase.

Ok. I think I’ve found something that can be removed. What can I do to verify?

For this specific example, my assumption is we’ve discovered a potentially unused method within a Ruby class. And our source code is versioned in a Git repository. We’re also glossing over meta-programming for this example. If you do leverage meta-programming, the following process still holds true, but will require more diligence and following the history closer to find places where the code may have been invoked.

1. Find current references

If we grep the source, do we find any occurrences of the method being used? If we only see references to the method signature and references within our tests, we might be onto a winner.

If we have a number of references, we have to follow that rabbit hole to see if any code that uses our method in question is itself dead.

Great, we think this method should be removed. But before we nuke the method from high orbit, let’s just do a little bit of diligence just to make sure we’re not going to upset anyone or anything.

2. Find the point of creation, and the last reference of usage

Commit history can be a really powerful tool. If you can find when a piece of code came into existence, it will usually lead you to the context of why it was added, and where it was likely to have been used. Likewise, if we can find commits that show the usage of the method was removed, it gives us some additional confidence that we’re not breaking something unintentionally.

My go-to for a Git repository is to view a patch log for a given file. e.g. Say we were investigating the origins of a method revoke_access_token! in a User model for a Rails application, we would do something like:

git log -S"revoke_access_token!" -p app/models/user.rb

We’re doing a few things here. We’re asking for the history of changes to the user.rb file, -p is denoting we want to display patches, or deltas of changes that happened in each commit. And the argument -S"revoke_access_token!" is filtering our log to only include commits that match the provided string.

This may fall down if the name was refactored at some point, but the same approach would still work for following the historic naming also. What we want to answer is ‘when did this come into existence?’, and ‘when did we stop using this?’

To answer the first question ‘when did this come into existence?’, our current command should work nicely. The oldest commit returned should give us a point of existence, and hopefully some additional context on the original intent of the method, and possibly even the exact places it was used.

To answer the second question ‘when did we stop using this?’, we may need to be looser with our searching. If we’re looking at a public method, we can assume that usage of the method would happen outside of the class itself. If we drop the path option to Git log, we can scan the full Git log for the use of our method with:

git log -S"revoke_access_token!" -p

If it’s defunct, the expectation would be to see the top (or top most) commits returned from the log with deletions of the methods usage. If we see additions, and compare them to the current code they exist, then further investigation should be carried out to determine if it is actually deadwood.

3. Delete the code, and verify the world continues to work

Assuming we’ve found when the method was added, and we’ve found the last place it was removed in history. We should commit our change to remove our dead method (and its defunct tests), and verify everything is continuing as normal. Be it automated tests, or manual smoke testing. There should be no surprises from making this change, so we’re just doing this to be good citizens, and performing diligence on our changes.

4. Show your workings in a pull request

We’ve stepped through time to give us the confidence to safely remove our code. When raising a pull request for removing some unused code, it can be very helpful for the reviewer to be able to be able to peruse the abridged history themselves to help them review and confirm your intended changes.

To quote Mark Twain:

I didn’t have time to write a short letter, so I wrote a long one instead.

So firstly, well done if you managed to read this far. Hopefully some of the information and thoughts here might inspire you to try and clean some deadwood the next time you’re in and about some code!