15 February 2023
Healthy and Sustainable Code Review
When we begin a new project or join an established project, one of the first questions we ask is: how do we ensure and maintain the quality of the codebase?
I’ll leave aside the exact definition of “quality” (there’s a “Zen and the Art of something maintenance” joke to be made here), but usually the answer is a constellation of processes and practices.
At Launch Scout, our practices include user story mapping (ensure we’re building the right thing), test driven development (ensure we’re building it right), and pair programming (help each other make it right).
We also, in many projects, practice code review.
I’m a pretty big proponent of code review (my Lone Star Elixir 2020 talk on “What I learned about code review from teaching 8th grade poetry” is how I originally met the crew at Launch Scout, though I wouldn’t work with them until two years later), and wanted to share what I’ve learned over my career thus far.
What is code review?
In short, code review is the act of reviewing code, with the stated goal almost always being to catch defects in the code. Modern code review usually entails examining changes, rather than the codebase at large, with the help of tools.
We can do formal code review, wherein one or more team members sit down with the explicit goal of examining a proposed change to the codebase, determining if that change passes muster, and giving corrective feedback if it doesn’t.
Using the review mechanism of pull requests on GitHub is an example of “formal” code review – you end up with a pull request that’s either approved or not.
We can also do informal code review, which can be as simple as “Hey, can you look at this for me real fast?” – feedback is ad hoc and usually verbal (or over a Slack message).
Why review code?
The explicitly stated goal of code review is to catch defects. Anecdotally, I find that code review catches relatively few actual defects – most of those are caught before the review phase.
The literature bears this out:
…although the top motivation driving code reviews is finding defects, the practice and the actual outcomes are less about finding errors than expected: Defect related comments comprise a small proportion and mainly cover small logical low-level issues. On the other hand, code review additionally provides a wide spectrum of benefits to software teams, such as knowledge transfer, team awareness, and improved solutions to problems. Moreover, we found that context and change understanding is the key of any review. (source)
Enough defects are still caught that even if that were the only benefit it’d still be worth reviewing code. I agree with the authors above that the sharing of knowledge and crafting of better solutions are both much more common and also very worthwhile. Additionally, the threat (promise?) that code will be reviewed applies positive social pressure toward writing better code.
Anything that keeps our mental model of the codebase more aligned with the actual codebase is a good thing.
Additionally, having a review phase in the process helps to shift quality from an individual effort to a team effort. Everyone owns the codebase, and everyone is responsible for its health.
…accountability is a team concept: no one person is ever solely responsible for writing or inspecting code. “You don’t get punished for making errors,” says Marjorie Seiter, a senior member of the technical staff. “If I make a mistake, and others reviewed my work, then I’m not alone. I’m not being blamed for this.” (“They Write the Right Stuff”) By checking each other’s work, we protect the individual. Defects that slip through are usually (and should be) the result of process failures, rather than individual effort.
The healthiest teams I’ve been on have shared ownership of the codebase and processes, and fix problems as they come in. The least healthy teams I’ve been on resort to blame and finger-pointing when defects are discovered (sometimes to such a degree that folks would choose to not review code out of fear of being stuck holding the bag if something went awry).
Healthy teams recognize that issues are inevitable, and have a whole slew of complementary practices designed to catch (and remedy) defects are early as possible. Code review is one component, but not sufficient by itself.
Almost every person on almost every team wants to do good and important work. We should name our processes and periodically reflect on them to see whether they help us in our goal of delivering value for stakeholders. When you have a team of skilled practitioners who are committed to doing good work and who share a common definition of “good work”, you can iterate ruthlessly on process toward empowering the team.
In order for us to improve the codebase, we must have a shared definition of “better” and a shared understanding of where the problems lie, ideally backed by data. Code review helps us keep our mental models of the codebase accurate, which in turn helps us to reason more accurately about the overall state of the codebase. It’s also a way of helping each other to do better work, particularly when it lets us take implicit patterns and make them explicit.
How do we review code?
The exact tool you use to review code will vary by team – for most of our projects, we use GitHub pull requests. In most cases you’ll look at the set of changes, rather than inspecting the entire codebase, and you’ll often have tools to assist you (including checks as part of your CI/CD pipeline).
When I am teaching developers how to review code, we focus on three keys in descending priority.
First, is the code correct? Can I as a reviewer see the relevant acceptance criteria, and are there tests that align with those acceptance criteria that exercise the code? Do the tests exercise the code thoroughly enough that I can feel confident in functionality?
If the code is not correct, it will not be valuable to the business. If there are logical errors or mis-implemented solutions, the code may in fact be harmful to the business.
Reviewers should not be shy about asking questions – “What happens if it’s a leap year?” or “do we need to validate that this field is shaped like a social security number?”
Reviewers might also leave matter-of-fact comments – “This year field is simply a positive integer. Do we need to handle cases where the user inputs a year that’s either in the very distant past or the far future?”
Second, is the code readable? Could a reasonably skilled junior developer trace the execution and make sense of both individual pieces and the thing as a whole? Are there explanatory comments that offer context, particularly in the event of surprising edge cases that stem from business requirements? Is the code where it should be?
We cannot change or build upon the code later unless we can understand it. With most readability issues, I encourage asking questions rather than leaving “please change this” feedback.
Finally, are there other notes? Do we have performance concerns? Are there other implementations worth considering? Nit-picky comments?
In terms of what order to read the code, I encourage reviewers to find the entry point. For a web application, for example, most work is kicked off via an HTTP request handled by a controller (possibly originating from the browser). So we start with the router, then the controller, and then we trace it through the execution path. With an asynchronous job, we might look at where the job is enqueued, then trace our way through the job itself and any code it calls. Or perhaps we start with a GraphQL query or mutation.
Finding the entry point lets our review follow something like a narrative, rather than simply going through the files in alphabetical order.
Not every pull request lets us review in this fashion. Sometimes we see a change to the business logic of an existing feature, at which point we’d first look at the acceptance criteria, then the new tests (any change to the code should be motivated by a test, which in turn would be motivated by a requirement), then the code that fulfills the test.
In all cases, though, we want to find some point of entry in order to review code. Without being able to trace the execution path, we cannot easily build a mental model of the change in question.
The best reviewers I’ve worked with primarily leave feedback in the form of questions, rather than statements. When they do leave statements, it’s a chance to teach others more about the system, rather than an attempt to shame the author.
When the amount of feedback grows beyond what fits comfortably in the normal code review process, that’s usually a great opportunity to pair-program with the author and collaborate on the work in question. We might also consider this more synchronous approach when we have questions about possible implementations, or when we see an opportunity to try something new or different.
Who should review code?
The short answer here is “every developer on staff.”
Most changes are fairly routine and relatively low-risk from an application security and stability perspective. However, these changes are still changes to how the software crystallizes aspects of the business.
By reviewing code (and pair-programming, and participating in user story mapping, and doing any of a number of other collaborative activities) we ensure a more accurate shared understanding between developers, and course-correct divergences between the codebase and our mental models of the codebase. We also have one last chance to collaborate on implementation details before work is merged.
When determining who should review a particular set of changes, we have a few interesting options. As a senior developer, I’ll usually ask more junior developers to review routine changes. This helps me to make sure the code I’m writing is readable and approachable. It may be tempting to leave errors in to see if they catch them (in the past I’ve worked with folks in the past who do this), but what I’d rather see is evidence that they’ve exercised the code in question.
When I’m looking at complex changes (particularly those that interact with systems my team doesn’t control, such as a third-party service for which we’ve built an API integration), I’ll usually ask the mid-level or senior developers for a review, and will sometimes request that they co-review with a junior developer. In these cases, I make sure to provide sequence diagrams and other artifacts that clearly outline how the system works and why it was designed that way, rather than leaving them to infer those things from the code.
If we’ve been pair-programming, we might ask someone not involved in the pair to review the code.
If the code is wide-reaching, we might wish to seek out feedback from folks on other teams as well, particularly if this work impacts their work in some way.
Ultimately, every developer on staff should be doing code review and should be supported as they learn to better review code.
What factors make code review more effective?
In code review literature, we can say a piece of feedback given in code review is “effective” if it effects (brings about) a change in the code. If you leave a comment on my code pointing out a readability issue and I go and fix the issue, that comment was effective. Conversely, if your comment produces no change in the code, then it was (by this definition) ineffective.
I’d argue that feedback is also effective if it brings about better understanding. If I leave a comment asking whether an intermediary record is useful when we can get the same data through a series of joins, and you explain that the business needs us to be able to reason about those relationships beyond what’s currently possible in the codebase, then the comment and discussion still changed my understanding even if they didn’t change the code.
To that effect, there are a few things we can do to make feedback more effective:
First, we can leave fewer comments. When there are lots of comments, it’s hard to sift through the noise. Ideally our test suite, static analysis tools, and CI/CD pipelines will help us to verify if the code is in a reviewable state and will find or fix the easy stuff.
Second, we can ask for review from people who’ve already reviewed this file. One of the biggest predictors of useful comments being left was whether the reviewer had at least previously reviewed the files in question, if not made a change. Source
Third, we can break our changes into much smaller diffs. The bigger the number of lines of code changed and the number of files changed, the less likely that defects will be caught. Conventional wisdom suggests keeping changes under 200 lines of code, and not above 400 Source.
One interesting side effect of segmenting work out into smaller chunks is that overall velocity usually improves even when review is required for all changes. Developers can do their segment of work and ask for review, and then review other changes while waiting. Teams that do this well usually end up with most changes waiting a few business hours (at most) for approval, and see their overall velocity climb.
Finally, we can create a review checklist, particularly for cases where we have a complex environment (e.g. micro-services). We’ve had some success using pull request templates to automatically include a checklist of standard review tasks. This helps to keep reviewers from having to rely on memory and instead places some of the load on a standard process.
Code review is a distinct step in the software development lifecycle from end-user acceptance. In the acceptance step, we ask the stakeholder to execute the feature in question and give feedback on functionality, rather than implementation. The reviewer should still do their best to evaluate code against the written acceptance criteria, but cannot account for requirements that either were unwritten or that have changed since the original writing.
How much code review is enough? How much is too much?
At minimum, developers should be self-reviewing their own changes prior to
merging them into main
.
Beyond that, it’s up to the team to determine the acceptable level of risk of defects getting into production. Ideally the team has tooling in place to enable/disable features (with feature flags or a similar pattern) and they have the ability to quickly roll back production releases. Teams should also have some internal data around defect rates.
My personal preference on most teams is that most if not all changes get reviewed before merging.
Teams that do well with the DORA metrics and who otherwise have good quality practices in play can and should calibrate review frequency to meet their needs (see “Review Roulette” below).
All high risk or far-reaching changes (significant new architecture, changes to encryption, authentication, or authorization) should get an extra set of eyes on them. A team that otherwise does an excellent job of catching defects prior to the review phase, particularly if that team has excellent collaborative practices, can probably reduce review frequency.
What anti-patterns should we avoid?
The most common code review mistake we see teams make is relying only on the senior-most developers to review code. This usually results in a log-jam. The longer a branch remains un-merged, the more likely it is that a breaking change will be introduced somewhere upstream.
Most changes in most codebases can be reviewed well enough by any developer on staff. There’s nothing wrong with calling in the senior engineers when needed – collaboration makes us better! But most changes are fairly routine.
Team leads can and should encourage (or possibly even require that) team members to review code, even if they do not feel fully qualified to do so.
The second most common mistake is having too large of changes. Review effectiveness drops after about 400 lines of change, which is good incentive to break things up into much smaller chunks. It’s harder to find issues and it’s harder to give meaningful feedback when the thing we are considering is very large – the only way to do it is to break it up into smaller pieces.
Smaller change-sets also encourage better and more useful comments.
Team leads should work with developers to segment work into smaller chunks when possible. This usually involves decision-making in the planning, rather than implementation, phase of a feature.
The next mistake we see is not having consistent expectations and definitions of “quality” throughout the team. Instead, it’s up to individual discretion, which in turn can lead to a fair amount of nit-picking relatively trivial choices.
Teams can remedy this by establishing guidelines, style guides, and the like. Simply writing a style guide (for example) is insufficient – team leads must make sure those resources are being used, and that these resources evolve to reflect new learning.
The last mistake we see is when reviewers do not consider the human relationships between team members. Code review (like any collaborative process) can only happen in an environment of trust and respect. This does not mean we should shy away from giving feedback, but it does mean we should consider how we give feedback.
What contextual variations might we consider?
Every team and every project has its own unique dynamics and needs. Your review practices will need to adjust to meet the needs of your projects.
Blocking Reviews
Many of the projects we work on have code review sign-off as a blocking step, meaning that no branch can be merged until it has first been reviewed, nor can developers push directly to main
. Teams can make these choices in the repository settings with most version control platforms.
The upside is that teams must go through the review process. The downside is that unless time is made throughout the day for this review, outstanding branches will pile up and teams will end up with a logjam of code waiting to be merged. The bigger this backlog, the more likely that a breaking change will be introduced.
If your team has blocking reviews, you should also strongly consider working towards very small (less than 250 lines of code, and ideally less than 100 lines of code) pull requests, as well as a cultural practice of reviewing as quickly as possible.
Review Roulette
If the defect rate in a team’s codebase is acceptably low (and tracked with actual data from the production system), they might consider applying a 1 in n chance of a review, rather than requiring every proposed change be reviewed.
(Team members can, as always, ask for review even if their number doesn’t come up)
Teams that do this still get the psychological effect of “I should do good work so the people who review my code don’t catch any issues” but without the extra overhead of lots of review.
Optional or Delayed Reviews
When teams already have good practices in place for ensuring quality (pair-programming, test driven development, stakeholder acceptance, very few bugs or exceptions in production and good monitoring and reporting), they might find that they don’t always need a blocking review phase.
Not every change merits the same level of scrutiny. An update to styling, or the addition of an extra field in a search form, might not need to be reviewed. On the other hand, anything with application or data security almost certainly needs an extra set of eyes. Teams might allow space for developers to choose whether a proposed change even needs to be reviewed.
When allowing an optional review phase, teams should establish criteria for “you definitely need a reviewer when…” to remove some of the ambiguity. These criteria should be short and easily digestible, and should not attempt to enumerate all scenarios.
Alternatively, teams might still require that all changes be reviewed, but might allow space to decide whether a change must be reviewed before merging. These teams can do delayed reviews. This means a change is put up as a pull request and review is requested, but if the developer has sign-off from the stakeholder and is justifiably confident that the code works as intended, they can merge it in. Then another developer can review and leave feedback or fix issues later.
Teams that practice delayed reviews can still choose to solicit review before merging, particularly with complex, risky, or foundational changes.
You might consider optional or delayed reviews if your codebase is pretty healthy and you have a small or very asynchronous team and want to maintain velocity.
Review Checklists
Some teams have particularly complex applications, or perhaps have not yet (or cannot) automate testing or other checks. Or the team may have a bunch of relatively junior developers. Or the project may just be complex enough that it’s difficult to keep it all in one’s mind.
In those cases, a team may consider creating a review checklist of common tasks, and might even consider even including this is a pull request template. By using a checklist, we can make it harder for little things to slip through the cracks.
Pair-Reviewing
We pair-program when possible so that we make better decisions and do better work. We also use pair-programming to introduce developers to new or particularly complex parts of the codebase.
Similarly, we can pair-review proposed changes. This lets two or more developers collaborate on the review and share knowledge.
Pair-reviewing can be a useful tool when the proposed change is very complex or high-risk, or when one or both reviewers are not yet comfortable with code review, or any other scenario when it makes sense to have an extra set of eyes.
Anecdotally, two developers pair-reviewing and talking through their review will leave better comments and catch more defects than those same two developers working independently would.
What if I’m on a team by myself?
Solo developers can and should still review their own work.
When I am a solo developer on a project, my process is usually to put up a pull request for some proposed change, then to give myself some time to work on an administrative task or planning the next segment of work, and then I go back and review my code as though it were someone else’s. This includes making notes of what I find using the review tools, and closing those out as I tackle them.
This is similar to how I edit my own writing, for example – I’ll finish a draft, take a break, and then revisit with an editorial eye once I’ve had a little bit of time to gain some distance.
The above practice is not limited to just solo developers – pre-reviewing one’s own work before asking for review for others is a great opportunity to improve that work.
Takeaways
- Code review is one practice within the broader collection of software quality processes, and it’s just one tool you can use as you build high-quality software.
- Code review will sometimes catch defects, but is primarily useful as a means of sharing knowledge and ensuring consistent mental models across the team.
- The team should have consistent expectations (among members) of their work.
- Smaller diffs are preferable. Reviews with a few high-impact comments are preferable to reviews with many comments of varying impact.
- Code review is a skill, and takes practice.
- As with any other process, the team should be empowered to modify as fits the needs of the team and business within their particular context.
- When reviewing, find the entry point. Prefer asking questions when possible, but calibrate this to what works best for you and your team.
The best teams I’ve been on had a healthy, sustainable, and robust code review practice, and were able to use it (alongside complementary practices) to substantially increase velocity while decreasing defects in code.
As with any process, it will take time to build comfort and to tune it to being the most effective for your team. As with any process, context matters a lot, and it’s more important that you find something you can do consistently and effectively than it is that you follow any particular doctrine.
If you’d like to discuss software quality practices (code review or otherwise) and how you might use them to improve the quality of work your team does, we’d love to chat with you.