Code Review for Research Code

Code Review for Research Code

This blog is a written version of a talk I have given a few times now on how to conduct a code review for research code. I am writing up to provide a reference for those who have attended the talk and for those who are interested in learning more about code review for research code.

This guide is intended to be a high-level overview of what a code review is, why you should do it, and how to do it. It does require some knowledge of Github, but I will try to explain things as I go along. If you have any questions, please feel free to ask me.

What is a code review?

A code review is a process where someone other than the author of the code looks over the code to check for errors, bugs, and other issues. This is a standard practice in software development, but it is much less common in research code. This is strange when we consider that many of us engaged in research these days are actually writing complex code that is used to generate important results. It is essential that this code is correct and that it is doing what we think it is doing.

A code review at a basic level is someone looking at the code you have written and checking it. However there are many different ways of reviewing code and what is key, from my experience, is making sure that the reviewer and the reviewee are on the same page about what is expected from the review. This is why I always start a code review by asking the author what they want from the review. This can be as simple as “I want you to check that this code is correct” or “I want you to check that this code is readable” or “I want you to check that the code implements the study protocol correctly”. This sets the tone for the review and helps the reviewer to know what to look for.

How to conduct a code review

This is a high-level overview of how to conduct a code review for research. It assumes that both the “reviewer” and the “reviewee” are signed up with Github or similar, and that the code is hosted on Github. If you are not familiar with Github, I would recommend looking at some of the many tutorials available online.

Step 1: Set up a code review

The first step is to set up a code review. This can be done in Github by creating a “pull request”. A pull request is a way of suggesting changes to a codebase. It is a way of saying “I have made some changes here and I would like them to become part of the codebase”. For this to work, it is essential that the person writing the code is coding on a different branch. This is a good practice anyway, as it means that you and your team can work on different parts of the code at the same time without interfering with each other, and it makes it easier to roll back changes if something goes wrong. Github has good documentation on what branches are and how to use them, and this page is a good place to start.

To create a pull request, go to the repository where the code is stored and click on the “New pull request” button. This will take you to a page where you can compare the changes you have made to the codebase. You can then add a title and a description of the changes you have made. I strongly recommend that you use the description to explain what you have done, and if needed, why you have done it.

Review Request
Figure 1: Example of a pull request on Github

Unlike in software development, where the purpose of the review is usually self-explanatory, in research code, the purpose of the review is often not clear. This is why I always recommend that you add something to describe what you want from the review. I might say “I have made some changes to the code to implement defining of the study population as per the first section of the study protocol (linked here). I would like you to check that the code is correct and that it implements the protocol correctly”. This sets the tone for the review and helps the reviewer to know what to look for. It is also good to understand if you are expecting the reviewer to run the code - usually this means running the code against data. If this is expected then you should make sure that the reviewer has access to the data or you should provide a dummy dataset. This is important as it is very difficult to review code without running it.

I would also recommend that if you anticipate your reviewer is going to need access to any external material, for example, the study protocol, that you link to it in the description. This will save time and make sure they are using the most up to date version.

You can also assign reviewers to the pull request. This is a good idea, as it means that the reviewers will get a notification that there is a pull request that needs reviewing.

Step 2: Review the code

For the reviewer, the next step is to review the code. This can be done in a number of ways, but I would recommend that you start by browsing through the files that have been changed using the Github interface. This is usually done by clicking on the “Files changed” tab. This will show you all the changes that have been made to the codebase. You can then go through these changes and check that they are correct. I usually mark them off as I go along, so that I can keep track of what I have reviewed and what I haven’t.

Review Request
Figure 2: What the reviewer sees when they are first reviewing the code

If you click on the line number on the left-hand side of the screen, you can add comments to the code. You can also drag downwards to highlight a section of code and add a comment to that section. This is a good way of giving feedback on a whole section of code. This makes it much easier to have a conversation about the code as both you and the author can see what you are talking about.

Review Request
Figure 3: A specific comment on a line of code

Github has also introduced a new feature where you can make small edits to the code directly in the review. This is a good way of suggesting small changes to the project that you can just fix yourself, for example, you might find a grammatical error in the readme file, and it is quicker to correct it rather than write a comment asking the author to do it.

Once you have reviewed the code files, I would recommend that you add a general overall comment. You will be offered an opportunity to do this when you submit your review. This is a good place to summarise your thoughts on the code and to suggest any overall changes that need to be made. For example, you might say “I have reviewed the code and I think it is correct. However, I think that the code could be made more readable by adding comments to explain what each section of the code is doing”. This is a general comment that applies to the whole codebase. You will then be prompted to submit your review as a comment, an approval or request changes. I suggest that you just use the latter two options. If you are happy with the code, you can approve it. If you think there are changes that need to be made, you can request changes. This will send the code back to the author for them to make the changes.

Review Request
Figure 4: An overall comment on the code

Step 3: Make changes

For the author, the next step is to make the changes that the reviewer has requested. This is usually done by going back to the code and making the changes. Once you have made the changes, you can commit them to the branch you are working on. This will automatically update the pull request. You can then ask the reviewer to review the changes again. This might not be necessary if the changes are small but I think it would be good practice to ask the reviewer to check that you have made the changes they requested.

Step 4: Merge the code

Once the reviewer is happy with the code, they can approve the pull request. This will allow the author to merge the code into the main branch. This is usually done by clicking on the “Merge pull request” button. This will merge the code into the main branch and close the pull request. The code is now part of the main codebase and is available for everyone to use.

What makes a good review as a reviewee?

Many of these suggestions are things that can make you write better research code in general, but they are particularly important when you are asking someone to review your code. I use the “if I got hit by a bus” test. If I got hit by a bus tomorrow, would someone else be able to pick up my code and understand what I was doing? If the answer is no, then I need to make my code more readable. Here are some suggestions for what makes a codebase easier to review:

  1. Files should be well named and in a logical order. This makes it easier for the reviewer to find the code they are looking for.
  2. Highlight particular area of importance or that need a review
  3. Link out to any external material that is important for the review, such as the protocol
  4. Comment in your code such as inline comments or docstrings.
  5. Name your variables and functions clearly, and sensibly. They should be descriptive of what they are doing. No-one wants to review a function called function1 that takes variable1 as an argument. It is better to say calculate_mean_episodes that takes episode_data as an argument.
  6. Consider refactoring your code so that it is easier to read. Here we might want to remove thing like list comprehensions or nested loops, that might make your code shorter but harder to read. This is part of what is called “Cognitive Refactoring” as described by the excellent book The Programmer’s Brain by Felienne Hermans.
  7. Try to avoid results that are outputted on the console. Output all results to files that can be examined.
  8. Clearly mark sensitivity analyses or other parts of the code that are not part of the main analysis.
  9. Add a readme file that explains at a minimum how to run the code
  10. Document your environment and dependencies. For example, a requirements.txt file is a good way of doing this in Python.
  11. Consider using style guides such as PEP8 for Python or Google’s R Style Guide for R. This will make your code more readable and easier to review. If everyone in your research group uses the same style guide, it will make it easier for you to review each other’s code.
  12. Related to the point above, consider making use of tools that format your code for you in a standard way. For example, in Python, you can use Black or Ruff to format your code. This will make your code more readable and easier to review. Smaller libraries like isort can also help to keep your imports in order.
  13. Consider adding your code to your branch in small, well-defined chunks as commits. You can add multiple commits to a branch and it makes it easier to review your code as your review can choose to click though the commits sequentially, and really understand the changes you have made. This blog has some good advice on what makes a good commit, and in particular how to write a good commit message.

And finally, the most important thing is to ask for review frequently. This makes it much easier to review your code as you go along, rather than having to review a large codebase all at once. Trust me, I have been asked to review code that is thousands of lines long and it is not fun, and what’s more, it is not effective. I would much prefer to review code in small chunks as it is written.

What makes a good review as a reviewer?

  1. Make sure you understand what the author wants from the review. This will help you to know what to look for.
  2. Be clear what you have done - what have you reviewed and what have you not reviewed. Have you run the code against data?
  3. Be kind. Your job is to help the author make their code better, not to make them feel bad about their code. This means giving praise as well as criticism and trying to make the criticism constructive. Feedback that is “this is terrible” is not helpful. Feedback that is “I think this would benefit from separating this function into two functions” is helpful.
  4. Be specific. If you think something is wrong, try to explain why you think it is wrong. If you attach your comments to the specific lines of code that is even better.
  5. If the code is overall very bad, consider suggesting that the code review should be done in person. This can be a good way of explaining what you mean and helping the author to understand what they need to do to improve their code, rather than calling out every single mistake in the code.

Rewarding code review

We must acknowledge the reality of many academic departments where researchers are working mostly on their own on single topic research projects, whilst sitting in a larger research group. This can make it difficult to get code reviewed as your colleague might not know the ins and outs of your project, and it has the potential for being time consuming. I have thought a lot about this and I think that there are ways that we can make code review more rewarding for both the reviewer and the reviewee. The first thing is academic credit. If I have had more than a cursory glance at your code, I would absolutely expect to be a named author on the paper - stick me near the middle to end of the author list, but I should be there. I have contributed to the project.

The second thing I have seen work really well is having a buddy system. So if I review your code, you review mine. This can work really well as you can learn a lot from reviewing other people’s code. You can see how they structure their code, what they do well and what they do badly. This can help you to improve your own code. Furthermore you get to know what other people are working on, which can be really helpful in a research group.

Buy in from senior leadership

Finally, I think that it is important to get buy in from senior leadership in your group. If your PI is not on board with code review, it is going to be very difficult to get people to do it. I would recommend that you have a conversation with your PI about the importance of code review and how it can help to improve the quality of the code that is produced by the group, and improve everyone’s coding skills. Frank discussions about authorship and how code review can be rewarded are also important.

Conclusion

I hope that this blog has given you an overview of how to conduct a code review for research code. I think that code review is an important part of the research process and that it can help to improve the quality of the code that we produce. I would encourage you to start reviewing code in your research group and to ask for reviews of your own code. It can be a bit daunting at first, but I think that it is worth it in the long run. If you have any questions, please feel free to ask me. I am always happy to help.

comments powered by Disqus

Related Posts