Open Science

Home /
Open Science

Throughout my career, I have advocated for code sharing in health data research and taught researchers how to effectively use tools like GitHub, conduct code reviews, and understand the importance of open science practices. I have been involved with many amazing colleagues in writing articles and papers on this topic, and have been able to go out and teach these practices to researchers in a variety of settings. I also saw the huge benefits of these practices first hand when I was working at OpenSAFELY, University of Oxford, where we were able to quickly re-run studies and share our code with other researchers.

By promoting these practices, I aim to help researchers enhance their software skills and contribute to a more transparent and collaborative research environment.

With some significant progress made, such as recent changes in code-sharing guidelines by the BMJ, I am now expanding my focus to advocate for other essential software development practices, including unit testing, maintainable code and the creation of synthetic data that can be shared freely. By continuing to push for these standards, I hope to further support researchers in producing robust, reproducible, and high-quality research.

I plan to share both papers and blogs I have written here, and interesting and useful resources that others working in this space might want to read.

How Should Meaningful Evidence Be Generated From Datasets?

Paper

This paper, co-authored with the brilliant Dr Christopher Rentsch, delves into how transparency at every stage of research—spanning data collection, protocol preregistration, and code sharing—can elevate the credibility and relevance of findings. We try to challenge the assumption that large datasets alone guarantee high-quality evidence and provide actionable strategies to bolster transparency in epidemiological studies, even when datasets are less than ideal.

Code Review for Research Code

Blog Post

An overview of how to conduct a code review for research code

Sharing is Caring: Recommendations for Sharing Code

Paper

I was lucky enough to be involved in this study that quantified the extent to which programming code is publicly shared in pharmacoepidemiology, and to develop a set of recommendations on this topic. This work was primarily undertaken by Dr Anna Schulze and Dr John Tazare, two brilliant researchers at the London School of Hygiene and Tropical Medicine.

OpenSAFELY: Designing for Reproducibility

Paper

This article that discusses the design of the OpenSAFELY platform, which is a secure analytics platform for electronic health records in the NHS. This article discusses the importance of reproducibility in health data research, and how the OpenSAFELY platform was designed to facilitate this. This article is written by my fantastic, now former, colleagues at OpenSAFELY, University of Oxford, and covers how the OpenSAFELY platform was specifically designed with reproducibility in mind during the early part of the COVID-19 pandemic. Working in the Tech team from the very beginning of this project, was a fantastic experience and I am very proud of the work we did there.

A PhD in generating synthetic health data

Blog Post

This is an introduction to my PhD project and what I am hoping to achieve with it, which is to develop methods for generating realistic synthetic health data. This project is generously sponsored by SurrealDB, a multi-model database entirely written in Rust. I am using SurrealDB for a number of reasons, including its ability to do complex queries, vector searching and embedding functions that are useful for generating synthetic data.

Data Flows in the NHS and Research

Resource

This is an excellent paper for understanding how data flows in the NHS and research. It is by my good friend, and former colleague, Dr Jess Morley. Jess is a true genius when it comes to understanding data and the complexity of using AI in healthcare. We worked together at OpenSAFELY for a few years before she got her PhD (in record time!) and move onto a postdoc position at Yale University. This article and the accompanying website is shows how complicated the data flows are, with the various EHR providers, data controllers, and users. It is a must-read for anyone interested in health data research in the UK.

Re-running your study

Blog Post

A blog post written when I was working at OpenSAFELY, University of Oxford. This blog post discusses the importance of automated pipelines in research, and how they can help you re-run your study quickly and easily, as we did at OpenSAFELY.

Sharing study materials in health and medical research

Paper

An article co-authored with the brilliant Dr. Nick DeVito, one of my former colleagues at Oxford. This article in the BMJ Evidence-Based Medicine journal discusses the importance of sharing study materials in health and medical research, and how to do this in a way that is useful to others. It discusses current barriers to sharing and that sharing code is useful even if the underlying data cannot be shared.

Software development skills for health data researchers

Paper

This is an article that I wrote with colleagues in the BMJ Health and Care Informatics Journal covering the basics of modern software development and how it can be applied to health data research. In particular, this paper discusses the use of version control, unit testing, cataloging your environment, and documenting your code This paper aims to provide a jumping off place for researchers interesting in the tools and techniques used in software engineering and applying them, even in a small way, to their own research.

Citing and Crediting Codelists

Blog Post

A blog post written with Dr Jess Morley, when we were both working at OpenSAFELY in Oxford. This blog post discusses the what a codelist is, some ideas on how to improve discoverability and provenance to encourage reuse. We discuss how credit could be given to the creators of codelists, and how they could be cited in research papers. This blog post is intended to start a discussion in the research community about how we can improve the use of codelists in research.

Barely sufficient practices in scientific computing

Paper

An article which we propose a minimal subset of common software engineering principles that enable FAIRness of computational research and can be used as a baseline for software engineering in any research discipline.

Bringing NHS data analysis into the 21st century

Paper

This is an article that I was a co-author on, which discusses the barriers that the NHS has in developing and leveraging its data and how some of these barriers can be overcome. Specifically, we set out the need for: a 21st-century NHS analyst workforce supported by clear career trajectories and training opportunities; a culture of ‘build it once and share it to everyone’ built around modern, open analytic methods.

Why Researchers Should Share Their Code

Paper

This article that I wrote with my then colleagues, Dr Nick DeVito and Professor Ben Goldacre, discusses the importance of sharing code in health research. It is a short article that is a reaction to a retraction of a clinical trial report after a serious programming code error was discovered. This article discusses the importance of sharing code in health research, and how it can help to prevent errors and improve transparency.