PhD
- Home /
- PhD
Hey there! 😊 Let’s dive into what my PhD is all about and why it’s super exciting!
I’m working on creating synthetic data for electronic health record research. Now, you might be wondering, ‘What on earth is electronic health record research?’ Well it is research that studies the data captured during routine healthcare, like when you see your GP or attend hospital. This data is not collected primarily for research purposes, but it can still tell us a lot about health trends, causes, treatments, and outcomes.
If you have seen my other pages or heard me speak, you will know that I am a big believer in open science. I think that is is important that we are transparent about our research. In the world of electronic health record research, we are actually writing a lot of code to do our research and then writing papers about it. But sadly, the code that actually produces the results often stays hidden. To me, that’s a missed opportunity because seeing the nitty-gritty details of how research is done at the code level is so valuable!. It means people can check our work and see how we did it, and reuse our code for their own research, so we aren’t all reinventing the wheel.
So, why are we not sharing the code already? Well, some of us are but the catch is that the code often isn’t very useful without the data it’s supposed to analyse. But here’s where it gets tricky: we can’t just release people’s medical data. I mean, who would want their GP records shared publicly? Definitely not me!
This is where synthetic data comes into play. My goal is to create high-quality synthetic data that mirrors the structure of real data but is completely made up. This way, we can run our code on synthetic data and share everything openly with the study—no privacy issues involved!
I’m just starting out on this adventure, but there’s so much to look forward to. With advances in technology and the power of Rust (my favorite programming language), it should be possible to create super-complex synthetic datasets. My plan is to build a set of tools using Rust , Python and SurrealDB , a multi-model database, to generate synthetic data that can be used by researchers worldwide. I am delighted to be sponsored by SurrealDB for my PhD and I can’t wait to see where this journey takes me!
Blog Posts and Resources
SNOMED and friends
Blog PostThis blog provides an introduction to SNOMED codes and how they are used in routine care in the UK.
An Introduction to Electronic Health Records
Blog PostA quick primer on what is an electronic health record
A PhD in generating synthetic health data
Blog PostThis is an introduction to my PhD project and what I am hoping to achieve with it, which is to develop methods for generating realistic synthetic health data. This project is generously sponsored by SurrealDB, a multi-model database entirely written in Rust. I am using SurrealDB for a number of reasons, including its ability to do complex queries, vector searching and embedding functions that are useful for generating synthetic data.
Data Flows in the NHS and Research
ResourceThis is an excellent paper for understanding how data flows in the NHS and research. It is by my good friend, and former colleague, Dr Jess Morley. Jess is a true genius when it comes to understanding data and the complexity of using AI in healthcare. We worked together at OpenSAFELY for a few years before she got her PhD (in record time!) and move onto a postdoc position at Yale University. This article and the accompanying website is shows how complicated the data flows are, with the various EHR providers, data controllers, and users. It is a must-read for anyone interested in health data research in the UK.
Citing and Crediting Codelists
Blog PostA blog post written with Dr Jess Morley, when we were both working at OpenSAFELY in Oxford. This blog post discusses the what a codelist is, some ideas on how to improve discoverability and provenance to encourage reuse. We discuss how credit could be given to the creators of codelists, and how they could be cited in research papers. This blog post is intended to start a discussion in the research community about how we can improve the use of codelists in research.