SNOMED and friends

SNOMED and friends

This blog is a short introduction to SNOMED CT from my perspective as someone who has interacted with SNOMED in a variety of different ways. I am going to try to reflect a variety of points of view from my time as a clinician, an epidemiologist using electronic health records for research and a software engineer, trying to create realistic synthetic data.

What is SNOMED CT?

SNOMED CT stands for Systematized Nomenclature of Medicine - Clinical Terms, and it is designed to be a comprehensive multi-lingual set of clinical healthcare terminiology that can be used to record and exchange clinical health information.

A term is a description of something such as a disease or a symptom, and with that term is a number - between 6 and 18 digits long, without padding. Meaning no leading 0s if it is shorter than 18 digits.

SNOMED Code SNOMED Term
49727002 Cough (finding)
103001002 Feeling feverish (finding)

There are SNOMED codes for all sorts of things and it is designed to capture quite fine grain clinical data with precision, in a consistent way. There are lots of reasons this is useful but if we think about a database, it is much easier to navigate, use and sort if the data recorded is cleaned and made uniform. For example, SNOMED allows clinicians to choose “asthma” and record it as a SNOMED code, rather than writing asthma as freetext, or ASTHMA, or Atshma or Has Asthma, or the myriad of other ways we might be tempted to write this down or mispell it. This means that when someone tries to find all the asthmatic patients in a clinic, they can search for asthma and get a list. Or at least this is how the theory goes. We will come on to why it is a little more complicated than this later on.

SNOMED is maintained and distributed by SNOMED International, a non-profit organisation. In the UK, we use a slightly modified version of SNOMED to add some UK context.

Understanding what SNOMED CT is lays the groundwork, but to truly grasp its power, we need to explore how it’s organised.

How is SNOMED structured?

SNOMED is organised in a set way. It is a tree structure. A SNOMED code can be both a child of another code, and a parent to one or many other codes. For example, if we look up “cough”, we can see that it is both a child of “Respiratory function finding (finding)” and a parent to 38 “types” of cough. For instance, “dry cough” or “cough with fever”.

There are also multiple descriptors for each code, and a preferred term. For example, “cough (finding)” is the preferred term for cough, but “Observation of cough” also applies. This does bring up some difficulty because sometimes these mutliple descriptors are slightly different. I chose cough as a good example for this. To some people “cough” and “Observation of cough” will be the same thing, but to others “cough” might mean patient reports a cough, whereas “Observation of cough” means ‘I, the doctor, saw the patient cough". This may seem trivial but can be important when we are thinking about using this data for research.

What is a SNOMED relationship?

In SNOMED CT, relationships are a crucial component that define how different concepts are connected to each other. These relationships provide the semantic structure that makes SNOMED CT more than just a list of medical terms. There are three main types of relationships in SNOMED:

  1. Is-a Relationships: These form the backbone of SNOMED CT’s hierarchical structure. They define subtype relationships between concepts. For example, “Bacterial pneumonia” has an is-a relationship with “Pneumonia”, indicating that bacterial pneumonia is a type of pneumonia.

  2. Attribute Relationships: These define characteristics of a concept. For instance, the “Finding site” relationship might link “Pneumonia” to “Lung structure”, indicating where pneumonia occurs.

  3. Historical Relationships: These track changes in concepts over time, such as when a concept is replaced or split into multiple new concepts.

Relationships in SNOMED CT allow for complex queries and inference. For example, you could use these relationships to find all respiratory conditions that affect the lungs, or all bacterial infections that might cause fever. This network of relationships is what gives SNOMED CT its power as a comprehensive clinical terminology, enabling precise and flexible representation of clinical information.

Understanding these relationships is crucial when working with SNOMED CT data, whether for clinical use, research, or developing health information systems. They allow for more nuanced and accurate data retrieval and analysis, which is particularly valuable in epidemiological research.

This sounds great but it does require the clinician to have some sort of understanding how the underlying data dictionary works. I can speak from experience that even though I was working in epidemiology and training to be a GP at the same time, I didn’t necessarily always use SNOMED in the way it was intended to be used. Left and right provide a good example of this. If you have a patient who has a condition that affects one of two things in their body, such as a hip fracture, you might intuitively expect that you can search for left hip fractuure and find a code. If we do this, we find that the only option is “Closed reduction and dynamic hip screw of fracture of left femur” which indicates the sugrical procedure that the patient had. This might not be what you want.

What we are supposed to be doing is finding “Hip fracture” and then also finding “left” or “left hip” to construct a relationship. This rarely happens in practice however. I, like most people, not understanding the full power of SNOMED here would have most likely have chosen “hip fracture” and then put in the free-text next to this “left”. As I mentioned in another blog , researchers don’t get the free text information usually so we end up in a situation where we have good recording of all the hip fractures at a national level but no breakdown of right or left spread. This is a trivial example but is an important illustration of some of the problems with SNOMED, and how it is integrated into the clinical system. We are further incenticised not to use the qualifier value of “left” because diagnostic codes get added to the list of past medical history of the patient notes, and having “left” here does not exactly make sense at a summary level.

Despite its logical structure and relationships, SNOMED CT isn’t without its quirks. Let’s take a lighthearted look at some of the more unusual aspects of this system.

Weirdness of SNOMED

A fairly common side hobby of epidemiologists is collecting weird codes that you come across in SNOMED. There are strange “families” of codes that defy explanation. For example, there is an extensive part of the tree for the various injuries that you might sustain by falling aircraft. I am sad to report that the extensive dog breed tree has been marked as inactive by NHS digital. It was a fantastically weird part of SNOMED that included everything from miniture huskies to greyhounds, and everything in between. I note however they have kept grey wolf just for good measure.

CKD codelist
Figure 1: A few of the old dog breeds in SNOMED

Whilst we are on the subject of dogs, there is a large selection of options for dog bite (dog bite of foot, dog bite of nose, etc). This brings us back to the oddness of the direction and body parts mentioned above. You might expect that dog bite plus body part would be the most correct way of denoting this.

It seems that NHS digital at least have started to clear up some of these extensive codes in recent years. I understand the reasons, althouugh I will miss the strangeness of “Atomic power plant malfunction in watercraft, water skier injured (event)” or the 10 different types of squirrels there were. Times are changing.

Let’s talk next about codelists!

The second order consequences of SNOMED and friends

The richness of SNOMED and the other coding systems allows us to record fine-grain clinical data in a structured way. This is great for clinical care, as it allows for more accurate and detailed patient records. However, it also has some unintended consequences in research. Epidemiologists and other researchers often use electronic health records (EHRs) to study disease patterns and outcomes. These records are a rich source of data, but they are not always easy to work with. One of the biggest parts of a research project using electronic health records is creating a series of codelists that define the conditions, treatments, and outcomes of interest. These codelists are used to extract data from the EHRs and analyse it.

This is easiest explained with an example. Let’s say you are interested in studying the outcomes of patients with chronic kidney disease (CKD). You would need to create a codelist of SNOMED codes that represent CKD. It is not enough to search for “chronic kidney disease” and use the first code that comes up. You need to carefully review the hierarchy of codes and select the most appropriate ones. There are often multiple codes that could represent the same condition, and a patient can have multiple codes for the same condition. This can make it challenging to create codelists that accurately capture the patients you are interested in.

If we want to find everyone with CKD, we probably need several strategies to do this. We might search for “Chronic kidney disease” and all its children, but we might also need to search for “End stage renal disease” and “Renal failure” and “Renal insufficiency” and all their children. This is because different clinicians might use different terms to describe the same condition. We will need to make decisions about whether to include trauma to the kidney resulting in CKD, or if we want to exclude patients with CKD who have had a kidney transplant.

Now we need to consider the other ways that CKD might be recorded, such as referral to a CKD clinic, or a prescription for a CKD medication. Are they relevant here to represent a patient with CKD? After all a patient might have had an unnecessary referral to a CKD clinic, or a prescription for a CKD medication which was actually also used for something else.

Finally we might want to consider their kidney function tests, such as eGFR (a measure of kidney function), and we will need to make decisions about what level of eGFR we consider to be CKD. Does it need to repeatedly be below a certain level, or is one test enough?

CKD codelist
Figure 2: An example CKD codelist from OpenCodelists

A lot of time and effort goes into creating and validating these codelists. I have spent many hours reviewing SNOMED hierarchies and discussing codes with clinicians to ensure that the codelists I create are accurate and comprehensive. I have also often wondered if anyone has the code in their record that we are discussing at great length.

How does SNOMED map to other ontologies?

SNOMED can be mapped to ICD10 quite easily. The SNOMED CT browser gives us options in the classification map. For example, telling us cough can be happed to ICD10 code R05.X Cough. When we do any mapping from SNOMED to pretty much any other systems we are usually losing information. There are hundred of thousands of SNOMED codes, including some pretty esoteric rare conditions which might just map to a very common condition in ICD10. For example, Cough, noctural cough or postural cough all just map to R05.X Cough.

Mapping the other way can also have some issues. As SNOMED is much more specific, there might be a code for the common ICD10 term for it. This is rarely a problem but can happen as knowledge about diseases get updated.

Understanding SNOMED’s relationship to other systems is crucial, but equally important is knowing how to work with SNOMED itself. Let’s look at some tools that can help navigate this complex system.

Tools for using SNOMED

NHS Digital release SNOMED from the attractively named TRUD website intermittently. It comes as flat files with tabs between cells and it is not the easiest thing to work with. Working out what files does what is half the problem. The readme that come with the releases assumes a lot of knowledge and does not cover simple things such as where file contains the whole list of code and terms, and which list has relationships. You have to sort of figure it out.

The SNOMED CT browser is a somewhat slow but reasonably clear website that you can use for individual searches. It does not scale particularly well however and to my knowledge there is not a publicly assessible API.

There are some custom tools out there, including OpenCodelists which I used to work on back in the day. There are a range of open source Github projects in various languages and states of abandonment that might be useful. Certainly for my PhD project, I am having to write my own scripts to do this because I want to make sure that it is loaded correctly.

Conclusion

As we’ve explored throughout this post, SNOMED CT is a powerful and intricate system that plays a crucial role in modern healthcare information management. From its comprehensive terminology to its complex hierarchical structure and relationships, SNOMED CT offers a standardised language for clinical data that supports both patient care and medical research.

However, like any complex system, SNOMED CT comes with its own set of challenges:

  1. Its vast scope and detail can be overwhelming for clinicians to navigate in day-to-day practice. Most clinicians don’t truly understand the full power of SNOMED and so don’t use it to its full potential.
  2. The nuances of its structure and relationships require careful consideration when used in research, particularly in creating accurate codelists.
  3. While it offers great specificity, this can sometimes lead to quirky or overly detailed codes that may seem unnecessary.
  4. Mapping SNOMED CT to other coding systems isn’t always straightforward and can result in loss of information.

Despite these challenges, the benefits of SNOMED CT in enabling precise, consistent clinical documentation and facilitating data analysis are undeniable. As healthcare continues to digitize and the demand for interoperable health data grows, understanding and effectively using SNOMED CT becomes increasingly important for clinicians, researchers, and health informaticians alike.

Whether you’re a healthcare professional grappling with electronic health records, a researcher diving into epidemiological studies, or simply someone curious about how medical information is organized, I hope this exploration of SNOMED CT has provided valuable insights into this complex but fascinating system.

comments powered by Disqus

Related Posts

Code Review for Research Code

Code Review for Research Code

An overview of how to conduct a code review for research code

Read More