Saturday, August 27, 2011

Why Do We Need Ontologies in Healthcare Applications

There is an ongoing thread in the HL7 mailing list about "what can OWL do?" in the wake of Grahame Grieve's recent post titled: "HL7 needs a fresh look because V3 has failed".

This post is my answer to "what can OWL do?".

Ontology vs. Information Model

Ontologies are our conceptualization (understanding) of the world while information models (of data structures) describe and constrain how the data is stored and transmitted in messages. Thomas Gruber popularized the notion of ontology in the nineties when he wrote in a paper titled "A Translation Approach to Portable Ontology Specifications":

A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly.

An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an ontology is a systematic account of Existence. For knowledge-based systems, what "exists" is exactly that which can be represented.

When people ask me to explain how ontologies are relevant to healthcare, I often use this quote from a report titled "Semantic Interoperability Deployment and Research Roadmap" by Alan Rector, an authority in the field of biomedical ontologies:
Ontologies are about the things being represented – patients, their diseases. They are about what is always true, whether or not it is known to the clinician. For example, all patients have a body temperature (possibly ambient if they are dead); however, the body temperature may not be known or recorded. It makes no sense to talk about a patient with a "missing" body temperature.

Data structures are about the artefacts in which information is recorded. Not every data structure about a patient need include a field for body temperature, and even if it does, that field may be missing for any given patient. It makes perfect sense to speak about a patient record with missing data for body temperature.

Hence, at the practical level, ontologies can help us verify the soundness of statements in messages based on our conceptualization of the world. Information models in healthcare often take the form of an XML schema, a Schematron schema, or a relational database schema. One distinguishing characteristic of ontologies is that they are based on an Open World Assumption (OWA) which is based on the AAA slogan or Anyone can say Anything about Any topic. Statements that are not included in an ontology are considered unknown as opposed to false. In contrast, information models of data structures such as XML messages and relational databases are based on a Closed World Assumption (CWA) which holds that any statement that is not known to the message or database to be true is false (this is also referred to as "negation as failure" or NF).

The OWA principle recognizes that our understanding of the world is incomplete, evolving, and that new knowledge can be discovered and added at any time. To return to Alan Rector's example, one cannot assume that because there is no mention of a patient's body temperature in an electronic health record message, that the patient does not have a body temperature. Another distinguishing characteristic of ontologies is the Nonunique Naming Assumption as opposed to the Unique Name Assumption (UNA) in CWA-based systems. People do use different labels to represent the same concept. This discussion of OWA vs. CWA is not just academic. The reality is that data about a patient can exist in multiple systems, organizations, jurisdictions, and even countries using different vocabularies and XML data structures. Concepts such as longitudinal or lifelong health record and medication reconciliation will soon reveal the limits of healthcare systems based on a CWA.

OWL2, a W3C Recommendation, is an expressive ontology language and provides reasoning and inferencing capabilities to software applications. Logical axioms specify restrictions through property domains and ranges. OWL2 also support negation and disjunction. OWL2 reasoning capabilites can be enhanced with a rule language such as the Semantic Web Rule Language (SWRL). Given the complexity and scale of medical knowledge today, the use of ontology-based reasoning will become essential in applications such as medical terminologies, clinical knowledge management for automated decision support, and even automatically verifying the accuracy of messages exchanged between healthcare applications.

Unfortunately, ontologies are not widely used in software engineering today. They are not well understood by the majority of developers. Undergraduate computer science programs don't usually teach ontologies. There is an urgent need to educate a new generation of ontology-savvy healthcare application developers.

Model Consistency

For obvious reasons, healthcare applications require a high degree of model quality and consistency. This is not always possible or easy to do with traditional approaches such as object-oriented design (the HL7 RIM is based on the UML) and data structures such as XML and relational database schemas.

A clear and clean separation of concerns is needed between the semantic model (the ontology) and the information model (the model of how the data is structured in an XML message or the health application's data store). The ontology can be used to verify that the content of a message is accurate in regard to our conceptualization of the world, while the information model is used to validate the data structure in the data stores and XML messages exchanged with other applications.

The HL7 RIM is definitely not an ontology and has been plagued by consistency issues. Futhermore, a consequence of the RIM model refinement process that is used to derive XML message exchange schemas from the RIM is that data structure concerns have leaked into what was touted as the semantic model. This lack of separation of concerns has led to an unwieldy information model and very complex XML message structures (in the CDA and other V3 messages) that are difficult to learn and implement in software applications. The GreenCDA is a possible answer to the message structure simplification challenge (see my previous post on the Greening of the HL7 CDA). However, it is not enough to solve the semantic interoperability challenge.

Ontologies and Medical Terminologies

In a paper titled "Why Do It the Hard Way? The Case for an Expressive Description Logic for SNOMED", Alan Rector and Sebastian Brandt argued in favour of using the OWL ontology language for SNOMED which is currently based on a Description Logic semantics known as EL++. The availability of computing power (particularly the elasticity and massive scalability of the cloud), reasoners, and tools have now made such a migration possible.

In a recent paper published in the Journal of the American Medical Informatics Association (JAMIA) and titled "Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications" (subscription required), Alan Rector, Sam Brandt, and Thomas Schneider used an OWL representation of SNOMED CT to unearth errors in SNOMED CT hierarchies for such common conditions as myocardial infarction, diabetes, and hypertension. This has significant practical implications for the use and interpretation of SNOMED codes in electronic health records (EHRs), post-coordination, and queries in software applications.

ICD-11 is being developed using OWL to allow consistency checking and linking to other biomedical terminologies and ontologies.

In addition to OWL, the Simple Knowledge Organization System (SKOS) specification can also used to represent thesauri, classification schemes, taxonomies, controlled vocabularies, and other concept schemes.

Overlap between the HL7 RIM and SNOMED CT

HL7 V3 messages like the CDA typically carry codes from SNOMED CT and other terminologies such as CPT, ICD9, and LOINC. However in certain cases such as family history, an observation can be expressed through a single SNOMED CT code or by using the RIM. To ensure model consistency, HL7 has released an implementation guide on using SNOMED CT in HL7 Version 3 documents such as the HL7 CDA (I refer to this implementation guide as Terminfo). In addition, HITSP C80 specifies vocabularies and terminologies to be used in various sections of a C32 document.

However, these guidelines have been difficult to enforce in practice due to the lack of automated validation tools. In a paper recently published in the Journal of Biomedical Semantics titled "Semantic validation of the use of SNOMED CT in HL7 clinical documents", Stijn Heymans, Matthew McKennirey, and Joshua Phillips described an approach using OWL ontologies to automatically validate Terminfo guidelines. The approach consisted in using the OWL representation of SNOMED CT, lifting (with XSLT) CDA XML instances into OWL individuals based on a CDA OWL ontology, and by expressing Terminfo guidelines as OWL integrity constraints. The latter were validated with the Pellet Integrity Constraint validator or Pellet-ICV.

Clarifying the relationship ("interface") between Ontologies, Coding Systems, and Information Models

I mentioned the need for a clean separation of concerns between the ontology and the information model. So what is the relationship between ontologies, coding systems (like SNOMED CT), and information models? I have long been intrigued by that question. In a paper titled "Binding Ontologies & Coding systems to Electronic Health Records and Messages", Alan Rector, Rahil Qamar, and Tom Marley write:

We contend that codes are also data structures – or more precisely symbols to be used in data structures – and that the model of codes is also at the level of the information model.

Although coding systems are derived from ontologies, we also need a separation of concerns between the coding system and the ontology. Remember that ontologies are based on an "Open World Assumption" (which means essentially that Anyone can say Anything about Any topic or the AAA slogan). Coding systems in contrast contain an enumerated list of codes to choose from.

In the same paper, the authors propose a code binding interface based on OWL DL between the model of meaning (i.e., the ontology), the model of codes (i.e., the terminology) and the information model.

To summarize our findings so far:

  1. We first create an ontology to describe our conceptualization (or understanding) of the world.
  2. We derive an enumerated list of codes called code system (itself a data structure) from the ontology.
  3. We used the codes in EHR applications databases and messages which are data structures.
  4. We can validate the binding between the ontology, the information model, and the code system (using the approach proposed by Alan Rector, Rahil Qamar, and Tom Marley).

Ontology Alignment

An ontology represents a specific world view that reflects the perspective of its origin (application, domain, people, or organization). Alignment consists in mapping concepts across ontologies. For example, in translational medicine, there could be a need to map an ontology used in biomedical research to an ontology used for clinical purposes. Several techniques can be used to achieve Ontology Alignment between two ontologies including:

  • Mapping each ontology to a third shared ontology called a foundational ontology
  • Mapping the two ontologies directly.

OWL facilitates ontology alignment through constructs such as owl:sameAs, owl:equivalentClass, and owl:equivalentProperty. These OWL constructs can be enhanced with a rule-based mapping using SWRL or RIF constructs. XSLT and SPARQL can also be useful in Ontology Alignment.

Clinical Knowledge Management (CKM)

Ontologies as knowledge representation formalism are well suited for modeling the medical knowledge contained in Clinical Practice Guidelines (CPGs) and Care Pathways (CPs). This enables automated reasoning and the execution of those guidelines based on patient data at the point of care.

Several ontology-based approaches to modelling CPGs and CPs have been proposed in the past including PROforma, HELEN, EON, GLIF, and SAGE. However, the lack of tooling has been a major impediment to a wide adoption of those standards. OWL has the advantage of being a widely implemented W3C Recommendation with available open source as well as commercial tools.

Ontologies and Enterprise Master Data Management (MDM)

As healthcare enterprises become larger and integrated (through the ACO model for example), there will be the need to consistently define and manage core business entities such as "patient", "provider", "payor", "care delivery", and "claim" across systems and business processes (e.g. research, clinical, reporting, and financial). The goal of Master Data Management (MDM) is to address those challenges.

One area of particular interest to MDM is the naming, meaning, equivalency, and relationships between those core business entities. Ontology constructs such as owl:sameAs, owl:equivalentClass, and owl:equivalentProperty can help establish common semantics across the enterprise when the same business entity is called by different names in different systems and business processes.

Linked Open Data (LOD)

Ontologies can help in building silo-busting applications that need to link data items (datum) to other data items (as opposed to web page to web page) over the web in order to perform entity correlation (or entity resolution). A datum can be a row in a relational database and technologies exist to provide an RDF view over a relational database table (see the R2RML: RDB to RDF Mapping Language). The RDF view itself can be defined in terms of an OWL ontology or RDFS vocabulary. Hence, LOD can integrate data across health applications and organizations by providing a semantic layer on top of existing applications.

The Linked Data design pattern is based on an open world assumption, uses dereferenceable HTTP URIs for identifying and accessing data items, RDF for describing metadada about those items, and semantic links to describe the relationships between those items. Other standards used in LOD applications include RDFS (for describing RDF vocabularies) and SQARQL (for querying RDF graphs). A practical application of LOD in healthcare is the Clinical Quality Linked Data project on

Metadata and the PCAST Report

The Office of the National Coordination for Health Information Technology (ONC) recently released an Advance Notice of Proposed Rulemaking (ANPRM) on Metadata Standards to Support Nationwide Electronic Health Information Exchange. The ANPRM was driven by the PCAST Report released in December 2010.

Specifically, the ANPRM called for public comments on patient identity, provenance, and privacy. There are existing ontologies related to identity, provenance, and privacy that can be at least partially reused (ontology reuse is a recommended best practice to avoid the difficulties of ontology alignment). An example is the Provenance Vocabulary Core Ontology. Modeling metadata in healthcare using ontologies will enable reasoning, data integration through Linked Open Data mechanisms, and federated SPARQL queries. Please note that metadata expressed in XML syntax can be lifted into RDF (using techniques like XSLT or XQuery) to provide the same benefits.