That's important because decisions on data extraction and storage models can have a significant impact on how institutions can exploit advanced IT tools such as AI and analytics, said Chang, who spoke Thursday in a webinar held by the Society for Imaging Informatics in Medicine (SIIM). Part 1 of our two-part series on Dr. Chang's talk is available here.
Getting the data
Four methods are used for getting data from your electronic medical record (EMR) and other clinical systems:
Dr. Paul Chang from the University of Chicago.
- Native clinical system business intelligence offerings
- Data repository/data warehouse
- Edge appliance/state aggregator
- Service-oriented architecture (SOA)/application programming interfaces (API)/microservices
The easiest approach would be to talk to your existing vendor about adding on business intelligence and analytics software packages, Chang said. These applications are available, are usually affordable, and don't require significant disruption to existing IT systems. There's also usually no need for data semantic normalization as the systems themselves basically provide the clinical context, he noted.
However, this model does have some significant disadvantages, including dependence on the vendor's ability to be forward-thinking about supporting big data or AI, according to Chang. In addition, these applications can typically only provide retrospective scorecard reports and usually can't support real-time applications such as dashboards and complex event processing. They also have very limited interoperability and can be difficult to correlate with multiple data sources.
Data repository/data warehouse
Others have pursued a data repository/data warehouse model, which involves a centralized data repository derived from all disparate enterprise data sources, or systems. This approach requires incoming data to be "normalized," or converted to a standard terminology, and "sanitized," Chang said.
The data repository/data warehouse model can provide one-stop shopping, which is also great for research. Traditionally, this approach involved normalizing data into structured query language (SQL) relational databases in a process also known as "schema on write." This concept is increasingly being augmented, though, by not-only SQL (NoSQL) approaches that utilize weak, sparse, or no schema -- i.e., "schema on read," he said.
Other advantages of the data repository/data warehouse model include the ability to support correlation across multiple data sources -- a very important benefit for training or validating machine-learning systems, he said. It's also not dependent on vendor offerings. However, this approach was very popular about 10 years ago and is starting to show its age, Chang noted. It has some key disadvantages for advanced IT, especially AI applications. This model requires support and investment from IT and usually can't provide real-time data. What's more, it's also dependent on methods being available for extracting data from native systems and requires semantic data normalization, according to Chang.
Data extraction -- i.e., extract, transform, load (ETL) -- from native production systems such as the PACS, RIS, and EMR is critically important to ensure that the data contained in the data repository/data warehouse are being used and interpreted correctly, he noted. ETL can be performed in many ways, including the traditional SQL schema-on-write techniques. However, newer "data lake" and schema-on-read approaches are available that are probably better for AI and big data than traditional approaches, he said.
Interestingly, EMR vendors over the last few years have actually encouraged the use of public microservices and application programming interfaces (APIs) to allow the extraction of information in a safe and semantically usable way, Chang said.
"That trend is increasing and has made ETL, or data extraction, a lot easier," he said.
Is FHIR the savior?
However, although very useful and helpful, FHIR is not going to be the savior it's often hyped to be, Chang noted.
"Don't get me wrong -- it's great that we're doing this, and it's great that the EMR vendors are supporting this along with microservices and API," he said. "But this just addresses one small part of the problem, the data extraction -- the ETL. To me, HL7 FHIR is a welcome, modern capability that makes the ETL easier. But ... that's just one part of the issue."
Data received from a production system must be augmented so that any other external system can safely ingest the data and utilize the information in a semantically correct way, according to Chang. This is performed via the critical process of semantic normalization of data, which takes a lot of work and internal discipline and governance, he said.
"That's one of the major reasons why many healthcare institutions are slow to be able to scalably consume advanced IT initiatives," Chang said. "Data semantic normalization is probably one of the biggest problems and challenges institutions have. It requires careful business modeling, robust data stewardship, and governance ... If you remember one thing, ... it's that governance and architecture matter."
These needs are driving the increasing use of NoSQL approaches, such as the Apache Hadoop software framework, XML, and JSON, he said.
The third method for getting data from the EMR and other clinical systems involves the use of an edge appliance/state aggregator. This appliance connects with production systems, such as the PACS, RIS, and EHR, and can normalize information from those systems via JSON and XML. It then uses web services such as XML, Simple Object Access Protocol (SOAP), and REST to utilize that information for applications such as integrated and interactive real-time radiology dashboards, according to Chang.
"This approach is very, very prevalent in other industries, and you're beginning to see this in healthcare," he said.
Edge appliances also have the advantage of being able to support real-time use cases and advanced prospective workflow orchestration, including AI applications. However, they require a sophisticated IT architecture and support and investment from the IT department. They are also dependent on the edge appliance vendor for both ETL and the specific use case, he said.
The fourth method, which utilizes service-oriented architecture (SOA) and modern variations such as APs or microservices, is probably the most successful interoperability approach used by other business verticals throughout the world, according to Chang. In this model, governance and architecture are built within a hospital's IT architecture to provide ETL, data normalization via XML/JSON object, and web services without the intermediate step of an appliance.
"This is similar to the edge appliance/state aggregator approach, but it gives you more flexibility because you're in control of what goes on in that box," he said.
SOA represents an evolution of a traditional DICOM/HL7-based architecture to the equivalent of a biological spinal cord.
"We can now create arbitrary, appropriately idiosyncratic business logic by taking information, persisting it, and combining it with other methods such as REST and doing whatever we want," Chang said. "This is how the rest of the world works. Unfortunately, in healthcare, we have not adopted this approach."
SOA is a component-based architecture that builds composite applications derived from data that are extracted and semantically normalized from production systems to provide loosely coupled services that are universally exposable, self-describing, and consumable, according to Chang. These services are orchestrated to create optimized user experiences.
Chang noted that SOA is not web services; it's an architecture with disciplined governance, security, semantics, and quality of service. Web services, on the other hand, are implementation technologies.
Advantages of the SOA/API/microservices model include support for real-time dashboards and complex event processing, as well as advanced prospective workflow orchestration. It also provides vendor independence from the use case. Disadvantages are significant, however, and include dependence on ETL availability -- although FHIR should help in that regard, he said. In addition, this approach requires significant IT support and resources, including a true architectural perspective and governance. It would also frequently require local software development sources and, most importantly, a cultural change and buy-in.
"That's very hard to do in hospital IT systems," Chang said.
Schema on read
Regardless of what method is used to extract and normalize the data from production systems, the resulting semantically normalized data objects need to be stored somehow. This is a critical decision, as the choice of adopting a schema-on-write or schema-on-read approach will significantly impact how agile and scalable an institution can be in supporting modern AI and big data applications, Chang said.
With a schema-on-write method on traditional SQL relational databases, a preconceived information data model -- based on a priori knowledge on how the data will be used -- is "hard-wired" on to the data as they are stored.
"In other words, we impose the data schema on the data as we get it and write it into our database," he said.
Traditional SQL databases used in healthcare IT can be great for structured data, are very fast and efficient, and are great for well-established, reasonably static business processes such as billing and transaction processing, according to Chang. They're not so good, however, for unstructured data, such as radiology reports, images, and clinical notes.
"The whole purpose of big data and AI and machine learning and business analytics is to find correlations and insights that you didn't anticipate, so you cannot hobble the data you want to use to speed and train these systems by locking them into a preconceived schema," he said. "That's the problem; it's not nimble."
Other industries have dealt with that type of challenge by adopting NoSQL approaches that don't force data into a complex schema. This more agile approach -- schema on read -- applies sparse or no schema at the time of storage. Users can then apply the schema that they want to when they access the data.
"I'm going to take all of the data and I'm going to apply the schema when I read it -- in other words, when I train my deep-learning system, when I do my analytics, my logistic regression for data analysis," he said. "This is a fundamentally different paradigm -- and why other business verticals are embracing schema on read rather than schema on write and why other business verticals are able in a more facile way [to] leverage and consume advanced IT."
Chang concluded his talk by emphasizing the need for a strategic perspective and a governance model to support new advanced IT initiatives such as AI and business intelligence.
"[When] I travel around the world and I look at hospital IT, I see the same databases, I see the same hardware, I see hard-working smart people, I see developers," he said. "What I see lacking compared to other business verticals is a true strategic architectural perspective. I see too much schema on write rather than schema on read. I see a lack of true formal governance that includes buy-in by users."
Copyright © 2019 AuntMinnie.com