Modern data science can unlock new innovation in healthcare, bioinformatics, genetic research, and other related fields. New personalized medicine programs, for instance, can identify previously unrecognized disease risk factors by applying analytics to vast amounts of genomic and clinical data. Hospitals can pore through EMR and operational data to pinpoint sources of infection. Public health agencies can use longitudinal population data to more accurately inform policy.
These are just a few examples. But all depend on one basic premise: many more researchers and analysts having access to much more data. And many healthcare organizations are still a long way from reaching that goal.
In most organizations, there is no single place where data resides. Rather, data is diffused across file servers and databases in various locations and multiple formats (Word documents, PDFs, spreadsheets, relational databases, EMRs) that can’t be easily consolidated.
Investigators are intimately familiar with the barriers this presents to effective research, and they’re trying to deal with it in various ways: Open-source, web-based services that allow researchers to store some data centrally, but limit the formats and volume of data that can be collected. Enterprise data warehouse suites that can centralize information, but take months of planning and massive capital investment to get up and running. “Do-it-yourself” Hadoop database software projects that leave administrators with the equivalent of a 1,000-piece LEGO kit with no instructions.
Even if researchers can overcome the technical barriers to creating a big data warehouse, they are still held back by data privacy and compliance concerns. No one wants to be responsible for protected health information (PHI) somehow finding its way to a publicly accessible server and earning a HIPAA violation against the institution.
If healthcare researchers are going to uncover the wealth of new insights hiding in their data, they need to find better ways to consolidate silos of information, while assuring that data privacy, security, and governance remain intact. Fortunately, modern cloud-based data warehouses can accomplish both. By harnessing the combined power of big data and the cloud, researchers and analysts can gain insights faster and increase the value of their data, without compromising their institutions or patients.
Growing Appeal of the Cloud
Not long ago, storing sensitive healthcare information in the cloud was a nonstarter. Compliance officers were simply not comfortable moving data beyond their control. By 2014, 83 percent of healthcare IT organizations were using cloud services, according to HIMSS Analytics. Why the shift? Two reasons:
First, the advantages of the cloud have become too compelling to ignore. Forward-looking researchers and academic centers see how much faster their research could be moving if data could be shared more easily. Cloud-based data warehouses that can be accessed by approved investigators anywhere, anytime, offer the easiest way to do it.
From a business perspective, healthcare organizations see the same cloud benefits as every other industry: faster deployments, consumption-based pricing and pay-as-you-grow scalability that makes better economic sense than building out internal capacity themselves. As Forrester notes, “On-premises solutions require investments akin to home ownership; when something breaks, it’s up to you to fix it. Cloud and SaaS are more akin to renting, and you’re only paying for the space you use; repairs, ongoing maintenance, unexpected expenses are the responsibility of the landlord.”
The other big change has been in liability issues surrounding cloud data hosting. Previously, healthcare organizations bore full responsibility for anything that happened to their data in the cloud. Today, organizations can enter into business associate agreements (BAAs) with healthcare-focused cloud service providers that share liability. These service providers are certified by organizations like HITRUST, and share responsibility for data protection and compliance in their clouds.
Capitalizing on Big Data in the Cloud
So what do organizations gain when they use cloud-based big data warehouses? First, they can consolidate all their data more easily and automate data collection.
Modern big data solutions can ingest data from many different sources, in many different formats, quickly and easily. That includes complex data types—unstructured and semi-structured, huge genomics files, imaging studies, EMR data. Healthcare-focused data warehouse are designed with pre-built libraries to accommodate all of these, extract the information, index it and transform it so it’s readily usable.
Modern cloud-based data warehouses also accelerate searches. They automatically catalog and apply metadata to information as it’s collected to describe exactly what it contains, at a granular level. This means that researchers can search catalogs of metadata, rather than raw data files, and find what they’re looking for much faster.
They can see exactly what data is there, and short-circuit the all-too-common process of waiting weeks or months for approval to access a data store, only to find that the information it contains isn’t what they need. Instead, they query the catalog—show me women with melanoma age 19-45—and see how many records exist. They can determine immediately if it’s worth the time to request formal approval to access the data, if there is enough for the study, or if they need to change their criteria.
Additionally, all PHI within that data is still protected. They’re not seeing the actual data—just a catalog generated from metadata. They can identify the data set they need in minutes, without compromising data security or compliance, in a self-service manner.
Modern cloud-based big data warehouses are also designed for privacy and governance. This is crucial, as much of healthcare research today is translational—bridging the traditional separation between researchers and clinicians—and making data privacy controls essential.
Cloud-based big data solutions can employ several mechanisms to make privacy and governance simpler. First, they can automate de-identification of PHI in line with Safe Harbor guidelines and the HITRUST framework. This is a huge difference from how de-identification is typically done today—via a person at a machine manually processing records.
Second, modern solutions employ sophisticated policy frameworks that let organizations tightly control who can see what, in which context, even within a single data asset. For example, a clinician at an academic center may be able to see a patient’s full record. A researcher or analyst with the center may be able to access the same record, but will see only de-identified information with no PHI. Modern systems can do this automatically, generating specific data sets appropriate for each requestor in accordance with organizational policy. And these capabilities can be fully audited for compliance.
This article was originally published on www.datanami.com and can be viewed in full
Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)