By Dr. Paul Terry
It may seem counter-intuitive, but in a data-driven world, the biggest barrier to success is not grappling with technology. It’s accommodating people.
Here’s what I mean: big data technology is now mature enough to collect huge amounts of information. Structured, unstructured, pictures, code fragments, virtual machines—you can dump anything and everything into a modern Hadoop database, and gain incredible insights to spur new innovations and better experiences in virtually every industry.
But oh, those human beings. On the one hand, you have the people clamoring to use all that data you’re collecting—scientists, marketing teams, analytics teams, statisticians, business partners. To realize the full value of your data, you have to be able to share it.
But on the other hand, that data is associated with real people who have real questions: Why should they trust you with their information? How do they know you’ll use it responsibly? How can they be sure that you’re meeting the privacy and governance policies you’ve committed to uphold?
When you’re collecting all of this data, you have an ethical, moral, and sometimes legal responsibility to store and use it appropriately. If you can’t justify people’s trust in your ability to do that, then you shouldn’t collect it in the first place. But if you’ve tried to take Hadoop trials through production systems, you know that this is a thorny problem. The answer isn’t to bolt on some new security solution, or dump off the problem to application developers. It’s to re-imagine the way you treat user identity and access in the context of your data.
It sounds intimidating, but there’s a highly effective model we can use, ported from the world of network security: the concept of “Zero Trust.” By extending Zero Trust principles to the data layer, you can address the needs of everyone who has a stake in your data and begin to unlock its full value.
Introducing Zero Trust Data
To understand what Zero Trust Data entails, start by looking at today’s Zero Trust Networks. What are they doing exactly?
Zero Trust Networks never trust, always verify. There’s no “trusted” zone in the network, where once you’re allowed into a specific network segment, you have free rein over everything in it. Rather, every connection is examined for appropriate authorization, every time. Security is embedded into the network itself, not bolted on after the fact. And access is controlled on a need-to-know basis, so that access to any resource, from any host, has to be explicitly authorized.
Zero Trust Data takes these principles a step further, applying them to the data itself. In a Zero Trust Data model:
- Access is denied by default. Every piece of data you collect is automatically encoded with your privacy and governance policies—who can access it, where, when, and how—as it’s collected.
- Data requests without proper credentials yield no information. Every request for data is tested against the specific characteristics of both the user and the data by default. If those conditions aren’t met, there’s no access.
- Data privacy is enforced independent of the network. The system doesn’t have to check with an external resource or rely on the requesting application to enforce data privacy. Privacy and governance functions live in the data layer itself.
Unshackling Big Data
Applying these Zero Trust principles to big data yields big advantages. When you use this approach, you’re no longer relying solely on network security solutions or application developers to keep data private. Instead, privacy and governance are baked into the data system and controlled by the people ultimately accountable for them—your data stewards.
In turn, your applications can become “thinner” and less expensive to develop, since they no longer have to account for changing organizational dynamics and complex privacy policies. Privacy, governance, and consent are all enforced automatically at the data layer.
An effective Zero Trust Data system can also dynamically repurpose data on the fly to show the same data to different users through different lenses, depending on their level of access. For example, an authorized physician might see a patient’s full medical record, while an affiliated researcher will only be able to view de-identified and/or aggregated data. So, even as you collect petabytes of information, you can deliver on-demand, self-service access at scale, while automatically hiding protected information on a user-by-user or case-by-case basis.
When you’re choosing a big data solution, look for one that has a flexible, comprehensive metadata strategy, a way to encode user attributes into the data store, and the ability to coordinate metadata with user attributes for policy-based enforcement. You can build it yourself, but there are also lots of players working on the problem. Look for ones who understand the opportunities offered by metadata frameworks and access control based on attributes. Then, you can unleash the full power and potential of your data, while addressing that most human of concerns: protecting individuals’ trust.
- Zero Trust Data: Solving the Data Dilemma white paper
- Zero Trust Data video
- PHEMI Enables Data-Driven Enterprises with Zero Trust Data press release
- Zero Trust Data primer
About the Author:
As the President and CEO at PHEMI systems, Paul provides the vision and technical leadership at PHEMI. He also currently serves on the Board of Directors for Providence Health Care, advising on its subcommittees for innovation, quality, EMR, and next-generation data strategies in healthcare. Paul also serves on the Board of Directors for Life Sciences BC and Molecular You, and is an advisor to the BC provincial government on next-generation data strategies.