PHEMI Health DataLab
A cloud-based system for
privacy, security & governance

Single Pane of Glass

PHEMI has one central system to manage data stored in many locations (cloud or hybrid cloud), process data in many locations (cloud or hybrid cloud) while controlling access and governing data all in one place.

Screenshoot of the platform with a woman in the background using the app

One access control policy
One place to manage all data
Centrally manage data stored in the cloud or hybrid cloud (multiple locations)
Abstract away platform and service integration complexities

Privacy-by-Design

Conventional data protection products simply lock down your data. PHEMI goes further.

Unlike most data management systems, PHEMI Health DataLab is built with Privacy-by-Design principles, not as an add-on. This means privacy and data governance are built-in from the ground up, providing you with distinct advantages:

Lets analysts work with data without breaching privacy guidelines
Includes a comprehensive, extensible library of de-identification algorithms to hide, mask, truncate, group, and anonymize data.
Creates dataset-specific or system-wide pseudonyms enabling linking and sharing of data without risking data leakage.
Collects audit logs concerning not only what changes were made to the PHEMI system, but also data access patterns.
Automatically generates human and machine-readable de- identification reports to meet your enterprise governance risk and compliance guidelines.
Rather than a policy per data access point, PHEMI gives you the advantage of one central policy for all access patterns, whether Spark, ODBC, REST, export, and more

Metadata Governs Access Control

Data access is driven by PHEMI’s Attribute-Based Access Control policy engine. Metadata attributes of the data, in combination with attributes of the user and their environment, are processed by the policy to dynamically determine access to data, providing contextual, scalable, and simpler access controls over traditional role-based access.

Metadata curation begins on ingest to immediately control data access, with data tagged with 42 out of the box attributes such as public, confidential, secret, and top secret. Metadata, and in particular security labels, follow the data as it is transformed and combined with other data sets. Once data enters the pipeline and new datasets are created, it is curated with more granular metadata at column, row, and even cell level to iteratively provide finer-grained access.

PHEMI’s Architectural Components

The PHEMI Health DataLab is a vital security asset that is available in the cloud, combining best-of-breed open source and custom software modules, all fully integrated using Privacy-by-Design principles.

Spark

Accumulo

Spark SQL

Azure Table Storage

Apache Airflow

Apache NiFi

Secure Landing Zone
Discovery Zone
Consumption Zone

Secure Landing Zone

INGEST

Ingest from Anywhere

PHEMI is deployment agnostic and offers flexibility by enabling data to be ingested from the cloud or on-premise and to be stored on the cloud or hybrid cloud.

Data is ingested using hundreds of pre-built NiFi connectors, processors and format converters.

Any type of data can be ingested from disparate sources. Types of medical data include ECG, x-rays, genomic, HL7, DICOM and VCF. Transformed data sets can be exported to chosen destinations.

In Nifi data is annotated with PHEMI metadata including business metadata, system metadata, and metrics metadata.

STORE

Original data is stored unchanged

The original file is stored in its original format in a secure, immutable database. It remains the single source of truth.

Auditability

Immutable original data in combination with our data provenance and lineage gives you full visibility and audibility into what data is contained in a dataset, and how it ended up there. In other words, which datasets were derived, how they were transformed, if they were joined with other datasets, etc. At any point in time, you can refer back to the original source of truth for tasks such as data retention, compliance, and data anomaly detection. All access to data is fully audited and governed by PHEMI’s Attribute-Based Access Controls that are applied at the lowest data level.

Discovery Zone

DISCOVER

Schema Creation

PHEMI creates a lightweight, flexible, scalable schema upfront. De-identification is done once to create multiple forms of anonymized data for different levels of access. No more errors or processing of data at read time. Data is delivered to the user in the form they are authorized to see as quickly as the data can reach them.

TRANSFORM

Indexing

With PHEMI, customized indexes can be created using a single or composite index type. This greatly improves query performance, allowing you to link, query, build cohorts and analyze data at speed and scale.

Any Storage Format

PHEMI enables the storage of any type of format. It supports formats over and above tabular and columnar formats, including genomic, time-series, graph and more. Index any column and build data models for fast lookup within your policy-based privacy, security and governance requirements, so people can only see the sensitive data you allow them to see.

Versioning

PHEMI brings versioning to big data to provide a time-machine to your data. View your data from any point in time to facilitate reproducible results from your AI/ML workloads while also ensuring full visibility and auditability of your data while it is continuously updated and transformed.

Consumption zone

ANALYZE

Attribute Based Access Controls (ABAC)

PHEMI classifies all data for sensitivity including files, tables, columns, individual rows and cells. Using military-grade advanced Attribute-Based Access Control system (ABAC), access controls are kept data and metadata centric. Metadata attributes of the data, combined with attributes of the user and their environment are processed by the policy engine to dynamically determine access to data. This provides more contextual, scalable, and simpler access controls than traditional role-based access.

De-identification

Personally Identifiable Information (PII) is rendered opaque by masking, rounding, tokenizing or encryption functionality while still preserving the value inherent in the data. PHEMI enables data scientists to dynamically create de-identified data sets, allowing analysts to make insights without compromising the privacy of the individual.

Automation

With PHEMl, automated ingest pipelines pull from all data silos, and metadata curation and access controls are added to this process. Data is automatically ingested and pre-curated to provide context. Data is locked down with access control on assets, to provide access to only the people that are allowed under the access policy. Third-party data prep tools allow integration via Spark, Hive, or HDFS Storage driver.

PUBLISH

Analytics

PHEMI can run advanced analytics on any type of data. It provides a self-serve environment and enables the access of data through a variety of standard protocols and your tools of choice: ODBC compliant tools like Tableau or PowerBI, Spark-based data science tools such as Apache Zeppelin or Cloudera Data Science Workbench, Spark MLlib, H2O Sparkling Water, the PHEMI REST API to a custom application; or even exported data to downstream targets.

More Information

For more information about what PHEMI can do for you, check out the Data Sheets, and additional information on the website.

Docs

Docs

Sign Up For Everything
Data-Related:Tips, White Papers, Opinion Pieces, Webinar Invitations & News

Email use governed by our Privacy Policy

PHEMI Health DataLab
A cloud-based system for
privacy, security & governance

Single Pane of Glass

Privacy-by-Design

Metadata Governs Access Control

PHEMI’s Architectural Components

Spark

Accumulo

Spark SQL

Azure Table Storage

Apache Airflow

Apache NiFi

Secure Landing Zone

Discovery Zone

Consumption zone

More Information

Docs

Docs

Data Sheet: PHEMI Data Privacy Manager

White Paper: Zero Trust Data

Solution Brief: Customer Journey

Infographic: Data Protection & Sharing Data Responsibly

Sign Up For Everything
Data-Related:Tips, White Papers, Opinion Pieces, Webinar Invitations & News

Health DataLab

PHEMI Health DataLabA cloud-based system forprivacy, security & governance

Single Pane of Glass

Privacy-by-Design

Metadata Governs Access Control

PHEMI’s Architectural Components

Spark

Accumulo

Spark SQL

Azure Table Storage

Apache Airflow

Apache NiFi

Secure Landing Zone

Discovery Zone

Consumption zone

More Information

Docs

Docs

Data Sheet: PHEMI Data Privacy Manager

White Paper: Zero Trust Data

Solution Brief: Customer Journey

Infographic: Data Protection & Sharing Data Responsibly

Sign Up For Everything Data-Related:Tips, White Papers, Opinion Pieces, Webinar Invitations & News

PHEMI Health DataLab
A cloud-based system for
privacy, security & governance

Sign Up For Everything
Data-Related:Tips, White Papers, Opinion Pieces, Webinar Invitations & News