Big Data Benefits to Researchers

data science

This fall Roy Wilds, Chief Data Scientist at PHEMI, and Rebecca Laborde, Principal Field Scientist at Oracle, shared their first-hand experience with big data in a webinar. Their perspective, along with those of other industry leaders in the field, illuminate the many ways healthcare researchers can benefit from big data.

Big data provides so many opportunities to researchers to:

Collect all their diverse data together, regardless of size.

There’s no such thing as small data anymore… Everything has grown in size and in complexity,” Rebecca notes. Healthcare data is especially diverse, ranging from highly structured lab reports to completely unstructured images, such as X-rays or free-text notes. Many of these data sources are huge: depending on the application, a full patient profile can contain petabytes of information per patient. You really need big data to store data at this scale.

Stay Flexible.

The scope and the size of data aren’t the only things that are growing. Rebecca notes that when she worked in flow cytometry, the methods were rapidly changing as well. New sources, new methods, and new technologies all demand a flexible work environment that can grow and evolve with changing requirements.


One thing Roy noticed in the field is that researchers, clinicians, and analysts are often all using or have a need for the exact same datasets or data sources. A repository that provides access from a single place, while enforcing responsible sharing, saves the time and cost of moving data around and reduces the risk incurred when you duplicate data.

Protect their data.

On the subject of access controls and governance, the healthcare industry has very specific requirements. The tension between the need to share data to extract maximum value can conflict with the need to protect patient data and meet industry regulations, including HIPAA. New technologies in big data have mechanisms for de-identifying data and enforcing rightful access, so that data can be shared safely and responsibly.

Trace their Data.

Data lineage is extremely important in healthcare—consider researchers who were not involved in generating the initial raw data—and when it comes to data provenance, metadata is where it’s at. Metadata can keep track of who has processed raw data and how the data has been changed. You also want metadata describing the lineage of data when your data comes to be audited, or when it comes time to write grant proposals or other submissions. Choose a big data solution that captures the entire history of what’s been done to the data, is easily traceable, and is easy to reference.

Explore their data.

Discovery and insight occur at that intersection point between researcher and data science. You want to be able to explore both your structured and your unstructured data, so that you are learning the most from the information you have. To do this, you need big data and data science tools that can operate on complex data.

These are very exciting times for all of us who work with data—there’s a universe of possibilities literally at our fingertips. If you’re a healthcare researcher or data analyst, make sure you’re aware of advances in big data that can allow you to unlock new potential and value in your field.

For more first-hand insights from Rebecca and Roy

Watch the Webinar!

Posted in Blog Tagged with: , , ,