I am a hospital librarian in a community hospital. I keep reading about the new developments in research with big data and precision medicine with interest. Though I don’t know of any such projects happening here at my hospital, I want to be ready when library skills are needed. These projects seem to be almost miraculous with the potential described and the results I read about. Can it really be true?
Looking to be a Data Diva
Thanks so much for writing. You have certainly found a topic that is in the forefront of healthcare today. The messages from NLM are full of information about data driven projects, especially as NLM Director Patti Brennan has been appointed the Interim NIH Associate Director for Data Science. I think you are wise to look for ways to educate yourself in this area and to prepare for the future.
There are many things about data happening around us in the library world.
These opportunities should provide you with a good start to learning more about data. I’m sure that as you meet others with the same interest, you will network with them for other sources of information.
The other part of your comments regarding the miraculous nature of what can be achieved with data has encouraged me to stop and think about your statement. Truly, the potentials in the use of big data for healing and discovery are tremendous, and can be summarized in four major points.
However, I fear there is equal opportunity for confusion and loss. As you can imagine, the amount of data produced in healthcare is huge. It includes data from the electronic health record, imaging, patient generated data, sensor data, etc. Privacy concerns will certainly always be an issue in the use and manipulation of data. Additionally, many of the current EHR programs are fragmented and lack interoperability. And of course, as with any other system involving humans, there are always safety concerns. These concerns may be as “simple” as mislabeling samples or selecting an incorrect medication to a multitude of other more complex errors.1
Another issue is the sharing and availability of the data. The International Committee of Medical Journal Editors (ICMJE) is very concerned with this issue, and in January 2016 published a proposal to help create an environment in which the sharing of de-identified individual participant data becomes the norm. This sharing is happening in some settings, however there are considerable challenges, and mechanisms to mandate this sharing are not currently available. Plans continue to be developed and clinical trials enrolling participants by January 1, 2019 must submit a data sharing plan in the trial’s registration.2
The final issue to be discussed here is curation and storage of the data. A 2013 article in The Atlantic discusses a study which found that as much as 80% of the raw scientific data from the 1990’s is gone forever as nobody knows where to find it.3 Sadly, the data is lost because the authors have changed their contact information and thus can’t be reached, or the data was stored using outdated technology. As you can see, this affects the ability to validate conclusions by reproducing the study. It also makes it impossible to conduct broad, long-term studies. Academic institutions are working hard to create institutional repositories, and journals are concerned with the adequate collection of data, but the problem still exists.
I hope I have not discouraged you, Diva. The move to working with large data sets is a wonderful development. My goal is to point out the necessity for careful and thoughtful development of data creation, collection, curation, and storage. This requires the involvement of several disciplines across the spectrum, all working toward the need for solid and safe systems. I hope that you are able to move into some of this data work, as you seem to be so interested in the potential for discovery.
Best wishes for your data discovery journey!