Apr

The Inaugural PNR Journal Club: Data Curation, Data Management, and Librarians

Posted by Carolyn Martin on April 25th, 2016 Posted in: Technology, Training & Education

This is a guest post by Andrea Harrow, MLS, Good Samaritan Hospital, Los Angeles, CA, about the PNR Journal Club, held over a period of eight weeks from February 8 through March 21, 2016.

I responded to an invitation sent out on a listserv for involvement in a new journal club–an opportunity for librarians to discuss issues in data curation and data management. Twelve librarians from across the US also signed up to give it a try. This journal club, hosted by the National Network of Libraries of Medicine, Pacific Northwest Region, met online within Moodle and live via four AdobeConnect sessions. This club followed the MLA’s Discussion Group Program structure and the new PubMed Commons Journal Clubs commenting format.

Before we met live via AdobeConnect, club members introduced themselves on the Moodle site and shared what they hoped to get out of the discussions. We were all interested in the game-changing promise of using big data in health sciences, but, as librarians, were not quite sure how we fit into the big data picture.

The article selection process took place online, using a discussion board in Moodle. We searched out relevant articles, posted these to the forum, then voted on which ones we wanted to discuss. A collection of a few “inspiration” articles steered us toward certain themes. Club members self-nominated to lead the meeting discussion and take notes. Using a discussion board setting was a great way to get thematic thought processes flowing. Some of the major themes we discussed on the forum were:

Is evidence-based, guideline-based medicine the enemy of personalized, precision, genomic medicine? – do they have conflicting priorities, or can they complement each other (1,2)?
Size and significance of data, data biases when data is not adequately translated or translated out of context (3).
What is big data? How do big data systems function? Background practical info and overview of the features of clinical big data, algorithms, statistical methods, and software toolkits for data manipulation and analysis, sharing big data, challenges and limitations (4,5,6,7).

The “Using Data to Improve Clinical Patient Outcomes” forum, held on March 7, happened to fall on the same day as one of our meetings. Some members attended the forum online or in person. We had selected an article by one of the panelists, Dr. Christopher Longhurst, to discuss in the meeting following the forum.

A summary of our discussions follows.

Week 1:

Cohen B, et al. Challenges Associated With Using Large Data Sets for Quality Assessment and Research in Clinical Settings. Policy Polit Nurs Pract. 2015;16(3-4):117-24. PMID:26351216

Facilitator: Laura Zeigen. Recorder: Erin Foster.

The article addresses the challenges of interfacing biomedical big data to answer clinical and epidemiological research questions, and how these challenges were overcome in the construction of a “data-mart” of medical, financial (i.e., billing), and demographic patient information in an academically affiliated health-care network. This article presents seven challenges identified by the National Institutes of Health (NIH) Big Data to Knowledge (BD2K) initiative and recommendations for overcoming those challenges based on the project experience. One of the BD2K identified obstacles of biomedical big data is the organization, management, and processing of data. The varied ways in which clinical data is collected makes it difficult to validate for consistency as well as discover for re-use.

Another obstacle identified in the paper was the challenge of training researchers to use data effectively. The article primarily discusses this in the context of hiring a data manager and the difficulty of identifying a person that has the necessary combination of skills or the job. For members, this raised questions about what kinds of skills librarians should consider developing in order to be valued, useful members of these teams.

Discussion revolved around several themes: challenges librarians face in identifying/occupying data science roles, the potential of promoting existing librarian skills and cultivating new skill sets, and the importance of networking and professional development. Journal club members cited librarian training in organizing and tagging/indexing data as well as enhancing data by connecting it to the published literature. Several members called attention to the lack of published librarian contributions to data science, especially since this article was authored by a multidisciplinary team.

Week 2:

Marshall DA, et al. Transforming Healthcare Delivery: Integrating Dynamic Simulation Modelling and Big Data in Health Economics and Outcomes Research. Pharmacoeconomics. 2016 Feb; 34(2):115-26. PMID: 26497003.

Facilitator: Ann Gleason. Recorder: Suzanne Fricke.

This article presents the idea of Dynamic Simulation Modeling (DSM), a computerized mathematical modeling used for several years in other disciplines, to develop meaningful insights from big data in healthcare environments. Applications of DSM to health care are complex due to de-identification requirements and harmonization of health care data sources (EMR, insurance, wearables, social media and large inter-organizational datasets). We are optimistic that there will be an ongoing need for librarian skills in organizing data, taxonomy development, naming conventions, metadata creation, and retrospective adaptations to changing terminology.

We noted the increasing need for skills in data science, machine learning and programming. Data cleaning and de-identification remains highly labor intensive in the health care setting. Although statistical software (SAS, R, MatLab, etc.) may be provided to users by libraries, we had difficulty envisioning what these mathematical models might look like and what novel software they might require to run. We discussed the need for tools such as data animation models to show students the steps and skill sets required for healthcare big data.

Big data increases the need for consumer resources centered on informed consent, particularly in relation to universal consent. The Henrietta Lacks case and PatientsLikeMe were mentioned as differing consumer perspectives on the unanticipated long-term use of patient data on the one hand, and the symbiotic relationship between patients and researchers looking for free access to medical advice, treatment and data on the other.

Week 3:

Panahiazar M, et al. Empowering Personalized Medicine with Big Data and Semantic Web Technology: Promises, Challenges, and Use Cases. Proc IEEE Int Conf Big Data. 2014 Oct; 2014:790-795. PMID: 25705726.

Facilitators: Erin Foster and Carol Perryman. Recorder: Lynly Beard.

The themes of this article were raised in past club discussions: 1) examining clinical co-morbidities and genomics to arrive at personalized patient treatment and 2) handling the volume of data with new technologies, to make it “smart data” and use it for analysis purposes. Creating data sets allows for uniform analysis, contextualization, and moves data along to the goal of personalized medicine.

Three interconnected use cases revolved around “heart failure”. With these, the authors introduced new technologies used to transform big data into smart data. Case #1 used Hadoop, a tool that can handle large volumes of data by breaking them down into smaller batches processed at several nodes, as well as the Pig programming tool. Case #2 used the UMLS Metamap tool, which when used in conjunction with Hadoop and Pig, reduced processing time for 10 million search queries from 40 days to 2 days. Case #3 then used Kino to add metadata.

These tools were new to many in the group, and there was a desire for further information. Because MetaMap is an NLM tool, participants were curious about training opportunities. We again discussed using a critical evaluation framework for all articles. This preprint article would have benefitted from expanding the use cases and putting them in context.

Week 4:

Longhurst CA, et al. A ‘green button’ for using aggregate patient data at the point of care. Health Aff (Millwood). 2014;33(7):1229-35. PMID: 25006150.

Facilitator: Andrea Ball. Recorder: Laura Zeigen.

Evidence-based medicine (EBM) has traditionally been based on randomized control (RCT) data. However, RCTs are expensive, time consuming, and not easily generalizable. Longhurst et al. introduce the concept of a “continuous learning health care system” (coined from Institute of Medicine) through utilization of the electronic health record (EHR). The “green button,” placed within a patient EHR would help clinicians find similar patients and provide support for patient care decisions in the absence of evidence. Precursors to the “green button” idea are the “blue button” from the VA system (for beneficiaries to obtain health care information in a consolidated way) and “infobuttons.” (Cimino, 2013).

Challenges with implementing the green button include the necessary policies and incentives to be in place, HIPAA/privacy, IRBs, informed consent, and visualizing the results in a meaningful way. The authors suggest health care systems initially approach integration of the button as a qualitative improvement process.

Patient preferences are the “third leg of the stool” for EBM. If patient preferences or values were included in the patient record, similar patients with similar care preferences could be identified. This is the first article we have read that discusses the obligation placed on the patient to contribute patient-generated data. Might there be levels of participation to which a patient could agree? Issues of informed consent, privacy and related ethical concerns could be part of what librarians help bring to the table. Another potential role for librarians, here, is mapping MeSH terms to SNOMED and other vocabularies.

References:

http://blogs.cdc.gov/genomics/2014/02/13/is-evidence-based/
http://informaticsprofessor.blogspot.com/2015/05/is-medicine-precise-enough-to-achieve.html
http://informaticsprofessor.blogspot.com/2016/01/biomedical-data-science-needs-measures.html
http://www.ncbi.nlm.nih.gov/pubmed/25600256
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4214055/
http://www.icmje.org/news-and-editorials/M15-2928-PAP.pdf
http://www.niso.org/apps/group_public/download.php/15375/PrimerRDM-2015-0727.pdf
Cimino JJ, Li J. Sharing infobuttons to resolve clinicians’ information needs. AMIA Annu Symp Proc. 2003;2003:815

Special thanks to the members of this journal club (Andrea Ball, Lynly Beard, Erin Foster, Suzanne Fricke, Ann Gleason, Mary Anne Hansen, Andrea Harrow, Ayaba Logan, Carol Perryman, and Laura Zeigen). Emily Glenn, Community Health Outreach Coordinator, NN/LM PNR, was the club moderator.

ABOUT Carolyn Martin
Carolyn Martin is the Outreach and Education Coordinator for the NNLM Region 5. She works with various libraries and community organizations to increase health literacy in their communities.

Email author View all posts by Carolyn Martin

The Inaugural PNR Journal Club: Data Curation, Data Management, and Librarians

Archived Content

Subscribe to all posts

Blog Categories

Pages