[Skip to Content]
Visit us on Facebook Visit us on FacebookVisit us on YouTube Visit us on YouTubeVisit us on Twitter Visit us on TwitterVisit our RSS Feed View our RSS Feed
The MARquee September 24th, 2021
CategoriesCategoriesCategories Contact UsContact Us ArchivesArchives Region/OfficeRegion SearchSearch



Date prong graphic

Big Data for Hospital Librarians – Are We There, Yet?

Posted by on May 1st, 2018 Posted in: Data Science

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by Elanor Pickens, Medical Librarian, Portsmouth Regional Hospital, Portsmouth, NH

“Big data” is a term that has been recently appearing in the literature of academic librarianship. However, as a hospital librarian, may I consider this justification enough to explore its applicability to my current position? The definition of big data is multi-faceted and covered elsewhere, and I will refer readers on (Gandomi & Haider, 2015). Here I simply seek to understand my potential role in this field.

Proposed by Martin (2016) is a framework of five basic categories, within which librarians may find at least one opportunity in supporting data science activities. One such role that the hospital librarian may engage in might be found within the “Literacy” domain, through teaching. This may include remaining informed about the various programming languages and software (R, Python, SAS, etc.), and their strengths and weaknesses, so that we may assist in the research process by educating about data management options. Additionally, we might lead researchers to resources that can help them visualize their data, to enhance their own comprehension of relationships and to enable them to better present their findings to others. If we also understand the form of data that they are collecting (structured, semi-structured, or unstructured), we may be able to help them discover studies that have utilized similar data so that they may anticipate any barriers of organization and analysis that they might encounter.

An example of a concern that researchers may have with respect to big data analysis is that a question needs to be relatively clear before the examination of any data (Brennan). Although slight revisions may be necessary, or new questions might arise, which could be further examined, there should not really be a significant change in direction of the original question. There are too many data and methods of analysis to proceed in big data research without a clear understanding of where it leads. Librarians are proficient at refining questions in order to get at the core of a research query as part of their reference interview skillset. We also have at least some basic knowledge of the different types of data analysis methods that are described in the literature, although much of this exposure does not include research conducted with big data. Iwashyna and Liu (2014) state that, in contrast to the formulation of one hypothesis and advance selection of the data that will need to be carefully collected to support or refute this hypothesis (traditional epidemiology), the multitude and variety of big data are able to be custom fit to epidemiological studies to identify patterns. Similarly, the authors discuss that a number of analytic methods can be adapted and combined when using big data, whereas traditional epidemiology is usually restricted to one analytic method per study. For hospital librarians, an awareness of research question modifications and understanding of methods of data analysis may not necessarily yield additional support to experienced researchers, but it may still help guide those individuals who are subject specialists in their field but new to the research process.

Groeneveld and Rumsfeld (2015) note that big data certainly has predictive power in clinical decision-making; however, big data cannot determine which associations are due to random events, nor can they identify causal associations. In addition, the authors point to a lack of large-scale analytical methodology for scientific comparison studies comparable to the level currently available in big data analysis. It is necessary for researchers, especially those seeking publication, to consider the reproducibility of their studies when they are using such highly adaptive and dynamic models of analysis. In this case, there may be little more for hospital librarians to do than to continue to assist researchers in discovering the current best practices for big data publication with respect to transparency.

For the solo hospital librarian, who is often juggling multiple tasks, big data may simply not be a sphere in which to operate. I feel that despite having gained a useful surface understanding of big data concepts, I am still unable to determine how I may be able to apply this knowledge in my current position, or whether I would even have the time to do so. And without a clearly-defined path, management support for professional development opportunities is essentially non-existent (Burton, 2017). One area of interest that has piqued my curiosity is how big data may inform my organization’s operations through the revision of protocols, thereby improving clinical practice. I have never really had much opportunity before to consider how data collected by our EHR truly inform patient care, and especially how they might impact revenue. For example, patients who come in for vaccinations may also receive additional preventive care (Kaelber, 2016). But in order to actually delve into the big data arena, it may be up to each individual librarian to either maintain a basic awareness, or seek out opportunities that may or may not be supported at the organizational level.


Burton, M., & Lyon, L. (2017). Data science in libraries. Bulletin of the Association for Information Science and Technology, 43(4), 33-35.

Brennan, P. (2015). Big Data in Nursing Research. NINR Big Data Boot Camp Part 4: Big Data in Nursing Research. https://www.youtube.com/watch?v=KOFLQ5z05f8

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.

Groeneveld, P. W., & Rumsfeld, J. S. (2016). Can big data fulfill its promise? Circulation: Cardiovascular Quality and Outcomes, 9(6), 679-682. PMCID: PMC5396388.

Iwashyna, T. J., & Liu, V. (2014). What’s so different about big data? A primer for clinicians trained to think epidemiologically. Annals of the American Thoracic Society, 11(7), 1130-1135. PMCID: PMC4214055.

Kaelber, D. (2016). Using Clinical Data to Improve Clinical Patient Outcomes. NNLM Forum (online). http://www.kaltura.com/tiny/7e5k7

Martin, E. R. (2016). The Role of Librarians in Data Science: A Call to Action. Journal of eScience Librarianship, 4(2), 7. https://escholarship.umassmed.edu/jeslib/vol4/iss2/7/

Image of the author ABOUT Hannah Sinemus
Hannah Sinemus is the Web Experience Coordinator for the Middle Atlantic Region (MAR). Although she updates the MAR web pages, blog, newsletter and social media, Hannah is not the sole author of this content. If you have questions about a MARquee or MAReport posting, please contact the Middle Atlantic Region directly at nnlmmar@pitt.edu.

Email author View all posts by
This project is funded by the National Library of Medicine, National Institutes of Health, Department of Health and Human Services, under Cooperative Agreement Number UG4LM012342 with the University of Pittsburgh, Health Sciences Library System.

NNLM and NETWORK OF THE NATIONAL LIBRARY OF MEDICINE are service marks of the US Department of Health and Human Services | Copyright | Download PDF Reader