[Skip to Content]
Visit us on Facebook Visit us on FacebookVisit us on YouTube Visit us on YouTubeVisit us on Twitter Visit us on TwitterVisit our RSS Feed View our RSS Feed
The MARquee December 10th, 2022
CategoriesCategoriesCategories Contact UsContact Us ArchivesArchives Region/OfficeRegion SearchSearch



Date prong graphic

Understanding How Librarians can Support Data Science and Big Data

Posted by on September 24th, 2018 Posted in: Data Science

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by Cathryn Miller, Social Sciences Librarian, Duquesne University, Pittsburgh, PA

Supporting data science and big data means supporting a new form of research.  Researchers engaging in data science often find or collect big data (large volumes of data), wrangle (prepare) the data, analyze it, and create reports (Federer, 2018).  A common technique used in data science is machine learning in which machines (computers) learn how to cluster, make recommendations, predict outcomes etc based on what the machines learn from the data.  In a healthcare setting, big data and data science can transform the clinical decision-making process.

How can librarians support researchers engaging in data science?  By no means do I think that librarians must learn advanced statistics or computer programming to support data science and big data.  We can support data science and big data by extending our strengths in providing access to information and in providing instruction.  In addition, librarians may want to consider learning about research data management, “the active management and appraisal of data over the lifecycle of scholarly and scientific interest” (Jones & Pickton, 2013).

Providing Access to Information: Focusing collection development efforts on data science methodology could be very helpful, especially for researchers who are venturing into data science for the first time.  Topics for books and ebooks might include machine learning, research data management, data visualization, text mining, algorithms, R programming language, python, data wrangling etc.  Curating those resources on a LibGuide or website, along with links to websites that help people learn about data science and obtain support (eg stackoverflow.com) might be especially useful.

Organizing Workshops:  Librarians can facilitate learning by organizing workshops.  Librarians have created and shared workshop materials on a variety of data science topics; Kristin Briney at The University of Wisconsin Milwaukee made her principles of data visualizations workshop available to be reused (Briney 2017).  There are also many workshops about research data management that librarians can use such as the Research Data Management Essentials workshop created by Alisa Surkis and Kevin Read at New York University (Read & Surkis, 2018).

Services Supporting Research Data Management:  Librarians’ specialized knowledge in finding, storing and preserving information could be particularly helpful for data scientists.  Consulting with researchers to help them create data management plans, think about the way their data are documented and organized, protected, stored and shared is a task that relates to librarian skillsets.   

Librarians don’t have to become experts in data science and big data to help those collecting and analyzing big data.  By providing access to information and organizing workshops, librarians can support data scientists.  Librarian support is key to helping researchers thrive, regardless of whether their data is big or small, and regardless of the methodologies they use.


Briney, K. (2017). Data Visualization Camp Instructional Materials (2017). UWM Libraries Instructional Materials. 4.

Federer, L. (2018). Data Science 101.

Jones, S., Guy, M., & Pickton, M. (2013). Research data management for librarians [training booklet]. Digital Creation Centre.

Read, K & Surkis, A. (2018). Research Data Management Teaching Toolkit. Retrieved from: https://figshare.com/articles/Research_Data_Management_Teaching_Toolkit/5042998

Image of the author ABOUT Hannah Sinemus
Hannah Sinemus is the Web Experience Coordinator for the Middle Atlantic Region (MAR). Although she updates the MAR web pages, blog, newsletter and social media, Hannah is not the sole author of this content. If you have questions about a MARquee or MAReport posting, please contact the Middle Atlantic Region directly at nnlmmar@pitt.edu.

Email author View all posts by
This project is funded by the National Library of Medicine, National Institutes of Health, Department of Health and Human Services, under Cooperative Agreement Number UG4LM012342 with the University of Pittsburgh, Health Sciences Library System.

NNLM and NETWORK OF THE NATIONAL LIBRARY OF MEDICINE are service marks of the US Department of Health and Human Services | Copyright | HHS Vulnerability Disclosure | Download PDF Reader