In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course, to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.
Written by: Rose Fredrick, Digital Repository Librarian, Health Sciences Library, Creighton University
Big data has a different nature than traditional research data. It is more immediate and ephemeral which creates large, eclectic datasets that are not easily categorized or managed with traditional data science tools. It is changing the way research is done and the health sciences in particular are discovering new possibilities for studies by aggregating multiple sources of patient data, like wearable health trackers and electronic health records. These transformative studies also give health science librarians an opportunity to support data scientists by building upon existing research data management services. The librarian’s role in research data management is well-established and this creates a natural launching point for librarians to expand into big data research services.
Many libraries already provide a full array of data services, such as advising on data management plans, metadata and organization, public access mandates, data security, and the preservation and archival of data sets. Although big data has different needs when it comes to storage and analysis, many of the same services apply. Librarians have expertise in the ethical implications of data privacy, publisher and funder requirements, and in curating, organizing and preserving data. All of these skills and services can benefit big data researchers, but librarians do need to be aware of the challenges of big data.
While the knowledge base of librarianship and research data management can clearly be used advantageously for big data services, there can be barriers to librarians implementing these new services. Perhaps the biggest barrier is training. Depending on the services being offered, at a minimum librarians will need to become familiar with the nature of big data and how that shapes the research process, the correct terminology, and what resources are available to researchers. Furthermore, to offer the most robust services, librarians may need data science training or advanced technical training to assist with data processing. Not all institutions are prepared to train librarians so extensively nor will they experience enough demand to require a full-time data science librarian .
Librarians can offer more basic services without intensive data science and technical training, however. A first step could be to become familiar with the terminology, issues, and processes of using big data and be ready to refer researchers with questions to useful resources. Another option that requires a bit more investment is to offer instruction on crafting data management plans, understanding funder/publisher requirements for data, or choosing a data preservation platform. Librarians with more time could offer one-on-one advisory sessions on the data management plan for their research projects. Librarians without a data science background could also take advantage of training geared towards them, like the Data and Visualization Institute for Librarians or the Data Sciences in Libraries Project.
Additionally, as a digital repository librarian, I wanted to determine whether my library would be able to offer services for archiving big data. Currently, our institutional repository would not be able to house such large sets of data, so while we can advise researchers on preparing for preservation and selecting a platform, we will not be able to archive the data sets in-house. In the future, it may be possible to collaborate with our information technology department and create an archival system using Apache Hadoop . Some libraries with enough technical resources may already be able to take that step. In the meantime, I think libraries can offer counseling on choosing from the available platforms and perhaps offer data preparation advice based on their experience from archiving smaller sets of research data. In summary, health sciences librarians have relevant expertise and services to offer to big data research and they should consider what combination of services will be the best fit for their institutions.