This is the second blog post in a series authored by several individuals who received professional development scholarships for completing the Biomedical and Health Research Data Management Training for Librarians. In this installment, a scholarship recipient, Alyssa Grimshaw, describes her professional development opportunity to attend the Research Data Alliance. For more information about upcoming research data management classes, webinars and events please visit the NNLM Data Driven Discovery Website and the NNLM NER website.
Alyssa Grimshaw, Access Services/Clinical Librarian – Cushing/Whitney Medical Library, Yale University
I had the pleasure of being part of the 1st cohort of the “Biomedical and Health Research Data Management Training for Librarians” offered by the National Library of Medicine and the National Network of Libraries of Medicine Training Office. To further our knowledge about research data, the cohort was given the opportunity to attend additional trainings.
With this professional development award, I was able to attend the 13th plenary meeting of the Research Data Alliance in Philadelphia, PA on April 2-4th, 2019. The theme of plenary session was “With Data Comes Responsibility”. The Research Data Alliance sessions are considered working sessions, so it’s much more hands-on interaction then typical conferences with lecture style talks. Research Data Alliance is an international group and it was interesting to see how other countries handle their data and the policies that their countries have initiated. The theme of the session was brought out in several discussions with a strong message of advocating for countries to realize the importance of data that their countries are outputting and making them realize that their data are an asset, rather than a burden.
The most interesting data concept that I learned about during the sessions was synthetic data. Synthetic data are datasets that are generated programmatically and have been around since 1992. Synthetic data did not originate in the medical field but could change the way medical professionals use and share data. The advantage of synthetic datasets is that the data are generated from original research data and have added noise in the dataset to ensure privacy and randomization of patient information in medical data. Synthetic data can also reduce costs by making biomedical data available at scale and support real world application and AI development. This allows researchers to be more comfortable sharing their research with small population sizes without having to be concerned with patient information being identifiable. One example of synthetic data that was shared was a health care research project where researchers used the technology to generate slightly different views of the original radiology images. Something I would never have thought was possible!
I think a valuable lesson learned at this conference was that all data is not created equal. There are vast amounts of low-quality data and significantly fewer good quality datasets. I think that libraries are in a perfect place in institutions to help educate health care professionals how to assess the quality of the datasets, which will result in better quality research for the entire medical community. This conference was vital to my better understanding of not only research data management, but how data scientists view and use data. I encourage any librarian that would like to become data-savvy to attend the NLM/NNLM RDM workshops and courses.