Shirley Zhao, MSLIS, MS
Spencer S. Eccles Health Sciences Library
The University of Utah
Shirley received funding through the NNLM MCR Professional Development Award to attend Data Science and Visualization for Librarians.
One of my goals is to support researchers with their data and scholarly communication needs. My knowledge of data science and data visualization has mostly been acquired piecemeal through webinars, online courses, and self-study over several years. In my current role as a librarian faculty member, I have been invited to guest lecture in a number of undergraduate and graduate level courses on data visualization best practices, using the ggplot package to generate plots, and writing in LaTeX. However, I still currently lack the depth of knowledge (e.g. in statistical analysis, data wrangling, using APIs, etc.) and practical experience to be an invaluable member of a research team. The Data Science and Visualization Institute for Librarians (DSVIL) looked like the perfect program to build up additional knowledge and skills.
Hosted by North Carolina State University Libraries, DSVIL is a weeklong immersive course to develop knowledge, skills, and confidence to work with researchers in data-heavy areas. Topics included data description, sharing, reuse, cleaning, exploration, analysis, and visualization; version control; bibliometric network analysis; web scraping; and mapping and geospatial visualization. Instructors were experts in their fields and affiliated with local institutions.
Participants came from all over the country, with fantastic representation from health sciences libraries. We used the hashtag #DSVIL on Twitter to post resources, insights, and photos. Every morning, we spent a half hour over breakfast reflecting on the previous day’s learning. Sessions ranged from 1 hour to 3.5 hours depending on the topic, but there were a number of sessions where we couldn’t get more in-depth because we ran out of time.
I appreciated that instructors included a hands-on segment and made their materials available online. The tools we covered are GitHub for version control; Sci2 and Gephi for bibliometric network analysis; OpenRefine for data cleaning; Web Scraper for getting data off webpages; ATLAS.ti and NVivo for qualitative analysis; QGIS and CARTO for geospatial visualization; and Tableau and Plotly for general data visualization. We even covered using Excel for data cleaning and visualization because it’s a tool most people are already familiar with using and have readily available.
In addition, I found the sessions that discussed security, legal, and ethical uses of data incredibly valuable. For example, check out this website to figure out how easily identifiable you are as an illustration of how difficult it can be to de-identify a dataset while still being able to do meaningful analysis. At the end of the week, some participants and invited speakers gave lightning talks on what projects their institutions are working on. It was inspiring to learn about the many different efforts around data support happening around the country.
Overall, I had a fantastic experience doing some concentrated exploration of data science and data visualization. I look forward to putting some of these tools into practice. Plus, belonging to a cohort will be extremely valuable moving forward as we support each other in our endeavors. I initially dismissed the idea of applying because the tuition alone was $2500. But it is possible to make it happen! In the end, I was able to attend DSVIL with support from my library, MLA’s Continuing Education Grant, NNLM MCR’s Professional Development Award, and extended family in the area who generously opened their home to me. Thank you!