Now that we’ve gone over the basics of research data management, here is a roundup of tools you can use for working with data, or tools you can teach to researchers to help them better manage their data. Some of these might be familiar to you, but we hope this will be helpful to have them listed in one place, or that you might discover something new to you.
Tools for working with data can encompass a wide variety of skills and applications. Some are very user friendly, and some require more of a learning curve. The great thing is that there are lots of ways to learn these tools, from online training, or to in-person workshops when gatherings are able to resume.
Openrefine is a “free, open source, powerful tool for working with messy data “. This browser-based tool is great for cleaning up spreadsheets and organizing messy data sets. OpenRefine allows you to explore your data sets by easily sorting them, as well as cleaning, transforming, reconciling, and matching data. Now it’s easy to make sure all your terms are matched and spelled correctly, empty rows can be deleted quickly, and different types cells can be merged together. OpenRefine offers free training videos and resources through their website and has a robust support community.
Jupyter Notebook is an “open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.” Jupyter’s strength lies in the ability to integrate many different types of data in one place, and to run live code and other tools all contained in one “notebook”. Like with OpenRefine, Jupyter has a robust community of users and support documentation.
Tableau is a data visualization tool that allows you to create dynamic data analysis. The tool is easy to learn but offers many levels of data analysis, and has versions of its software from basic users up to industry. Tableau was originally free, but now offers different subscription levels for their tools. It is still mentioned here even though it is no longer free since it is a commonly used tool and very powerful.
Coding in different software languages has become a vital part of the data analysis process, and many librarians are learning to code to work with their own data, or to help teach researchers how to use these tools. Some of the most common languages being currently being used by researchers are R and Python.
Information about R can be found here and Python information is here. In addition to the resources on the code developers’ websites, many places offer free lessons to learn these coding languages and how to use them to automate tasks or perform research analysis.
Once you have learned how to code and used it for your research analysis, a code sharing site like CodeOcean is a great place to share code so others can use it. CodeOcean says “Our cloud-based platform lowers the barriers for researchers to follow best practices of reproducibility. Researchers’ work is stored in compute capsules, preserving work for reuse today, tomorrow, or next year.” Code can be run live online, and you are able to use this platform for sharing with journal publishers and other resources.
In addition to the free online resources for all of the tools mentioned in this post, workshops through The Carpentries offer workshops on many of these programs and code languages. Typically the Carpentries workshops are held in-person, but many are now starting offer online workshops as well. If you are in the New England region, check out NESCLiChttps://nesclic.github.io/, The New England Software Carpentry Library Consortium, for local events and resources.
We hope this RDM Snippets series has been useful and informative! Stay tuned for more informative blog posts and other content on this page in the new year.