This month on RDM Snippets, let’s talk about storage and preservation. It may seem like these are the same thing, but while there is some overlap there are also important differences between the two.
Where you store your data can and should change throughout the project lifecycle. While you are preparing your project, data and documentation may be stored on your laptop, office computer, or a shared drive in the cloud if you are collaborating with others. As your project progresses through the data collection phases, you again may be using a laptop or other devices, storing data in the cloud, or uploading it to a server.
Data analysis may be conducted off a hard drive or uploaded to a high performance computer for processing. Data may also be analyzed by a researcher hand-coding results or writing code to process the observations. Again here, researchers may also be working in the cloud to collaborate on analysis.
At the end of the project, data and associated files including a README and/or data dictionary should be packaged together and stored on a hard drive or other backup location. Put data in open formats, as we discussed last month, using the most open file format for your data type.
Since data may be moving between locations during the lifetime of a project, take care to make sure that everyone involved knows where the data will be at what point.
It’s also crucial to check funder and institutional requirements for data storage, or for privacy and encryption requirements. Human subjects data or biomedical research data that falls under HIPAA regulations have encryption requirements while data is being stored and transmitted.
Having multiple copies of your data as backups is also essential for good data management. A good rule of thumb is the 3-2-1 rule. As explained well by the Penn Libraries Data Management Libguide:
|A common best practice for backing up and storing your data is the 3-2-1 Rule which says you should keep:
3 copies of your data on
2 types of storage media and
1 copy should be offsite
Having 1 copy offsite protects your data from local risks like theft, lab fires, flooding, or natural disasters.
Using 2 storage media improves the likelihood that at least one version will be readable in the future should one media type become obsolete or degrade unexpectedly.
Having 3 copies helps ensure that your data will exist somewhere without being overly redundant.
Storage does not equal preservation. Just because data is stored somewhere, this does not mean that storage is safe for the long-term. One of the most common ways to preserve data is to put it in a repository. This will be the topic of next month’s RDM Snippets post.