Posted by NNLM Region 7 on August 11th, 2020
Posted in: Data
Tags: data science, data_science
In this installment of RDM Snippets, let’s look at stable file formats. These formats will ensure that your data is (hopefully) preserved as long as possible, and your files are open and accessible to people with various types of software and operating systems. Once you’re done working on your project and preparing your data for long-term storage, you’ll want to convert your files to one of these formats for future accessibility.
Technology changes quickly, and while some programs such as Word or Excel are currently standard, it’s possible they will someday be replaced with another program. Many people also use open-source programs such as LibreOffice or Google, so saving your work in a format that can be used with multiple programs is a great idea.
Most software, and many lab instruments, have their own proprietary file formats for saving outputs. Those formats can only be opened using that original software or instrument. The good news is that those can usually be converted to the most generic, open format for the types of files you have.
Spreadsheets can be saved as comma-separated values (.csv), and word processing documents can be saved as .txt files. This will ensure that the document can be opened by the widest amount of software possible.
|Examples of Proprietary Formats||Open Format Equivalents|
|Excel (.xls, .xlsx)||Comma Separated Value (.csv)|
|Word (.doc, .docx)||Plain Text (.txt)|
|PowerPoint (.ppt, .pptx)||PDF/A (.pdf)|
|Photoshop (.psd)||TIFF (.tif, .tiff) or PNG (.png)|
|Quicktime (.mov)||MPEG-4 (.mp4)|
|MPEG 4 Protected Audio (.m4p)||MP3 (.mp3)|
Some of these open formats also help preserve documents for the long term. A PDF file is an open format that is readable by many software programs, and is also a way to preserve the contents of the document so they are locked and uneditable. This is something to consider based on the potential future use of the document.
Saving image files in TIFF or PNG format prevents the loss of image quality that comes with editing and reuse of JPG files. If you have high-quality image outputs from microscopes or other equipment, this is important to consider.
Now that you’ve got your data saved in an open format, next month we’ll talk about best practices for long-term storage and preservation.