What File Format Is The Best For Long-Term Preservation?

By Author thumbnail image Emily Shaw | on


As the digital age accelerates, the world is producing more and more data. According to the International Data Corporation (IDC), a provider of market intelligence for the information technology industry, global data will increase from 33 zettabytes in 2018 to 175 zettabytes by 2025. As each zettabyte is one billion terabytes, this is a very large amount of new data.

Thankfully, data storage technology is improving and it is cheaper to store large amounts of data these days. However, long-term data preservation remains a significant challenge. More specifically, choosing the right file format is crucial to ensure that digital content remains accessible in the future.




Factors in considering the right file format

When it comes to long-term storage, there are many factors to consider. For example, what file format to use, the type of storage media, and what metadata to include (metadata is data that provides a description of other data).

In this article, we will focus specifically on the choice of file format and discuss the pros and cons of each format for long-term preservation. It is important to understand that a single file format will not meet the needs for all long-term storage needs. This is because the ideal format needs to be selected for the specific problem at hand. For example, the type of content, the amount of data that needs to be stored, and the anticipated lifespan of the content.

 

Preserving files

Photo credit: Unsplash/Benjamin Lehman

 

The most popular choice: the PDF

One of the most popular file formats for long-term preservation is the PDF (Portable Document Format). This is because PDFs are designed to be platform-independent, meaning they can be created and viewed on any device without requiring specific software. Unsurprisingly, PDFs are widely used for storing and sharing documents. PDFs are also a very efficient way to store data, as they are compressed (meaning less storage space is needed to store the data).

PDF pros and cons

One of the main advantages of a PDF file is that they are self-contained. This means that all the content, including images, text, and formatting, is stored within the PDF file itself. This makes it less likely that the content will become corrupted or lost over time. Of course, this assumes the file is properly backed up and stored.

One of the main drawbacks of PDFs is that some PDF formats are not future proof. This is because PDFs are regularly updated to improve security and viewability on new device types (e.g. mobile phones). This means that some PDFs may become inaccessible in the future if the software used to view and create such PDFs become obsolete.

In practice, PDFs are so widely used that this is an unlikely scenario. Furthermore, the PDF/A (archival) format is specifically designed for the long-term preservation of electronic documents and should mitigate the issue described above.

 

Useful for storing images: the TIFF

A TIFF (Tagged Image Format File) is often used to store images for the long-term. They are typically used to store high-quality graphic designs and photographs in a way that does not lose image quality. 

TIFFs are designed to be platform independent and viewable on any device. TIFFs are also self-contained, meaning that all the image data is stored within the file itself. TIFFs thus have many similarities with PDF files described above.   

TIFF pros and cons

One of the advantages of a TIFF file is that they are lossless format. As described by Adobe (who acquired the creator of the TIFF file), this means that the image quality is not degraded when the file is compressed or decompressed. This makes it an ideal format for storing high-quality images just as photographs. TIFFs also support metadata, which can be used to store information about the image such as a short description, or the date it was created.

Unfortunately, the main drawback of TIFFs for long-term preservation is their large file size. By design, TIFFs are not as efficient as other image formats such as JPEG files. They require a lot of space, and this is a significant consideration for organizations that need to store large amounts of image data.

 

Future proof storage: the ODF

An ODF (OpenDocument Format) file is an alternative to Microsoft Office file types. Open document filetypes include ODT (OpenDocument Text – comparable to Word document files), ODS (OpenDocument Spreadsheet – comparable to Excel spreadsheet files), and ODP (OpenDocument Presentation – comparable to PowerPoint files). ODF is an open standard, meaning that it is not controlled by a single company or organization.

ODF pros and cons 

One of the main advantages of ODF is that they have been designed to be future-proof. Specifically, ODF files are XML-based, which means that they can be read and modified by any software that supports XML. As XML is the backbone for a lot of software applications, it is highly unlikely that ODF files will become inaccessible in the future.

The main drawback of an ODF file is formatting issues. Unlike a PDF file (which maintains its formatting), an ODF file may have different formatting based on the device used to open the file. Specifically, problems may include font substitution, page layout problems, or loss of special effects or features. This is especially an issue for files with complex formatting. Lastly, ODF files can also be larger in size compared to Microsoft Office or PDF files.

 

In conclusion: choosing the right file format is key

As described above, the right file format can help ensure your data remains accessible in the future. Some of the best options for long-term preservation include the PDF (particularly PDF/A), TIFF, and ODF file types. They are widely used for long-term storage as a result.

If you choose another file type, it is a good idea to consider compatibility and accessibility when making such a decision. This will help you choose a file format ensuring the longevity of your digital data.

 



Author image

Emily Shaw is the founder of DocFly. As a software developer, she built the service from scratch and is responsible for its operations and continued growth. Previously, she studied engineering at the University of Hong Kong and mathematics at the University of Manchester.