Make Your Own File
Introduction
When writing a program, input is crucial with files being a common form of input. There are many file formats available to choose from such as JSON, CSV, and XML. These might seem like great options as they are flexible enough to work for many types of data. Falling back on well-known file formats, however, is often not the best way to encode your data.
Using existing file formats may help during prototyping phases but some of them come at a cost of performance and complexity if used in a shipped build. Custom file formats can allow for easier access of data as well as better control over the format. Having several formats helps with organization as the file extension describes what type of data the file stores.
Readability
An important consideration is if a file needs to readable by a human. Files being readable requires encoding the data as a text representation which has costs associated with it. One cost is translation of text data into other forms of data at runtime takes computation. The size of the file may also be less predictable and often larger as data of a certain size may have a variable length text representation. This can also lead to parallelizing the data can become more difficult as another thread may not be able to start analyzing the file halfway though with full context.
The text representation of numbers often varies in length
One example of custom file formats I’ve encountered is for storing data for the tilemaps in my game. The program I use to assemble the tilemap has a way to script custom export options. I use this to store the raw tilemap data to ensure that each piece of tile data is a constant size and as small as possible by not requiring the file to be human readable. Although the data may not be represented in plain text, debugging is still possible by viewing the hexadecimal data of the file. I did this extensively when I was first learning how to make the custom exporter for my files.
Parsing
The flexibility of file formats is wasted by using them for storing simple data like an array. Any additional features or capabilities that a file format provides usually make extracting the data from the file more difficult. This means that by not utilizing the full capabilities of a file format, complexity is added with little benefit. By restricting yourself to only the common file formats, you are more likely to use the wrong format for a given set of data.
Readability also contributes to files being harder to parse than if the data was stored directly. Human readable files often include delimiters such as commas and braces which need to be analyzed in order to extract the data. For a while, I was saving my tilemap data as a CSV file which is meant to be human-readable. This added some overhead and complexity to parsing the file compared to what I do currently which just involves first reading some information in the file header.
Non-data is often used for organizing data, even if it is very simple
Control
Custom file formats allow for complete control over the nuances of the format which outside formats may or may not document. This means that you should document the file format and include the version of the format in the file header. Making your own importers and exporters for each file format may add some additional work but it allows you to ensure a level of quality instead of relying on a third-party library.
I don’t imagine that the CSV file format will change much so that isn’t cause for concern in my case. However, the default CSV export option in the tilemap program I use may not contain all the information necessary for my tilemaps. By managing my own format I can alter it as much as I want so whatever information I need from the tilemap is included.
How many binary files are organized, including mine
Conclusion
By using custom file formats, many benefits can be gained in complexity, speed, and control. Using already existing formats can lead to several complications and lead to sub-par solutions for easy problems. I have implemented one custom file format in my game so far, and plan to implement several more. See if you can find parts of programs that you think a custom file format would be better suited than the current solution.