Data Structure Reuse

Introduction

While building a compiler, I found hash maps to be used many times throughout the codebase with a variety of key and value types. At first, whenever a new hash map needed to be implemented I would copy some of the code from another hash map’s implementation and fix it to work with my new key and value types. I intentionally held off on using C++’s “template” feature to facilitate generic hash map types for easy data structure reuse. Recently, however, I decided to implement these templates and would like to go through the benefits and tradeoffs of reusing data structures that led me to make the decisions I did (not about C++’s template feature specifically). 

Benefits

One apparent benefit of using a template is that it takes significantly less time to implement a hash map with different key and value types. Instead of copying one implementation of a hash map and changing some of the types and variable names, only a simple declaration to specify the type of the key and value is needed. Another benefit is that data structure reuse naturally leads to more code reuse in general. If a change is made to one of the hash map’s functions (e.g. a bug fix, or optimization), tracking down every function which may need that same change is no longer necessary. This both saves time and reduces the chances of a known bug going unaddressed accidentally.

Tradeoffs

Writing a generic implementation of a data structure may be more time consuming than a specific implementation as thinking abstractly sometimes lowers productivity. This means that writing a generic data structure is often slower if it is only used once. Another tradeoff is the performance impact that using the same data structure in several scenarios can have. For instance, every type of hash map in my compiler could be implemented the same way, but every hash map has different data and data access patterns associated with it. Therefore, each hash map would call for a bespoke implementation in order for them to be maximally performant, which would make a generic hash map inappropriate. 

Conclusion

Initially I resisted the use of templates for data structure reuse because I didn’t want the performance of my compiler to be negatively impacted. After several different hash map types being required, I have realized that data structure reuse does have some merit. While performance should always be a concern, addressing it immediately can potentially be a waste of time. Making a plan to optimize something later if it is a problem can be better for overall productivity. Because I might eventually want to phase out the use of the generic hash map for performance reasons, I have set up a macro to disable the template in order to facilitate that transition more easily. This analysis has helped me evaluate if generic data structures are worth it, and I hope it has helped you decide to either start or stop using generic data structures in parts of your code.

Next
Next

Combining Data Structures