Wednesday, November 24, 2010

HashMap of a HashMap of a HashMap Problem -- Part 1 - Fundamentals

Years ago, when I was working on my first project as a fresh developer out of school my lead developer and one of the other developers got into a shouting match about the use of HashMaps. The lead developer’s argument was basically “be sure you are using the appropriate data structure.”  The developer’s argument was “of course it is right, HashMaps are fast, and I want fast access.”  At the time, I thought the lead developer was making a mountain out of a molehill.  It seemed obvious that the HashMap was appropriate in this situation.  There was a reason for his reaction, which I only learned later when I became lead developer of the same project.  During the course of the argument, he said something to the effect “I’ve seen HashMaps abused before, I just want you to be sure you are using them because you should, not because they are convenient.”  If he would have given context to his argument he would have helped his position greatly.  After some experience I’ve come to share his opinion.  However,  it was only after I saw the code he was referring to when he said he’s seen them abused.  When I saw this code, I immediately understood his angst and it is something that has shaped the way I develop today.  I am a better developer today because I understand the problem and avoid it.  What is this problem?  It is what I call the HashMap of a HashMap of a HashMap problem.

I use the HashMap class to express this problem because I first encountered working on a Java project.  It applies equally to any language and any collection types Dictionary, List, SortedSet, etc.   I will use HashMap and Dictionary interchangeably within this series because they are essentially the same data structure. The crux of the problem is that collections can be used as a way to circumvent the object orientedness of languages like C# or Java.  I say circumvent because they do not break the typing system, but they can be used to create pseudo classes that someone has to know meta-data about in order to use. 
In this first part I’m going to talk about the nature of using collections and how even the simple use of collections requires us to be careful.  It is not an argument against using collections but the groundwork that will help illuminate the issue we are dealing with.

Let me start by saying I hate seeing a return type of a Dictionary<TKey,TElement>, especially if it propagated up through many methods.  It’s not that this is bad code it is because of my experience has shown me that the information provided with the method is often not as robust as I’d like. I hate seeing  it because it often means I have to trace the method calls down to the source to understand what values are in the collection.  Let’s examine what I mean at a very basic level.  Let’s say I have a property in C# defined like this:

public Dictionary<string, string> NameMap { get; }

Can you tell me what that property contains?  Of course you cannot.  You can make some guesses that one of the strings is a name of some sort and another might be a name or it might be something else completely.  What the person in the code has done is created an implicit pair type and put it into a collection for quick access.  With a Dictionary<string, string> data structure you are reliant on the meta-data to understand what it contains.  Hopefully, the writer commented what the key and value were well, otherwise you’re going to have to use some archaeology to understand what the collection contains.  This structure  is not bad as long as we write the appropriate documentation and use good names for our properties and methods.  In this situation good commenting and a little rename can go a long way.  If I change the NameMap to be NameToNicknameMap its purpose becomes much clearer and I can guess the key is the name and the value is a nickname.  Adding comments can make this clear still.  Let’s change the comment as well and we get the following code:

/// <summary>
/// Gets a map of a person's name (the key) to their nickname
/// (the value) so we can look up their nickname quickly
/// </summary>
public Dictionary<string, string> NameToNicknameMap { get; }

We now can clearly understand that we are mapping from a name string to a nickname string. Therefore, our first lesson from this is good naming and good commenting can solve simple collection implicit typing problems.  In the real world, I often find that even this level of detail is lacking.  Developers often have an attitude that it is obvious what a collection contains, even though we’ll forget ourselves by the time we return to the code at some future point.  It is important even in this simple case that we make it clear what the values we are handling are.  Naming and comments go a long time to improve the maintainability of the code. 

The important take away from this part of our series is that even simple collections require clarifying documentation.  Without clarifying documentation they can be confusing, cause developers to guess at their meaning, or dig through other code to understand them.

In the next part we will talk about adding complexity to the information we want available in our data structure and the implications on clarity that occur when we add complexity.

1 comment: