Wednesday, December 1, 2010

HashMap of a HashMap of a HashMap Problem -- Part 3 - Seeing the Problem

Previously we looked at the need for meta-data and good naming when using collections to form associations.  We also looked at how collections are used to circumvent typing.  Now we are really going to see how this circumvention can get you into trouble and what my lead developer was actually talking about in his argument with the other developer.
Up to this point we’ve created properties for our mappings without a class to contain them, this has been very purposeful.  Let’s ignore that we added the name to birthday mapping and assume we still just have name to nicknames that we want to map.  Let’s say these names are only first names.  Let’s say we want to map last names to a group of first names that match to a list of nicknames.  That’s probably confusing to read as we have now chained three things together and that makes it hard to follow.  A property for this sort of value structure, using only collections, would look like this:
public Dictionary<string, Dictionary<string, List<string>>> NameMap { get; }

I’ve gone back to the simple naming convention and lack of documentation on purpose to illustrate how I’ve seen similar relationships in code.  If you were given only this property with no documentation and a name like NameMap what would you make of it?  Would you have any idea where to begin?  Quite simply, no.  You’d have to start digging through the code to determine what all the values are for.  Let’s be “kind” and add documentation and improved naming.

/// <summary>
/// Gets a map of last names to a map of first names to nicknames for
/// the person with that first name. The dictionary maps
/// last names -> first names -> nicknames
/// </summary>
public Dictionary<string, Dictionary<string, List<string>>> LastNameToFirstNameToNicknameMap { get; }

If you can understand what that comment intends to communicate then congratulations.  I’m going to be honest and tell you I spent five minutes trying to make a coherent sentence to describe the structure and believe I have failed miserably.  The only way I could even remotely express what was being contained in the structure was to use the arrow notation at the end of the comment.  That only gives me the very basic information.  The fact that I was having trouble explaining the data structure should set off red flags that it is not a good data structure.  Also, notice the ridiculous name I had to give the property in order to clarify its meaning.  There are eight separate words in the name.  It explains the object but I tend to get lost in the middle or reading it.  Its way too long, even though it spells out our relationship.  Again, this is another red flag.
We are now beginning to see the HashMap of a HashMap of a HashMap problem.  We only have two levels of nesting with our HashMaps and we already are beginning to see that it is becoming unmanageable. 
Let’s take it one step further and see how this gets even more complex.  Say we want to add one more level of lookups to our data structure and map nationality to our previous structure.  For simplicities sake I’m just going to consider everyone with the same last name shares a nationality and I am going to use a string for nationality.  Obviously, this is not how the real world works and we’d probably create an enumeration for nationality.  This example is contrived, but we’ll live with it because I’m attempting to illustrate a problem that occurs when you have nested data relationships.  This is the important factor for our contrived example.  If we want a property that returns our new relationships we would then have:
public Dictionary<string, Dictionary<string, Dictionary<string, List<string>>>> NameMap { get; }

We now have a HashMap of a HashMap of a HashMap with a contained list thrown in for good measure.  Good luck ever understanding what is going on here.  The scary thing is I’ve come across such a data structure more than once in my career.  It scares me that someone could look at a structure like this and not think that there was something wrong.  I’m not going to even attempt to add the meta-data and transform the name.  I encourage you to try to do so in a cogent manner.  It is near impossible to write an explanation of what is being mapped here.  Part of our lesson here is if you cannot explain the basic structure of a property in a sentence, it is probably too complex.
The structure also leads to complex access code with nested if-blocks just to get a value.  Let’s assume we want to see if a nickname belongs to someone with a given nationality, last name, and first name.  The code to do this would look something like:
public bool HasNickname(string nationality, string lastName, string firstName, string nickname)
    Dictionary<string, Dictionary<string, List<string>>> lastNameMap;
    Dictionary<string, List<string>> firstNameMap;
    List<string> nicknames;

    if (NameMap.TryGetValue(nationality, out lastNameMap))
        if (lastNameMap.TryGetValue(lastName, out firstNameMap))
            if (firstNameMap.TryGetValue(firstName, out nicknames))
                return nicknames.Contains(nickname);
    return false;

There are 3 nested ifs and three objects being created as outs that are collections themselves.  The long and short of this being that even a simple boolean check becomes insanely complex.  I even tried using LINQ to write a simplified version of this and I got myself lost in it.  Point being is the end user has to write complex data accesses to use our structure. 
The overarching point I’m trying to make is that if you have more than one level of nesting of collections you should re-evaluate how you are writing your code.  You are relying on weak references to build your structure.  Your code will be very hard to maintain.  It is hard enough to try to explain what the structure is encapsulating.  It is even harder to understand it without a priori knowledge.  My general rule of thumb is that anything past our original nesting of a list in HashMap (Dictionary) is too much and should be rewritten.  You are putting a large, unneeded burden on code that uses your data structure. 
Next time we will see what we should do instead  to fix/avoid the problem.