Tuesday, December 7, 2010

HashMap of a HashMap of a HashMap Problem -- Part 4 - A Way To Correct The Problem

Previously we have seen how collections can be used to circumvent the type system.  They can be thought of creating classes without actually creating classes.  This can be useful when you have simple relationships you want to store, but confusing as you add more complicated relationships.  They lead to maintainability nightmares where even the original author does not understand what is occurring.  The code is mucked up with nested data structures both in the definition of the relationships and the use of them.  Now we’ll see how to correct the problem.
The way to correct the problem is not revolutionary.  In fact, it is not something that I used to think I had to explain.  It was only after I saw multiple developers create the HashMap of a HashMap of a HashMap problem and explained what they should have done that I realized not everyone saw it as intuitive.  The solution is simply applying good object oriented principles.  Let us return to our Person class we created in Part 2 (for now we will keep the birthday out of the relationship):
public class Person
{
    public string Name { get; }
    public List<string> Nicknames { get; }
}

In Part 3 we added complexity that the name was actually just a first name and that we wanted to tie last name to the first name and the list of nicknames. Let us add this information to our Person class:

public class Person
{
    public string FirstName { get; }
    public string LastName { get; }
    public List<string> Nicknames { get; }
}

We have now defined a class that encapsulated the information contained in our LastNameToFirstNameToNicknameMap in Part 3 into an object.  If we have a list of Persons we have the same information as we did in our complex nested properties.  However, we’ve lost some of the ability to quickly retrieve values based upon name values.  We have to look through the list of persons for a match.  There are a couple ways we can approach the problem at this point, depending upon how we want to expose and deal with our data.  The following is just one way that this can be handled.  Let’s call a collection of people who share a last name a Family.  A family class would on a basic level look like this:

public class Family
{
    public string LastName { get; }
    public List<Person> People { get; }
}

 I’m purposefully omitting the constructor and methods needed for management of the family for simplicity.  Let us also think about the other layer of data we wanted to have which was nationalities and add that data to our objects as well.  Our Family and Person class would look as follows:

public class Person
{
    public string FirstName { get; }
    public string LastName { get; }
    public List<string> Nicknames { get; }
    public string Nationality { get; }
}

public class Family
{
    public string Nationality { get; }
    public string LastName { get; }
    public List<Person> People { get; }
}

You’ll notice that Nationality and Nickname is duplicated between the two objects.  This is purposeful on my part.  This could be handled in other ways, like as a Person could have a reference to his family.  This makes sense in our contrived example, but in practice, I find the way this is structured to be more practical.  In reality, I often want to create the type of object a Person is our example of without knowing what family it will belong to at creation time.  However, I know the properties like LastName and Nationality at creation time.  What I mean is that I want to construct families from people not create families and then add people to it.  It is a subtle difference, but is the general case where I’ve seen the HashMap of a HashMap of a HashMap problem occur.  You of course should write the data structure that is most appropriate for your situation.
Ultimately, I need some sort of top-level object that will provide be access to the values.  I could just use a list of Families and have some outside object manage the adding and removal of items, but that breaks the object-oriented approach we are going for.  What we really want to do is encapsulate the management logic into a class and provide convenience methods for access and manipulation.  Let’s call our top level object Families.  I am going to construct a class with some basic features for creating our constructs based upon a person.  Let me preface this by saying it is meant to illustrate the structure, not be a fully functioning example:
public class Families
{

    /// <summary>
    /// Private class to encapsulate the logic connecting a last
    /// name to a family
    /// </summary>
    private class LastNameFamilyMap{
        private Dictionary<string, Family> _familyMap;

        public LastNameFamilyMap()
        {
            _familyMap = new Dictionary<string, Family>();
        }

        public void AddFamily(Family family)
        {
            _familyMap.Add(family.LastName, family);

        }

        public void RemoveFamily(Family family)
        {
            _familyMap.Remove(family.LastName);
        }

        public Family GetFamily(string lastName)
        {
            Family family;
            if (_familyMap.TryGetValue(lastName, out family))
            {
                return family;
            }
            return null;
        }
    }

    /// <summary>
    /// Private class used to encapsulate the nationality
    /// to families relatonship.
    /// </summary>
    private class NationalityFamilyMap
    {
        private Dictionary<string, List<Family>> _nationalitiesToFamilies;
        private Dictionary<string, LastNameFamilyMap> _familyMappings;

        public NationalityFamilyMap()
        {
            _nationalitiesToFamilies = new Dictionary<string, List<Family>>();
        }

        public void AddFamily(Family family)
        {
            List<Family> families;
            if(!_nationalitiesToFamilies.TryGetValue(family.Nationality, out families)){
                families = new List<Family>();
                _nationalitiesToFamilies.Add(family.Nationality, families);
            }
            families.Add(family);

            LastNameFamilyMap lastNameMap;
            if (!_familyMappings.TryGetValue(family.Nationality, out lastNameMap))
            {
                lastNameMap = new LastNameFamilyMap();
                _familyMappings.Add(family.Nationality, lastNameMap);
            }
            lastNameMap.AddFamily(family);

        }

        public void RemoveFamily(Family family)
        {
            List<Family> families;
            if (_nationalitiesToFamilies.TryGetValue(family.Nationality, out families))
            {
                families.Remove(family);
            }

            LastNameFamilyMap lastNameMap;
            if (_familyMappings.TryGetValue(family.Nationality, out lastNameMap))
            {
                lastNameMap.RemoveFamily(family);
            }
        }

        public IEnumerable<Family> GetFamilies(string nationality)
        {
            List<Family> families;
            if(_nationalitiesToFamilies.TryGetValue(nationality, out families)){
                return families;
            }
            return new List<Family>();
        }

        public Family GetFamily(string nationality, string lastName)
        {
             LastNameFamilyMap lastNameMap;
            if (_familyMappings.TryGetValue(nationality, out lastNameMap))
            {
                return lastNameMap.GetFamily(lastName);
            }
            return null;
        }

        public Family GetFamily(Person person)
        {
            return GetFamily(person.Nationality, person.LastName);
        }
    }

    private NationalityFamilyMap _nationalityFamilyMap;


    private List<Family> _families;
    private List<Person> _people;


    public Families()
    {
        _nationalityFamilyMap = new NationalityFamilyMap();
        _families = new List<Family>();
        _people = new List<Person>();
    }

    public IEnumerable<Family> Families { get { return _families; } }
    public IEnumerable<Person> People { get { return _people; } }


    public void Add(Person person)
    {
        Family family = _nationalityFamilyMap.GetFamily(person);
        if (family == null)
        {
            family = new Family(person);
            _nationalityFamilyMap.AddFamily(family);
            _families.Add(family);
        }
        else
        {
            family.People.Add(person);
        }
        _people.Add(person);
    }

}

This may seem over whelming at first.  I also apologize for the dearth of comments.  This is just one way you can implement the management of the relationship we had in only collections at the end of Part 3.  There is a lot more complexity in the code here than the basic collection structure we had, but it is all for good reason.  What we are doing here is encapsulating the management into the object itself.  We are using two private classes to abstract away the underlying collections.  
The private classes may seem complex, but they are in reality quite simple.  Their purpose it maintain specific relationships between objects and enable us to easily retrieve those relationships. Lets look at the LastNameFamilyMap alone.
    /// <summary>
    /// Private class to encapsulate the logic connecting a last
    /// name to a family
    /// </summary>
    private class LastNameFamilyMap{
        private Dictionary<string, Family> _familyMap;

        public LastNameFamilyMap()
        {
            _familyMap = new Dictionary<string, Family>();
        }

        public void AddFamily(Family family)
        {
            _familyMap.Add(family.LastName, family);

        }

        public void RemoveFamily(Family family)
        {
            _familyMap.Remove(family.LastName);
        }

        public Family GetFamily(string lastName)
        {
            Family family;
            if (_familyMap.TryGetValue(lastName, out family))
            {
                return family;
            }
            return null;
        }
    }

We have specific methods to add, remove and add families.  This gives us explicit control over changes to our structure.  This is something we don’t get with straight collections.  We’ve encapsulated the underlying dictionary into a structure so that if we want to keep track of additional information from our Family object we are able to easily.  Our management points for changing this are only in  the AddFamily and RemoveFamily methods of the class.  This means we can easily provide a reverse lookup Dictionary, which would be useful if LastName wasn’t part of the Family object (which in the real world might be the case).  We also are sure that nobody can change our underlying collection without accessing it through our methods. 
This is basic data encapsulation but it’s something that is not often done.  I was working on a relationship like this just the other day.  I had to allow the storage of a new type of relationship.  If I had just had a HashMap of a HashMap of a HashMap I would have never been able to enforce the tracking constraints that I did.  Let say I wanted to make it so only families with a last name that began with A were being tracked.  This is an easy check to put into our Add method.  Or I can make it so families with more than three members cannot be removed.  The data encapsulation puts our logic into this one location and ensures that the rules are always followed.
 Looking at our private classes we could have substituted the LastNameFamilyMap  reference in NationalityFamilyMap with just a straight Dictionary but then we’d be back to where we started with a Dictionary containing a Dictionary.  In essence, this is what that relationship is, but we are using another object to indirect this relationship.  All the complexity of dealing with the dictionary moves inside the object making additions and removals from it much simpler.
The same is the case for our NationalityFamilyMap.  Here we actually have two Dictionaries.  One allows us to easily keep track of Families based upon nationality while the other provides a quick look-up for a Family based upon Nationality and LastName of a Person. 
Because of the sub-classes manages the adding a person to our collection is quite simple.  We can easily look-up if that person’s family exists through our sub-class.  If it does not exist, we can easily create a new one and add it to our internal collections and objects.  The onus for the complicated logic needed to create add a new Person, including creating a new Family is on us as a developer of the management class, not the user of the data structure.  Not included in the code are the accessors that the definition of this structure might require.  We might want to get all the families of a nationality.  This would involve simply adding a method to our Families class that looks like this:

public IEnumerable<Family> GetFamilesOfNationality(string nationality)
{
    return _nationalityFamilyMap.GetFamilies(nationality);
}

This is much simpler than what we would have had to write if we still had our collection only implementation from Part 3.  The function that would then have looked like:
 public IEnumerable<Dictionary<string, List<string>>> GetFamilesOfNationality(string nationality)
{
    Dictionary<string, Dictionary<string, List<string>>> lastNameMap;


    if (NameMap.TryGetValue(nationality, out lastNameMap))
    {
        return lastNameMap.Values;
    }
    return  new List<Dictionary<string, List<string>>>();
}

Underneath the hood, the code would look very similar but as far as readability goes, the version with the Family class is much easier to read.  The layering of objects also allows us to keep congruent sets of information to speed up retrieval when we want something other than a direct lookup.  We keep a running list of families at very little cost.  If we wanted to keep them sorted in some manner (add order or on some value) this would be easy as well.
Part of what makes this method of encapsulating values valuable is controlling access.  We limit the access points through our add and remove methods.  We therefore are able to add new backing collections that aid in our access easily.  Since we have limited access points, object management only needs to be added in those specific points.  The added complication in our object ultimately makes for a more elegant code that uses our management object.  It is also more maintainable as the encapsulation pinpoints responsibilities within objects.
We could keep examining the structure in detail and analyze all of its facets, but I think this is sufficient for now.  I would encourage you to try writing accessors to get values out of each structure and see which one you like using better.  When writing accessors for the class structure feel free to add new collections to improve retrieval time and see how the code easily falls into place. 
What I hope you have gotten out of the last series of posts is an understanding of the HashMap of HashMap of HashMap problem.  I hope you see that in code it is an unwieldy structure to use.  It is a way of circumventing object-oriented design.  I hope I have given you at least one path on how you can transform that structure into classes.  My hope is that in the future when you see nested collections you will question if that is really how you want the data to be contained.  It is more work up front to create the classes to contain the data but in the end, you will save yourself headaches if you do write specialized classes to manage complex relationships.  

3 comments: