Sunday, November 28, 2010

HashMap of a HashMap of a HashMap Problem -- Part 2 - Adding Complexity

In our previous part we started discussing how even basic collection usages require careful use of meta-data.  Our example was simple but now we are going to begin to add some complexity to it.  This complexity will begin to lead us down the path to the problem.
 Let us make some modifications to our example to add a little complexity.  In all likely-hood, a person will have more than one nickname.  We want to be able to retrieve all their nicknames based upon their name.  Therefore, we modify our property to be as follows:
/// <summary>
/// Gets a map of a user's name (the key) to their nicknames
/// (the value) so we can look up their nicknames quickly
/// </summary>
public Dictionary<string, List<string>> NameToNicknamesMap { get; }

This still is not horrible but it is more complex.  We now have a nested collection that causes us some grief in writing code to access the values.  This is not a big deal it just makes our code less clean.  We still can easily understand it if the property is well documented.
 Let’s step back for a second and make an observation of this structure.  Why would this structure exist?  It exists because we want to make some sort of connection between name and nicknames.  Doesn’t that sound an awful lot like what we should use a class for?  Let’s write a class that provides the same connection:
public class Person
{
    public string Name { get; }
    public List<string> Nicknames { get; }
}

We then would also change our property to:
public List<Person> People { get; }

There is one thing that we are missing in this implementation.  That is we’ve lost the ability to quickly lookup the nicknames based upon the name.  We will ignore this for right now and revisit it later on once we develop our structure further.
Our class has created as strong relationship between our name and nickname.  The relationship is explicit and there is no reliance on meta-data to understand the relationship.  Using the old structure if we wanted to also map name to another value, like birthday, we’d have to create another property like this:
/// <summary>
/// Gets a map of a user's name (the key) to their birthday
/// (the value) so we can look up their birthday quickly
/// </summary>
public Dictionary<string, DateTime> NameToBirthdayMap { get; }

In creating this property, we have to ensure that we provide the same detail in the meta-data and naming to keep straight what we are mapping.  If we want to create the same relationship using our class we just add a new property for the birthday and we have created the same relationship.  Our Person class now looks like this:
public class Person
{
    public string Name { get; }
    public List<string> Nicknames { get; }
    public DateTime Birthday { get; }
}

We now have strong links between the name, nicknames, and birthday.  This is simple object-oriented encapsulation and nothing revolutionary.  We also are using an example that leads us naturally to think of an object.  A person is an object type that is naturally created when we want to represent people in a system.  We know we want to encapsulate data into a person object early in design because we know that a person has many characteristics we want to put together.  We are using a person only as a concrete example.  The type of situation we are discussing here is one where originally, there is a single relationship and over time, similar relationships are added.  If our original relationship is done with a map then our future relationships will most likely be added with a map as well.  This is natural because programmers tend to follow the pattern of existing code.  Unless someone takes a step back at some point you easily could wind up with 5, 10, 20 maps mapping name to some other values.  The class containing these mappings feeds on itself because as the pattern becomes more pervasive there is less incentive to change.  This probably won’t happen with person data which can clearly be thought of in object terms.  It can easily happen with abstract concepts within a system that are hard to translate into a real world object because people probably aren’t thinking of the relationships between values as object defining relationships. 
Let’s take a step back and circle back to my previous point about circumventing the object orientedness of the language.  When you use collections to make connections between name and nicknames or name and birthday you are creating a weak relationship.  The relationship only exists as far as that Dictionary or HashMap exists.  You haven’t broken the typing system and syntactically you are still correct, but you aren’t following good object oriented design practices.  When you create the class that encapsulates the data you create strong relationships that are self evident to someone reading and using your code. 
Again, this is not the full problem, but another stepping-stone onto seeing the problem and the remedy for it.  It still can be manageable to handle data with only the collections if there is a small number of relationships.  It is also easy to refactor if we begin to recognize we should encapsulate these relationships together.  Next time we’ll start to see when writing code in this manner quickly becomes a problem as we add more levels.

Wednesday, November 24, 2010

HashMap of a HashMap of a HashMap Problem -- Part 1 - Fundamentals

Years ago, when I was working on my first project as a fresh developer out of school my lead developer and one of the other developers got into a shouting match about the use of HashMaps. The lead developer’s argument was basically “be sure you are using the appropriate data structure.”  The developer’s argument was “of course it is right, HashMaps are fast, and I want fast access.”  At the time, I thought the lead developer was making a mountain out of a molehill.  It seemed obvious that the HashMap was appropriate in this situation.  There was a reason for his reaction, which I only learned later when I became lead developer of the same project.  During the course of the argument, he said something to the effect “I’ve seen HashMaps abused before, I just want you to be sure you are using them because you should, not because they are convenient.”  If he would have given context to his argument he would have helped his position greatly.  After some experience I’ve come to share his opinion.  However,  it was only after I saw the code he was referring to when he said he’s seen them abused.  When I saw this code, I immediately understood his angst and it is something that has shaped the way I develop today.  I am a better developer today because I understand the problem and avoid it.  What is this problem?  It is what I call the HashMap of a HashMap of a HashMap problem.

I use the HashMap class to express this problem because I first encountered working on a Java project.  It applies equally to any language and any collection types Dictionary, List, SortedSet, etc.   I will use HashMap and Dictionary interchangeably within this series because they are essentially the same data structure. The crux of the problem is that collections can be used as a way to circumvent the object orientedness of languages like C# or Java.  I say circumvent because they do not break the typing system, but they can be used to create pseudo classes that someone has to know meta-data about in order to use. 
In this first part I’m going to talk about the nature of using collections and how even the simple use of collections requires us to be careful.  It is not an argument against using collections but the groundwork that will help illuminate the issue we are dealing with.

Let me start by saying I hate seeing a return type of a Dictionary<TKey,TElement>, especially if it propagated up through many methods.  It’s not that this is bad code it is because of my experience has shown me that the information provided with the method is often not as robust as I’d like. I hate seeing  it because it often means I have to trace the method calls down to the source to understand what values are in the collection.  Let’s examine what I mean at a very basic level.  Let’s say I have a property in C# defined like this:


public Dictionary<string, string> NameMap { get; }

Can you tell me what that property contains?  Of course you cannot.  You can make some guesses that one of the strings is a name of some sort and another might be a name or it might be something else completely.  What the person in the code has done is created an implicit pair type and put it into a collection for quick access.  With a Dictionary<string, string> data structure you are reliant on the meta-data to understand what it contains.  Hopefully, the writer commented what the key and value were well, otherwise you’re going to have to use some archaeology to understand what the collection contains.  This structure  is not bad as long as we write the appropriate documentation and use good names for our properties and methods.  In this situation good commenting and a little rename can go a long way.  If I change the NameMap to be NameToNicknameMap its purpose becomes much clearer and I can guess the key is the name and the value is a nickname.  Adding comments can make this clear still.  Let’s change the comment as well and we get the following code:

/// <summary>
/// Gets a map of a person's name (the key) to their nickname
/// (the value) so we can look up their nickname quickly
/// </summary>
public Dictionary<string, string> NameToNicknameMap { get; }

We now can clearly understand that we are mapping from a name string to a nickname string. Therefore, our first lesson from this is good naming and good commenting can solve simple collection implicit typing problems.  In the real world, I often find that even this level of detail is lacking.  Developers often have an attitude that it is obvious what a collection contains, even though we’ll forget ourselves by the time we return to the code at some future point.  It is important even in this simple case that we make it clear what the values we are handling are.  Naming and comments go a long time to improve the maintainability of the code. 

The important take away from this part of our series is that even simple collections require clarifying documentation.  Without clarifying documentation they can be confusing, cause developers to guess at their meaning, or dig through other code to understand them.

In the next part we will talk about adding complexity to the information we want available in our data structure and the implications on clarity that occur when we add complexity.

Tuesday, November 16, 2010

WPF Themes - Ensuring Hot Keys Display properly

In going through the themes in the WPF Themes project on CodePlex I've encountered many inconsistencies.  Sometimes these inconsistencies are annoying and other times they require much more complicated fixes.  All these issues are instructive because they teach us things to be aware of when building our own themes or even our own templates.


I've written about some of the more complex issues I've encountered in the past.  Today I'm just going to talk about one of the more simple problems.  Some themes in the WPF Themes do not properly display the access keys.  The underscore used to indicate the key was being displayed instead of removed.  This is a simple problem to fix.  ContentPresenters have a property RecognizesAccessKey.  If this property is not set, it defaults to false and the access keys will not work.  Writing RecognizesAccessKey="True" will make the access keys work properly.


I would have preferred the default for this property to true.  In the general case I want the access keys to be active.  An even better solution would be if there was a global way to specify the default for it.  Using access keys is usually what you want to be application wide and it'd eliminate little presentation bugs to if this was possible. Its easy to forget to set this property on a ContentPresenter which can manifest in hard to find locations.  Its an easy problem to fix, but annoying when you discover it.

Tuesday, November 9, 2010

Neat Way to Write a Read-Only Extendable List

The other day one of my coworkers wanted to to do the following:

  • Have a base class that has a property that returns a list
  • The list should be immutable
  • The list will contain a list of items to exclude elsewhere in the code
  • Sub-classes should be able to add items to the list but not remove them

One way to implement this would be something like the following:


public class Base
{
    private  IList<string> _excludeList = (new List<string>(){ "a", "b", "c" }).AsReadOnly();
    public virtual IList<string> ExcludeList
    {
        get { return _excludeList; }
    }
}


public class Subclass : Base
{


    public override IList<string> ExcludeList
    {
        get
        {
            List<string> excludeList = new List<string>(base.ExcludeList);
            excludeList.Add("d");
            return excludeList.AsReadOnly();
        }
    }
}


My inclination was that this might be inefficient because of the  repeated creation of the list.  Let's examine a couple pieces of this code and see if we can clean it up.  The first is we use a list because the framing of the question lead us there.  We just want to be able to iterate over the list, how we accomplish that is unimportant, therefore List is not the type we want.  There really is no reason that this needs to be something other than an IEnumerable.  If we return an IEnumerable rather than a list we can use the yield keyword to do some fancy rewriting.  We also are no longer creating new collections every time we call the subclass version:



public class Base
{
     private IList<string> _excludeList = (new List<string>() { "a", "b", "c" }).AsReadOnly();
     public virtual IEnumerable<string> ExcludeList
     {
         get { return _excludeList; }
     }
}


public class Subclass : Base
{


    public override IEnumerable<string> ExcludeList
    {
        get
        {
            foreach (string s in base.ExcludeList)
            {
                yield return s;
            }


            yield return "d";
        }
    }
}


We now are avoiding creation of new lists and the way it works is fairly clean.  The unfortunate part of this is that its not very efficient in the large case.  In talking about this I originally thought this would be a more efficient way of doing meeting our requirements because there would be fewer lists being created.  My quick performance testing showed quite the opposite.  In the small cases the difference in performance is negligible; I could not get a consistent result of one being quicker than the other.  In the case where I make the initial list "a" - "z" and the added items "aa" - "mm" the performance difference was significant.  The rewritten version took about twice as long on average.  I did not test a small initial list and a long additional list nor did I test a long initial list and a short additional list.  The test case I ran proved to me that my theory about it being more efficient was wrong so I saw no reason to continue.


Since the performance of this method turns out to be poor this remains nothing more than an interesting way to use the yield operator.



Friday, November 5, 2010

The Art of Commenting Code

When I first started programming I was horrible with comments.  I honestly didn't write them because I didn't see the point.  The reason I did not do it was because I was never forced to maintain any of my code.  In school we often wrote little programs that were a couple hundred lines long that was discarded after each assignment.  Despite my teacher's best efforts it never struck me as to why comments are necessary.  I would comment my code when forced to by my teacher.  This lead to inane commenting like the following:

//Assign 5 to foo
int foo = 5;

If anything my commenting here has hurt the readability of the code because it clutters it with needless information.  If you know the language, you know what an assignment statement looks like, making the comment unnecessary.  It was only once I started working on an J2EE application with approximately 200,000 lines of code that I understood why you comment.  After much thinking about it I've come to the conclusion that you comment to justify the code.  What I mean is if someone reading your code was to ask "Why does this code exist?" the comment should provide the answer.  My comments tend to be long winded compared to most other developers I've encountered, because of this approach.  I often make statements about the origin or the history of a piece of code to give context.  I also believe in making the comment conversational or tell a story.  This makes it both easier to read, more fun to read, and simplifies the communication.  I'll write something like the following:

//The item has to be added to the collection before the call to the next function
//because the control throws an exception if the item is not in the collection
//yet.  It'd be better if we could do it later on but we are limited by the existing
//structure.  If we refactor the control to use WPF we should revisit this.

This of course is completely contrived and doesn't relate to any real code.  I am trying to demonstrate what I believe is a good commenting style. I am doing the following things:
  • I justify the positioning of the code and why it exists there.  
    • This lets a future developer (or which might be me) know why the code is set-up as it is and that the order has a logic to it and what that is
    • It also prevents a developer who is refactoring the code from unwittingly creating an error.  I hate when I move something that appears to have no effect on the logic but then a strange edge case appears that causes an error.  Even worse is when you talk to the original developer and they knew about the error potential but did not document it
  • I also acknowledge any short comings.  
    • This is partly me protecting myself from future developers being to angry at me.  I cannot count the number of times I've read someone's code and just questioned their sanity for coding something in a certain manner.  There might have been logical justification for doing it that way and if I understood that I probably would accept it.  I still may dislike the code but at least I now have context
    • This also provides information to someone who want to improve the code in the future.  At some point in the future someone will be changing that code.  There may be a way to perform the operation in a better manner.  If that person may be less experienced with the code base or less experienced in general they may just assume that this is a good piece of code.
    • It raises a do not copy and paste this flag for people.  Nothing is worse than seeing a poor piece of code copy and pasted.  Inexperienced developers make this mistake often.  They know a piece of code works but they don't understand it.  They therefore copy, paste it, and never examine it closely.  A comment that acknowledges short comings encourages a deeper level of thinking.
  • Provide an upgrade path
    • If there is some point in the future that the code be changed to remove the short comings and I know what that point is I mention it.  I don't necessarily think that those short comings will be removed right away when that upgrade point change occurs but it provides a reference point to make the change.
Its also important to remember that comments are how future developers will be able to understand what the code does and why it does it.  Reading comments is one way you can perform archeology on code to understand it.  You therefore should leave behind artifacts for the future archeologists.  Code  doesn't exist in a vacuum.  We use many other tools like requirements and defect tracking systems to help us manage development.  We stage implementation in to releases.  We have meetings to discuss functionality.  We write using architectures and designs.  All of these are important to the code, but rarely are they referenced in a meaningful way.  Providing well placed linage to the outside materials can provide a wealth of information with the minimal actual writing.


It is impossible to link every piece of code to meaningful external references, but sometimes providing that link is vital.  The easiest link to provide is to requirements management systems and defect tracking systems that give unique identifiers that can be easily referenced.  If a change for a requirement or a defect is fairly esoteric and isolated to a small code section  I will often include the ID of the requirement or the ID in the comments for the code.  Sure, a tool like subversion can be used to gain the same information, but that requires an additional step that most people will not undertake.  It also gives me traceability back to the defect tracking system, which occasionally comes in handy on its own.  Defects and requirements are not the only links that can be put in comments.  If you keep meeting minutes or a record of conversations you can provide a link by putting in the date of the conversation that was the basis for the change.  Architecture and design documents can be referenced in similar manners.  Instead of trying to reexplain in detail why something works the way it does, reference the existing documentation.  If a change significantly changes behavior starting at a certain version number, it doesn't hurt to mention that with that version number the functionality has changed.


You can go overboard with this to the point that it becomes annoying.  Just like the first example of commenting an assignment statement, you don't want to make it so the comments lose meaning.  For example I don't link to a defect number if the defect occurs because of a null reference that is avoided by a simple null check.  Generally, I add the link and level of detail if

  •  The change required significant archeology on my part
  •  If a fix for a defect was non-intuitive
  •  It seems likely that similar changes will need to be made in the future
  • The way something was functioned was specifically prescribed by a meeting or a document
These are not hard and fast rules to follow, but more guidelines.  I do not use the previous list as a checklist.  In fact I had to reverse engineer when I actually do add links to outside information.  It all comes down to a feel for the code and when its useful.  You do not want the linkage to become pedantic, but want to provide others and yourself tools for the future.  After all, you are as likely to be helping yourself as you are to helping others. Numerous times I have had to return to a piece of code I wrote and had to perform archeology on it because I couldn't remember writing it.  We always feel we will remember the code we write.  For the most part that just can't be true because of the sheer volume of code we write.  It will save you time an energy in the long run to write good comments.  Nobody really debates the merits of comments, so I won't continue down that line of thought. I do think the quality of comments is extremely important.  I hope I've given you some thoughts on how to write good comments.



Monday, November 1, 2010

WPF Gird Causing Memory Leak

There is a fundamental problem in WPF that can cause a memory leak in applications using it.  The problem is that a ColumnCollection in a Grid can become pinned in memory. It is not in all cases but in a specific case in the application I work on.  I am not 100% about what in the structure caused it but there are 3 separate potential contributing culprits that I've identified.  Without extensive testing of the different scenarios I won't be able to identify exactly what the cause was.  The three potential contributing culprits are:

1.     User control inside a DataTemplate with a Grid.  It does not matter how deep in the control the grid was in the control all Grids were getting caught
2.     Binding column widths from a parent grid to an inner grid through a border control.  This was done to emulate a grid like behavior but allow for multiple control types within the grid
3.     Binding and unbinding the collection view source every time the source of the observable collection changed. 
In trying to fix this problem it seemed as if it was related to the DataTemplate but it is unclear exactly what caused the issue. 
The actual leak came from the pinning of the Grid.ColumnCollection.  The pinning of that caused all the associated control to be pinned as well.  The Grid.RowCollection did not seem to have the same issue.  This was still an issue even if there were no columns defined.  It also occurred if a Grid was used inside a control from the same assembly.  If the Grid was anywhere inside a user defined control in the same assembly, at any level, the issue would occur, even if it was in another user control.  Crossing the assembly boundary appears to make this not occur. 
To fix the issue I replaced the Grids in the affected controls.  The grids were replaced with a combination of StackPanels and DockPanels which can be used to replicate the grid.  In this case they were actually more appropriate than using a Grid anyway. 
The lesson from this appears to be that the use of a Grid over other LayoutPanel types should be carefully considered.  If memory leaks start to appear with the Grid it needs to be changed to another Layout type.