Array, Collection, IEnumerable<T> Usage Guidelines

Please read this first: About Dave’s ‘unofficial’ Framework Design Guidelines.

 

þ DO treat a null reference (Nothing in Visual Basic and nullptr in C++/CLI) and an empty collection or IEnumerable<T> as equivalent.

For example, the following constructor treats both a null an empty IEnumerable<string> instance similar; they both result in an empty StringCollection.

public class StringCollection : Collection<string>
{
    public StringCollection(IEnumerable<string> items)
    {
        if (items != null) // Correct
        {
            foreach (string item in items)
            {
                Add(item);
            }
        }
    }
}

ý DO NOT return a null reference (Nothing in Visual Basic and nullptr in C++/CLI) from an array, collection or IEnumerable<T> property or method; return an empty instance instead.

Similar to string properties, consumers of your class expect to be to enumerate over the returned enumerable instance without also needing to check for null.

For example, in the following code, performing a foreach over the returned Collection<T> from Directory.GetFiles should never cause a NullReferenceException to be thrown.

void PrintDirectory(string path)
{
    Collection<string> fileNames = Directory.GetFiles(path);

    foreach (string fileName in fileNames)
    {
        Console.WriteLine(fileName);
    }
}

þ DO throw ArgumentException if an element in collection or IEnumerable<T> argument contains a null reference (Nothing in Visual Basic and nullptr in C++/CLI) and null is not a valid value for an element. Do not filter or skip over these values.

For example, the following method correctly throws ArgumentException when the passed types IEnumerable<Type> instance contains a null element.

public void RegisterTypes(params Type[] types)
{
    if (types == null)
        throw new ArgumentNullException("types");

    if (types.Length == 0)
        throw new ArgumentException("'types' cannot be empty.");

    foreach (string item in items)
    {
        if (item == null)
            throw new ArgumentException("'items' cannot contain a null element", "items"); // Correct

        [...]
    }
}

Note that ArgumentException is thrown above for the null element instead of ArgumentNullException. Throwing the later would be misleading and confusing to the user; the problem isn't that items is null, but rather one of its elements.

 

þ CONSIDER providing an params array based overload in addition to an IEnumerable<T> based overload so that user can pass elements without having to explicitly create a temporary array.

For example, the following class has two constructors, one taking an array, and one taking an IEnumerable<T>.

public class TypeCollection : Collection<string>
{
    public TypeCollection(params Type[] types)      // Correct 
        : this((IEnumerable<Type>)types)
    {
    }

    public TypeCollection(IEnumerable<Type> types)
    {
        [...]
    }
}

þ CONSIDER providing an IEnumerable<T> overload in addition to an array based overload. This allows users to pass arguments typed as other enumerable sources other than arrays, such as List<T>.

Instances typed as IEnumerable<T> are common when calling LINQ-based extension methods such as Enumerable.Select and Enumerable.Where. Adding an overload that takes IEnumerable<T> saves the user from having to manually convert an instance returned from one of these methods to an array just to pass it to the method or constructor.

 

ý AVOID looping over IEnumerable<T> arguments multiple times.  IEnumerable<T> instances returned from LINQ-based and yield generated methods are not cached underneath, which could cause work done in the enumerator to be repeated multiple times if an argument is iterated more than once. This could lead to unexpected behavior and performance problems.

For example, the following shows the pitfalls of looping over IEnumerable<T> instance multiple times.

public void DeleteFiles(IEnumerable<string> fileNames)
{
    if (fileNames == null)
        throw new ArgumentNullException("fileNames");

    if (fileNames.Count() == 0)             // 1st Loop
        throw new ArgumentException("'fileNames' cannot be empty.");

    foreach (string fileName in fileName)   // 2nd Loop
    {
        [...]
    }
}

This could cause a performance problem given the method as follows:

void DeleteDocumentsWithBackups(string path)
{
    string[] fileNames = Directory.GetFiles(path, ".doc");
// Get every document that already has a backup IEnumerable<string> filesToDelete = fileNames.Select(fileName => File.Exists(fileName + ".bak")); DeleteFiles(filesToDelete); }

The lambda expression within the Enumerable.Select extension method will be executed twice for every file name found in the directory; once for Count and once for the foreach statement.

We hit a similar problem while developing the Managed Extensibility Framework. There was a particular method that returned an IEnumerable<T> (generated using the C# yield keyword), that underneath walked an assembly looking for types marked with a specific attribute. Trying to diagnose a performance problem, one of the developers on our team found that most of our time was spent in Reflection and it was quickly narrowed down to a particular method that was walking over the result from this method multiple times. Suffice to say, both methods were changed; the first to cut the loops down to one and the second to cache the results of the search underneath.

 

þ DO make a copy of an array or IEnumerable<T> argument if outside modification is not expected. As arrays and IEnumerable<T> instances can be modified after they have been passed to a method, make a clone of the argument if storing it and you want to prevent outside manipulation.

The following example uses the LINQ method, Enumerable.ToArray, to make a copy of the types argument.

public class TypeCollection : ICollection<Type>
{
    private readonly Type[] _types;

    public TypeCollection(params Type[] types)
        : this((IEnumerable<Type>)types)
    {
    }

    public TypeCollection(IEnumerable<Type> types)
    {
        if (types != null)
        {
            // Make a copy
            _types = types.ToArray();       // Correct
        }
    }
}

Note, even though the types argument might be an array (such as when the first constructor overload is called), Enumerable.ToArray will still make a copy of it and will not return the original array.

 

Related Framework Design Guidelines Section: 8.3 Collections

Published Saturday, December 13, 2008 7:00 AM by David Kean

Comments

Saturday, December 13, 2008 8:43 AM by Steven

# re: Array, Collection, IEnumerable<T> Usage Guidelines

Great guidelines Dave! I wholeheartedly agree with you on this.

I think the "Do treat a null reference and an empty collection as equivalent" guideline is an interesting one. I developed a precondition framework (www.codeplex.com/conditions) and it excepts statements as "items.Requires().IsNotEmpty().DoesNotContain(null);". The framework throws an ArgumentException or ArgumentNullException when the given collection is empty (or null). I decided to treat null reference and empty collections equal. However, I found out (after doing a little 'usability study' among other developers) that everybody had a different opinion about this. Some developers though that the statement "items.Requires().IsEmpty()" should succeed when 'items' was null, because they reasoned that null was different from an empty collection. I'm very interested to your opinion in this.

Saturday, December 13, 2008 8:44 AM by Jeff Moser

# re: Array, Collection, IEnumerable<T> Usage Guidelines

Good advice! Keep them coming.

I've enjoyed FDG for the past few years and recently read the second edition. I think it'd be interesting to have you or a group of people at Microsoft maintain a list of "beta" guidelines that people could vote on. At the very least, it'd be a single archive. If it had annotations, it'd be even better.

The list could be forward looking (e.g. "DO use code contracts" for .NET 4.0) as well as additional guidelines like the ones presented here.

Keep up the good work!

Saturday, December 13, 2008 11:51 AM by Oran

# re: Array, Collection, IEnumerable<T> Usage Guidelines

I really wish more APIs took IEnumerable<T>.  I agree with all of these except for "DO throw ArgumentException if an element in collection or IEnumerable<T> argument contains a null reference (Nothing in Visual Basic and nullptr in C++/CLI) and null is not a valid value for an element. Do not filter or skip over these values."

The best example of why it's useful to filter and skip over null values is the constructor for XElement.

The documentation topic "Valid Content of XElement and XDocument Objects" msdn.microsoft.com/.../bb943882.aspx says:

"If content is null, nothing is added. When passing a collection items in the collection can be null. A null item in the collection has no effect on the tree."

So why is this hugely useful?  As it says at the beginning of the topic:

"Queries often evaluate to IEnumerable<(Of <(T>)>) of XElement or IEnumerable<(Of <(T>)>) of XAttribute. You can pass collections of XElement or XAttribute objects to the XElement constructor. Therefore, it is convenient to pass the results of a query as content into methods and constructors that you use to populate XML trees."

This lets you use functional construction without having to worry and check whether the results of queries are null or not, and lets you nest the functional construction as arbitrarily deep as you want.  I really like the declarative programming style this enables.

Friday, December 19, 2008 1:04 PM by David Nelson

# re: Array, Collection, IEnumerable<T> Usage Guidelines

@Oran,

I don't see why its a good thing that nulls are ignored in XElement. Even if you are passing a collection that is the result of another query, there is no reason for that query to contain nulls. Under what circumstances would you reasonably end up with a collection of items with a null in the middle? Usually this indicates a bug; something was supposed to be there but was not included correctly. That's why it is a good idea for APIs to throw instead of ignoring nulls: it helps you find the bugs in your code.

Sunday, December 21, 2008 2:47 PM by #.think.in

# #.think.in infoDose #12 (15th Dec - 19th Dec)

#.think.in infoDose #12 (15th Dec - 19th Dec)

Sunday, December 21, 2008 4:56 PM by Code Monkey Labs

# Weekly Web Nuggets #43

Pick of the Week: Hardware is Cheap, Programmers are Expensive General Array, Collection, & IEnumerable<T> Usage Guidelines : Dave Kean has some good guidelines for using the different collection classes. The Azure Service Bus : Good content

Thursday, December 25, 2008 12:48 AM by Oran

# re: Array, Collection, IEnumerable<T> Usage Guidelines

@David Nelson,

One example that comes to mind is when using FirstOrDefault() on a query result that may sometimes return no elements.  This will return null for any reference type.

Remember that the XElement constructor signature takes a params object[], enabling you to write:

new XElement("name", q1.FirstOrDefault(), q2, q3.FirstOrDefault(), q4, q5)

Everything after "name" gets turned into an object[], but the contents of that array are not the typical "well-formed" collection you describe.

Now if you imagine developers using this same style to pre-package ad-hoc arrays of elements and passing those as parameters into XElement, you can hopefully see why it's useful to filter nulls.

Functional construction isn't a traditional C# programming style, but I find it quite useful and attractive.

Friday, December 26, 2008 11:15 AM by David Nelson

# re: Array, Collection, IEnumerable<T> Usage Guidelines

I agree that functional construction can be very useful, but functional does not necessarily equate to skipping nulls.

Needing to skip nulls in your example is really a result of having to use the params pattern in the first place, which isn't really the best way to do this. Imagine instead if the XElement constructor took an IEnumerable instead of a params object[]. Then you could build up your list of elements using Enumerable extension methods, including Take(1) instead of FirstOrDefault. The resulting element list would more accurately describe what you are trying to accomplish (in your example it looks like there are 5 elements, when in fact there could be anywhere from 3 to 5); and XElement wouldn't have to do any special processing for nulls.

Friday, December 26, 2008 6:53 PM by Oran

# re: Array, Collection, IEnumerable<T> Usage Guidelines

Hmm... sounds like you're arguing backward from your foregone conclusion that nulls ought not to exist in IEnumerables, even if that means writing unnecessary imperative shim code every time you work with XElements.

I don't want to new up some IEnumerable every time I want to throw more than one node into an XElement.

As far as accurately describing what I'm trying to accomplish, I would argue that XElement's declarative style does a better job of clearly showing intent than cluttering the picture with imperative shim code.

I know which kind of code I'd rather write, and the designers of XElement felt the same way.  But different strokes for different folks.

Minor nit-picks:

Take(1) is equivalent to First().

In my example there could really be anywhere from 0 to infinity elements.  Not that there's anything wrong with that. ;-)

The XElement constructor DOES take an IEnumerable.  As well as other things.  You are free to clutter up your code with as many unnecessary shim objects as you want.

I don't see how switching from a params object[] to an IEnumerable changes the nature of the null problem.  Using params object[] doesn't cause the presence or absence of nulls any more than using an IEnumerable.

Monday, January 05, 2009 3:38 PM by David Nelson

# re: Array, Collection, IEnumerable<T> Usage Guidelines

Take(1) is NOT equivalent to First(). The former returns an empty sequence when called on an empty sequence, while the latter throws an exception. That is why I explicitly used Take(1) in my comment. The point is to be able to (declaratively) compose a sequence which contains no null elements without adding imperative "shim code" everywhere.

"I don't want to new up some IEnumerable every time I want to throw more than one node into an XElement."

You are already newing an array every time, you just aren't doing it explicitly because C# does it for you. There is no reason why something similar couldn't be done with IEnumerable, and in fact the idea is being considered by the C# team, although there is no guarantee if or when they will do it.

The point is that using empty sequences rather than null elements allows null elements to be treated as bugs, which makes tracking bugs much easier. Therefore a solution based on IEnumerable and Take() is logically preferable to a parameter array solution; the only reason the parameter array is preferable right now is because of the lack of language support for the IEnumerable solution.

Saturday, January 24, 2009 6:40 PM by Oran

# re: Array, Collection, IEnumerable<T> Usage Guidelines

Whoops, you're right, Take(1) can return an empty sequence.

Another scenario that motivates accepting nulls is when you are using variables that can sometimes be null.  Consider a modification of my previous example:

new XElement("name", q1, e1, q2.FirstOrDefault(), e2, q3)

Imagine that e1 and e2 are each of type XElement and they may sometimes be null.  q1, q2, and q3 are queries.  How do you propose rewriting this so that XElement's constructor doesn't receive any nulls?  The solution to this is guaranteed to contain unnecessary shim code.  Even with a params IEnumerable feature in a future version of C#, I believe XElement's null-filtering approach is still preferable in this case.

Note that I don't take issue with any of Dave's other guidelines such as whether methods or properties should return nulls in IEnumerables.  This is because I believe Postel's Law applies here.  "Be conservative in what you do; be liberal in what you accept from others."

Still, I acknolwedge there may be cases where null elements should be treated as bugs.  If null elements truly are bugs, you can write:

if (list.Contains(null))

throw new ArgumentException("'list' cannot contain a null element", "list");

If you instead want to filter out nulls, you can easily replace that with:

list = list.Where(i => i != null);

In the first "nulls are bugs" case, every call site is forced to write extra code to filter nulls.  In the second case, no extra code is required and filtering occurs in exactly one spot.

Sunday, February 22, 2009 7:33 PM by Code Monkey Labs

# Weekly Web Nuggets #43

Pick of the Week: Hardware is Cheap, Programmers are Expensive General Array, Collection, & IEnumerable<T> Usage Guidelines : Dave Kean has some good guidelines for using the different collection classes. The Azure Service Bus : Good content

Tuesday, April 28, 2009 12:16 PM by David Kean

# re: Array, Collection, IEnumerable<T> Usage Guidelines

or use

return Enumerable.Empty<T>();