Steve Michelotti: Generics, Reference Types, Value Types, and Interview Questions

Generics are nothing new and have been part of .NET for over 5 years. Reference types and value types are core concepts in the CLR type system and these concepts have been the same since .NET was released – they are also nothing new. However, when I ask about these concepts during interviews, I often get a wide range in quality of the answers to these questions. If you get asked about these topics in an interview, be prepared to give great answers! You don't have to give a textbook perfect memorized definition – but make sure you show that you fully understand and can apply the concepts to real-world development.

Question: why are generics such a big deal? I typically get answers discussing greater type safety and better performance. OK, the type safety one is pretty easy and straightforward. We had the ArrayList in .NET 1:

ArrayList list = new ArrayList();
list.Add(1);
list.Add(2);
list.Add("hello"); // compiler doesn't help me avoid!

When generics were introduced, we could declare a strongly-typed list of int's to avoid this:

List<int> list = new List<int>();
list.Add(1);
list.Add(2);
list.Add("hello"); // compiler error!

So drilling into the second answer a little - why is performance better with generics? Often I hear, "you can avoid boxing and unboxing." OK great! So what specifically is boxing and unboxing? (This is where people often start to struggle.) Boxing is the act of converting a value type to a reference type. If we look at line #2 of the first code sample above, we're putting an Int32 (a value type) into an ArrayList which stores everything as System.Object (a reference type). Therefore, unboxing is the act of converting a reference type to a value type (e.g., if we were taking an item out of the ArrayList and having to cast: int num = (int)list[0];). So with boxing/unboxing, don't focus on casting or sub-classes (they're not directly relevant!) – focus on converting from reference types and values types.

Question: what's the difference between a reference type and a value type? Just focus on the basics: Reference types are stored in the heap (which means they are garbage collected) and value types are stored in the stack (cannot be allocated on the GC heap). Reference types can be null; value types cannot be null. Reference type variables do not contain the value – it has a pointer to its value. A value type variable contains the value itself.

Question: how do you know if a type is a reference type or a value type? Is a DateTime a reference type or a value type? A String? If you're creating your own Person data structure, how can you control whether it's a value type or a reference type? The short answer: a "class" is a reference type and a "struct" is a value type. If you "View Definition" in Visual Studio on a DateTime, for example, you'll see:

datetime

DateTime is a struct so it's a value type. System.String is a class so it's a reference type (plus, the fact that it can be null is also a tip off that it's a reference type). So if you create your own data structure as a class, it will be a reference type (as a struct, a value type).

For further reading on reference types, value types, and boxing/unboxing, have a look at this article by Jeffrey Richter written in December 2000. These are core concepts in the .NET type system that are still just as relevant today as they were 10 years ago.

Let's circle back to our original generics performance question. So far we have 3 assertions:

Generics result in better performance because you can avoid boxing/unboxing
Boxing/unboxing is the act of converting between reference types and value types
A "struct" is a value type (e.g., DateTime); a "class" is a reference type (e.g., String)

Question: given all three of these assertions, is performance really better with a generic List (string being a reference type) versus a non-generic list of strings? Answer: NO! Generics still provide plenty of benefit for these situations in terms of type-safety and allowing us to reduce noise in our code by avoiding having to cast objects (or code-gen objects if we want strongly-typed objects) – but a performance benefit is not on the list. However, if we're talking about List or List (value types) then here is a significant performance improvement. Not only is the run-time performance benefit significantly better because we can avoid the expensive boxing/unboxing operations, but we also avoid making the GC do extra work by having to collect these boxed objects that were just heap-based wrappers around what were originally value types.

In fact, let's say you use 5 different generic List in your application: List, List, List, List, List (where Foo and Bar are both reference types). What happens behind the scenes is that the JIT will actually produce 3 versions of the generic list. For each value type it will generate a totally strongly-typed version (so we'll have 1 for List and 1 for List). Then it will generate a single generic List whose type gets re-used for all reference types behind the scenes (so it will get used for List, List, List). But it is providing you the type-safety features along with allowing you to avoid all of the ugly casting in your code.

New .NET technologies come and go and as professional developers we are constantly working to learn and stay up-to-date on these new technologies. However, we also have to make sure we stay grounded in the fundamentals that .NET is based on. If you ever end up in an interview with me, I trust that you'll ace these questions. :) And, by the way, my company is hiring so if you're interested, please contact me!