Pages

Tuesday, May 19, 2009

Garbage Collector Internal Mechanism

In Dot Net Programming, developers don't often do garbage collection on their own, because they rely on the features provided by dot net for garbage collection. A common usage of garbage collection in dot net is when a programmer explicitly invokes GC.Collect() method. The presence of Garbage Collector frees the programmer of any worries about dangling data.

Let us start with the cause which led us to the effect of garbage collection in Dot Net. We know necessity is the mother of all inventions. So what was the necessity to invent garbage collection?

Why Garbage Collector?

Those people who have been writing code in unmanaged environment had to explicitly release the memory when they had created a new reference to any variable, they had to free it up again in their own code. Failure in doing so resulted in un expected behavior of system (bugs) or memory leakages (the data that has been allocated by a program and has not been de-allocated later in the program -- This is what we call "Garbage Data"). The bugs may be due to you are trying to access some part of memory that has already been freed by your program earlier.

What is Garbage Collector?

Garbage Collector normally runs in a low-priority thread and checks for unreferenced dynamically allocated memory space. If it finds some data that is no longer being referenced by any variable/reference, it re-claims it and returns the occupied memory back to the operating system, so that it can be used by any other program. The garbage collector is optimized to perform the memory free-up at the best time based upon the allocations being made.

How Garbage Collector Works

Every application has a set of roots. Roots identify storage locations, which refer to objects on the managed heap or to objects that are set to null.

For example:
  • All the global and static object pointers in an application.
  • Any local variable/parameter object pointers on a thread's stack.
  • Any CPU registers containing pointers to objects in the managed heap.
  • Pointers to the objects from Freachable queue
The list of active roots is maintained by the just-in-time (JIT) compiler and common language runtime (CLR), and is made accessible to the garbage collector's algorithm..

Garbage Collection in dot net is done using tracing collection and specifically the CLR implements the mark/compact collector. This is done in two phases:

In the first phase, When the garbage collector starts running, it makes the assumption that all objects in the heap are garbage. the garbage collector then checks for all the roots and live references in the application and then it starts walking the roots and start making a graph of all the references it finds in the application, avoiding any circular references found in the application. Thus, identifying all the live objects/references in the application. All the objects except the objects in this graph are identified as dead objects and are therefore considered as garbage data and are thus MARKED.

In the second phase, the GC moves all of the live objects/references down the heap, and re assigns the pointers so that the pointers now (again) point to the correct locations. This is the phase which is called COMPACTING the data. After all garbage data has been MARKED and all live references have been COMPACTED, the garbage collector now point just after the last object in the managed heap, so that if the program requires any new memory, it would allocate the memory on this position.

Finalization:

The garbage collector offers an additional feature that you may want to take advantage of: finalization. Finalization allows a resource to gracefully clean up after itself when it is being collected. By default the garbage collector frees up only the memory allocated for managed resources, not the memory allocated for unmanaged resources like if, someone has used a file in his/her program then he/she will have to deallocate the memory explicitly. For this purpose, one has to override the Finalize() method of the class. The Finalize method of the class in dot net is not anyway like the destructor in C++. One need to keep this in mind that the destructor in dot net translates to Finalize() when the code is compiled into IL. Thus, the behavior of destructor is never achievable in dot net.

In such a scenario, what will the GC do? Once the Garbage Collector has identified that the object is garbage, it would execute its Finallize() method and would promote it to next generation. (I will discuss the Generations and Promotions in another article here), for now it would be enough to keep in mind a brief detail about generations. Briefly, a generation is a mechanism implemented by the garbage collector in order to improve performance. The idea is that newly created objects are part of a young generation, and objects created early in the application's lifecycle are in an old generation. Separating objects into generations can allow the garbage collector to collect specific generations instead of collecting all objects in the managed heap.

When designing objects it is best not to use the Finalize method. There are several reasons for this:
  • Finalizable objects get promoted to older generations, which increases memory pressure and prevents the object's memory from being collected when the garbage collector determines the object is garbage. In addition, all objects referred to directly or indirectly by this object get promoted as well.
  • Finalizable objects take longer to allocate, because when we allocate an object that is finalizable as well then the object is supposed to not only allocate memory for itself but also make an entry in the Finalization Queue, from which GC checks whether the object is finalizable or not.
  • Forcing the garbage collector to execute a Finalize method can significantly hurt performance. Remember, each object is finalized. So if I have an array of 10,000 objects, each object must have its Finalize method called.
  • Finalizable objects may refer to other (non-finalizable) objects, prolonging their lifetime unnecessarily. In fact, you might want to consider breaking a type into two different types: a lightweight type with a Finalize method that doesn't refer to any other objects, and a separate type without a Finalize method that does refer to other objects.
  • You have no control over when the Finalize method will execute. The object may hold on to resources until the next time the garbage collector runs.
  • When an application terminates, some objects are still reachable and will not have their Finalize method called. This can happen if background threads are using the objects or if objects are created during application shutdown or AppDomain unloading. In addition, by default, Finalize methods are not called for unreachable objects when an application exits so that the application may terminate quickly. Of course, all operating system resources will be reclaimed, but any objects in the managed heap are not able to clean up gracefully. You can change this default behavior by calling the System.GC type's RequestFinalizeOnShutdown method. However, you should use this method with care since calling it means that your type is controlling a policy for the entire application.
  • The runtime doesn't make any guarantees as to the order in which Finalize methods are called. For example, let's say there is an object that contains a pointer to an inner object. The garbage collector has detected that both objects are garbage. Furthermore, say that the inner object's Finalize method gets called first. Now, the outer object's Finalize method is allowed to access the inner object and call methods on it, but the inner object has been finalized and the results may be unpredictable. For this reason, it is strongly recommended that Finalize methods not access any inner, member objects.
If you have no other escape and you want to implement the Finalize method in your class then make sure that your Finalize method executes quickly and does not acquire any more resources and it should not do any action that would block Finalize method, like any thread synchronization problem. Also, if you let any exception escape the Finalize method the system would assume that Finalize method has returned without any problem and continue Finalization of other objects.

Important Questions:

What is Finalization Queue?
When an application creates a new object, the new operator allocates the memory from the heap. If the object's type contains a Finalize method, then a pointer to the object is placed on the finalization queue. The finalization queue is an internal data structure controlled by the garbage collector. Each entry in the queue points to an object that should have its Finalize method called before the object's memory can be reclaimed.


What is FReachable Queue?
When a GC occurs, objects B, E, G, H, I, and J are determined to be garbage. The garbage collector scans the finalization queue looking for pointers to these objects. When a pointer is found, the pointer is removed from the finalization queue and appended to the freachable queue (pronounced "F-reachable"). The freachable queue is another internal data structure controlled by the garbage collector. Each pointer in the freachable queue identifies an object that is ready to have its Finalize method called.

Who is responsible for calling Finalize Method?
There is a special runtime thread dedicated to calling Finalize methods. When the freachable queue is empty (which is usually the case), this thread sleeps. But when entries appear, this thread wakes, removes each entry from the queue, and calls each object's Finalize method. Because of this, you should not execute any code in a Finalize method that makes any assumption about the thread that's executing the code. For example, avoid accessing thread local storage in the Finalize method.


What is Resurrection?
When an application is no longer accessing a live object, the garbage collector considers the object to be dead. However, if the object requires finalization, the object is considered live again until it is actually finalized, and then it is permanently dead. In other words, an object requiring finalization dies, lives, and then dies again. This is a very interesting phenomenon called resurrection. Resurrection, as its name implies, allows an object to come back from the dead.