Welcome to Sign in | Join | Help

Microsoft BRE: Fact Identity in the Microsoft Rules Engine, or how the author chased a non-existant bug

I recently had reason to revisit the exact mechanisms Microsoft use when you assert facts to the Microsoft Rules Engine.   I was discussing stuff on-line with a fellow rules enthusiast when a terrible thought occurred to me.   Can the MS BRE always uniquely identify each different fact, or is there a chance that sometimes it might confuse two facts with each other?

For a couple of days, I was convinced that I had stumbled on a significant bug.   Indeed, at one point, I thought there was such a serious problem that I would have to recommend to my company that we desist from any further use of Microsoft’s rules engine.   Melodramatic, huh!    Fortunately, after a bit more research, I discovered that I was quite wrong.   The MS BRE does not suffer from a terminal flaw and can be trusted to always distinguish correctly between all your facts…except when…well, I’ll get onto that later.   It’s worth recording my suspicions, mistakes and eventual enlightenment.   There is something useful to learn, here, about the inner workings of the engine, and also about Microsoft’s implementation of the Hashtable class in .NET.   

The first point to make is that Microsoft uses the term ‘fact’ rather loosely in MS BRE.   A fact is a data item ‘asserted’ to the working memory of a rules engine.   The working memory, itself, is simply some mechanism for storing and retrieving facts, and is typically implemented as a set of node-specific collections (a rule set is represented as a node graph at runtime).   Microsoft makes extensive use of the non-generic Hashtable class in the MS BRE to implement node memories.   In most similar engines, a fact is always a simple data tuple – i.e., a collection of attribute values.   The attributes are usually named, though sometimes un-named and ordered.   It is also almost always the case that each tuple contains some kind to type identifier.    Many engines provide a metadata definition language which rule developers use to define tuple templates.   These definitions are called ‘deftemplates’ in several engines.

Microsoft does not support the direct equivalent of deftemplates, and ‘facts’ do not always convert neatly to single tuples.   The MS BRE philosophy is to apply set-based rule processing directly to data stored in .NET objects.   You can therefore assert any POCO (Plain Old CLR Object) to the engine as a fact.   In this case, a POCO is roughly equivalent to a tuple because it provides named properties (MS BRE does not access fields directly).   Of course, it also provides normal methods as well, and rules can exploit this by calling methods in order to obtain data values, evaluate predicates or invoke custom actions.

Microsoft provides four ‘Typed Fact’ classes for wrapping specific types of .Net object and asserting them to the engine.   These all implement the ITypedFact interface and are as follows:
TypedXmlDocument: wraps an XML document or node
TypedDataTable: wraps an ADO.NET DataTable
TypedDataRow: wraps an ADO.NET DataRow

DataConnection: wraps an ADO.NET DataSet and either SQL Server or OLE DB .NET data adapters, connections and transactions.   A DataConnection object is used to access a specific table in an external data source via the wrapped data connection.

A TypedDataRow is the closest equivalent to a data tuple.   The other Typed Fact classes are used to instantiate what Microsoft calls ‘facts’, but are, in reality, sources of multiple facts.    Once asserted, the engine uses information embedded in the rules in the currently executing policy to internally select and assert the required facts from these fact sources.   For example, it uses XPaths to select node sets from XML documents, and then uses additional XPaths to access individual ‘fields’ within these nodes.

The important thing to note here is that you only ever assert .NET objects to the engine.   In some cases the engine automatically asserts additional .NET objects internally.   MS BRE must be able to distinguish between all these objects unambiguously.   It must be able to tell when two objects represent two different facts and when they reference the same fact.   The only way this can be done generically across all asserted facts is to use some notion of object identity.

Unfortunately, object identity is a rather slippery subject in .NET.   There is a good reason for this.   In un-managed code, a compiler will normally distinguish between two different objects by using their references.   Each object reference maps directly to a memory address at which the object state is located.   The exact layout of the state varies from compiler to compiler and may depend of whether the object has a vTable or equivalent.   The important point, however, is that once created, an object does not generally move.   Hence, references stay valid over time.

Not so in the CLR (the .NET Common Language Runtime).   The runtime environment provides automatic garbage collection and memory management.   One consequence of this is that objects move.   They cannot be guaranteed to remain at the same memory location for any significant period of time.   Hence, you cannot depend on an underlying memory reference to provide object identity except in a very limited sense.   If you want to test two objects to see if they are identical, you can test their underlying references at a point in time (or, at least, an approximation to a point in time).   If the references are equal, the objects are identical.   Microsoft supports this via the static ReferenceEquals() method of SystemObject.

In itself, this is simply not adequate to meet MS BREs needs.   MS BRE stores fact objects in its working memory over a period of time, and needs to be able to distinguish between existing and new facts unambiguously.   Each time you assert a fact to the engine, MS BRE needs to know if this is a new fact of which it has prior knowledge, or if you are reasserting an existing fact.   Hence the engine needs some reliable form of identity that persists over time.    We might consider using the values of a field in each fact as unique identifiers.    A POCO, though, can contain any data, and there is no guarantee that two entirely different POCOs might not contain identical data.   In any case, you would have to impose some kind of contract on your POCOs…meaning that they are not really POCOs anymore from the engine’s perspective.

Most .NET developers are aware of the GetHashCode() method.    GetHashCode() is inherited from System.Object, and therefore is a member of all .NET objects.   It is also callable on value types.    Microsoft provides two default implementations of this method for reference and value types.   For value types (structs, in C#), the hash code is provided by calling GetHashCode() of the first field in the value type.   Hence, not all fields in a struct are equal.   The first one has a special role.   For reference types, the default implementation in .NET 2.0 is a randomly-generated 32 bit signed integer.

The hash code is often confused for an object identifier.   In many cases, it is almost good enough to be used as such, but not quite.   Hash codes, according to Microsoft’s guidance, should be associated with the ‘value’ of an object.   This is the value which is tested for equality when you use the Equals() method.   If two objects evaluate as equal, they should both return the same hash code.   Of course, the ‘value’ of an object may change over time.   For example, a string object’s value is the text associated with that object, and the hash code is derived algorithmically from that text.   If you change the value of a string, the hash code generally changes as well.   The default implementation of GetHashCode() generates a number randomly which is then associated with the object by the runtime and returned on future calls.   Hence, it is almost like an object reference.   Almost, but not quite.   The fundamental problem is that there is a small chance that two entirely different objects could, by chance, have identical hash codes.   The default implementation of GetHashCode() does not guarantee uniqueness of hash codes.

One of the reasons I suspected there might be bug in MS BRE is that the implementation of GethashCode() in System.Object was different in .NET 1.x.   In the earlier version of the framework, Microsoft returned the index of the object’s sync block as a 32-bit positive integer.   A sync block is a structure used to manage thread synchronisation, and the index value is incremented each time a sync block is created.   If you called GetHashCode(), and there was no sync block, the object would create one internally in order to obtain a hash code.   Because the indexes were incremented, and because (I believe) index numbers were recycled only when objects were garbage collected, there was no chance that two hash codes would collide.     However, this was never part of the hash code contract, and Microsoft, from day one, reserved the right to change this behaviour.

No doubt you can begin to see why I entertained suspicions about MS BRE.   MS BRE was originally written for .NET 1.x, and apart from a few bug fixes and minor enhancements, it has hardly changed in the .NET 2.0 version.   When I used Reflector to inspect how Microsoft handles fact identity, I realised that the approach is based on calls to GetHashCode().   It therefore occurred to me that Microsoft might have incorrectly depended on the coincidental nature of the default implementation of GetHashCode in .NET 1.x, and not realised that this behaviour had changed in .Net 2.0.   I suspected that, because hash codes can collide in .NET 2.0, facts might sometimes be confused.   The engine might think you had reasserted an existing fact when you had actually asserted a new fact.

I needn’t have worried.   The issue is robustly dealt with within MS BRE by using the Hashtable class.   I had a reasonable idea of how this class works, but there was one detail of which I was not aware, although in hindsight it appears blindingly obvious that this is the only sensible way in which Microsoft could implement this class.   I’m sure many developers are aware of this feature, although I haven’t yet come across any detailed discussion or documentation that describes the exact behaviour.

When you assert a fact to the engine, its type is tested, and it is handed off to a ‘class node’.   This is a type-specific node object in the Rete.   A Rete is a node graph that provides an efficient and executable run-time representation of a rule set.    Class nodes first test to see if a fact is already stored in working memory.   A class node implements its working memory as a simple hashtable, and it uses the object itself as the key value.   It tests the hashtable to see if the object already exists in the collection.   If the hashtable determines that the fact is not in the memory, it wraps it in an instance of a class called ‘WorkingMemoryElement’ (WME), and then adds the WME to the hashtable.   Incidentally, the WorkingMemoryElement class overrides GetHashCode() in order to delegate to the GetHashCode() method of the fact it wraps.    If the hashtable already contains the fact, the engine assumes that you are re-asserting an existing fact.   It retracts the existing fact, and then asserts the new version of the fact.

Now, as I say, it is blindingly obvious with the benefit of hindsight that this must be implemented as a robust mechanism.   After all, imagine what life would be like if, when you exploit a .NET 2.0 hashtable that uses objects as keys, it sometimes fails to distinguish between the objects it stores.   This would be terrible, and we would all have heard about it.   It would break huge amounts of code causing strange, arbitrary and infrequent exceptions.   The trouble is that I knew just enough to know that, when you use an object as a key, the hashtable calls GetHashCode() internally.   You can see why I was worried.

Hashtables compute an internal hash code by calling GetHashCode() and ensuring that the negation bit is set to 0.   This is necessary because the negation bit (the first high-order bit in the 32-bit integer) is used for collision detection purposes.   Hashtables detect hash code collisions and use an approach called ‘rehashing’ to deal with this.

When the MS BRE tests for an existing fact, the hashtable calls GetHashCode() on the fact and computes the internal hash code.   It then looks this hashcode up in the collection to see if it is already used for a given fact.   If it isn’t, the hashtable knows unambiguously that this is a new fact.   However, if it does find the hash value, it cannot be certain if it already contains that fact, or if a collision has occurred.   This brings us to the feature of which I was unaware.   Very simply, if it finds an existing item with the hash code, the hashtable performs an additional test.   It tests the equality of the fact object, using the instance Equals() method, against the object it already contains (the one with the same hash value).   For performance purposes, it only performs this additional test when needed.

The default implementation of the Equals() method, inherited by all reference types from System.Object, internally does a ReferenceEquals() check to discover, unambiguously, if the two objects are actually identical.   So, if your facts use the default implementation of Equals(), the hashtable will always correctly distinguish one object from another.   The ‘TypedFact’  wrapper classes all use the default implementation, and your POCO facts will also use the default implementation unless you have overridden it, or are using a class that overrides this method.

In conclusion, therefore, we can see that MS BRE safely handles fact identity as long as you never assert .NET objects that have a poor or incorrect implementation of Equals().   For a long time, I have wondered if I ought to recommend overriding GetHashCode() and Equals() on fact classes.   My conclusion is that I should definitely recommend that, unless you have a very good reason to do so, you should NOT override these methods.   For safety, use the default implementations, or existing overrides on classes you trust (like ‘System.String’, for example).    If you do have a reason to provide overrides, make sure you fully understand the characteristics of the code that you implement and ensure that you code conforms rigorously to the ‘rules’ specified by Microsoft.   If you don’t, you may well live to regret it.   And, unlike me, don’t worry…Microsoft developers really do have some idea (generally) of what they are doing J

Published 04 April 2007 23:24 by CharlesYoung

Comments

No Comments

Anonymous comments are disabled