About me

Michael L Perry

Improving Enterprises

Principal Consultant

@michaellperry

User login

Historical Modeling makes offline data analysis easy

Because I built my Windows Phone application using Historical Modeling, I have been able to easily export user data to an analytical database. This is giving me insight into the way the application is used, and will lead to an improved user experience.

The HoneyDo List historical model

The HoneyDo List Windows Phone application was built using Correspondence, a collaboration framework based on the principle of Historical Modeling. Historical Modeling captures a history of facts, each one a decision made by a user. HoneyDo List defines facts like:

  • List – the user has create a task list
  • List__name – the user has updated the name of the task list
  • Task – the user has added a task to a list
  • TaskComplete – the user has completed a task
  • TaskCompleteUndo – the user has unchecked a task

Although the names of some of these facts are nouns, it is important to remember that these are not entities or objects. They are facts that capture the creation of these objects. By convention, we just drop the verb “Create” because it would get very wordy and confusing.

The relationships among these facts form a tree.

Model

Each of the arrows points to a predecessor of a fact. Think of this like a foreign key relationship. The red arrows are special. They indicate that the successor is published to its predecessor. Anybody subscribing to a List will see all of the related facts. This is what makes Correspondence a good collaboration framework.

The analytical database

My task was to analyze how the app was being used, so I can decide what features should be in the next version. The historical model captures a history of facts. This history is a detailed account of application usage. All I needed to do was to mine this data.

Correspondence was designed to make it easy to synchronize data across different data stores. Each client points to a distribution node, which publishes facts so that the clients can subscribe to the ones they care about. It has built-in support for several storage mechanisms, including isolated storage on the phone and SQL Server on the desktop. My first step in constructing an analytical database was to plug in the SQL Server storage mechanism and point it at the distribution node.

The SQL database that Correspondence creates is not suitable for data mining. It stores all facts in a single table, regardless of type. All of the fields are serialized to a binary column within that table. It is impossible to query such a structure for any useful analytics.

image

So the next step was to design a separate database with an application-specific schema. This database has separate tables for lists and tasks.

image

The data loader

The final step was to populate the analytical database from the historical database. This is where things really got fun.

Take a look at each of the facts in the historical model. It is a record of a decision that the user made. We can turn each fact into an insert, update, or delete operation. This is what the user would have done had they been working directly against the analytical database. We just write a method that maps each fact to the appropriate SQL.

   1: public void Transform(CorrespondenceFact nextFact, FactID nextFactId, Func<CorrespondenceFact, FactID> idOfFact)
   2: {
   3:     using (var session = new Session(_connectionString))
   4:     {
   5:         session.BeginTransaction();
   6:         if (nextFact is List)
   7:         {
   8:             Console.WriteLine(String.Format("Insert list {0}", nextFactId.key));
   9:             InsertList(session, nextFactId.key);
  10:         }
  11:         if (nextFact is List__name)
  12:         {
  13:             List__name listName = nextFact as List__name;
  14:             Console.WriteLine(String.Format("Update list name {0} {1}", idOfFact(listName.List).key, listName.Value));
  15:             UpdateList(session, idOfFact(listName.List).key, listName.Value);
  16:         }
  17:         //...
  18:         UpdateTimestamp(session, nextFactId);
  19:         session.Commit();
  20:     }
  21:     LastFactId = nextFactId;
  22: }
  23:  
  24: private static void InsertList(Session session, long listId)
  25: {
  26:     session.Command.CommandType = CommandType.Text;
  27:     session.Command.CommandText = "INSERT INTO List (ListId) VALUES (@ListId)";
  28:     session.ClearParameters();
  29:     session.AddParameter("@ListId", listId);
  30:     session.Command.ExecuteNonQuery();
  31: }
  32:  
  33: private static void UpdateList(Session session, long listId, string listName)
  34: {
  35:     session.Command.CommandType = CommandType.Text;
  36:     session.Command.CommandText = "UPDATE List SET ListName = @ListName WHERE ListId = @ListId";
  37:     session.ClearParameters();
  38:     session.AddParameter("@ListId", listId);
  39:     session.AddParameter("@ListName", listName);
  40:     session.Command.ExecuteNonQuery();
  41: }

The analytical database keeps the ID of the last fact that it processed. It updates this ID with each change. Both of these database operations are performed within the same transaction, so there is no danger of missing or duplicating a fact. It’s like using a transactional queue without the overhead of the distributed transaction coordinator.

The analytical database uses fact IDs as primary keys, rather than defining its own auto increment key. For the foreign key relationships to be valid, the parent record must be inserted before the child. Fortunately, Correspondence ensures that this is always the case. Predecessors are always handled before their successors.

Future analysis

Now that I have HoneyDo List data loaded into an analytical database, I can see exactly how people are using the system. I have a few features planned, and I will get those deployed to the Marketplace very soon. When the new version is out there, I’m sure that I will come up with more analytics to capture. For example, how many people are sharing lists with their family? How frequently does the average user add a task, or mark it completed? As I come up with new analytics, I will need to modify the database. After each modification, I’ll simply drop all of the data and replay it from the historical model.

Historical modeling has made it easy for me to separate the analytics from the collaborative application. The app doesn’t have to operate against the analytical database. But I can see the data as if it did, because I can tap into the history and replay it offline.