Big data – Partitioned views using Entity Framework

Working with large data sets creates a unique challenge for software developers.  Balancing simplicity, maintainability and performance is near impossible, but it’s nevertheless something developers should strive for.

In regards to maintainability and simplicity, Entity Framework has been a big winner for many organizations.  Entity framework is a time saver on many fronts, on top of it’s object-relational mapping capabilities it also allows developers to easily create queries, database tables and stored procedures.

On the performance side many problems associated with massive data tables can be greatly alleviated by using a technique, available from many database vendors, known as partitioned views. Partitioned views allow large data tables to be split into ranges of data and stored as multiple member tables.  By doing so queries can be optimized to seek data of a specific range yielding significant performance gains.

Partitioned views can quite easily be imported within an Entity Framework model, however there will be some limitations.  Inserting and updating records of a view using Entity framework can be problematic, it requires trickery on the .Net side and specific conventions on the SQL server side.  There’s also the issue of maintaining the partitioned view.  How are new member tables added, how is the view updated to include new member tables?

In this article I will walk through the creation of a code-first entity framework library that will dynamically create partitioned views,  and enable view data modifications.

Partitioned views

Before I dive into Entity Framework magic. I’ll quickly outline the steps required to create partitioned views in SQL Server.

First, a “key” identifying  records of a given data range must be determined.  If data is primarily queried by year, a year column would be used for the data range key.

Once a logical data range key is determined it’s time to create member tables.  Three years worth of data would result in three separate members tables. In order to make the SQL server query optimizer aware of the data range key, a check constraint is added on the column.  The create table scripts look as follows:


CREATE TABLE [dbo].[PokerHand2011](
 [Id] [bigint] IDENTITY(1,1) NOT NULL,
 [Year] [int] NOT NULL CHECK(Year=2011), 
 [Action] [nvarchar](max) NULL,
 [Amount] [decimal](18, 2) NOT NULL,
 [PlayerName] [nvarchar](max) NULL,
 [PokerSiteHandId] [nvarchar](max) NULL, 
 CONSTRAINT PK_PokerHand2011 PRIMARY KEY ( [Id], [Year])
)

GO

CREATE TABLE [dbo].[PokerHand2012](
 [Id] [bigint] IDENTITY(1,1) NOT NULL,
 [Year] [int] NOT NULL CHECK(Year=2012), 
 [Action] [nvarchar](max) NULL,
 [Amount] [decimal](18, 2) NOT NULL,
 [PlayerName] [nvarchar](max) NULL,
 [PokerSiteHandId] [nvarchar](max) NULL, 
 CONSTRAINT PK_PokerHand2012 PRIMARY KEY ( [Id], [Year])
)

GO

CREATE TABLE [dbo].[PokerHand2013](
 [Id] [bigint] IDENTITY(1,1) NOT NULL,
 [Year] [int] NOT NULL CHECK(Year=2013), 
 [Action] [nvarchar](max) NULL,
 [Amount] [decimal](18, 2) NOT NULL,
 [PlayerName] [nvarchar](max) NULL,
 [PokerSiteHandId] [nvarchar](max) NULL, 
 CONSTRAINT PK_PokerHand2013 PRIMARY KEY ( [Id], [Year])
)

Once all the member tables are created the partitioned view is created  as a union on all members tables.


CREATE view PokerHand
AS

select * from PokerHand2011 union all

select * from PokerHand2012 union all

select * from PokerHand2013
GO

Pretty simple stuff. With the partitioned view created data can now easily queried over the entire set of data or a specific range of data all from the same view. The query execution plans below highlight this behavior.

Full set

Specific Range

Code-first entity framework

OK, now that the SQL academics are out of the way it’s time to write some .Net code.

In this section I’m going step through the creation of a self-contained library that creates partitioned views based on plain old C# objects (POCO) and exposes  the partitioned view through the entity framework with read, insert, update and delete capabilities.

Configuration of the partitioned view

In order to create a partitioned view for a POCO, the library will need to know which properties of the C# object compose to the primary key, and which properties compose the data range key.  I use the following configuration class for this:


public class PartitionedViewConfiguration<T>
{
public Expression<Func<T, Object>> PrimaryKeyExpression { get; set; }
public Expression<Func<T, Object>> DataRangeKeyExpression { get; set; }
}

Which can be used as follows:


config = new PartitionedViewConfiguration<PokerHand>
{
DataRangeKeyExpression = ph => new { ph.Year, ph.Month },
PrimaryKeyExpression = ph => new { ph.Id, ph.Year, ph.Month }
};

Creating data member tables

Creating tables using code-first entity framework is very easy.  You register a type with a DbContext, once the DbContext is initialized for the first time, entity framework creates a table corresponding to that type.  For the purpose of this library, I don’t have types for the member tables I want to create.  I can however create them dynamically as needed by inheriting the partitioned view type!


private Type CreatePartitionTableType(Type partitionedViewType, string suffix)
{
var asm = new AssemblyName(String.Concat(partitionedViewType.Name, "PartitionedViewMemberTables"));
var asmBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(asm, AssemblyBuilderAccess.Run);
var moduleBuilder = asmBuilder.DefineDynamicModule("MemberTables");
var typeName = String.Concat(partitionedViewType.Name, suffix);
var typeBuilder = moduleBuilder.DefineType(typeName);
typeBuilder.SetParent(partitionedViewType);
return typeBuilder.CreateType();
}

I can now register the newly created type with a DbContext as follows.


public class MemberTableDbContext<T>:DbContext where T:class
{
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.RegisterEntityType(DataType);
modelBuilder.Types().Configure(c => c.HasKey(PrimaryKeyPropertyNames));
base.OnModelCreating(modelBuilder);
}
public Type DataType
{
get { return typeof(T); }
}

//...

}

In the code above, “T” is the type that was created dynamically.  I create one DbContext instance per member table type for a reason.  Notice the DbContext configures the primary key, but it does not create the check constraint required for the data range key.  There isn’t a great way of creating these constraints using entity framework.  Therefore, I need to create these constraints using TSQL once the member table is available.  But when is a table created using entity framework? By default, code-first entity framework calls the DbContext initializer once per app domain.  Creating one DbContext per member table is what allows me to know when member tables are created.  To put it all together,  I create my member table data type dynamically, instantiate the DbContext, and create the check constraints as follows:


public virtual void AddConstraintCheckIfEqual(string tableName, string columnName, object value)
{
var cmd = String.Format("alter table {0} add check({1}={2})",
tableName,columnName, SqlSafe(value));
emptyContext.Database.ExecuteSqlCommand(cmd);
}

Creating partitioned views

The partitioned view can easily be exposed to the entity framework by registering the partitioned view type with a DbContext.  However, entity framework cannot create views, the library will need to create the view upon initialization of the DbContext and once all member tables for the view have been created.  The code looks as follows:

public class PartitionedViewAdapter<T>:DbContext where T : class
{
 public IDbSet<T> View { get; set; }
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
InitializeMemberTables();
CreateView();
modelBuilder.Entity<T>().ToTable(ViewName).HasKey(Config.PrimaryKeyExpression);
}
private void CreateView()
{
var keys = GetDataRangeKeys();
var memberTableNames = keys.Select(dataRangeKey => PartitionTablePrefix + dataRangeKey);
DatabaseAdapter.CreateOrAlterPartitionedView(ViewName, memberTableNames);
}
//...
}

DatabaseAdapter code


public virtual void CreateOrAlterPartitionedView(string viewName,IEnumerable<string> memberTableNames)
{
var createOrAlter = ObjectExists(viewName) ? "alter" : "create";

var selects = String.Join(" union all ",
memberTableNames.Select(tableName => "select * from " + tableName));
emptyContext.Database.ExecuteSqlCommand(createOrAlter + " view " + viewName + " as " + selects);
}

Modifying view data

The last problem to solve with this library is to support insert updates and deletes on the partitioned view.  One approach is to configure entity framework to use stored procedures for entity modifications.  This is accomplished by using the MapToStoredProcedures method of the EntityTypeConfiguration object.  The stored procedures can use case statements to defer the operation to the appropriate member table.  Another option is to override the SaveChanges on the partitioned view DbContext and delegate all entity modifications to the member table DbContext instances.  The code looks as follows:


public override int SaveChanges()
 {
 var objectsWritten = 0;
 foreach (var o in ChangeTracker.Entries<T>().Where(e=>e.State!=EntityState.Unchanged))
 {
 var dataRangeKey = GetDataRangeKey(o.Entity);
 var memberTable = memberTables.Single(mt => mt.DataRangeKey == dataRangeKey);
 var copy = CloneTo(o.Entity, memberTable.DataType);
 memberTable.DbContext.Entry(copy).State = o.State;
 objectsWritten += memberTable.DbContext.SaveChanges(); 
 Copy(memberTable.DbContext.Entry(copy).Entity as T, o.Entity);

if (o.State == EntityState.Deleted)
 o.State = EntityState.Detached;
 else
 o.State = EntityState.Unchanged;
 }
 return objectsWritten;
 }

The CloneTo method creates an instance of the member table data type and copies the values from the partitioned view entity to the member table entity.  The Copy method copies the values of the member table data type to partitioned view entity.  The other thing to notice is that I changed the EntityState of the changed objects so that operations are not repeated on subsequent calls to SaveChanges.

Parting notes

This wraps up my article on partitioned views and entity framework. Working with partitions views traditionally adds significant overhead to development efforts.  As displayed in this article, code-first entity framework can be of great help in the management of partitioned view object creations.  All that is needed to really make this library complete is a BulkInsert, an exercise for the future.

The code used for this article can be found on github.

Disclaimer, some of the methods (CloneTo, Copy, CreatePartitionTableType) are there for academic reasons.  I would override these with more robust 3rd party alternatives in a commercial setting.