Patterns on Bulk Data Migration

No code in this most, well maybe a bit.

I’ve been working on a number of data migrations over the past few years using a variety of tools, from off the shelf solutions to home grown, I’ve run everything in between.

All of them have their hits and misses and this isn’t to harp on one over the other as I continue to refine my approach in getting data into Dynamics in the most efficient way possible.

My current investigation into how to optimize this process has lead me to consider what the most important pieces for a solution to work are;

  1. Make it As Fast As Possible.
  2. Being able to handle Concurrent Connections (i.e., multi-thread and parallelize it until the cows come home).
  3. Accessible in the environment that the user operates in (read: if you’re building something for developers, give it in a form they will consume it).
  4. Does 80% of the heavy lifting, leaving the door open for the other 20% to be handled by the developer.
  5. Connection Handling with the Source (could be anything) and Target (generally Dynamics)
  6. Can is scale.
  7. Make it Debuggable (i.e., where do I go when something goes wrong… because something will go wrong).
  8. Relate that Data (can we bulk it up).

All of these “tasks” intertwine with one another – you can’t have scale or parallel connections without a solid connection point.  I’ll deal into some of these items in more depth in the coming months but one item I’ll focus on today is relating the data.

Relating Data

Unless you’re migrating the most horrible database in the world (or some flat-files), you’re going to have some bits of related data, some old data that needs to be transformed into new, some dirty that needs to be cleaned.  Where I talk about that last 20%, 60% of that will be spent dealing with these issues that just wear you down.

When you’re relating data as it flows into Dynamics, you need proper EntityReferences to be able to link up your record before submitting it.  The SDK isn’t mature enough for you to submit a number of Entity Create Requests and have it insert the EntityReferences for you on the fly (if it did – WOW would that simplify life).

But it doesn’t, so you need to persist in case you need again (think lookup/domain data).  Right now as part of the framework I’m working on, the goal is to figure out how to migrate an entity and chain that entity’s creation to another (i.e., don’t start migrating Entity A, if Entity B isn’t done yet).

So if you think about it, you’d want to create something like this – a package, that you can specify what elements get created separately, but can be done as a unit (i.e., threaded to go concurrently) to save time (see related to #1, #2 & #5).

So you could have something like this.

ParallelPackage pack = new ParallelPackage();
pack.Name = "PACKAGE NAME";
pack.Dependencies.Add("PACK DEPENDENCY 1");
pack.Dependencies.Add("PACK DEPENDENCY 2");
pack.Entities.Add(new Entity1());
pack.Entities.Add(new Entity2());
pack.Entities.Add(new Entity3());

So what does this give you?

Well what this says is that Entity1, Entity2 & Entity3 are all dependent on each other so we’ll look to execute them serially within their package BUT imagine if you have a bunch of these packages that can now run in parallel WITHOUT having to wait until everything is done?

Taking this a step further, what if you have a part of the package that lets you chain the dependencies together so they execute when the next one is done.

I had previously written the above code snippet in the way above, but after having written this, now think there is a better way to chain these together.

Stay Tuned for More Developments.




Post A Reply

%d bloggers like this: