jump to navigation

Data Optimization Using Data Request Objects Implementing IEquatable December 23, 2013

Posted by codinglifestyle in Architecture, ASP.NET, C#, CodeProject.
Tags: , , , , ,
add a comment

Anyone who has written enterprise software often knows an ideal design and a linear code path are not too common. Even if you are the architect of your application that doesn’t save you from the hoops you must jump through to connect and interact with other systems in your enterprise’s ecosphere. Such as it is in my system to do a seemingly simple task, loading addresses. While there is only one address control and one presenter, when we get to the data layer there are many different code paths depending on the type of address, the selected company, sales areas, and backend systems. For a large order with hundreds of quotes there is theoretically several hundred address controls to be populated efficiently throughout the ordering process.

Before we get going on optimization let’s start with some basics. We have data consumers who want data. These consumers may represent my address control, a table, or any number of components that need data. In addition, there may be hundreds or thousands of them. We can’t allow each instance to simply call our data layer individually or we’ll cripple the system. These data requests need to be managed and optimized.

We are going to encapsulate our data request in an object. If you’re data layer function signature takes 3 parameters simply move these to your new data request object and later we’ll rewrite your data function to take a List<DataRequest> instead of 3 parameters. Of course you may have many parameters or complex objects you need to pass, all the better to encapsulate them! So now we have an object which contains all the information we need to ultimately call the data layer.

When you have hundreds or thousands of data requests there is a very good chance that many of those requests are for the same data. That’s what we’re after here is minimizing the number of calls for actual data. Of course, due to how legacy data functions may be written they may be too narrow in scope. Some queries, for example, may be filtered based on a function parameter which then might require multiple calls to get the complete data required across all data requests. This is the kind of analysis you will need to perform on your own to perhaps bring back the larger data set, cache it, and return pieces of it to individual data requests. One of the great advantages of encapsulating your data requests is analysing them and being able to better satisfy them by rewriting your data layer functions.

Next we must design our controls and other data consumers to be patient. Instead of making a call to get some data which is immediately fulfilled they will instead register a data request. This will give the hundreds or thousands of other data consumers a chance to register their data requests.

Once the registration window is closed we can now trigger our service or presenter to make the necessary data layer call. As eluded to above, we will pass the complete list of data requests to the data layer. We will then have the opportunity to optimize the data requests to minimize the number of actual data calls made and to make them in bulk. This is the part where you might start worrying how to tackle this gargantuan task. What if I told you I could optimize your data requests in just a few lines of code?

//Create data sets of like requests  (minimized data requests)
//
Dictionary<DataLoadAddressesEntity, List> requestsEx = new Dictionary<DataLoadAddressesEntity, List>();
foreach (DataLoadAddressesEntity request in requests)
{
    if (!requestsEx.ContainsKey(request))
        requestsEx.Add(request, new List());
    else
        requestsEx[request].Add(request);
}

That wasn’t so hard, was it? Now I have a dictionary whose keys represent the minimized number of data calls truly necessary. I call these prime data requests and they are the keys in the dictionary. Each prime data request may then be used to populate the list of equal data requests which are held in the values of the dictionary. So once the prime data request is satisfied we merely need to copy the results across the values in the data set:

//////////////////////////////////////////////////////
//Copy prime request results reference across data set
//
requestsEx[requestPrime].ForEach(r => r.Results = requestPrime.Results);

You might notice that I’ve included a Results property in my data request object. The great thing about encapsulating our request in an object is how handy it is to add more properties to keep everything together. Keep in mind that we are merely copying a reference of the prime request’s results across all like data requests. Therefore, changing one affects all the others, which makes sense but must be understood to not be dangerous. Some developers can go many years without really considering what reference types are so make sure to mentor your team of the basics of value vs reference types. Coming from C++ and the wondrous pointer I take full advantage of references as you will see in my final summary below.

So you must be wondering what voodoo magic I’m using to optimize the data set so easily. Did you read the title? To know if one data request is equal to another it is up to us to implement IEquatable<DataRequest> and override GetHashCode. This is the voodoo that allows us to use Dictionary.ContainKey(datarequest) singling out a prime data request from the secondary data requests. So, how do we decide if one request is equal to another?

With so many permutations and variables in the data layer where does one start? There is no easy answer for this one. It is time for some analysis to boil down what exactly makes one data request different from another. This is the hardest part of the exercise. I started with a spreadsheet, looked at all the variables each code path required, and developed a matrix. I was able to eliminate many of the variables which were the same no matter what type of request it was (CompanyID for example). What appeared an arduous task boiled down to just a few criteria to differentiate requests from one another. Of course, it took hours of eliminating unused variables, proving assumptions that other variables were always equal, and cleaning up the code in order to see the light through the reeds.

Once your analysis is done you now know how to tell if one data request is equal to another so we don’t waste resources making the same call twice. Implementing IEquatable<DataRequest> will have you implementing Equals in your data request object where the comparing type is another data request:

public bool Equals(DataLoadAddressesEntity other)

For each criteria from your analysis, let’s assume we have a property in your data request object. For each criteria, a comparison of of this.Property != other.Property means you return false. If the other data request’s criteria are the same you are both after the same data. So if you fall through all the criteria comparisons return true and you now have one less data call to make.

You must repeat the same logic, in principle, for the GetHashCode override. Instead of comparing the search criteria, this time you are adding up the criteria’s hash codes. So much like above, if you have 2 data requests which need the same data you should also have 2 hash codes which are equal. In this way you can use the dictionary, as above, to optimize the data requests.

Although the criteria that pertains to your data requests will differ I will show mine here as I love seeing examples:

#region IEquatable Members
public bool Equals(PartnerFunctionSearchEntity other)
{
    if (!this.AddressType.Equals(other.AddressType)) return false;
    if (!this.SoldToId.Equals(other.SoldToId))       return false;
    if (!this.SalesArea.Equals(other.SalesArea))     return false;

    return SearchCriteria.DictionaryEqual(other.SearchCriteria);
}

public override int GetHashCode()
{
    unchecked  //overflow is ok, just wrap
    {
        int hash        = 17;
        const int prime = 31;  //Prime numbers

        hash = hash * prime + AddressType.ToString().GetHashCode();
        if (!string.IsNullOrEmpty(SalesArea))
            hash = hash * prime + SalesArea.GetHashCode();
        if (!string.IsNullOrEmpty(SoldToId))
            hash = hash * prime + SoldToId.GetHashCode();

        foreach (KeyValuePair<EAddressSearchCriteria, string> keyvalue in SearchCriteria)
            hash = hash * prime + keyvalue.GetHashCode();

        return hash;
    }
}
#endregion

You may be wondering where the best place to put the various parts of this solution. I would suggest a service layer which sits between the data consumers and the data layer. In my case with many instances of an address control I placed it in the control’s presenter. As there is a 1:1 relationship between control and presenter the latter contains a member variable which is the data request. On registration it contains only the criteria necessary to get the data. I am using the per request cache (HttpContext.Current.Items) to store my List<DataRequest> where all registered data requests are accumulating.

Remember, my presenter only holds a reference to it’s _Request member variable… the same reference which is in the data request queue and the same reference to which the results will be assigned.

Once registration closes the data layer call is triggered with the list of data requests. The optimization happens here, nearest the source, so as not be repeated. Once the requests are optimized and the actual data calls are made the _Request.Results still held in the presenter’s member variable will be populated and are ready to set to the view for display.

Advertisements