jump to navigation

Data Optimization Using Data Request Objects Implementing IEquatable December 23, 2013

Posted by codinglifestyle in Architecture, ASP.NET, C#, CodeProject.
Tags: , , , , ,
add a comment

Anyone who has written enterprise software often knows an ideal design and a linear code path are not too common. Even if you are the architect of your application that doesn’t save you from the hoops you must jump through to connect and interact with other systems in your enterprise’s ecosphere. Such as it is in my system to do a seemingly simple task, loading addresses. While there is only one address control and one presenter, when we get to the data layer there are many different code paths depending on the type of address, the selected company, sales areas, and backend systems. For a large order with hundreds of quotes there is theoretically several hundred address controls to be populated efficiently throughout the ordering process.

Before we get going on optimization let’s start with some basics. We have data consumers who want data. These consumers may represent my address control, a table, or any number of components that need data. In addition, there may be hundreds or thousands of them. We can’t allow each instance to simply call our data layer individually or we’ll cripple the system. These data requests need to be managed and optimized.

We are going to encapsulate our data request in an object. If you’re data layer function signature takes 3 parameters simply move these to your new data request object and later we’ll rewrite your data function to take a List<DataRequest> instead of 3 parameters. Of course you may have many parameters or complex objects you need to pass, all the better to encapsulate them! So now we have an object which contains all the information we need to ultimately call the data layer.

When you have hundreds or thousands of data requests there is a very good chance that many of those requests are for the same data. That’s what we’re after here is minimizing the number of calls for actual data. Of course, due to how legacy data functions may be written they may be too narrow in scope. Some queries, for example, may be filtered based on a function parameter which then might require multiple calls to get the complete data required across all data requests. This is the kind of analysis you will need to perform on your own to perhaps bring back the larger data set, cache it, and return pieces of it to individual data requests. One of the great advantages of encapsulating your data requests is analysing them and being able to better satisfy them by rewriting your data layer functions.

Next we must design our controls and other data consumers to be patient. Instead of making a call to get some data which is immediately fulfilled they will instead register a data request. This will give the hundreds or thousands of other data consumers a chance to register their data requests.

Once the registration window is closed we can now trigger our service or presenter to make the necessary data layer call. As eluded to above, we will pass the complete list of data requests to the data layer. We will then have the opportunity to optimize the data requests to minimize the number of actual data calls made and to make them in bulk. This is the part where you might start worrying how to tackle this gargantuan task. What if I told you I could optimize your data requests in just a few lines of code?

//Create data sets of like requests  (minimized data requests)
//
Dictionary<DataLoadAddressesEntity, List> requestsEx = new Dictionary<DataLoadAddressesEntity, List>();
foreach (DataLoadAddressesEntity request in requests)
{
    if (!requestsEx.ContainsKey(request))
        requestsEx.Add(request, new List());
    else
        requestsEx[request].Add(request);
}

That wasn’t so hard, was it? Now I have a dictionary whose keys represent the minimized number of data calls truly necessary. I call these prime data requests and they are the keys in the dictionary. Each prime data request may then be used to populate the list of equal data requests which are held in the values of the dictionary. So once the prime data request is satisfied we merely need to copy the results across the values in the data set:

//////////////////////////////////////////////////////
//Copy prime request results reference across data set
//
requestsEx[requestPrime].ForEach(r => r.Results = requestPrime.Results);

You might notice that I’ve included a Results property in my data request object. The great thing about encapsulating our request in an object is how handy it is to add more properties to keep everything together. Keep in mind that we are merely copying a reference of the prime request’s results across all like data requests. Therefore, changing one affects all the others, which makes sense but must be understood to not be dangerous. Some developers can go many years without really considering what reference types are so make sure to mentor your team of the basics of value vs reference types. Coming from C++ and the wondrous pointer I take full advantage of references as you will see in my final summary below.

So you must be wondering what voodoo magic I’m using to optimize the data set so easily. Did you read the title? To know if one data request is equal to another it is up to us to implement IEquatable<DataRequest> and override GetHashCode. This is the voodoo that allows us to use Dictionary.ContainKey(datarequest) singling out a prime data request from the secondary data requests. So, how do we decide if one request is equal to another?

With so many permutations and variables in the data layer where does one start? There is no easy answer for this one. It is time for some analysis to boil down what exactly makes one data request different from another. This is the hardest part of the exercise. I started with a spreadsheet, looked at all the variables each code path required, and developed a matrix. I was able to eliminate many of the variables which were the same no matter what type of request it was (CompanyID for example). What appeared an arduous task boiled down to just a few criteria to differentiate requests from one another. Of course, it took hours of eliminating unused variables, proving assumptions that other variables were always equal, and cleaning up the code in order to see the light through the reeds.

Once your analysis is done you now know how to tell if one data request is equal to another so we don’t waste resources making the same call twice. Implementing IEquatable<DataRequest> will have you implementing Equals in your data request object where the comparing type is another data request:

public bool Equals(DataLoadAddressesEntity other)

For each criteria from your analysis, let’s assume we have a property in your data request object. For each criteria, a comparison of of this.Property != other.Property means you return false. If the other data request’s criteria are the same you are both after the same data. So if you fall through all the criteria comparisons return true and you now have one less data call to make.

You must repeat the same logic, in principle, for the GetHashCode override. Instead of comparing the search criteria, this time you are adding up the criteria’s hash codes. So much like above, if you have 2 data requests which need the same data you should also have 2 hash codes which are equal. In this way you can use the dictionary, as above, to optimize the data requests.

Although the criteria that pertains to your data requests will differ I will show mine here as I love seeing examples:

#region IEquatable Members
public bool Equals(PartnerFunctionSearchEntity other)
{
    if (!this.AddressType.Equals(other.AddressType)) return false;
    if (!this.SoldToId.Equals(other.SoldToId))       return false;
    if (!this.SalesArea.Equals(other.SalesArea))     return false;

    return SearchCriteria.DictionaryEqual(other.SearchCriteria);
}

public override int GetHashCode()
{
    unchecked  //overflow is ok, just wrap
    {
        int hash        = 17;
        const int prime = 31;  //Prime numbers

        hash = hash * prime + AddressType.ToString().GetHashCode();
        if (!string.IsNullOrEmpty(SalesArea))
            hash = hash * prime + SalesArea.GetHashCode();
        if (!string.IsNullOrEmpty(SoldToId))
            hash = hash * prime + SoldToId.GetHashCode();

        foreach (KeyValuePair<EAddressSearchCriteria, string> keyvalue in SearchCriteria)
            hash = hash * prime + keyvalue.GetHashCode();

        return hash;
    }
}
#endregion

You may be wondering where the best place to put the various parts of this solution. I would suggest a service layer which sits between the data consumers and the data layer. In my case with many instances of an address control I placed it in the control’s presenter. As there is a 1:1 relationship between control and presenter the latter contains a member variable which is the data request. On registration it contains only the criteria necessary to get the data. I am using the per request cache (HttpContext.Current.Items) to store my List<DataRequest> where all registered data requests are accumulating.

Remember, my presenter only holds a reference to it’s _Request member variable… the same reference which is in the data request queue and the same reference to which the results will be assigned.

Once registration closes the data layer call is triggered with the list of data requests. The optimization happens here, nearest the source, so as not be repeated. Once the requests are optimized and the actual data calls are made the _Request.Results still held in the presenter’s member variable will be populated and are ready to set to the view for display.

Advertisements

Software Architect Conference 2012 November 19, 2012

Posted by codinglifestyle in Architecture, ASP.NET, CodeProject, Parallelism.
Tags: , , , ,
add a comment

I was fortunate enough to have the opportunity to attend the Software Architect Conference this year in London.  This is the same group which puts on DevWeek.  It was short and sweet, just 2 days without the additional sessions before and after.  Often with the daily grind you simply don’t have the time or inclination to challenge yourself with the sort of material presented at these conferences.  This is what makes them unique, for a few precious days you are free of distractions to consider how and why we do what we do.  I certainly found it useful and some of the speakers where truly impressive.  While the technology we use continues to change at the speed of light, the great thing about software architecture is many of the basic principals of building a stable, well-engineered system haven’t changed since medieval times.

Keynote

  • Theme: 21st century architects should aspire to be like medieval “master builders”
    • 7 years apprentice, many years to master, administers the project, deals with client, but still a master mason
    • Keep coding – credibility with team, mitigates ivory tower
  • 20th century software architects
    • Stepped away from the code
    • UML
    • Analysis paralysis
    • Ivory Tower syndrome
  • Architecture traps
    • Enterprise Architecture Group – not sustainable, disconnected
    • CV driven development – ego and fun over needs and requirements
    • Going “Post-technical” – no longer involved in programming
  • Software Architecture summed up
    • Create a shared vision – get everyone to move in the same direction
  • Architectural lessons learnt lost in Agile – baby out with the bath water
    • It is a myth that there is a conflict between good software architecture and agile
  • What we do
    • Requirements and constraints
    • Evaluate and vet technology
    • Design software
    • Architectural evaluation
    • Code!
    • Maintainability
    • Technical ownership
    • Mentoring
  • True team leadership is collaborative / mentoring
  • Big picture: Just enough architecture to provide vision enough to move forward

Architectural Styles

  • Architectural definition defines 3 things
    • What are the structural elements of the system?
    • How are they related to each other?
    • What are the underlying principles and rationale to the previous 2 questions?
  • Procedural
    • Decompose a program into smaller pieces to help achieve modifiability.
    • Single threaded sequential execution
  • RPC Model
    • Still procedural: single thread of control
  • Threads
    • Decouples activities from main process but still procedural
    • Shared data must be immutable or copied
    • Some people, when confronted with a problem, think, “I know, I’ll use threads,” and then two they hav erpoblesms.
  • Event based, Implicit Invocation
    • The components are modules whose interfaces provide both a collection of procedures and a set of events
    • Extensible / free plumbing
    • Inversion of control (not dependency inversion)
  • Messaging
    • Asynchronous way to interact reliably
    • Instead of threads and shared memory use process independent code and message passing
  • Layers
    • Regardless of interactions and coupling between different parts of a system, there is a need to develop and evolve them independently
    • Each layer having a separate and distinct responsibility following a reasoned and clear separation of concerns
    • Often “partitioned” but not true layers due to cross references which sneak in
  • Alternate Layers – spherical
    • Core – domain model
    • Inner crust – services wrapped around core
    • Outer crust – wrapped external dependencies
  • Micro-kernel / Plug-in
    • Small hub with everything plugged in
    • Separates a minimal functional core from extended functionality and customer-specific parts
  • Shared repository
    • DB and the like
    • Procedures secondary, data is king!
    • Maintain all data in a central repository shared be all functional components of the data-driven application and let the availability, quality, and state of that data trigger and coordinate the control flow of the application logic.
  • Pipes & Filters
    • Divide the application’s task into several self-contained data processing steps and connect these steps to a data processing pipeline via intermediate data buffers.
    • Process & queue → process & queue → process & queue

The Architecture of an Asynchronous Application

  • Heavy focus on messaging throughout talk
  • About Messaging
    • Guaranteed delivery at a cost
    • Reliable and scalable
    • Subscription models
      • 1 : n
      • Round robin
      • Publish / Subscribe
  • Messaging Terms
    • Idempotency – will doing something twice change data / state?
    • Poison message – situation where a message keeps being redelivered (perhaps because an exception is thrown before an ack is returned to queue)
  • Messaging platforms
    • MSMQ – MS specific (personally found it easy enough to use)
    • IBM MQ
    • NServiceBus
    • RabbitMQ – multiplatform, Multilanguage binding. Mentioned in numerous talks and focus of talk.
    • SignalR – interesting client-side messaging platform could be a more powerful model than using web services on the client
      • install-package SignalR with NuGet
      • Picks best available connection method
      • Push from server to client
      • Broadcast to all or to a specific client

Async with C# 5

  • This talk is largely about Tasks and iterates through several examples of an application trying various asynchronous styles. The point is to try to get a minimal syntax such that an asynchronous application can be written is the same number of lines as a procedural program.
  • Context – must know the identity of which thread is executing. Critical in UIs and error handling
    • SynchronizationContext class can revert thread context to calling thread (as can several other methods such as Invoke)
  • Tasks – a piece of asynchronous functionality
    • Uses continuations to handle results
  • Async keyword – marks a function to allow use of the await keyword. Must return void or a Task.

    private async void CalculatePi()
    {
      // Create the task which runs asynchronously.
      Task<double> result = CalculatePiAsync();

      // Calls the method asynchronously.
      await result;
     
      // Display the result.
      textBox1.Text += result;

    }

  • Putting a try/catch around this an the compiler will ensure that the error is rethrown in the correct context.
  • Automatic use of thread pool which measures throughput to scale number of running threads up or down, as appropriate
  • Progress / Cancellation Features
    • IProgress<T>
  • Can launch a collection of classes and then use different operation types such as
    • var task = Task.WhenAny(tasks);
    • which returns when the first task completes. Or use Task.WhenAll to wait for all tasks.
  • WCF can generate the async methods to use tasks when adding Service References -> Advanced.

Inside Requirements

  • Kevlin Henny, author 97 Things Every Programmer Should Know and Pattern Oriented SW Arch
  • While listening to requirements we often stop listening while jumping ahead to solutions
  • Killer question when cutting through nefarious design agendas: “What problem does this solve?”
  • Patterns often misapplied – using a hammer to drive a screw leading to a pattern zoo
  • Composing a solution to a problem rather than analysis to understand the problem
  • Many to many relationships don’t need to be normalized (they model the real world)
  • Describing is not the same a prescribing
  • A model is an abstraction of a point of view for a purpose
    • Good – omits irrelevant detail
    • Bad – omits necessary detail
  • RM-ODP: reference model using viewpoints a way of looking at a system / environment

    • Enterprise – What does it do for the business?
    • Information – What does it need to know?
    • Computational – Decomposition into parts and responsibilities
    • Engineering – Relationship of parts
    • Technology – How will we build it?
  • Use Case
    • Use inverted pyramid style to place most important detail at the top. Move post-condition next to pre-condition. Sequence, containing detail about how you accomplish the steps in-between pre and post at bottom as only interest to implementers.
      • Intent
      • Pre-condition
      • Post-condition
      • Sequence – lots of juicy detail but actually least important from an architecture point of view
  • User Story
    • Traditional Connextra form
      • As a <role>,
      • I want <goal/desire>
      • So that <benefit>
        • As an Account Holder
        • I want to withdraw cash from an ATM
        • So that I can get money when the bank is closed
    • Dan North scenario form
      • Given <a context>
      • When <a particular event occurs>
      • Then <an outcome is expected>
        • Scenario 1: Account has sufficient funds
        • Given the account balance is \$100
        • And the card is valid
        • And the machine contains enough money
        • When the Account Holder requests \$20
        • Then the ATM should dispense \$20
        • And the account balance should be \$80
        • And the card should be returned
  • Problems with the Use Case / User Story approach
    • Observations are always made through a filter or world-view
    • Until told what to observe you don’t know what you’ll get. In that case, is it even relevant?
    • Use Case Diagrams neglect to notice they are fundamentally text/stories
  • Context Diagrams – shows the world and relationships around the system (UML actors)
    • Litmus test: what industry does the diagram apply to?
    • Not a technical decomposition
    • You’re an engineer planning to build a bridge across a river. So you visit the site. Standing on one bank of the river, you look at the surrounding land, and at the river traffic. You feel how exposed the place is, and how hard the wind is blowing and how fast the river is running. You look at the bank and wonder what faults a geological survey will show up in the rocky terrain. You picture to yourself the bridge that you are going to build. (Software Requirements & Specifications: “The Problem Context”)

    • An analyst trying to understand a software development problem must go through the same process as the bridge engineer. He starts by examining the various problem domains in the application domain. These domains form the context into which the planned Machine must fit. Then he imagines how the Machine will fit into this context. And then he constructs a context diagram showing his vision of the problem context with the Machine installed in it.
  • Problem Frame approach – describe a problem in diagrams
  • Grady Booch
    • Use centric – visualization and manipulation of objects in a domain
    • Datacentric – integrity persisting objects
    • Computational centric – focus on transforming objects
  • In summary: move from ignorance / assumptions → knowledge gathered from multiple points of view

A Team, A System, Some Legacy… and you

  • Legacy System – so valuable it can’t be turned off (and it’s paid for!)
  • Be aware a legacy system often comes with a legacy team engrained in their own methods
  • Being late to the party
    • Software architecture often seems valuable only once things have gone wrong.
    • Architects often join existing projects with to help improve difficult situations
    • Often a real sense of urgency to “improve”
    • Avoid distancing self to ivory tower and likewise avoid digging in thus losing big picture focus
  • Software architecture techniques offer a huge value for older or troubled projects. Especially techniques to understand where you are and with whom
  • Stage 1: Understand
    • Right perspective
      • See gathering requirements for perspectives of end user, business management, IT Managers, development, and support
    • Automated analysis tools
      • NDepend, Lattix, Stucture 101, Sonar
      • Dependency analysis
      • Metrics
    • Monitor / Measure
      • Leverage existing production metrics
        • IIS
        • Oracle Enterprise Manager
      • Implementation metrics
      • Stakeholder opinions
    • Architectural Assessment
      • Systems Quality Assessment
        • Context and stakeholder requirements
        • Functional and deployment views
        • Monitor and measure
        • Automated analysis
        • Assessment Patterns
          • ATAM – architectural trade off analysis method
          • LAAAM – Lightweight architectural assessment method- more practical
          • TARA – tiny architectural review approach (recommended)
    • Minimal Modelling
      • Define notation / terminology
      • Break up system to different viewpoints
        • Functional
        • Data
        • Code
        • Runtime
        • Deployment – systems / services
        • Ops – run, controlled, roll-back
      • Focus on essentials for target audience
    • Deliverable:
      • System context and requirements
      • Functionality and deployment views
      • Improve Analysis
      • Requirements Assessment
      • Identity and report
      • Conclusion for sponsor
      • Deliver findings and recommendations
  • Stage 2: Improve
    • Team must be involved or rocketing risk affecting morale, confidence, competence
    • Choices based on risk
      • Assess -> Prioritize -> Analyse -> Mitigate
    • Engage in Production
      • Why
        • Reality check
      • How
        • Monitoring, stats, and incidence management
      • Who
        • Biz man, IT man, support
    • Tame the Support Burden
      • Drain on development
      • Support team can offset this
      • Avoid “over the wall” mentality
    • Continuous Integration and Deployment
      • Start simple
      • Increased efficiency and reliability
    • Automated Testing
      • Unit test + coverage, regression tests
      • Costly
    • Safe step evolution
      • Control risk
      • Wrap with tests
      • Partition
      • Simplify
      • Improve
      • Generalize
      • Repeat
    • Stay coding – but if a pure architect stay off the critical path
      • Beware ROI of your coding skills vs. architect’s skills
      • Refactor, write unit tests, address warnings
  • Define the future
    • Good for the team
    • Clear, credible system architecture for the medium term (1-2 years)
    • Beware: timing and predictions

Technical Debt

  • As an evolving program is continually changed, its complexity (reflecting deteriorating structure) increases unless work is done to maintain or reduce it
  • Technical Debt is a metaphor developed by Ward Cunningham to help us think about the above statement and choices we make about the work required to maintain a system
  • Like a financial debt, the technical debt incurs interest payments, which comes in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. We can choose to continue paying the interest, or we can pay down the principal by refactoring the quick and dirty design into a better design
  • Sometimes, upon reflection, it is better to pay interest. But are we trapped paying so much interest we can never get ahead?
  • What is the language of debt?
    • Amortise, repayment, balance, write off, restructure, asset, interest, default, credit rating, liability, principal, load, runaway, loan, consolidation, spiralling, value
  • Shipping first time code is like going into debt. A little debt can speed delivery so long as it is paid back promptly with a rewrite
  • The danger is ignoring or not paying back the debt (compound interest!)
  • Rebuttal: A mess is not a technical debt. A mess is just a mess.
  • Counter response: The useful distinction isn’t between debt or non-debt, but between prudent and reckless debt.
  • There is also a difference between deliberate debt and inadvertent debt.

  • There is little excuse for introducing reckless debt
  • Awareness of technical debt is the responsibility of all roles
  • Consideration of debt must involve practice and process
  • Management of technical debt must account for business value

  • Perfection isn’t possible, but understanding the ideal is useful

Books, People, and Topics of Note                                       

  • Simon Brown – www.codingarchitecture.com
  • Alan Holub – www.holub.com
  • Kevlin Henney – Pattern Oriented Software Architecture
  • Grady Booch – architecture vs. design
  • Linda Rising
  • George Fairbanks – Just Enough Software Architecture
  • Roy Osherove – Notes to a Software Team Leader
  • Top 10 Traits of a Rockstar Software Developer
  • Becoming a Technical Leader – Gerald Weinberg
  • 101 Things I Learned in Architecture School
  • Architecting Enterprise Solutions
  • Software Architecture – Perspectives of an Emerging Discipline
  • Software Requirements and Specification – Michael Jackson
  • Problem Frames – Michael Jackson
  • 12 Essential Skills For SW Arch
  • Refactoring to Patterns
  • Managing Software Debt
  • Modernizing Legacy Systems
  • Working Effectively with Legacy Code
  • Growing Object-Oriented Software, Guided by Tests
  • Knockout.js – MVVM javascript library. Takes JSON and allows you to connect to HTML in a simple way I presume w/o the manual jQuery work of redrawing your control (e.g. autocomplete textbox)
  • Backbone.js – model / view extension with events
  • Parasoft Jtest smoke test
  • Selenium automation UI test
  • RabbitMQ – client side messaging queue
  • LightStreamer / SignalIR – web sockets for client (stop gap for HTML5?)