In a previous blog article, I mentioned how I was getting back to my programming roots and reading The Principles of Product Development FLOW: Second Generation Lean Product Development by Donald G. Reinertsen. My plan is to review each chapter. I have already posted my reviews of Chapter 1 and Chapter 2. Here is my review of Chapter 3, "Managing Queues."
As its name suggests, the third chapter is all about queues. Queues are working waiting to be done. A queue is characterized by the rate at which new work to be done arrives, the amount of time it takes for work to be completed, and the process for determining what gets worked on. Rather than go into the complex math regarding queuing theory, most people can relate to the examples cited in the chapter.
Rate at which new work arrives
At rush hour in major cities, metering lights operate at freeway on-ramps. These hold arriving cars in the queue and allow them to enter the freeway at a regular cadence. This prevents injecting cars onto the freeway at variable rates. This avoids congestion that is caused by the turbulence of cars having to slow down to allow an excess number of other cars to merge. [Page 67]
Amount of time it takes for work to be completed
Most airlines use a single queue with multiple agents for airport check-in lines. Some still use a separate line per agent. In the second case, an unusual transaction or an unexpected circumstance causes a delay in that agent's queue while all of the other agents are unaffected. How many times have you thought to yourself "I got in the wrong line." With one queue and multiple agents, anomalies affect all of those waiting equally instead of just those who got in "the wrong line." [Page 65]
Process for determining what gets worked on
"Think of a hospital emergency room. When there is a 6-hour wait, critical patients must be moved to the head of the line. When waiting times are very short, patients can be processed in a FIFO [first-in, first-out] order. Ideally, [the] goal is to make the queue size so small that [the hospital] does not need queuing discipline." [Page 70]
So how does this relate to software development projects?
For a manufacturing process, work queues can be more reliably estimated. Each widget being produced is the same. Yes, there is variability in that problems can occur or suppliers can be late, but in general the time to make 200 widgets is 20 times as long as it takes to make 10 widgets. There may be a constant setup and take down time that is the same for both the 10 and 200 cases, but by and large, the process is predictable.
In the case of software development, each programming task is unique. The programmer is solving a programming problem that he or she has never solved before. One software task may be similar to tasks previously performed but is rarely identical. This makes it challenging to estimate the time to complete the work when considering a software development project. The good news is that the agile development process breaks a project down into sprints, where a sprint is a small collection of functionality to be delivered in a short amount of time. So for a programmer's to-do list queue, though the time to complete work is uncertain, the arrival rate of new work is known. The key to success for software projects is to determine which features to include in the sprints based on the cost of delay associated with all possible features. This is often a challenge because for software, unlike inventory for manufacturing that can be observed and counted, the queue of work in progress for a developer is traditionally invisible.
Customers report problems with Autodesk software on a regular basis. Many are dismayed as to why they are not immediately corrected. For example, "Shouldn't my issue go right to the front of a developer's queue?" is a common expectation during our beta programs. The key yo to successful projects is to manage the queue so that the correction occurs at the proper time. Although a beta is typically months before a product ships, there is often not enough time to fix every reported defect. For defects, the severity, frequency, and risk are evaluated.
- The severity reflects how bad it is (e.g., typo, annoyance, feature does not work as promised, crash, loss of data).
- The frequency reflects how likely it is to happen (e.g., everyone will encounter this every time, this only happens with a certain type of data or on a certain hardware configuration).
- The risk associated with a defect reflects the complexity and amount of code that has to change to correct it.
Anytime a developer fixes a defect, he or she runs the risk of breaking something else. So as a beta progresses, the bar gets raised higher and higher as to what gets fixed and what gets deferred. All of this is done by product core teams (i.e. project management, QA, SW DEV, marketing) who hold daily bug scrub meetings using a queue of bugs in a defect tracking system. So if a reported defect does not get fixed, it was probably too risky to fix for the initial release. It's in the tracking system, so the customer may see a correction in a later service pack.
Queues are alive in the lab.