Articles...

Copyright © 2009 Corvus International Inc.  All Rights Reserved

Home      About Corvus       Contact Us       Articles/Resources       Clients          Affiliations  

Uncertainty will always be part of the taking-charge process.

                      Harold S. Geneen (1910 - 1997)
                 CEO of ITT 1959-1977


ACM DL Author-ize serviceThe conservation of uncertainty
Phillip G. Armour
Communications of the ACM - ACM's plan to go online first, 2007

Combination LOC                                                           

Dang, we just can't seem to leave LOC alone.  After publishing Beware of Counting LOC I got a number of responses--some misunderstanding the point I was trying to make--that said we really shouldn't be talking about Lines of Code (LOC) in this day and age, for goodness sake! 

I mean, LOC is so passť, so outdated, so,... so 1980s COBOL!

It is interesting that, despite its manifest shortcomings, we still use LOC (or, I prefer, Line of Code Equivalents: LOCE) for most software estimation processes and tools.  While many of these proprietary estimation tools do allow you to size a system using "other units" they also require, or have built-in, a "scaling factor" which defines how many LOC there are per whatever other unit you are using.  Multiply the two numbers and you are still de facto counting in LOC.

Scaling = Unit Complexity (wrt LOC)                                   

This unit "scaling factor" can be considered to be the "relative complexity" or perhaps the amount of knowledge contained in the unit with respect to the LOC unit, which by default or definition is rated as 1.

 LOC is a quite well defined unit, but others are not            

For all its many failings, LOC is quite a well-defined unit.  Viewing the artifact as a knowledge container, every LOC within a system contains an amount of knowledge that is about the same as every other LOC.  The amount of knowledge we need to get to create one effective LOC is not several orders of magnitude higher or lower than any other LOC.  The reason for this is that LOC is a pretty small unit--there is not much knowledge in one LOC--so we end up dividing the total knowledge of a system by quite a large number.  This shows what LOC really is and that is a normalizing factor for system size and complexity (the system size and the system complexity are the same number when measured in LOC since the "scaling factor" for LOC is axoimatically of value 1.

Other units are not so well-defined.  Sizing a system by the number of "Requirements" for instance forces us to ask the question: "...what is a 'Requirement'?...".  A Requirement could be really big or really small, it could be very complex or very simple, it could require a lot of work to instantiate or it could be very easy to build.  This means the range of the scaling factor for this unit is potentially very wide, perhaps from 10 to 10,000,000 (simplest/smallest to most complex/largest requirement).  One way of looking at this (though not necessarily the best way) is to consider how many LOC it would take to operationalize the Requirement.  The reason why there is often not a straight line from Requirement to LOC is that we may choose to operationalize the Requirement using something other than 3GL lines of control-oriented code, which is what the LOC unit is baselined against.

But simply because we might not actually write LOC to make a Requirement actually function, does NOT mean that the equivalence is invalid.  It simply means that the Requirement at its level of scaling contains about the same amount of knowledge (or more correctly requires the acquisition of the same amount of knowledge) as a certain number of LOC with a defined scaling of 1.  The certain number is given by the Count of Requirements * Scaling Factor for Requirements.

What we can't count, why we can't count                         

The problem with LOC as a unit is that, in early system development, we may have no idea how many LOC there might be.  This is the primary argument against using the LOC unit and it is entirely valid.  However, if we use a unit we can count, such as Requirements we find that the definition of the unit is uncertain

It seems that if the count is well-defined (Requirements) then the definition is not, but if the definition is well-defined (LOC) then the count is not.  We can't win.

The Conservation of Uncertainty                               

This is what I call The Conservation of Uncertainty.  The Conservation of Uncertainty states that:

[Uncertainty_in_Count] * [Uncertainty_in_Definition] = Constant

That is, if we switch units in sizing a system we will simply move the source of the uncertainty in how "big and complex" the system is from the count of the unit to the definition of the unit or vice versa.  We cannot change the uncertainty in sizing a system by switching the unit in which we size the system.

Intrinsic Uncertainty                                               

There is a simple reason for this: there is intrinsic uncertainty in how "big and complex" a system is (read: how much knowledge will be in a delivered system and how much knowledge will we have to acquired to build it).  This uncertainty simply exists in the situation and cannot be wished away.  More pointedly, the uncertainty cannot be reduced simply by the activity of counting in any unit.

The product of the uncertainty in count times the uncertainty in unit definition is a constant that is always equal to the uncertainty in the system at the time of counting.  Switching units simply moves the uncertainty and does not reduce it.

Uncertainty and Entropy                                       

Uncertainty is rather like thermodynamic entropy: it can only be reduced through the application of energy.  Specifically, the intrinsic uncertainty of a system development can only be reduced through the activity of systems development.

It's time to back away from the sizing unit battle and look at the real challenges of project estimation.

 

 

 
The Conservation of Uncertainty