Posts Tagged ‘Software Development Process’
Build Automation: Setting the Stage
Written by Kendall Miller on May 22, 2008 – 12:35 amEditor’s Note: This is the first article in a three article series, with a new article posted every few days.
If you haven’t experienced the difference an automated software build system can make to your entire approach to development, this article series will show you why it’s worth your time and how to get it done. Before we launch into the nuts and bolts of setting up a build automation system, lets step back and establish some common ground.
What’s A Build?
A build is the process that takes your source code and translates it into an installable product. There are some definitions that merely look at the first part (building executable files), but I prefer to look at things from a results standpoint: A process should achieve an external result, and the external result of building software is that you have a package that can be distributed and installed by users.
The critical goal is to ensure traceability from product back to the source code that created it:
- A given version of your product must represent a unique build so you know that there’s just one “1.1.1452″ version of your product in existence.
- Each binary file (.dll, .exe, .jar, etc.) needs to have a unique version number to ensure that there is just one “1.1.1452″ version of “MyCoolApp.exe” so you can look up the source code by that version number.
- The source code for each binary must be labeled with the version number so you know what source code made that version.
The same rules apply to non-compiled code as well, you just tend to treat them at a higher level (e.g. a whole set of PHP files as a group instead of each individual file).
To achieve these goals, I’ve always used a few simple rules:
- Every exchange loops through the source code control system: From computer to computer or process to process, do it by checking the output into the source code control system and getting it from there on the other side. This ensures you have a way of seeing the output of each stage.
- Only builds leave development: When you are going to bridge from your raw development environment to any other environment – test, certification, whatever – it’s done through a full build that has its own unique tracking number. Even if you just made another build 10 minutes ago.
These rules eliminate the possibility of transient work products (e.g. binaries) getting anywhere without the tracking to back up where they came from. They also ensure that any developer that has pack rat tendencies (and most do) will have to push things from their box to the source code control system, which should be on a nice safe server that’s backed up.
Sidebar: Seriously. Your source code control system is virtually irreplaceable. It should live on server-grade hardware fed nice clean power with a UPS and regular backups. The system you select should have a strong track record of never corrupting data and you should be comfortable that your backups of it are top flight. I recommend a product that stores into a commercial-grade database because the data is just that important.
What’s In Your Build Process?
At a high level the process to achieve this traceability is going to look something like this:
- Get the code for each project that needs to be compiled.
- Update the version information so you get a unique version of the compiled files
- Compile them.
- Label the source code you compiled with the version number.
- Package the binary files with everything else needed for the product into a distribution format.
- Store that distribution in a central location with a version number or name indicating what version it is.
That feels very simple and straightforward, doesn’t it – just six steps. When you look closer, you’ll notice there are a lot of loops: You have to get, label, and compile the source code for every project that needs to be built. Often, these projects have to be built in a specific order to work correctly. It may not even be obvious until the code is smoke tested if they were built out of order and won’t run together as a group. You also need to do this with absolute confidence in the integrity of the process so when you find a problem on a computer and it appears to be running version “1.1.1452″ you have confidence on exactly what that means, all the way back to the source code.
Pretty much every development environment includes some form of build automation. In the old days it was “Make”. In Visual Studio it’s now MSBuild. For the most part, these tools are competent at performing the basic steps necessary to take source code and produce binaries, but they aren’t generally going to handle the other elements like labeling source code, checking in outputs, and copying the final distribution to a central location. If they can be extended to do that, it’s usually fairly high effort, and can easily get in the way of the routine work your developers need to do local builds on their development systems.
But Wait, There’s More
This is a very simplistic view of what a build looks like because it leaves out a critical step: The smoke test. It really can’t be called a build if it can’t be installed and at least fire up without laying over and dying. It’d also be nice to pull together release notes including the defects that were fixed or new features added in this build. Finally, lets notify the team that a new build exists so they can pick up where the build leaves off.
You Don’t Need an Automated Build
You can do all of this by hand indefinitely. After all, if you document the process it should be possible for a professional to correctly execute the build by hand every time, following each step.
There are three key problems with this approach:
- Humans are fallible: A well trained professional doing an intricate task will still make a mistake around two percent of the time. That’s one in 50 opportunities: They’ll put the wrong version number on something, label the same folder twice and one not at all, not clean out the working directory first, something.
- The potential for mistake degrades value: Because a main point of the build process is to have confidence that you can absolutely go from distribution package back to every element of source code it maintains, even the possibility that there was an error in how the build process was executed will make you doubt its integrity and therefore you won’t achieve the value you wanted.
- It’s wasteful: Each build occupies a well trained professional’s time. If you need to do a new build at 2:00AM, you need a well trained professional to execute a possibly lengthy process accurately. This costs you resources and even worse it’s not a job any developer likes, so it costs morale.
Over time, the fact that each build is a risk and a waste will tend to unconsciously affect the decision making of the development team, making them more likely to defer a fix or change they might be able to code and unit test on their own computer but don’t feel is worth the overhead of the build.
Traditional Resistance
There are a number of reasons that are typically put forward against having a central, automated build process. The most common ones I hear are:
- It will slow down testing and certification: Since each build that is going to be tested outside of a developer’s machine has to come from the build system, that means that even a small error found in certification will require the entire build be run before it can be tested. Why not let a developer just recompile the offending file and slip it onto the cert system to verify it?
- It takes extra resources: Having an engineer set up and maintain the build process takes time away from development, which means my customers will get fewer features, etc.
- It slows down change: Every time we want to add a new binary file or a dependency we will have to update the build system and possibly the build process and retest it. This will get in the way of an individual developer being able to get things done as fast as possible.
- Single point of failure: What if the build computer fails? If it’s the only place to do a build, we’re stopped.
These objections generally spring from a few underlying problems within the development team: Developers that lack confidence and fear of change.
Developers Playing Hide the Ball
If there is a developer on your team that isn’t up to the rest of your team’s level and they’re trying to hide it, this is virtually guaranteed to bring it to everyone’s attention. They won’t be able to just slip a new file into the build or slip a fix into test without it being clear what happened.
If the time it takes to perform a build – whatever that is – is an impediment to certifying your software because you need to fix problems faster than that time, you have a more fundamental issue: Your developers are not thinking through their code before it’s included in the build. Fundamentally, it’s called Certification, not Debugging for a reason: Developers should be genuinely surprised that their code doesn’t work as expected when it leaves their hands.
If this is the case, then when a problem makes it to test it shouldn’t matter if the build takes 30 minutes or even two hours. Any development process that needs to go from the developer’s fingertips to certification in less than that time has more fundamental quality control and process issues.
If you have developers concerned that this slows down their ability to add new projects or dependencies because they have to think through how to update the build system this is really a good thing: These decisions matter by the time you want to ship a product to customers, so the earlier you can address them the lower the probability you’ll discover in certification that redistributing a particular dependency is hard or being done wrong.
Fear Of The Unknown
Most developers are not IT administrators, and all developers are humans. Human beings fundamentally don’t like change. They will actively fight change, often with very good prose. Giving up control from being able to do a local compile and take the binaries that work on their box to a central system that is opaque is uncomfortable. The very same developer that’s perfectly willing to switch to Visual Studio 2008 the second it was posted to MSDN and downloads the latest nightly build of NHibernate will come up with all sorts of creative reasons against a central, automated build because of their fear of change.
If you are following reasonable source code control rules, you really don’t need to worry about backing up an individual developer’s system: There shouldn’t be much that’s on it uniquely if it were to be lost, preferably at most a day’s work (which is within the time frame of a backup/restore loss anyway). The build system is special: As part of making it the central authority of building your distribution, it really is inconvenient to have to recreate it from scratch through reinstalling all of the software components, etc. It is likely to be slightly different than your developer’s computers (server grade hardware vs. desktops) so your normal developer image won’t work on it. Back it up as part of your normal production server backup scheme, and invest in redundant disks so it’s unlikely you’ll need those backups. This will tend to give you better build performance anyway, so it’s a double benefit.
Coming Next: Benefits of Automation and Centralization
Check back for the second article in this series focusing on the benefits for your team of automating the build process and centralizing it, including the roles and capabilities of an automated build system. From there the series will continue with how to create an automated build incrementally and make it a natural evolutionary process of your team.
Tags: Build Automation, Software Development Process
Posted in Process, Software Development | No Comments »
Defects: The Resolution Perspective
Written by Kendall Miller on May 19, 2008 – 12:47 amRegardless of how trivial the defect is there are very real costs and risks to resolving it. Let’s say it’s as simple as a misspelling on a text label, so it’s both really easy to fix and really easy to ensure you fixed it. You still have to contend with:
- Every Build is a Risk: Every time you package up a set of files as a build, there’s a risk of error. If your build isn’t entirely automated – and entirely means from source code through install – you run the risk of something being done wrong. More likely, the risk may be something external: Unknowingly including a newer version of a referenced library or introducing a dependency on a newer version. Either way, you need to do significant regression testing to mitigate that risk.
- Deployment Risk: The update will need to get from your development environment to your users. Whether it’s a Software as a Service (SaaS) product that just needs to hit some web servers or packaged software deployed to thousands, your update will need to be installed for people to get any advantage of it. In most cases this will mean a special upgrade installation, notification to existing customers to come and get the upgrade, and additional support for your users.
The truth is that most defects aren’t as clear cut as a spelling error, so you will also have to contend with the possibility that no matter how well intentioned, your fix is going to cause new problems for your customers. It could be that there are advantageous side effects of the current (defective) implementation or that your fix doesn’t work on the Elbonian version of Windows XP which you didn’t discover because you did only a focused test of the fix on your key target platforms. In more elaborate cases, it could be that the loophole represented by the defect is viewed by some of your users as a feature, so fixing it makes your product less valuable. This is more likely when doing defect patching because you typically don’t have the benefit of a beta cycle and end-user involvement in considering all of the aspects of the fix.
The Last Change is to Blame
If you have the opportunity, try this experiment some time: Announce a new version that never really happened. Perhaps you just relabel a prior version with a new number or something else to create the placebo effect of an update. What you will discover is:
- Surprise Fixes: Some group of your users will thank you for the new version. It’s so much faster than the old one! Oh, and you fixed a problem they’ve had for months.
- Surprise Defects: Unfortunately much more common than surprise fixes are the number of people that will report a problem that must have been caused by your update because it happened just after they installed it. It could be as wide ranging as their hard drive died or Word lost its dictionary. But they’re sure it’s your fault.
- Reinstall Rash: Some contingent of users will have problems installing the upgrade. The problem will vary depending on how you deploy your fix, but they’ll manage to get a computer or two out of sorts over it. Don’t think this is a Windows problem either, just look at the volume on support forums for WordPress right after a minor update.
In this case, there isn’t much you can do to minimize the problems because… you didn’t create them (after all, it’s the same software – that was the point of the test). With the possible exception of finding better ways of deploying fixes, there just isn’t a lot you can do. This is the minimum end-user overhead to every upgrade you make, and they’re going to make it your overhead. The big investment you can make to minimize this is:
- Cultivate Your Brand: If customers love you, they won’t make the leap from coincidence (two things happened at the same time) to causality. The more they love you, the more they’ll be sure they are at fault.
- Make Upgrades Easy: You really want to invest in ways that make updates easy. Look at Firefox and Windows Updates for examples of really great ways to get updates out the door. It’s easy and surprisingly trouble free, much more so than relying on users to manually know whether to uninstall the old version first, whether an update applies to them, etc.
What Are You Committed To?
It may seem cold and uncaring, but many defects just aren’t worth fixing because the downside potential of deploying the update overwhelms the likely benefit. Particularly when you are well into the next development cycle and can instead resolve the issue in the next feature release it often makes better business sense and customer satisfaction sense to leave the defect unaddressed and fold it into the future release.
If you’ve decided that the defect should wait, discuss this with the development team and your internal management and get consensus. This isn’t an easy conversation, but it’s made easier if you can show just how much effort, cost and opportunity loss there is in shipping an update for just this issue. Make sure that you leave the door open to reconsider, particularly if another issue shows up: Most of the overhead of deploying an update is essentially constant regardless of the number of issues resolved. This future potential will often push people to focus their thinking on whether this issue alone is worth all of the cost instead of talking in vague terms of commitment to quality and customer service.
Sidebar: Eliminate Build Overhead
While the overhead of creating a build and validating it is essentially a constant, that constant can be made significantly smaller with the right investments throughout the development process. The key is to automate as much of the process as possible. This broadly fits into the school of Continuous Integration or Continuous Builds, primarily because if you can’t automate the process you have no hope of doing it continuously.
- Automate Source Code to Install: Look at the process that takes raw source code and produces an install (be it a Windows Installer package, zip file, or RPM) and get humans entirely out of the loop. This can be done entirely with free software and by extending the tools you’re using already.
- Elevate Unit Testing to System Testing: Are you writing unit test libraries? If you’re using NUnit (or JUnit, or whatever) then look for ways to expose these to the build process and let the build run them every time. It doesn’t have to be this fancy – there just has to be a way of invoking tests during the build process, so this could be done with your own custom command line tool that exercises the system.
Automating the build process decreases your overhead costs during the primary development lifecycle and during maintenance. The overhead of the build is a tax on everything you produce that adds no value to your users, so focus on reducing it as much as you can. The great news is this game can be won an inch at a time: Incremental investments across your team can steadily improve your efficiency. There aren’t many other things you can do in the development process that pay off quickly and don’t require a major upfront investment.
As you gain experience with having an automated build and verification process you will find the entire team is more willing to tolerate risks because they know they have a large safety net in the automated verification process.
What Conversations Are You Having?
It’s easy to get pulled down into conversations that confuse the effort to fix the defect with the value of fixing it, or ignore the practical issues of deploying the fix or impact to other work that you can’t do because you’re pursuing the defect. Your development team will instinctively want to fix the defect – it will feel like an affront to their honor. Have the right conversations to bring everyone around to consensus on whether this one is worth it or not to what the team is trying to achieve.
As a manager and leader, your job is to generate buy-in for the decisions of the team and of the company. In the end, the worst mistake is pushing the development team where they don’t want to go. If they are determined to fix it, think hard about what the cost of letting them go ahead is. Perhaps the team can fix the defect but you don’t deploy it, deferring that cost until there’s enough value accrued to make it worth it. If they don’t think it’s worth it, perhaps it’s time for a field trip to commune with the users to understand the impact of the problem more clearly. The worst outcome would be if the team loses the passion to put in the time on all the details that have to be right to produce an outstanding product. Whatever the problem is, it isn’t going to be worth that cost.
Tags: Defects, Mindset, Problem Management, product feedback, Risks, Software Development Process
Posted in Software Development | No Comments »
Defects: The Diagnostic Perspective
Written by Kendall Miller on May 15, 2008 – 12:47 amRarely will users identify the true underlying defect with the software: Most users know there’s a problem but can’t precisely define the true defect. Additionally, if the software was at least moderately tested before release then most defects that are visible to the end user are really multiple defects:
- The problem the user reported.
- The way the software handled the problem when it occurred.
- The software design that allowed the user to get into trouble in the first place.
Typically, a user experiences a problem once the software has gone well off-track. The underlying problem began earlier than reported where it first jumped off the tracks (#1). It then snowballs until the user gets an odd message or experience sufficiently bizarre that they’re willing to report it (#2). It’s unlikely the software handles it in a pleasing and gentle way because if it did, that would mean you anticipated the problem and if that’s true, you would likely have found it in testing. You’ll want to make the software handle the problem more gracefully if something like it shows up in the future.
Finally, what was it in the overall architecture or design of the system that allowed the problem to get as far as it did without getting caught or corrected earlier (#3)? Perhaps there’s an underlying assumption that hardware is reliable or a file can’t be partially written to disk that needs to be reconsidered. This is the preventative medicine to catch all of the problems that are like the original problem. Once you understanding the basic design assumption that led to the problematic design in the first place, your team can usually see other decisions that sprung from the same thinking and look to address those before a user experiences a problem.
Side Note: Your users are already experiencing that problem too; they just haven’t reported it yet. How good are your feedback mechanisms?
It may not seem like there’s much of a risk or impact in attempting to diagnose the defect, but:
- Diagnosis is unbounded: In most cases, determining the fundamental cause of a defect is the most time consuming part. It also defies estimation. You can time box diagnosis time to limit your exposure, but that’s not the same as being able to provide an estimate. Each defect represents the potential to throw time down a hole.
- Workflow Impact: Your team is virtually guaranteed to be off busy on some new development or other project. They will likely have to shelve source code in a temporary state and shift back to a prior set of code to even diagnose the issue. Whatever the individual(s) involved were working on will need to stay on hold or be reassigned, complicating management and team productivity.
If your team doesn’t believe they can easily find the defect before they start looking into it, or they don’t believe it’s worth the effort, the defect is going to cost you more than time; it’ll cost you with the team. If it turns out the defect is easy to find, or while finding it the team discovers another issue they feel is more important then you’ll get lucky and face no longer term damage. More likely, the team members will add this to whatever other water they feel they’ve had to carry for you and the company. Before overruling your team, spend the time to either convince them that it’s worth it – or be convinced that it isn’t. If that doesn’t resolve the impasse, propose a time boxed approach. If you start with just a one day commitment to look into something this typically reduces resistance because you’ve eliminated the first problem (an unbounded time frame) and significantly reduced the workflow impact.
As your team looks into the problem, they will naturally come to believe that it’s both very important (because people perceive effort to equal value, so the more effort it takes the higher value it must have to justify that effort). This means they’ll tend to lose the perspective necessarily to objectively evaluate the risks and rewards of resolving the defect and deploying the fix. To help make these future conversations easier, don’t let the developers involved in the problem go too far off the reservation before revisiting the decision on whether it’s worth continuing to dig into the defect and what the potential upside to fixing it is.
Coming Next: The Resolution Perspective
Come back in a few days for the final post in this series, talking about the impact, difficulty and risks of resolving defects and deploying updates.
Tags: Defects, Mindset, Problem Management, product feedback, Risks, Software Development Process
Posted in Software Development | No Comments »
Defects: It Depends on Your Perspective.
Written by Kendall Miller on May 12, 2008 – 12:50 amYour product is out in the wild, and even better – it’s in use by real users. You’ve got feedback and support structures in place and they are producing results. Now you need to take that feedback and incorporate it back into the product. To do this, you’re going to have to navigate the social dynamics of your organization around defects.
Any product, software or otherwise, has defects. Your shop may have a nice term to paper over it – incident, problem, ticket, errata, trouble report…. But let’s not paper over it – a defect is a defect. Developers also like to split hairs between feature requests and defects, but from a user standpoint it’s all defects. Everyone involved in product development will have their own way to prioritize defects, and to get the best results from your team’s time you need to be able to figure out which ones to address and how fast – and do so in a way that generates buy in from your development team, management, and customers.
There are an endless number of way to look at prioritization, but however you do it the discussions should include several perspectives:
- Impact, Difficulty, and Risk of defect to the end user.
- Impact, Difficulty, and Risk of even diagnosing the defect.
- Impact, Difficulty, and Risk of defect correction and deployment.
The End User Perspective
Your customers in general want and expect a defect-free product. Even if your average customer understands that all software has defects, intellectual understanding won’t overcome the emotional impact of running into a problem. Your users will generally start from the perspective that their problem is a defect in your software, it shouldn’t have been there in the first place, and you need to fix it immediately. Today would be nice.
It is very difficult to understand the value a customer places on fixing a particular issue from within the development team. Developers tend to grade defects based on the effort it takes to fix them and whether they produce an outright failure of the software. For example, few developers will get worked up over fixing a cosmetic defect such as a misspelling or alignment problem; If it’s that simple to fix, how valuable can it be? The exception to the natural cognitive bias that a problem must be hard to be worthy are problems that can crash the application or cause it to corrupt data. Few developers won’t see this as a deadly sin that must be resolved regardless of cost or risk.
Customers have a different perspective. They see just the surface veneer of your product and assume that it can never corrupt data or crash. Outright crashes have gotten rare enough that most users will refer to an error message as a crash. Because they have no idea what’s happening inside the black box that is your product, they will judge it solely on what they can see: Does it act like other applications, do the things they can see look clean and well crafted? Much like making a spelling error on the title page of your term paper can cause the entire work to be devalued, a small cosmetic error on the user interface can cause customers to doubt the correctness of your entire application.
Many end users will tend to discount defects that require long steps to produce or go away by restarting the application as long as they can convince themselves that they are at fault. This happens more often than you might expect – most users believe they don’t understand the rules behind the application and instead are using a rote procedure to accomplish their tasks. When something goes wrong, they will generally go back and try it again – possibly many times – before coming to the conclusion that there just might be a problem with the software and not with them. If the defect only presents occasionally they will usually write it off as their fault. Lest you think this behavior is limited to nontechnical users, this happened to NASA and resulted in a several day delay of the first Space Shuttle Flight.
With rare exception, if a defect isn’t judged as essential to fix by your customers, it’s probably not worth addressing prior to the next routine release. Every change to the software has consequences and takes effort that could go into something more important – to your team and your customers.
Coming Next: The Diagnostic Perspective
Come back in a few days for the next post in this series, talking about the impact, difficulty and risks of diagnosing and resolving defects.
Tags: Defects, Mindset, Problem Management, product feedback, Risks, Software Development Process
Posted in Software Development | No Comments »
Effort doesn’t equal Value
Written by Kendall Miller on February 2, 2008 – 1:20 pm- Effort ≠ Value
Think about it for a few minutes and it seems patently obvious: Just because something’s difficult doesn’t mean it has great value. For example, if I want to mail 50 letters to clients and I put an individual stamp on each one instead of using an automatic postage machine I’ve achieved the same value: I can now send these letters to each of my customers. They’ll get there just as fast, the postage is just as valid. Therefore, if it takes me 15 minutes to put the stamps on one by one vs. about 1 minute to run it through a machine I’ve spent 15 times as much effort to achieve the same value.
It works in reverse as well: Just because something has great value doesn’t mean it’s intrinsically difficult. It may be exceedingly valuable to me to get a message to a client that lives on the other side of the country, and yet it’s really easy to do: In just a few seconds with my cell phone I can reach out any time of the day, from virtually anywhere. Low effort, high value.
Obvious, and yet we ignore the implications of this every day. We naturally assume that anything worthwhile takes effort, and that anything that takes a lot of effort was worthwhile.
Good examples of low effort, high value
Cosmetic defects are a classic example of this. It isn’t unusual at all to go through a new software application and find a substantial number of cosmetic defects: Alignment issues, inconsistencies in language (is it login, logon, or user id? Do you click or press that button?), spelling or language errors and a range of items that aren’t application behavioral issues (like tab order). Developers tend to instinctively minimize these issues: They’re trivial to resolve and they don’t prevent the application from working. They aren’t anywhere near whatever hideously complicated part of the system the developer is really worried about, and they’ll take no time to get right later. They can’t be that important, so development teams tend to not talk about them or work them. Even the term “cosmetic defect” is often used as a label for trivial or low value: “that’s no big deal, it’s just a cosmetic issue. Now let’s talk about that rare crash on every other leap year if you attempt to delete a customer with no records!”. This perspective isn’t even particularly unreasonable if you’re looking at the development process from a risk management perspective: You know the issues can be cleaned up quickly and without a lot of technical risk, and if you clean them all up now you’ll still have to do a recheck of the system before release because new ones will show up.
Now look at it from the standpoint of an end user of the system. The system is a black box: They don’t see the really artful code that figures out automatically when they enter a name as Last, First or First Last or how you managed to make a really fast look-ahead search system despite the large number of records you have to work with. Instead, they’ll see what’s right in front of them: The user experience of the application itself. If they start it up and notice immediately that things aren’t lined up vertically & horizontally or there are spelling errors it will bring rise to the classic line of reasoning that if you didn’t get this right, what hope is there that the black box is right? The more you protest that this is easy to fix the worse it gets: If you couldn’t get the easy to fix simple stuff right then there’s now no way the detailed 12 step process for determining how much to bill a client is going to be right, and the user is going to have to check it all before you regain their trust.
The good news is that this direction is the easiest to avoid as a manager or team member of a software development project. Once you’ve had the above experience once or twice, you will start to get wise and do a cosmetic issue pass at strategic points in the time line – usually just a few builds before it’s going to be seen by people outside of the team. You’ll be surprised at both how many items show up each time, and how easily they clean up. Then, while you’re sweating during the big demo about whether you’re going to get a runtime error you’ll at least have the comfort that what they are seeing while they’re waiting for the next page represents the good work your team did in a way that communicates to the average user that can’t see behind the curtain.
Good examples of high effort, low value
This trap is more dangerous and harder to avoid. At many points through the development process you’ll have opportunities to chose architectures, designs, algorithms and other items that will either increase or decrease the effort it’ll take to complete the project. You might chose to not use that built-in dialog to open files and instead make your own dialog because of one annoying behavior you really want to avoid. Or decide that you want to make a better column sizing routine for the grids you display so that you can avoid either trying to cram too much on a small screen or having acres of empty space on a large one. None of these are on their own bad ideas necessarily, and that’s part of the trap: Most development processes by design tend to focus team attention on the things that are hard, high risk, or just time consuming because these have the biggest ROI for project management activities. This reinforces our built in instincts to presume that the harder the work, the greater the value.
What this ignores is that the value is essentially constant regardless of effort: Any particular feature or capability has a set value in the eyes of the user. Our goal is to realize that value with if not the minimum effort then something that appears (prior to construction) to be the minimum effort that has an acceptable risk. In its most direct form, this means that the user places the same value on a five thousand line algorithm to determine optimal column width and using a method built into a control to get it right, as long as the outcome achieves their expectation.
<tip>Corollary: Be sure what’s important to you is also important to your users before investing a lot of time. Perhaps they don’t care if there’s a bunch of empty space on their 24″ widescreen monitor as much as you do. Get evidence commensurate to the effort you think it’ll take to resolve the issue.</tip>
Understand the trap
This issue tends to manifest itself in some classic ways. One is when a developer argues passionately in favor of a complicated algorithm even in the face of peer review that casts substantial doubt on its necessity. Typically the developer caught one small aspect of the problem and has ruthlessly optimized for it, and uses that one point as the proof of why simpler approaches don’t work (“If users are constantly switching back and forth between these two displays it’s 30% faster to do it this way than what’s built in”). These items also tend to be defect prone and difficult to explain to others.
Complicating this trap are a few factors:
- There are hard problems to solve: You can’t assume every hard problem is really an overcomplicated solution. Most applications will have at least two places where there is some real trickery and engineering to get the right result each and every time. If there weren’t, your users probably wouldn’t want the application in the first place.
- There are low value problems to solve: There are hard problems that have to be solved, and some of these are even relatively low value to the customer but are still a requirement. Consider this example: The customer places relatively low value on your application not crashing when they run it. Don’t get me wrong – if it starts crashing they will be very upset, but they simply assume that it won’t crash. All joking about Microsoft aside, any application you write is virtually guaranteed to be more crash happy than Microsoft Office is. So you’re going to end up investing a lot in something users don’t really place a lot of incremental value on.
- Developers are Optimists: Developers like hard problems (after all – hard problems are valuable problems according to our instincts) and want to solve them. They will underestimate the effort going in and overstate the value of the journey. If it’s a new problem, it’s unlikely that their estimate is particularly great even if they aren’t focused on why a particular complicated approach is necessary.
Striking the Balance
How to avoid this trap? First, For these problems to get out of hand it usually requires the ability for one or two developers to go off away from the herd for long enough to cook up a complicated idea, justify the effort to themselves, and then get far enough into the swamp to be in real trouble. Depending on your specific software development approach, find ways to catch the telltale signs before developers sink enough effort into the solution to get permanently attached to it.
Second, be fully prepared to throw out an already developed solution regardless of how much code or effort it is. In other words, the decision on whether or not to back up and take another approach should be largely blind to how many lines are being thrown out as long as they are all part of the same solution. Even at this point the instinctive desire to equate effort and value will creep into the entire team’s thinking: people will look at the large block of code and assume that it must be necessary, we’re just missing the subtlety of why it is the light & the way. This is what source code control is for (you do use source code control, don’t you?). It enables you to with no fear reject a bird in hand for a simpler bird that may converge faster, have fewer issues, and ultimately be a more cost effective way of providing the value your customers are expecting. Remember that even if you’ve taken a complicated implementation through initial unit testing, there is still a substantial investment that will be made in that code over time.
A production implementation is worth many theories
This article has been talking about high effort ways of achieving value in software, and the approach shouldn’t be generalized into applying to applications simply because they are large or complicated, or even any particular solution that is large and complicated – provided it got there incrementally over time. While it’s often tempting to look at a few hundred line block of code that does just one thing and think that in this age of objects, partial template classes, interfaces, and reflection there just has to be a cute, simple implementation that’s less than half the size and complexity of the current solution there are two key issues with this thinking:
- Large, stable code already achieved value: If the block of code is substantially stable and accepted by the customers then it has achieved its value and any effort spent on refactoring it that doesn’t also deliver more value to customers isn’t improving the value of the application
- Refactoring introduces defects: It’s virtually guaranteed that in the process of refactoring the existing routine sufficiently to make a good dent in its complexity you’re going to introduce some new defects just due to conceptual or implementation oversight during the process. It’s generally not considered a great justification to management that you introduced defects that then require expense to clean up in an effort to avoid possible future expense maintaining code.
- Code gets larger because it handles very subtle points: If the code organically grew over time to be a complicated routine, it probably did so because it was progressively asked to handle a number of interesting boundary cases that experience with the application proved necessary. In the minds of the users, these subtle behaviors can be some of the greatest value they place on your application – and yet they’ll never mention the feature when talking with you about it. Therefore, when you refactor it you have to preserve every little behavior, and that is often infeasible (with the exception of true defects in design of the original code – well isolated routines that can be replaced with provably equivalent code)
If it’s already in production and doesn’t have a critical flaw that the business needs addressed, leave it alone.
Clearly communicate the end-user value within your team
The best technique to avoid this problem is to make sure your development team has a tradition of discussing the end-user value of the work being done. This tradition would mean that anyone gets to clarify what the end user value of any work that’s being done is – that’s a free question never met with ridicule. It may take some practice to make sure the question is asked correctly, e.g. it’s asked in a way that gets the entire team to back up and be clear about why the work is important. Within that questioning, its important to make sure the discussion is based on what’s important to the users of the system, not the developers or other non-users. There are times to focus on maintenance or other non-user issues but with rare exception it isn’t the reason you’re writing the system.
With some practice, this will become a strong self-regulation mechanism for the team, ensuring that your discussions about design and approach are grounded in the needs of your customer. It creates a good mental yardstick for how much time to invest in a solution before going back to the customer to re-verify the requirement.
What’s your experience?
Have a good story to share? Have a critique? Post your comments or drop me a line to continue the conversation.
Tags: Cosmetic Defect, Effort, Requirements, Software Development Process, Yak Shaving
Posted in Process, Software Development | No Comments »