Archive for Software Development Process
Philly Code Camp Presentations
Posted by: | CommentsThanks to everyone for coming to my sessions and the organizers for making the event run so well. The facility was great and it’s really quite remarkable that the community can have such a strong one day training event without having to charge participants. Microsoft was there to help out, but it was clearly a training event for the community not a Microsoft press event, exactly as it should be.
We had fun talking with the other vendors at the show, most notably Component One and the folks from CapTech. This is the first time we’ve had a vendor booth at an event like this, so in many ways it was a dry run to get the hang of what it’s like. We gave away a full copy of Gibraltar Analyst as well as a year’s subscription to Hub at the conference, and got to talk with a lot of people about both Gibraltar and VistaDB. We got some great real world examples of where VistaDB fits, which is a big help as we work on the marketing for that going forward.
I presented two sessions -
A Year in the Life of an ISV
If you’re thinking about what it’d be like to ditch your corporate development job or consultant gig and strike to create & market your own product (Or you’re a consultancy looking to create a product to diversify) this presentation shows what to expect on the path from shipping your first version to business success.
This one’s always a little risky at a code camp because, well… there’s no code. But, with the incredible diversity of tracks that were available at Philly Code Camp (13 tracks, over 60 sessions…) I think it’s also good to be able to “take a break”. Next time I might go for the last session of the day to maximize the value of that.
Designing APIs for Others
I covered a range of real world lessons about commercial API development emphasizing the differences between in-house & internal development and great, reusable commercial libraries.
I got some great feedback on this talk, particularly on an example that broke my own rule about samples: I tried to over-simplify it and instead created a “not best practice” sample. I’ll fix that for next time!
If you saw either presentation, please be sure to fill out the conference evaluation and I’d love to hear your feedback – drop it in the comments below or send it to me directly at kendall.miller@gibraltarsoftware.com.
If you’d be interested in having me come talk at your code camp, .NET Users Group, or event – please reach out and let me know. I’m always looking for new & better ways to engage with the community.
Walking the Walk – Gibraltar Moves You Down the Path
Posted by: | Comments
If you’ve read more than one or two articles from Reliable Systems you probably have gotten the sense that we worry a lot about how to make things just work. It’s that quality of anything where you get what you expect and what you need every time. It can be in an experience (like a fun drive down a country road) or a product. As a company if you can do this over and over you create a brand people develop a strong emotional connection to: Apple, John Deere, Starbucks…
When you want to create a product that just works, you need to get all of the details right – from packaging through to maintenance and upkeep. It’s not one thing that’s important, it’s all the things. We are often engaged by senior management within a client when things aren’t working, and there’s conflicting opinions on why. Usually along the path technology is being blamed: Not enough, not the latest thing, not someone’s favorite thing, not working. As we dig into the situation, rarely is the technology the dominant factor: More often, it’s how the technology is being integrated with the people and processes that all have to work together.
One of the first things we have to do in these engagements is to establish the real facts on the ground: What exactly are the problems in the system, who’s doing what with it, how many times. It comes down to establishing metrics to make sure time and attention are paid to the parts that make the biggest difference in the outcome. Armed with these facts in a form the business can consume it’s possible to create plans of action that deliver virtually regardless of budget.
So let’s make this easier
The biggest trick is then getting the facts you need on an ongoing basis, easily, and in a form that the business can consume. For over a decade we’ve been building instrumentation right into the systems we’ve worked on. We’ve created a variety of toolkits to make this easier over the years, refining them as technology and our experience has changed.
About 18 months ago we decided it was time to really invest down this path. We believe in routinely capturing key computer metrics along with whatever logging the application can do on its own. We won’t do a project without using a great logging system that includes a strategy for managing runtime exceptions. Now that we’re collecting all this data, we need to have a way of managing the raw data and turning it into valuable business data.
The challenge is that businesses don’t get up in the morning and say “what our customers want us to do is have great internal tools”, so you’re nearly always doing this on the cheap: Borrowing time from development projects internally to cobble together various free or cheap solutions. Frankly, we got tired of having to create new solutions with each client out of the margins of each project. So, we pooled our best thinking from all of the work we’ve done (including a previous product that we did license to our clients over the past decade called CLAS) and started creating Gibraltar.
Rock Solid from Initial Release
With Gibraltar we wanted much more than a log system. Of course, it had to be a log system too – and a really easy to use one that could work with each of our client applications. More than that, it had to:
- Automatically capture all of the performance metrics we wanted.
- Integrate with existing logging available on the platform, including whatever a client might already be doing (like custom in-house options)
- Be absolutely, positively, for sure safe to run in production no matter what. That means it can’t ever use too much disk space or disk throughput or block the application.
- Not use more than 5% of the performance of the app
- Include all of the tools necessary to get data from where it was collected to the people that could get value out of it
- Include the ability to look at the detailed session data up to high level analysis: What’s the error rate? What’s it correlate to? Are we doing better or worse in this version?
From this initial sketch into everything we wanted, we’ve spent 18 months including four beta periods (from 2-4 months each) to refine the vision with real customers and real scenarios. It was essential to us that this not be just a tool for techies but be ready for use by people with a wide range of skills. It had to be pretty and just do what you wanted, when you wanted it to.
We’ve added a lot of capabilities along the way: It can generate print-ready reports about application reliability that can communicate with senior management, you can define all kinds of custom metrics to easily track how your application is used and by whom. We ran a number of betas to be sure that we had hit every goal we have above. We’re happy to report that Gibraltar is in use within large deployments of custom applications, commercial applications, and small deployments right down to our corporate web site.
This tool isn’t for everyone – Our clients are nearly all Windows shops, and if they do any custom development it’s almost invariably in .NET, so that’s what we’ve targeted. But, if you’re interested in easily getting real data on not just infrastructure (how well the application is running) but whether or not it just works, have we got an easy path for you. You can see a quick demo video of how it works technically at Gibraltar Software.
You also don’t have to take my word for it at all, you can hear what one of our beta users did with it, which is really a more compelling story than what we might say.
I think you’ll find that our work sweating a lot of little details, from the exact design of the API and making sure the documentation was complete to rewriting our own licensing system to be very IT Admin friendly. If we didn’t get a detail right, we want to know. And the great news is that we’ve just begun: We’re obsessed with the little things, and you can bet we’ll keep listening and watching to make it better. Of course, this is made a lot easier because we’re using Gibraltar to monitor itself, and a select group of our users is sending that information back to us so we can make sure it just works in the field for real people.
It’s easy to start your journey
If you do development for Microsoft .NET, I’d encourage you to go over and download our commercial release of Gibraltar. You’ll get great documentation, a free agent you can use like a flight recorder “black box” in every application you create, and a trial for a tool that will make you seem wise beyond your years. And if you pay us the ultimate honor and purchase a permanent license, I can assure you that you won’t find anyone more committed to your satisfaction than we are.
What Happens when Engineers don’t Rule
Posted by: | CommentsI’m an engineer at heart. I worry about all the little details of how something works technically. When I can, I go for the overengineered solution every time. We recently needed to get a Microphone Pre-amp to USB device. Instead of getting the plastic MAudio unit that probably works just great I got the USBPre at twice the price. Why? Just look at that case, it’s awesome:

With a nice metal case like that, industrial strength construction – it’ll last forever! Of course, this thing will never leave my desk, so the ability to be run over by a truck is more or less academic.
So with my natural preference for hard core engineering I’d like to report that the best software comes from a group of driven software engineers. Technically, that may be true – a big group of engineers can make a very technically sophisticated product. But, really great products? Well, that requires a lot more than just technical excellence.
I think this is the backstory behind Vista’s successes and failures. We’ve been using Vista is our corporate OS since January of 2008, not long after it was widely available. It’s worked very well for us – even better since SP1. But again, we’re engineers: half of our systems are 64 bit, and we use high end hardware so we were very good candidates.
A Whole Lotta Polish
Last weekend I installed Windows 7. Now, even though I generally love new toys I haven’t been chomping at the bit to try out Windows 7 because Vista is working great for me, and we’ve had a lot of deadlines I didn’t want to risk. But, with the release of build 7100 last week, I couldn’t resist.
What’s the big difference between Windows 7 and Windows Vista? Polish. A whole lotta non-engineering polish. I was using the media center capabilities last night and noticing all of the little things that are completely irrelevant from an engineering / functional standpoint. These same things make all the difference in how you perceive the quality of the product and, more importantly the quality of the experience in using the product.
Is Build 7100 without issues? No – there are some optimization issues that I’ve run into, but they’re likely known already within Microsoft and they have months to refine them. The big picture is that the risky, time consuming design details are all there. I haven’t even turned off UAC yet, and I couldn’t live with that under Vista for more than two hours.
Now, it may be that if you’re creating the next version of SQL Server that this fundamentally human element of intuitive adjustment and polish isn’t as necessary. SQL Server could be all about hard core specifications, tests, and optimization. That’s reasonable when the human to product interface is either through a standard you can’t affect (e.g. T-SQL) or is confined to highly technical specialists.
Goes to Eleven
When you’re creating an application, you aren’t going to find the polish by reading a functional specification. You also aren’t going to get it just by using any particular development methodology – Agile, Waterfall, whatever. What you have to be willing to do is go beyond the written functional and system specification and look carefully at each aspect of the human – computer interface in your product.
This dedication requires a few things:
- Access to a User Experience (UX) / Human Computer Interface (HCI) specialist. These folks are experts not for facts and figures or things you can read in a book but their experience and practiced eye that lets them pick out the key details that make all the difference.
- Dedication to making it better: At each turn, and in very difficult moments, you’re going to have to repeatedly look at what you have and what you’ve done and say OK, how do we make this better. Take the case that we can leap beyond this, what would that look like?
Done right, this experience can be tortuous to engineers because it’s about iterating through hard to quantify, experimentally determined states without objective metrics to guide your process. You will see the results of your work – but as the sound of distant thunder as your users either rave more and more for what you’ve done or just accept meekly what you give them. Engineers are used to tweaking a knob and seeing the needle move in a quick, quantifiable way.
If you want to get a sense of what happens when people think deeply about how to create software that interacts well with people, read the Microsoft document on how to write an error dialog for Vista. This is 28 pages on how to do a good error message and why. Warnings? another 12 pages. Even if you’re a hard core engineer, some of the Vista User Experience Guidelines is a great read to understand why it takes many iterations and at least equal measure of instinct and intellect.
Fighting the Good Fight
The challenge with pushing for breakthroughs in the user experience with your product is that it doesn’t fit well into traditional engineering problem solving techniques. That may be why some of the most successful organizations at it have a strong command & control personality (like Apple) that emphasizing an individual making an intuitive judgment to decide what’s best. Trying to apply traditional engineering approaches will generally stifle and drive away the very talent that excels at solving these problems. Just ask Google. Their well respected expert on design and usability quit this year, saying:
I’m thankful for the opportunity I had to work at Google. I learned more than I thought I would…. But I won’t miss a design philosophy that lives or dies strictly by the sword of data.
The full text is an interesting read. Probably the most poignant example was testing what shade of blue should be used in a specific scenario. This is a good example of trusting your judgment, but don’t try to explain it. It’s a fundamentally human, intuitive leap and you might be able to rationalize it, but that doesn’t mean you can really explain it.
The best part is that if Microsoft is finally getting the message that it isn’t enough to just complete on business and engineering requirements but instead you have to battle for the hearts and minds of the people that use products it’s only good for everyone. Just like Linux has pushed Microsoft to be faster at evolving Windows (and creating more low cost licensing options), this may push players that are known for great design to have to up their game as well. I can’t wait.
Now where was I…
Posted by: | CommentsAs you can tell from the timeline It’s been a while since I’ve posted anything. It isn’t because I’ve had nothing to say – instead, I’ve been completely consumed by leading the team creating Gibraltar, a new application monitoring product for .NET teams that we’re launching. You can download the latest version at www.GibraltarSoftware.com. We just published the last beta version of the product before the commercial release which is scheduled for June 1, 2009.
Bring a new product to market is really hard. I’m sure you’ve heard that before – but however hard you think it is, it’s harder than that. While we’re not quite across the finish line, there are a few things that have become readily apparent:
- Commercial-grade quality takes a lot to achieve. At each turn where you might normally say “well, users just shouldn’t do that” you can’t. Things you otherwise solved through training you can’t. It’s the difference in construction of a commercial Amp and the receiver you bought at Best Buy.
- Users won’t read anything. We did a beta release where we posted in five places instructions for how to upgrade from the prior beta which required an extra step or things wouldn’t work. We got deluged with calls about it not working from virtually every beta user; no one read any of the notes they saw, even in bold text in a yellow box in the middle of the screen.
- Marketing involvement early and often: The feedback from our first beta version was brutal; it told us that we were going in entirely the wrong direction because the users we were building the app for weren’t going to buy anything regardless of how singing & dancing it was. We had to step back and go a whole different direction. That would have been far more painful if we hadn’t been early in the process.
From here on out, I’ll be contributing to a separate blog articles that are focused on .NET software development and being part of a small Independent Software Vendor (ISV). This site will focus in more on its original goal: IT and business strategies for reliable systems.
Ignore what you know – Demand Results
Posted by: | CommentsMany if not most software project leaders came up through the development ranks. It’s generally thought of as a distinct advantage – you know the technologies you’re using, you can form your own well reasoned opinions about how hard something is, what is possible, and how long it should take. For a long time, I felt that the best way to get results from development teams was to use my experience and knowledge to be very understanding of the challenges they faced and give them whatever time they asked for. However, in the last few years I’ve run into several situations where I just couldn’t get them the extra time or relief from the most problematic requirements. I predicted doom to the projects in question but instead I observed some of the best outcomes I’d ever experienced.
While the projects were successful, it bothered me that the secret sauce seemed to be a rigid adherence to schedule and delivery more than any other consideration. This was exactly the reverse of how I wanted projects to succeed: I wanted them to succeed because I was treating the developers how they always wanted to be, not like a stereotype from Office Space. How could it be that better results came from ignorance of the technical details involved?
Developers Will Use All Available Time
Upon reflection, the first thing that struck me was how much an immobile deadline focused discussions and decision making. If you give a team more time, they will expand their process to consume it. Time will get consumed by:
- Elaborate Decision Making: When you have little time, you make a choice and go with it until it appears it just can’t work. When you have a lot of time, you sit back and look for the very best option. That then requires defining what the best is – is it fastest, or smallest, or most scalable, or whatever.
- Development Approach: Under pressure you’ll tend to go with the proven guaranteed approach. If you have the luxury of time you’re more likely to engage in yak shaving like investigating a new tool or approach, or writing several prototypes first before you develop the real solution. You might even just throw caution to the wind by skipping a formal design figuring you’ll have the time to just code and test your way to a solution.
The more time a development team has, the harder it is to argue against spending it on up front luxuries. It also can be harder to argue for long term best practices because the team has the time now to develop a solution any way they want.
Unknowns Create Boomerang Estimates
Even very experienced developers are generally terrible at estimating the duration of developing a solution. This has been demonstrated over and over by many other parties. The key behavior that we’ve observed is the phenomenon that from when you approach a specific development problem (like displaying a graph on a web page) until you know exactly how you’re going to solve it (and have a reason for confidence in that approach) you will tend to estimate high because in effect the only reasonable estimate is infinity.
Put another way, as long as you don’t know how you will solve a problem you don’t know for sure that it is solvable which means it will take an infinite amount of time to solve it. Fortunately, developers are almost universally optimists so they believe they can solve anything eventually – so they’ll pull out a standard answer like three weeks or months or whatever feels like a big chunk of time to figure out the problem but not so big that it kills the project. The reality is that until you know how you’re going to solve it, it feels like it could take forever.
Once a solution has presented itself the development team will often find that all it will take is some cleanup and polish to be done- a very small amount of time. What will push the team to find the answer? We’re back to the problem of elaborate decision making when you have the luxury of time. Finding solutions tends to not be a linear problem that will be solved with incremental development energy. Instead, it tends to be solved by getting people together and brainstorming possible solutions until you find a few candidates and can work out what it’ll take to prove them out. Under pressure, people tend to focus their creative energy and be more willing to compromise. That flexibility will tend to get rid of pet requirements and developer gold-plating and focus on the most critical aspects of the problem.
What’s the alternate approach?
The key is to not let your knowledge and experience as a developer lead you to buy into the stories the team creates around what’s reasonable to get done and how long it will take. Instead, you have to stick with the project’s goals first then the facts of the project. The project’s goals form the objective reality of what has to be accomplished for the project to survive: Deliver this functionality by that date, keep these people informed, solve these problems without causing those problems.
When the team runs into a wall and needs more time, instead of buying into the story of needing a lot of time, set a specific and tight goal that keeps a solid amount of time pressure on the team to solve the issue and prevent the problems above from showing up. Ideally, find a way to give out one or two day chunks to answer incremental questions if necessary to emphasize that time is precious and has to be invested carefully. This is where you can leverage your experience in a way that a non-developer can’t: The team knows they can’t snow you with tech details, and you can define a specific, measurable result that can be achieved in a short period of time that they can’t argue with. Despite this, you are bound to have to assert a few times that the time limit is the limit – solve the problem in that time. It’s very hard because you’ve been on the other side of that conversation and it can feel like you’re the Pointy Haired Boss, but it’s fundamentally your job on the project.
What will nearly always happen is the team will surprise itself – a solution will be presented within the team that they can live with and can be done in the time they have. It may be incomplete or have some risky shortcomings, and you’ll want to ask how long it’d take to address those. You probably shouldn’t address them in the first round, but the team will feel better that you’ve considered through things and will buy into the outcome more if you ask. You’ll also want to make a record of it so that the team can in the future recognize what was a predicted shortcoming vs. an accidental defect.
Do you want it solved right?
This is a question that often gets voiced within a team as a rebuttal to external time pressures and is very dangerous. The challenge is that most non-technical people don’t get the number of ways that a problem can be solved: instead, each problem appears to have a single solution. Take away your technical knowledge and imagine you’re the paying customer: What’s the alternative – were you going to solve it wrong? If that’s the case, what else have you done that’s garbage? If you took your car to a repair person and they said it’d be $500 to fix it, then when you came back they said well, if you want it fixed right it’ll actually be $1200, wouldn’t you wonder what the hell the $500 fix was?
Usually this statement is uttered in desperation when a team believes they just need more time to figure out a problem. Nobody wants a problem solved wrong. Skip the hyperbole and get down to action: break down the problem into small chunks of time that can be invested for a specific measurable result, and make sure the team gets that overage time is the most precious commodity.
Side Note: This is an advantage of SCRUM in practice. If you’re following an Agile Development practice, particularly SCRUM, this fits right in: Focus on making each sprint deliver the user stories it was supposed to even if you have to leave some special cases for a later sprint. The daily stand up meetings are a great place for the different team members to apply team pressure against over engineering and doomsday estimates.
Cleaning Up and Closing Out
At some point you need to close out your release and ship it. For each of the areas where you’ve had to make compromises and taken shortcuts you have to choose to either:
- Ship as Final: Decide the implementation is close enough to the intent of the end-user functional requirements that it can be the final implementation (at least until new information contradicts this decision)
- Ship as Temporary: Decide that something is better than nothing and ship the feature with limitations.
- Cut the Feature: Hold back the feature until it can be reconsidered or reimplemented.
You’re nearly always better off shipping the feature, often as a final feature pending more information because it’s very hard to gauge the true impact of each limitation. This is particularly true of user-facing features and environments where it’s possible to evolve the software rapidly. Inevitably once it’s in the hands of your users you’ll discover aspects of it that you didn’t think of that will require rework and you may discover that the killer feature you were sure would be the hit of the release is hardly used. In either of these cases if you’ve invested a great deal of time in making it foolproof the team will tend to resist changing it. It’s a natural product of the presumed relationship between effort and value. If necessary, you might put in some temporary safeties to detect and catch the limitations you’re worried about.
The major exceptions to this approach are areas that are too dangerous to deploy if less than fully trustworthy. For example, if your team is developing a data storage system, software deployment system, or other critical infrastructure your choices likely resolve down to making it as right as possible or holding the feature until it can be reworked.
If it turns out that the solutions that are viable within the schedule have significant limitations, you should make sure these caveats are known to the business – provided you can express them in business terms. For example, knowing that an algorithm won’t work if your userbase doubles is probably not a significant caveat, unless you know the business plans to double in a relatively short period of time. Every system has limits, and every software change has risks. Business representatives don’t like to hear the same items covering the same ground repeated every time you discuss software, and it tends to make them not hear the new and important information as well as sound like you’re attempting to transfer accountability from your team to them.