Latest Posts »
Latest Comments »
Popular Posts »

Walking the Walk – Gibraltar Moves You Down the Path

Written by Kendall Miller on June 19, 2009 – 3:29 am

kick it on DotNetKicks.com
If you’ve read more than one or two articles from Reliable Systems you probably have gotten the sense that we worry a lot about how to make things just work.   It’s that quality of anything where you get what you expect and what you need every time.  It can be in an experience (like a fun drive down a country road) or a product.  As a company if you can do this over and over you create a brand people develop a strong emotional connection to:  Apple, John Deere, Starbucks…

When you want to create a product that just works, you need to get all of the details right – from packaging through to maintenance and upkeep.  It’s not one thing that’s important, it’s all the things.  We are often engaged by senior management within a client when things aren’t working, and there’s conflicting opinions on why.  Usually along the path technology is being blamed: Not enough, not the latest thing, not someone’s favorite thing, not working.  As we dig into the situation, rarely is the technology the dominant factor:  More often, it’s how the technology is being integrated with the people and processes that all have to work together.

One of the first things we have to do in these engagements is to establish the real facts on the ground:  What exactly are the problems in the system, who’s doing what with it, how many times.  It comes down to establishing metrics to make sure time and attention are paid to the parts that make the biggest difference in the outcome.  Armed with these facts in a form the business can consume it’s possible to create plans of action that deliver virtually regardless of budget.

So let’s make this easier

The biggest trick is then getting the facts you need on an ongoing basis, easily, and in a form that the business can consume.  For over a decade we’ve been building instrumentation right into the systems we’ve worked on.  We’ve created a variety of toolkits to make this easier over the years, refining them as technology and our experience has changed.

About 18 months ago we decided it was time to really invest down this path.  We believe in routinely capturing key computer metrics along with whatever logging the application can do on its own.  We won’t do a project without using a great logging system that includes a strategy for managing runtime exceptions.   Now that we’re collecting all this data, we need to have a way of managing the raw data and turning it into valuable business data.

The challenge is that businesses don’t get up in the morning and say “what our customers want us to do is have great internal tools”, so you’re nearly always doing this on the cheap:  Borrowing time from development projects internally to cobble together various free or cheap solutions.  Frankly, we got tired of having to create new solutions with each client out of the margins of each project.  So, we pooled our best thinking from all of the work we’ve done (including a previous product that we did license to our clients over the past decade called CLAS) and started creating Gibraltar.

Rock Solid from Initial Release

With Gibraltar we wanted much more than a log system.  Of course, it had to be a log system too – and a really easy to use one that could work with each of our client applications.  More than that, it had to:

  • Automatically capture all of the performance metrics we wanted.
  • Integrate with existing logging available on the platform, including whatever a client might already be doing (like custom in-house options)
  • Be absolutely, positively, for sure safe to run in production no matter what.   That means it can’t ever use too much disk space or disk throughput or block the application.
  • Not use more than 5% of the performance of the app
  • Include all of the tools necessary to get data from where it was collected to the people that could get value out of it
  • Include the ability to look at the detailed session data up to high level analysis:  What’s the error rate?  What’s it correlate to?  Are we doing better or worse in this version?

From this initial sketch into everything we wanted, we’ve spent 18 months including four beta periods (from 2-4 months each) to refine the vision with real customers and real scenarios.  It was essential to us that this not be just a tool for techies but be ready for use by people with a wide range of skills.  It had to be pretty and just do what you wanted, when you wanted it to.

We’ve added a lot of capabilities along the way:  It can generate print-ready reports about application reliability that can communicate with senior management, you can define all kinds of custom metrics to easily track how your application is used and by whom.  We ran a number of betas to be sure that we had hit every goal we have above.  We’re happy to report that Gibraltar is in use within large deployments of custom applications, commercial applications, and small deployments right down to our corporate web site.

This tool isn’t for everyone – Our clients are nearly all Windows shops, and if they do any custom development it’s almost invariably in .NET, so that’s what we’ve targeted.   But, if you’re interested in easily getting real data on not just infrastructure (how well the application is running) but whether or not it just works, have we got an easy path for you.  You can see a quick demo video of how it works technically at Gibraltar Software.

You also don’t have to take my word for it at all, you can hear what one of our beta users did with it, which is really a more compelling story than what we might say.

I think you’ll find that our work sweating a lot of little details, from the exact design of the API and making sure the documentation was complete to rewriting our own licensing system to be very IT Admin friendly.  If we didn’t get a detail right, we want to know.  And the great news is that we’ve just begun:  We’re obsessed with the little things, and you can bet we’ll keep listening and watching to make it better.  Of course, this is made a lot easier because we’re using Gibraltar to monitor itself, and a select group of our users is sending that information back to us so we can make sure it just works in the field for real people.

It’s easy to start your journey

If you do development for Microsoft .NET, I’d encourage you to go over and download our commercial release of Gibraltar.  You’ll get great documentation, a free agent you can use like a flight recorder “black box” in every application you create, and a trial for a tool that will make you seem wise beyond your years.  And if you pay us the ultimate honor and purchase a permanent license, I can assure you that you won’t find anyone more committed to your satisfaction than we are.
kick it on DotNetKicks.com


Tags: , , , ,
Posted in Infrastructure, Monitoring, Software Development | No Comments »

What Happens when Engineers don’t Rule

Written by Kendall Miller on May 5, 2009 – 12:42 am

I’m an engineer at heart. I worry about all the little details of how something works technically. When I can, I go for the overengineered solution every time. We recently needed to get a Microphone Pre-amp to USB device. Instead of getting the plastic MAudio unit that probably works just great I got the USBPre at twice the price. Why? Just look at that case, it’s awesome:
usbpre_large
With a nice metal case like that, industrial strength construction – it’ll last forever! Of course, this thing will never leave my desk, so the ability to be run over by a truck is more or less academic.

So with my natural preference for hard core engineering I’d like to report that the best software comes from a group of driven software engineers. Technically, that may be true – a big group of engineers can make a very technically sophisticated product. But, really great products? Well, that requires a lot more than just technical excellence.

I think this is the backstory behind Vista’s successes and failures. We’ve been using Vista is our corporate OS since January of 2008, not long after it was widely available. It’s worked very well for us – even better since SP1. But again, we’re engineers: half of our systems are 64 bit, and we use high end hardware so we were very good candidates.

A Whole Lotta Polish

Last weekend I installed Windows 7. Now, even though I generally love new toys I haven’t been chomping at the bit to try out Windows 7 because Vista is working great for me, and we’ve had a lot of deadlines I didn’t want to risk. But, with the release of build 7100 last week, I couldn’t resist.

soap-and-bucketWhat’s the big difference between Windows 7 and Windows Vista? Polish. A whole lotta non-engineering polish. I was using the media center capabilities last night and noticing all of the little things that are completely irrelevant from an engineering / functional standpoint. These same things make all the difference in how you perceive the quality of the product and, more importantly the quality of the experience in using the product.

Is Build 7100 without issues? No – there are some optimization issues that I’ve run into, but they’re likely known already within Microsoft and they have months to refine them. The big picture is that the risky, time consuming design details are all there. I haven’t even turned off UAC yet, and I couldn’t live with that under Vista for more than two hours.

Now, it may be that if you’re creating the next version of SQL Server that this fundamentally human element of intuitive adjustment and polish isn’t as necessary. SQL Server could be all about hard core specifications, tests, and optimization. That’s reasonable when the human to product interface is either through a standard you can’t affect (e.g. T-SQL) or is confined to highly technical specialists.

Goes to Eleven

When you’re creating an application, you aren’t going to find the polish by reading a functional specification. You also aren’t going to get it just by using any particular development methodology – Agile, Waterfall, whatever. What you have to be willing to do is go beyond the written functional and system specification and look carefully at each aspect of the human – computer interface in your product.

This dedication requires a few things:

  1. Access to a User Experience (UX) / Human Computer Interface (HCI) specialist. These folks are experts not for facts and figures or things you can read in a book but their experience and practiced eye that lets them pick out the key details that make all the difference.
  2. Dedication to making it better: At each turn, and in very difficult moments, you’re going to have to repeatedly look at what you have and what you’ve done and say OK, how do we make this better. Take the case that we can leap beyond this, what would that look like?

goes-to-11Done right, this experience can be tortuous to engineers because it’s about iterating through hard to quantify, experimentally determined states without objective metrics to guide your process. You will see the results of your work – but as the sound of distant thunder as your users either rave more and more for what you’ve done or just accept meekly what you give them. Engineers are used to tweaking a knob and seeing the needle move in a quick, quantifiable way.

If you want to get a sense of what happens when people think deeply about how to create software that interacts well with people, read the Microsoft document on how to write an error dialog for Vista. This is 28 pages on how to do a good error message and why. Warnings? another 12 pages. Even if you’re a hard core engineer, some of the Vista User Experience Guidelines is a great read to understand why it takes many iterations and at least equal measure of instinct and intellect.

Fighting the Good Fight

The challenge with pushing for breakthroughs in the user experience with your product is that it doesn’t fit well into traditional engineering problem solving techniques. That may be why some of the most successful organizations at it have a strong command & control personality (like Apple) that emphasizing an individual making an intuitive judgment to decide what’s best. Trying to apply traditional engineering approaches will generally stifle and drive away the very talent that excels at solving these problems. Just ask Google. Their well respected expert on design and usability quit this year, saying:

I’m thankful for the opportunity I had to work at Google. I learned more than I thought I would…. But I won’t miss a design philosophy that lives or dies strictly by the sword of data.

The full text is an interesting read. Probably the most poignant example was testing what shade of blue should be used in a specific scenario. This is a good example of trusting your judgment, but don’t try to explain it. It’s a fundamentally human, intuitive leap and you might be able to rationalize it, but that doesn’t mean you can really explain it.

The best part is that if Microsoft is finally getting the message that it isn’t enough to just complete on business and engineering requirements but instead you have to battle for the hearts and minds of the people that use products it’s only good for everyone. Just like Linux has pushed Microsoft to be faster at evolving Windows (and creating more low cost licensing options), this may push players that are known for great design to have to up their game as well. I can’t wait.


Tags: , , ,
Posted in Software Development | 4 Comments »

Now where was I…

Written by Kendall Miller on April 17, 2009 – 4:09 pm

As you can tell from the timeline It’s been a while since I’ve posted anything.  It isn’t because I’ve had nothing to say – instead, I’ve been completely consumed by leading the team creating Gibraltar, a new application monitoring product for .NET teams that we’re launching.  You can download the latest version at  www.GibraltarSoftware.com.  We just published the last beta version of the product before the commercial release which is scheduled for June 1, 2009.

Bring a new product to market is really hard.  I’m sure you’ve heard that before – but however hard you think it is, it’s harder than that.  While we’re not quite across the finish line, there are a few things that have become readily apparent:

  • Commercial-grade quality takes a lot to achieve.  At each turn where you might normally say “well, users just shouldn’t do that” you can’t.  Things you otherwise solved through training you can’t.  It’s the difference in construction of a commercial Amp and the receiver you bought at Best Buy.  
  • Users won’t read anything.  We did a beta release where we posted in five places instructions for how to upgrade from the prior beta which required an extra step or things wouldn’t work.  We got deluged with calls about it not working from virtually every beta user; no one read any of the notes they saw, even in bold text in a yellow box in the middle of the screen.
  • Marketing involvement early and often:  The feedback from our first beta version was brutal; it told us that we were going in entirely the wrong direction because the users we were building the app for weren’t going to buy anything regardless of how singing & dancing it was.  We had to step back and go a whole different direction.  That would have been far more painful if we hadn’t been early in the process.

From here on out, I’ll be contributing to a separate blog articles that are focused on .NET software development and being part of a small Independent Software Vendor (ISV).  This site will focus in more on its original goal:  IT and business strategies for reliable systems.


Tags: ,
Posted in Software Development | No Comments »

Ignore what you know – Demand Results

Written by Kendall Miller on November 30, 2008 – 8:53 pm

Many if not most software project leaders came up through the development ranks.  It’s generally thought of as a distinct advantage – you know the technologies you’re using, you can form your own well reasoned opinions about how hard something is, what is possible, and how long it should take.  For a long time, I felt that the best way to get results from development teams was to use my experience and knowledge to be very understanding of the challenges they faced and give them whatever time they asked for.  However, in the last few years I’ve run into several situations where I just couldn’t get them the extra time or relief from the most problematic requirements.  I predicted doom to the projects in question but instead I observed some of the best outcomes I’d ever experienced.

While the projects were successful, it bothered me that the secret sauce seemed to be a rigid adherence to schedule and delivery more than any other consideration.  This was exactly the reverse of how I wanted projects to succeed:  I wanted them to succeed because I was treating the developers how they always wanted to be, not like a stereotype from Office Space.  How could it be that better results came from ignorance of the technical details involved?

Developers Will Use All Available Time

Upon reflection, the first thing that struck me was how much an immobile deadline focused discussions and decision making. If you give a team more time, they will expand their process to consume it.  Time will get consumed by:

  • Elaborate Decision Making: When you have little time, you make a choice and go with it until it appears it just can’t work.  When you have a lot of time, you sit back and look for the very best option.  That then requires defining what the best is – is it fastest, or smallest, or most scalable, or whatever.
  • Development Approach: Under pressure you’ll tend to go with the proven guaranteed approach.  If you have the luxury of time you’re more likely to engage in yak shaving like investigating a new tool or approach, or writing several prototypes first before you develop the real solution.  You might even just throw caution to the wind by skipping a formal design figuring you’ll have the time to just code and test your way to a solution.

The more time a development team has, the harder it is to argue against spending it on up front luxuries.  It also can be harder to argue for long term best practices because the team has the time now to develop a solution any way they want.

Unknowns Create Boomerang Estimates

Even very experienced developers are generally terrible at estimating the duration of developing a solution.  This has been demonstrated over and over by many other parties.  The key behavior that we’ve observed is the phenomenon that from when you approach a specific development problem (like displaying a graph on a web page) until you know exactly how you’re going to solve it (and have a reason for confidence in that approach) you will tend to estimate high because in effect the only reasonable estimate is infinity.

Put another way, as long as you don’t know how you will solve a problem you don’t know for sure that it is solvable which means it will take an infinite amount of time to solve it.  Fortunately, developers are almost universally optimists so they believe they can solve anything eventually – so they’ll pull out a standard answer like three weeks or months or whatever feels like a big chunk of time to figure out the problem but not so big that it kills the project.   The reality is that until you know how you’re going to solve it, it feels like it could take forever.

Once a solution has presented itself  the development team will often find that all it will take is some cleanup and polish to be done- a very small amount of time.  What will push the team to find the answer?  We’re back to the problem of elaborate decision making when you have the luxury of time.  Finding solutions tends to not be a linear problem that will be solved with incremental development energy.  Instead, it tends to be solved by getting people together and brainstorming possible solutions until you find a few candidates and can work out what it’ll take to prove them out.  Under pressure, people tend to focus their creative energy and be more willing to compromise.   That flexibility will tend to get rid of pet requirements and developer gold-plating and focus on the most critical aspects of the problem.

What’s the alternate approach?

The key is to not let your knowledge and experience as a developer lead you to buy into the stories the team creates around what’s reasonable to get done and how long it will take.  Instead, you have to stick with the project’s goals first then the facts of the project.  The project’s goals form the objective reality of what has to be accomplished for the project to survive:  Deliver this functionality by that date, keep these people informed, solve these problems without causing those problems.

When the team runs into a wall and needs more time, instead of buying into the story of needing a lot of time, set a specific and tight goal that keeps a solid amount of time pressure on the team to solve the issue and prevent the problems above from showing up.  Ideally, find a way to give out one or two day chunks to answer incremental questions if necessary to emphasize that time is precious and has to be invested carefully.  This is where you can leverage your experience in a way that a non-developer can’t:  The team knows they can’t snow you with tech details, and you can define a specific, measurable result that can be achieved in a short period of time that they can’t argue with.  Despite this, you are bound to have to assert a few times that the time limit is the limit – solve the problem in that time.  It’s very hard because you’ve been on the other side of that conversation and it can feel like you’re the Pointy Haired Boss, but it’s fundamentally your job on the project.

What will nearly always happen is the team will surprise itself – a solution will be presented within the team that they can live with and can be done in the time they have.  It may be incomplete or have some risky shortcomings, and you’ll want to ask how long it’d take to address those.  You probably shouldn’t address them in the first round, but the team will feel better that you’ve considered through things and will buy into the outcome more if you ask.  You’ll also want to make a record of it so that the team can in the future recognize what was a predicted shortcoming vs. an accidental defect.

Do you want it solved right?

This is a question that often gets voiced within a team as a rebuttal to external time pressures and is very dangerous.  The challenge is that most non-technical people don’t get the number of ways that a problem can be solved: instead, each problem appears to have a single solution.  Take away your technical knowledge and imagine you’re the paying customer:  What’s the alternative – were you going to solve it wrong? If that’s the case, what else have you done that’s garbage?  If you took your car to a repair person and they said it’d be $500 to fix it, then when you came back they said well, if you want it fixed right it’ll actually be $1200, wouldn’t you wonder what the hell the $500 fix was?

Usually this statement is uttered in desperation when a team believes they just need more time to figure out a problem.  Nobody wants a problem solved wrong.  Skip the hyperbole and get down to action:  break down the problem into small chunks of time that can be invested for a specific measurable result, and make sure the team gets that overage time is the most precious commodity.

Side Note: This is an advantage of SCRUM in practice.  If you’re following an Agile Development practice, particularly SCRUM, this fits right in:  Focus on making each sprint deliver the user stories it was supposed to even if you have to leave some special cases for a later sprint.  The daily stand up meetings are a great place for the different team members to apply team pressure against over engineering and doomsday estimates.

Cleaning Up and Closing Out

At some point you need to close out your release and ship it. For each of the areas where you’ve had to make compromises and taken shortcuts you have to choose to either:

  • Ship as Final: Decide the implementation is close enough to the intent of the end-user functional requirements that it can be the final implementation (at least until new information contradicts this decision)
  • Ship as Temporary: Decide that something is better than nothing and ship the feature with limitations.
  • Cut the Feature: Hold back the feature until it can be reconsidered or reimplemented.

You’re nearly always better off shipping the feature, often as a final feature pending more information because it’s very hard to gauge the true impact of each limitation.  This is particularly true of user-facing features and environments where it’s possible to evolve the software rapidly.  Inevitably once it’s in the hands of your users you’ll discover aspects of it that you didn’t think of that will require rework and you may discover that the killer feature you were sure would be the hit of the release is hardly used.  In either of these cases if you’ve invested a great deal of time in making it foolproof the team will tend to resist changing it.  It’s a natural product of the presumed relationship between effort and value.  If necessary, you might put in some temporary safeties to detect and catch the limitations you’re worried about.

The major exceptions to this approach are areas that are too dangerous to deploy if less than fully trustworthy.  For example, if your team is developing a data storage system, software deployment system, or other critical infrastructure your choices likely resolve down to making it as right as possible or holding the feature until it can be reworked.

If it turns out that the solutions that are viable within the schedule have significant limitations, you should make sure these caveats are known to the business – provided you can express them in business terms.  For example, knowing that an algorithm won’t work if your userbase doubles is probably not a significant caveat, unless you know the business plans to double in a relatively short period of time.  Every system has limits, and every software change has risks.  Business representatives don’t like to hear the same items covering the same ground repeated every time you discuss software, and it tends to make them not hear the new and important information as well as sound like you’re attempting to transfer accountability from your team to them.


Tags: , , ,
Posted in Management, Process, Software Development | 4 Comments »

Code Monkey Challenge

Written by Kendall Miller on August 29, 2008 – 8:35 pm

We spend most of our time deeply engaged in our client’s projects.  We work hard to assimilate quickly in to our client’s culture and challenges to make sure we can deliver the most value we can.  This doesn’t leave us with a lot of opportunities to do team building within our own company.  We’re committed that employees of eSymmetrix feel a part of something a lot more than just our clients – they are part of the eSymmetrix way of getting things done, which they can be proud of.

In the past several months the three partners of our firm have been scattered to handle all of the challenges of our growing business, but with the completion of a major engagement with a client we had the opportunity to spend some time back together to focus on more long range concerns.  We had intended to spend the time working on product and marketing strategy for our upcoming commercial release of Gibraltar, but instead on a whim decided to do something we called the Code Monkey Challenge.

The goal was to make something complete and useful within a very short period of time.  Complete for us meant we were going to create a software component, test it, package it, and publish it to a new web site with descriptive content all in 48 hours.  We didn’t even have a hosting company set up beforehand.  To be useful it had to be usable by people outside of our company as delivered, and fulfill a real need, at least for a set of users.

We started at 9:00 AM with the goal of publishing the final version to a new web site by 6:00 PM the following day.  Our initial approach was to have one hour sprints of work divided between each team member, then meet and review and set up the next sprint.  In the end we used three hour sprints, delivering to each other through our source code system at the end of each sprint.  We made our goal – you can see the results at www.gibraltarsoftware.com.  Most importantly, we achieved our result of bringing together our team members.

Update:  The small Logger we wrote as part of the Code Monkey Challenge has since been rewritten and incorporated into the Gibraltar Agent.

We eliminated any distraction we could so that we could apply as much of the 48 hours as possible to the project.  We worked well into the evenings and had others provide some support to make sure we had food, coffee, and no outside distractions.

The intense time pressure gave us a good tool to eliminate most conflict: there simply wasn’t time for much philosophy on how to approach the different technical aspects.  The lack of time removed ambiguity that otherwise would be the source of conflict.  There wasn’t much time for yak shaving either – we had to produce results at the end of each sprint.  Still the first few sprints were spent mostly in exploration and validation of the approaches with our first rough cut of each element done by the end of the first day.  The second day was then about refining and documenting.

While we did skip over some aspects that we would normally do on a larger scale project, we did include most elements of our preferred development process.  We did coordinated sprints of the team separated by an after action review to adjust our plan.  We distributed code and results between the team through our source code control system, we wrote automated unit tests to verify key aspects of the system, and did peer design reviews.  If anything, the lack of time made us less likely to bypass most parts of the process because we knew we wouldn’t have time to dig our way out of any problems we created.

The real goal of the exercise was to bring us back together as a leadership team and establish more of the shared experiences that define interpersonal relationships.  It’s helpful to bridge the gaps that easily develop between architects and developers, developers and designers, engineering and marketing.  We all had to come together to achieve our result because it was clear that we’d all fail if we couldn’t.  Most business projects are sufficiently fuzzy that it isn’t nearly as clear what the cost of not working together is, and politics can overwhelm cooperation.  It’s an experience I’d recommend in any software team, particularly if you’re in a company that can mix skills and responsibilities.

The next time your shop is done with a project, or when you need a break consider doing your own Code Monkey Challenge.  It only takes two days.  Here’s the rules:

  • Produce a Real Product: Scope out a product that is really useful to a specific audience.  Sure, you’re giving it away for free, but it still needs to stand up to scrutiny.
  • Marketing Material Too: What good is a product if no one knows about it or understand how or why to use it?  Even though you’re giving it away you want people to be able to understand and take advantage of your effort.
  • Help and Usage Information: Like a real product, it has to be usable without you sitting over the customer’s shoulders.  Depending on what it is, you need to tell them how to install it, program with it, and provide guidance on recommended usage scenarios.
  • In Little Time: Extra time is the enemy – it will reduce the pressure to work together and encourage unnecessary bells and whistles while removing the catalyst that gets everyone to work together.
  • Publish to the World: If you’re  a large company, perhaps it’s just the company.  Wherever it is, it has to feel like real stakes to the team:  Your results will be on display.

To set it up, be sure that you can eliminate anything that would distract the team – they can work into the evenings, have food brought in, whatever.  The closer it is to a total immersion experience the better.  It improves the sense of camerarderie developed within the team and ensures the most creative energy is directed to the project. 

If you try it out, or have another software team building story, please drop me a line and tell me how it went, or leave a comment below.


Tags: , ,
Posted in Management, Software Development | No Comments »