<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Reliable Systems &#187; Infrastructure</title>
	<atom:link href="http://reliable.esymmetrix.com/category/infrastructure/feed" rel="self" type="application/rss+xml" />
	<link>http://reliable.esymmetrix.com</link>
	<description>People, Processes, Hardware and Software that deliver results every time, every where.</description>
	<lastBuildDate>Thu, 16 Jul 2009 15:47:03 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Walking the Walk &#8211; Gibraltar Moves You Down the Path</title>
		<link>http://reliable.esymmetrix.com/development/walking-the-walk-gibraltar-moves-you-down-the-path</link>
		<comments>http://reliable.esymmetrix.com/development/walking-the-walk-gibraltar-moves-you-down-the-path#comments</comments>
		<pubDate>Fri, 19 Jun 2009 07:29:10 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Gibraltar]]></category>
		<category><![CDATA[IT Management]]></category>
		<category><![CDATA[product feedback]]></category>
		<category><![CDATA[Software Development Process]]></category>

		<guid isPermaLink="false">http://reliable.esymmetrix.com/?p=229</guid>
		<description><![CDATA[f you do development for Microsoft .NET, I'd encourage you to go over and download our commercial release of Gibraltar.  You'll get great documentation, a free agent you can use like a flight recorder "black box" in every application you create, and a trial for a tool that will make you seem wise beyond your years.  ]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2freliable.esymmetrix.com%2fdevelopment%2fwalking-the-walk-gibraltar-moves-you-down-the-path"><img src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2freliable.esymmetrix.com%2fdevelopment%2fwalking-the-walk-gibraltar-moves-you-down-the-path&#038;bgcolor=FF9900&#038;cfgcolor=FFFFFF&#038;cbgcolor=175d92" border="0" alt="kick it on DotNetKicks.com" /></a><br />
If you&#8217;ve read more than one or two articles from Reliable Systems you probably have gotten the sense that we worry a lot about how to <strong>make things just work</strong>.   It&#8217;s that quality of anything where you get what you expect and what you need every time.  It can be in an experience (like a fun drive down a country road) or a product.  As a company if you can do this over and over you create a brand people develop a strong emotional connection to:  Apple, John Deere, Starbucks&#8230;</p>
<p>When you want to create a product that just works, you need to get all of the details right &#8211; from packaging through to maintenance and upkeep.  It&#8217;s not one thing that&#8217;s important, it&#8217;s all the things.  We are often engaged by senior management within a client when things aren&#8217;t working, and there&#8217;s conflicting opinions on why.  Usually along the path technology is being blamed: Not enough, not the latest thing, not someone&#8217;s favorite thing, <em>not working</em>.  As we dig into the situation, rarely is the technology the dominant factor:  More often, it&#8217;s how the technology is being integrated with the people and processes that all have to work together.</p>
<p>One of the first things we have to do in these engagements is to establish the real facts on the ground:  What exactly are the problems in the system, who&#8217;s doing what with it, how many times.  It comes down to establishing metrics to make sure time and attention are paid to the parts that make the biggest difference in the outcome.  Armed with these facts in a form the business can consume it&#8217;s possible to create plans of action that deliver virtually regardless of budget.</p>
<h2>So let&#8217;s make this easier</h2>
<p>The biggest trick is then getting the facts you need on an ongoing basis, easily, and in a form that the business can consume.  For over a decade we&#8217;ve been building instrumentation right into the systems we&#8217;ve worked on.  We&#8217;ve created a variety of toolkits to make this easier over the years, refining them as technology and our experience has changed.</p>
<p>About 18 months ago we decided it was time to really invest down this path.  We believe in routinely <a title="Reliable Systems: Key Infrastructure Information to Capture" href="http://reliable.esymmetrix.com/infrastructure/monitoring/key-infrastructure-information-to-capture">capturing key computer metrics</a> along with whatever logging the application can do on its own.  We won&#8217;t do a project without using a great logging system that includes a strategy for managing runtime exceptions.   Now that we&#8217;re collecting all this data, we need to have a way of managing the raw data and turning it into valuable business data.</p>
<p>The challenge is that businesses don&#8217;t get up in the morning and say &#8220;what our customers want us to do is have great internal tools&#8221;, so you&#8217;re nearly always doing this on the cheap:  Borrowing time from development projects internally to cobble together various free or cheap solutions.  Frankly, we got tired of having to create new solutions with each client out of the margins of each project.  So, we pooled our best thinking from all of the work we&#8217;ve done (including a previous product that we did license to our clients over the past decade called CLAS) and started creating Gibraltar.</p>
<h2>Rock Solid from Initial Release</h2>
<p>With Gibraltar we wanted much more than a log system.  Of course, it had to be a log system too &#8211; and a really easy to use one that could work with each of our client applications.  More than that, it had to:</p>
<ul>
<li>Automatically capture all of the performance metrics we wanted.</li>
<li>Integrate with existing logging available on the platform, including whatever a client might already be doing (like custom in-house options)</li>
<li>Be absolutely, positively, for sure safe to run in production no matter what.   That means it can&#8217;t ever use too much disk space or disk throughput or block the application.</li>
<li>Not use more than 5% of the performance of the app</li>
<li>Include all of the tools necessary to get data from where it was collected to the people that could get value out of it</li>
<li>Include the ability to look at the detailed session data up to high level analysis:  What&#8217;s the error rate?  What&#8217;s it correlate to?  Are we doing better or worse in this version?</li>
</ul>
<p>From this initial sketch into everything we wanted, we&#8217;ve spent 18 months including four beta periods (from 2-4 months each) to refine the vision with real customers and real scenarios.  It was essential to us that this not be just a tool for techies but be ready for use by people with a wide range of skills.  It had to be pretty and just do what you wanted, when you wanted it to.</p>
<p>We&#8217;ve added a lot of capabilities along the way:  It can generate print-ready reports about application reliability that can communicate with senior management, you can define all kinds of custom metrics to easily track how your application is used and by whom.  We ran a number of betas to be sure that we had hit every goal we have above.  We&#8217;re happy to report that Gibraltar is in use within large deployments of custom applications, commercial applications, and small deployments right down to our corporate web site.</p>
<p>This tool isn&#8217;t for everyone &#8211; Our clients are nearly all Windows shops, and if they do any custom development it&#8217;s almost invariably in .NET, so that&#8217;s what we&#8217;ve targeted.   But, if you&#8217;re interested in easily getting real data on not just infrastructure (how well the application is running) but whether or not it <em>just works</em>, have we got an easy path for you.  You can see a quick demo video of how it works technically at <a title="See Gibraltar .NET Logging and Metrics Integration" href="http://www.gibraltarsoftware.com/See/Default.aspx" target="_self">Gibraltar Software</a>.</p>
<p>You also don&#8217;t have to take my word for it at all, you can <a title="VistaDB:  Gibraltar opens beta to new logging and reporting tool" href="http://www.vistadb.net/blog/post/2009/04/20/Gibraltar-opens-beta-to-new-logging-and-reporting-tool.aspx" target="_self">hear what one of our beta users did with it</a>, which is really a more compelling story than what we might say.</p>
<p>I think you&#8217;ll find that our work sweating a lot of little details, from <a title="Intellisense Driven API Design" href="http://rocksolid.gibraltarsoftware.com/development/dotnet/intellisense-driven-api-design" target="_self">the exact design of the API</a> and making sure the <a title="Good Help is Hard to Find" href="http://rocksolid.gibraltarsoftware.com/development/good-help-is-hard-to-find" target="_self">documentation was complete</a> to rewriting our own licensing system to be very IT Admin friendly.  If we didn&#8217;t get a detail right, <a title="Gibraltar Software: Contact Us" href="http://www.gibraltarsoftware.com/About/Contact.aspx" target="_blank">we want to know</a>.  And the great news is that we&#8217;ve just begun:  We&#8217;re obsessed with the little things, and you can bet we&#8217;ll keep listening and watching to make it better.  Of course, this is made a lot easier because we&#8217;re using Gibraltar to monitor itself, and a select group of our users is sending that information back to us so we can make sure it j<strong>ust works in the field for real people</strong>.</p>
<h2>It&#8217;s easy to start your journey</h2>
<p>If you do development for Microsoft .NET, I&#8217;d encourage you to go over and download our commercial release of Gibraltar.  You&#8217;ll get great documentation, a free agent you can use like a flight recorder &#8220;black box&#8221; in every application you create, and a trial for a tool that will make you seem wise beyond your years.  And if you pay us the ultimate honor and purchase a permanent license, I can assure you that you won&#8217;t find anyone more committed to your satisfaction than we are.<br />
<a href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2freliable.esymmetrix.com%2fdevelopment%2fwalking-the-walk-gibraltar-moves-you-down-the-path"><img src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2freliable.esymmetrix.com%2fdevelopment%2fwalking-the-walk-gibraltar-moves-you-down-the-path&#038;bgcolor=FF9900&#038;cfgcolor=FFFFFF&#038;cbgcolor=175d92" border="0" alt="kick it on DotNetKicks.com" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/development/walking-the-walk-gibraltar-moves-you-down-the-path/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Watch the Gazelles Turn</title>
		<link>http://reliable.esymmetrix.com/infrastructure/watch-the-gazelles-turn</link>
		<comments>http://reliable.esymmetrix.com/infrastructure/watch-the-gazelles-turn#comments</comments>
		<pubDate>Sat, 13 Jun 2009 02:31:38 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[IT Management]]></category>
		<category><![CDATA[Technology Selection]]></category>

		<guid isPermaLink="false">http://reliable.esymmetrix.com/?p=224</guid>
		<description><![CDATA[It is very tempting to be one of the herd of gazelles in technology.  Every time there&#8217;s  a sense of a shift in the wind, everyone starts to run in a new direction.  For the past year I&#8217;ve been reading about how it&#8217;s all going to be laptop computers from here on out.  In fact, [...]]]></description>
			<content:encoded><![CDATA[<p>It is very tempting to be one of the herd of gazelles in technology.  Every time there&#8217;s  a sense of a shift in the wind, everyone starts to run in a new direction.  For the past year I&#8217;ve been reading about how it&#8217;s all going to be laptop computers from here on out.  In fact, not even full fledged laptops, but netbooks &#8211; computers with small screens and small keyboard who&#8217;s main distinguishing characteristic is that they&#8217;re less of a computer than anything else around.</p>
<p>If all this sounds a little off kilter from reality, perhaps a few hard numbers would help:</p>
<p>Quoting<a title="Computer World:  Do business desktop PC's have a future?" href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&amp;taxonomyName=windows_and_linux_pcs&amp;articleId=9134320&amp;taxonomyId=64&amp;intsrc=kc_top" target="_blank"> Computer World</a>, who asked &#8220;Do Business Desktop PCs have a future?&#8221;:</p>
<blockquote><p>While desktop PCs account for the bulk of personal computers sold to  enterprises, the gap in laptop sales to enterprises is closing. Of 168 million  PCs sold worldwide to professional organizations in 2008, about 95 million were  desktops and 73 million were laptops. That&#8217;s compared to 94.6 million desktops  and 47.3 million laptops that shipped in 2006.</p></blockquote>
<p>Now, as with any statistics there&#8217;s two ways to look at these numbers:</p>
<ol>
<li>Laptops have grown tremendously in their total percentage of the market, and that growth rate has them on track to take over the world.</li>
<li>The majority of the growth in computer sales is coming in the form of laptops.</li>
</ol>
<p><img class="alignright size-full wp-image-227" title="gazelle" src="http://reliable.esymmetrix.com/wp-content/uploads/gazelle.jpg" alt="gazelle" width="170" height="115" />The gazelles are taking the first road.  And why not?  <strong>People love to assume the disruptive is true</strong>, it&#8217;s a lot more interesting.  Before you charge down that road, consider what seems likely.  There are a few problems with the first conclusion:</p>
<ul>
<li><strong>Two data points don&#8217;t make a pattern: </strong> If you follow the trend back farther, the sales of PC desktops has held up consistently, but laptop sales go up and down.  This would seem to indicate that the most likely interpretations of the data are that either the overall market is expanding (for example by people having two systems) or that this is a momentary, periodic surge in laptop purchases.</li>
<li><strong>Past large growth rarely projects forward: </strong> Just because there was a large growth in one year (either in absolute or percentage turns) doesn&#8217;t mean it will repeat at all.  It&#8217;s just as likely that the next year pattern will be flat or even retreat.</li>
</ul>
<p>So before you see the first twitch and assume it signals a migration of the whole herd, step back and think through the underlying facts.  Is this really the first sign of a monumental shift?  Or just another twitch of the needle?  Then look at your own situation.</p>
<p>Now, we have a few laptops, but we have more hard core desktops &#8211; the laptops are used for on the road presentations or working at Starbucks for fun.  Of course, we&#8217;re developers so we&#8217;re in the category of users that are always excluded from the norm.  But what&#8217;s not to love about a desktop?  For the same money they will always be faster and more capable than a laptop because they don&#8217;t have the burden of being small or extra power efficient.  Even if you buy into the idea that everything will be run through the web so computers are just glorified terminals&#8230;  Something still has to compose all of those web pages and make it all come together, and web apps can burn a surprising amount of processor and RAM locally.</p>
<p>In the end, I think we&#8217;re seeing a lot of folks buying second computers or getting additional laptops for other uses that complement their primary work computer experience.  Additionally, there are folks in emerging markets that need what laptops offer (self-contained, reliable power) more than performance but this reflects an increase in the overall market, not a shift in the existing market.</p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/infrastructure/watch-the-gazelles-turn/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Careful with that thing &#8211; it&#8217;s running Vista!</title>
		<link>http://reliable.esymmetrix.com/infrastructure/careful-with-that-thing-its-running-vista</link>
		<comments>http://reliable.esymmetrix.com/infrastructure/careful-with-that-thing-its-running-vista#comments</comments>
		<pubDate>Wed, 29 Apr 2009 06:33:36 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[product feedback]]></category>
		<category><![CDATA[Technology Selection]]></category>

		<guid isPermaLink="false">http://reliable.esymmetrix.com/?p=140</guid>
		<description><![CDATA[Everyone likes to be on the winning team.  We love to root for our favorite sports team, we like the car we own and the brand behind it.  So it&#8217;s no surprise that when Apple ran their I&#8217;m a Mac ads that Windows fans were in an uproar.  Now with the Laptop Hunter series the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-142" title="apple_or_microsoft" src="http://reliable.esymmetrix.com/wp-content/uploads/apple_or_microsoft.jpg" alt="apple_or_microsoft" width="180" height="119" />Everyone likes to be on the winning team.  We love to root for our favorite sports team, we like the car we own and the brand behind it.  So it&#8217;s no surprise that when Apple ran their I&#8217;m a Mac ads that Windows fans were in an uproar.  Now with the Laptop Hunter series the shoe&#8217;s on the other foot.   Microsoft is making a big show that Apple computers are overly expensive just for the Apple brand.  Apple fans claim that  to match a Mac, a PC has to be equipped with tons of antivirus software, a full time tech support guy, and a Witchdoctor on standby to keep it working.</p>
<h2>Seriously people</h2>
<p><strong>First, Apple makes some of the finest hardware you can possibly buy.</strong>  If you compare it nose to nose with hardware that&#8217;s actually built to the same standards then it really doesn&#8217;t represent a significant price premium.  Compare a Macintosh Pro with an equivalent Dell workstation &#8211; the cost is within 5%.  It&#8217;s amazing that Apple can afford the extra engineering of an OS with that little premium.  </p>
<p><strong>Second, Vista works great. </strong>  It&#8217;s running on many more systems than Mac OS is, and with volume comes a range of new problems.  The total amount of money I&#8217;ve spent on desktop antivirus software in 10 years of administering PCs?  $0.  The total number of virus problems I&#8217;ve had? 0.  My parents managed to get into trouble with one virus and Windows XP &#8211; but installing the (free) Microsoft Defender cleared it right up never to return.</p>
<p>As with nearly all marketing, this is a battle of perception:  Apple has done a great job of defending their brand at every turn.  This is part of their corporate ethos.  Along with a few other tenants it ensures they are a much loved but niche player:</p>
<ul>
<li>Only do something you can do uniquely well.</li>
<li>Don&#8217;t extend into markets that might ask you to compromise your values.</li>
<li>Cultivate the mystique:  Don&#8217;t show what&#8217;s behind the curtains.</li>
</ul>
<p>Microsoft on the other hand has tenants that ensure they&#8217;ll be a volume player, but an unloved one:</p>
<ul>
<li>Play to win the most market share in any market you can.</li>
<li>Build your ecosystem by making it easy for others to add value to it.</li>
<li>Cultivate the engineers:  Provide overwhelming amounts of documentation and approaches.</li>
</ul>
<p>The fact is, many people don&#8217;t need a top end piece of hardware like a Mac.  On the other hand, many people want a computer that&#8217;s just a tool, not a piece of art.  To them, the nearly infinite diversity and low cost of entry are essential.</p>
<h2>I&#8217;m a People person.  I&#8217;m Good with People!</h2>
<p>The computing needs of the average corporation and the average individual are very far apart.  To companies, <strong>computers are tools</strong> &#8211; like the desk, phone, and copier.  Very important, very powerful &#8211; tools.  They aren&#8217;t there to make you feel great or enable you to create a cool video of your vacation in France.  My partner really summed it up one day when he commented that the Mac was a really <em>personal computer </em>- it worked hard to create a personal connection.  </p>
<p>Corporations on the other hand want slow paced evolution, massive support for legacy applications and hardware (these guys are still running dot matrix printers off parallel ports) and to control costs.  They also philosophically want to have all the keys to the computers &#8211; just like they do for the buildings and offices they own.  PC&#8217;s are just end points on the large mesh that is the corporate IT network.  It&#8217;s very impersonal.</p>
<p>Microsoft makes a great deal of money providing businesses with the tools they <em>need </em>to have the computers work for them, and Apple makes a great deal of money creating computers that people <em>love</em>.  Either of these goals would be compromised by trying to do both.</p>
<h2>Vista Goggles</h2>
<p>Folks that have been in the Windows ecosystem a long time probably recognize that you could take the first year of press about Vista and substitute &#8220;Windows 2000&#8243; and find the same article written 8 years earlier.  Vista is a surprisingly large and tricky step forwards on a number of fronts, whereas Windows XP was a visual redress of Windows 2000.  </p>
<p>Almost like an SAT test:</p>
<blockquote><p>Windows 7 is to Windows Vista as</p>
<p>Windows XP is to Windows 2000.  </p></blockquote>
<p>Like Windows XP, the story on Windows 7 is making virtually no architecture changes and instead just tuning for the long haul.  That&#8217;s a great thing, because there&#8217;s a lot that works very well with Vista, and now it&#8217;ll work even better with Windows 7.</p>
<p>The humorous thing is to read now about how people are thinking about moving to Vista once 7 ships because, well, they don&#8217;t want to move to an OS that was just released.  It&#8217;s as if Vista has been aging like a fine cheese on the shelf so the very same binary code that once was toxic is now just what the doctor ordered.  To a slight degree this is true:  Vista SP1 did address some issues that affected some people, and more importantly hardware now is dramatically faster than it was two years ago (as it will be two years from now&#8230;) so what once was aggressive is now commonplace.  The same was true of Windows 2000 when it shipped.  Requiring 64MB of RAM?  That&#8217;s just crazy talk!  Only certain video cards worked reasonably with it, and video drivers to a while to stabilize.  That sounds very familiar&#8230;</p>
<p><strong>In the end, it really comes back to Perception. </strong> Probably the biggest mistake Microsoft did was not push the OEM&#8217;s that make the computers to build machines that could responsibly run the new operating system, and be clear that meant hardware 3D video cards and plenty of memory.  And oh yeah, stop putting aftermarket firewalls, antivirus, Google Desktop, and all kinds of other things on them that are ill optimized.  At my last company we got in the habit of routinely wiping each new Dell that came in and reinstalling the OS from the Dell restore CD &#8211; because that got rid of all the noise.  It was surprising how much better that worked.  Is that Microsoft&#8217;s fault?  Not directly, but they certainly could have found a way to encourage the ecosystem to forgo some profit for usability.  But that&#8217;s just not in their corporate DNA.</p>
<p>With any luck, the big story for Windows 7 will be that Microsoft pushes back against their channel, even being willing to risk it by leaving Windows XP out there for folks that don&#8217;t want to play by the Windows 7 rules.  It&#8217;s hard to put up barriers when you&#8217;re a legal monopoly, so find ways to use incentives to do it right instead of punishment for doing it wrong.  And keep up the ads, because perception does matter in the long run.</p>
<p>Who knows, it may push Apple to get better too.  Just once when my iPod updates itself to enhance stability and performance I&#8217;d love to know what exactly was unstable or slow&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/infrastructure/careful-with-that-thing-its-running-vista/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Remote Access for Everyone</title>
		<link>http://reliable.esymmetrix.com/infrastructure/remote-access-for-everyone</link>
		<comments>http://reliable.esymmetrix.com/infrastructure/remote-access-for-everyone#comments</comments>
		<pubDate>Tue, 17 Jun 2008 23:48:00 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Mobile Users]]></category>
		<category><![CDATA[VPN]]></category>

		<guid isPermaLink="false">http://kendall.srellim.org/?p=47</guid>
		<description><![CDATA[Back in the day, corporate remote access meant modem pools that people dialed into from wherever they were.  Even then it was like watching a feature film on your IPod; you got a sense of the action but it was ultimately as much frustrating as useful.
Things change.  Over the past eight years broadband [...]]]></description>
			<content:encoded><![CDATA[<p>Back in the day, corporate remote access meant modem pools that people dialed into from wherever they were.  Even then it was like watching a feature film on your IPod; you got a sense of the action but it was ultimately as much frustrating as useful.</p>
<p>Things change.  Over the past eight years broadband in some form has become available in most cities across the nation.  This bandwidth has made dedicated remote access a thing of the past.  Now you can provide remote access to your employees over your Internet connection.  Traditionally, IPSec has been the technology of choice to provide a virtual private networking solution for your employees but over the past two years there&#8217;s been a new game in town &#8211; SSL VPNs.</p>
<p>if you are using IPSec for your mobile users, you owe it to them and you to check out one of the SSL VPN options at your disposal.  We&#8217;ve used IPSec VPNs for network to network access reliably, but they&#8217;ve always been tough to support for mobile users.  Offhand, there isn&#8217;t any specific reason this should be true, but it is.  For mobile users, we seem to consistently run into a few problems:</p>
<ul>
<li><strong>Installation: </strong> The success rate for an average user being able to install an IPSec client and get the VPN tunnels to work, even with phone support, was around 15%.  Most of the time the user had to bring in the computer or we had to send a tech on site.</li>
<li><strong>Compatibility: </strong> Different physical network technologies &#8211; notably DSL &#8211; run into performance problems with IPSec in many configurations, requiring adjustments on the client, routers, or other things that you just can&#8217;t expect end users to understand.</li>
<li><strong>Portability: </strong> IPSec is very easy to block on a network.  In fact, it took some time for most network routers to be compatible with IPSec.  Now try to get it to work at 8 PM over a wireless network in a hotel in Buffalo.</li>
</ul>
<p>In contrast, a few years ago at the urging of <a title="Watchguard" href="http://www.watchguard.com/" target="_blank">Watchguard</a> (our resident firewall vendor) we tried out their SSL VPN product, which was basically a version of the <a title="Citrix Web Site" href="http://www.citrix.com/lang/English/home.asp" target="_blank">Citrix</a> Access Gateway SSL VPN solution running on a Watchguard hardware appliance.  Out of the box it worked &#8211; every time, and even faster than IPSec.  We had resisted the option because we preferred standards-based solutions, and this sounded like yet another proprietary security technology.  We used a demonstration appliance for a month but the feedback from our users was so strong we purchased a unit after a few weeks.  Upon reflection, there really is a good bit of sense to why it works so well:</p>
<ul>
<li><strong>SSL is Simple, IPSec is complicated: </strong>SSL is a single TCP/IP socket with a relatively straightforward, self-configuring, and invisible to intervening appliances.</li>
<li><strong>SSL is essential, IPSec is a threat: </strong> No one can afford to block SSL on their network without basically admitting to not having a network at all.  It&#8217;s very expensive to proxy by decrypting and re-encrypting, so few companies do it.  On the other hand, many networks view with suspicion the goal of establishing an encrypted connection out of their network, so blocking IPSec may sound like a good idea.</li>
</ul>
<p>With the SSL VPN solution we had about an 85% end-user self install rate without support, and a 100% rate of not requiring a tech to go on site.  Even better, the reviews from end users was that it was fast to connect, easy to use, and performance was good.  Because it was so easy to get set up, many more users started connecting from home in the evenings or in bad weather to get work done.  The net cost? While your firewall probably offers an IPSec client for free, you can expect to pay a few thousand for a dedicated SSL VPN appliance and depending on licensing $50-$200 per concurrent additional user after the first five or so.  For a company with say 100 users that might have at most 20 concurrent users the cost is in the order of $4,000 to $6,000.</p>
<h2>Making the Business Case</h2>
<p>Jumping from &#8220;free&#8221; to $6,000 may seem questionable until you look at it from the value standpoint:  A service that was expensive to setup and of questionable reliability became cheap to set up and rock solid.   In other words, this is the real cost to provide this service.  An unreliable solution isn&#8217;t a business solution.  If it&#8217;s more than your business is willing to pay, wait a little while &#8211; the cost has come down by half in the last two years, and some vendors (like Watchguard in their Fireware Pro product) are offering it alongside their free IPSec VPN option.</p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/infrastructure/remote-access-for-everyone/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>So Why are You Still Hosting?</title>
		<link>http://reliable.esymmetrix.com/infrastructure/so-why-are-you-still-hosting</link>
		<comments>http://reliable.esymmetrix.com/infrastructure/so-why-are-you-still-hosting#comments</comments>
		<pubDate>Fri, 13 Jun 2008 05:18:44 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[IT Management]]></category>
		<category><![CDATA[IT Operations]]></category>

		<guid isPermaLink="false">http://kendall.srellim.org/?p=46</guid>
		<description><![CDATA[Right now, the power is out at my home. That doesn&#8217;t happen often &#8211; in fact, it&#8217;s been almost two years since we lost power long enough for my UPS to shut down my home network. Normally this would be a small inconvenience, but I still host a few things for my wife out of [...]]]></description>
			<content:encoded><![CDATA[<p>Right now, the power is out at my home. That doesn&#8217;t happen often &#8211; in fact, it&#8217;s been almost two years since we lost power long enough for my UPS to shut down my home network. Normally this would be a small inconvenience, but I still host a few things for my wife out of my house which are now down. The largest of these is a fairly popular forum for an author she likes, but there are other sites as well.</p>
<p>Why am I still hosting these at home? Really there&#8217;s no reason &#8211; I&#8217;ve shifted hosting for my personal services out to other providers, and our company services are also hosted by hosting companies.  I just haven&#8217;t moved her stuff out of my house.</p>
<p>We talk with a lot of small and medium sized businesses that are still hosting all of their own services internally for pretty much the same reasons &#8211; they originally had them in house when they were much smaller and the market was different, and haven&#8217;t considered what it would mean to have those computers live somewhere else.  <strong>It&#8217;s time for a change.</strong></p>
<h2>Why It&#8217;s time to Use the Cloud</h2>
<p>You should look at all of your important business services &#8211; things that your business can&#8217;t operate without &#8211; and work out a plan to no longer host those items in your facility.   As a first step, just consider what it means to <strong>provide the same applications and services, but have the computers not live within your company.</strong> The main goals for moving these services out are:</p>
<ol>
<li><strong>Business Agility: </strong>When you use a hosting company it&#8217;s easier to change capacity as your needs change, even to bring services up temporarily as a trial run and then shut them down if they don&#8217;t pan out.   This makes it easy to experiment with new software technology without the traditional problems of hosting getting in the way.</li>
<li><strong>Low Cost Reliability: </strong>If you want those services available, the cost to outfit a room to provide redundant cooling and power for a single rack of equipment is easily $50,000.   To host one rack of equipment in a basic Tier-2 data center can cost around $1,500 to $3000 a month, which includes power and Internet.  At that rate, how quickly will you get an ROI on your facility investment?</li>
<li><strong>Improved Focus:</strong> Getting this equipment out of your shop improves your focus on the things you really need to be spending time on:  Projects for the business and end-user support.  <strong>The rest of it is overhead.</strong></li>
<li><strong>Access from Anywhere:</strong> When you set up your services so they can live in the cloud and be used from your office, it&#8217;s easy to make those same services available to employees from home and from laptops.  Not as second class citizens but with all of the ranks and privileges of being in the office.  This helps you leverage employee talent wherever it is.  It&#8217;s also easier to set up rock-solid extranet access for customers and suppliers.</li>
</ol>
<p>When you start looking at each thing you provide as a service, you might also find that some of them &#8211; like Microsoft Exchange &#8211; really aren&#8217;t worth hosting yourself at all even in a data center, and it&#8217;d be ultimately in your best interest to outsource it entirely to a hosted Exchange provider.   There are number that can do this very effectively.  While the cost may seem high based on what it cost you to purchase your initial Exchange licenses, when you look at the real cash costs for Exchange over two to three years they are very cost effective.</p>
<p>Once you&#8217;ve taken the step of taking an existing service and outsourced it entirely, you might even consider a Software as a Service offering for some of your core services (such as a hosted CRM). This is the most aggressive mode of outsourcing and does create a set of unique risks and opportunities.</p>
<h2>But I can&#8217;t See It</h2>
<p>Two common objections we hear from IT administrators about moving services out of their shop, even if it&#8217;s just relocating servers into a data center. is that it will make it hard for them to get upgrades when necessary because the business won&#8217;t be able to see &amp; feel the new equipment.  <strong>Out of sight, out of mind</strong> as the saying goes.  The second main objection is that the IT administrators want to be able to do a <strong>laying of hands</strong> on the equipment to maintain it.  There&#8217;s a comfort factor in knowing you can walk into a room and flip the power switch or move a drive or just bask in the warm glow of blinking lights.</p>
<p><strong>Here&#8217;s the good news: </strong>Both of these reasons are not only suspect in their own right, but are preventing your shop from getting to the next level in IT&#8217;s relationship with the business.</p>
<p>First, even though vendors do a good job of making server hardware look serious and fun, in the end <strong>it&#8217;s just a business appliance</strong>:  It either is good enough to deliver for the business or it isn&#8217;t.    With rare exception, there is no extra business value for it to look good, new, or cool.    If you find that you need to show the business physical servers to explain your costs, you&#8217;re missing out on the critical opportunity to establish <strong>a real partnership between business and IT</strong>.    You need to be sure you&#8217;re spending when it&#8217;s time to spend and saving when it&#8217;s time to save, and have discussions in the language the business would use for any other service it would acquire.</p>
<p>Second, If your IT administration patterns and practices require routinely touching your physical infrastructure then you need to re-examine them.    It generally means you either have equipment that is no longer up to the task or that you&#8217;re not doing enough automation of IT tasks.    If you have trouble-prone hardware, it&#8217;s time to either <strong>fix the fundamental issue or ditch the hardware</strong>.    Ironically, this type of problem is often easier in a hosted environment because it generally isn&#8217;t your problem: it&#8217;s the hosting company&#8217;s.</p>
<p>Automation is essential because humans are the most error-prone part of any standard process.   Your routine IT administration time shouldn&#8217;t be going to consistent tasks &#8211; they should be automated, leaving your time for user support and other business value-add services.    That&#8217;s right &#8211; even in your shop with your existing staff you can find more time to spend on projects instead of support events by automating recurring tasks.</p>
<h2>Some Things Still Stay</h2>
<p>There are some things that should be on site for performance reasons.   Regardless of how big your Internet connection is, <strong>you&#8217;re going to want basic file and printer sharing services to be local.</strong> Depending on the size of your site, you&#8217;ll probably also want a directory server for whatever your directory system is (e.g. Microsoft Active Directory).   Even here the central services help:  If you have a reasonable Internet connection, you can <strong>have your local file server back itself up to the data center</strong> by using one of a few distributed backup systems (such as <a title="Microsoft Data Protection Manager" href="http://www.microsoft.com/systemcenter/dpm/default.mspx">Microsoft&#8217;s Data Protection Manager </a>or a third-party option like NSI Software&#8217;s Double-Take).   This eliminates the time and attention that local disk backups require.</p>
<h2>Perhaps not Now, but Soon &#8211; and For the Rest of Your Life</h2>
<p>It may not be appropriate to move a number of your services outside yet; If you have only one business site, light access by employees externally, and aren&#8217;t expecting that to change then you can host most things yourself.    A number of the considerations still apply &#8211; but you might just use an external facility for your public web presence and for backing up your essential data for business continuity.</p>
<p>Even if you don&#8217;t do much now, you should <strong>find some opportunity to put a service outside</strong> so you and your company can gain experience at working with external hosting providers and you&#8217;ll stay current on the capabilities and costs so that as new business requirements evolve you&#8217;re ready to take care of them.  You&#8217;ll be in a better position to advise your company on when to move things out of the shop, and as you do you&#8217;ll discover that instead of focusing your time and talent inward at the routine operations of infrastructure you&#8217;ll have time for those projects that really make a difference to your business.</p>
<h2>How Has the Cloud Delivered For You?</h2>
<p>Have a story about what has and hasn&#8217;t worked with hosting?  <a title="EMail Kendall Miller" href="mailto:kendall.miller@esymmetrix.com">Drop me a line</a> or post a comment to share it.</p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/infrastructure/so-why-are-you-still-hosting/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Aviate, Navigate, Communicate</title>
		<link>http://reliable.esymmetrix.com/infrastructure/monitoring/aviate-navigate-communicate</link>
		<comments>http://reliable.esymmetrix.com/infrastructure/monitoring/aviate-navigate-communicate#comments</comments>
		<pubDate>Thu, 27 Mar 2008 05:29:46 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Management]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Cognitive Bias]]></category>

		<guid isPermaLink="false">http://kendall.srellim.org/infrastructure/monitoring/aviate-navigate-communicate</guid>
		<description><![CDATA[If you&#8217;re involved in IT operations or even in business long enough, you&#8217;re going to experience some emergencies. During these emergencies, you&#8217;re going to have to balance several conflicting things that will demand your attention simultaneously:

Cause of the problem: What is really happening? What device is at the root of the problem (network switch died [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re involved in IT operations or even in business long enough, you&#8217;re going to experience some emergencies. During these emergencies, you&#8217;re going to have to balance several conflicting things that will demand your attention simultaneously:</p>
<ol>
<li><strong>Cause of the problem: </strong>What is really happening? What device is at the root of the problem (network switch died because an admin configured a loop in the fabric and miss-configured the port)</li>
<li><strong>S</strong><strong>cope of the problem: </strong>Just how bad is it? Problems usually show up in one place (users can&#8217;t access Exchange) but those symptoms often represent a larger problem (network switch died)</li>
<li><strong>Communicate with users: </strong>First, people will be coming in the door to report the problem (do you know that Exchange is down?) and will be expecting updates on what&#8217;s going on and when it&#8217;ll be resolved (I really need to tell my friend about a party tonight, when will email be back up?)</li>
</ol>
<p>Even in a shop with healthy staffing, this can be a lot to handle at once particularly because your impulse is going to be to move between the root cause and communication. The first because it&#8217;s the real high value item -fix the problem. The last because whenever someone walks in, you&#8217;ll want to tell them what&#8217;s going on. The higher up the chain of command, the better you&#8217;ll want it to sound.</p>
<p>Whenever I&#8217;m wondering how to look at an IT Operations problem from a different perspective to gain insight, aviation is the first place I go. Think about the modern air transport system in the United States not from your usual perspective (a passenger on a plane) but from the standpoint of the people that live within it and operate it. For example, the life of a flight deck crew isn&#8217;t that different than system support in the sense that you have long periods of routine punctuated by periods of high stress activity. A classic rule taught to pilots when they&#8217;re first being trained is Aviate, Navigate, and Communicate &#8211; in that order.</p>
<ol>
<li><strong><a title="First, fly the plane article" href="http://reliable.esymmetrix.com/infrastructure/first-fly-the-plane" target="_self">First, fly the plane</a></strong>. (Be in the <em>middle</em> of the air, not the bottom)</li>
<li><strong>Figure out where you are. </strong>(Over the White House)</li>
<li><strong>Then communicate. </strong>(Sorry Tower, would you like us to land?)</li>
</ol>
<p>To make things easier on commercial planes, you have a pilot and co-pilot that divide these responsibilities by having clear designation of one being the Pilot Flying and the other (called the Pilot Not Flying or Pilot Monitoring) responsible for navigation and communication. This is practiced carefully during training with different parts of each emergency checklist assigned to either the Pilot Flying or Pilot Monitoring.</p>
<p>Now apply this back to a system problem:</p>
<ol>
<li><strong>Create Clear Roles: </strong>Have your team know who is going to take on the role of Admin Flying and Admin Monitoring. This shouldn&#8217;t always be the same &#8211; it may be based simply on rotation (who is &#8220;up&#8221;) or who gets the trouble ticket or whatever within your shop. The team should declare their role in a situation so everyone knows their role.</li>
<li><strong>Perform in Order: </strong>If you have an Admin monitoring, it&#8217;s their role to intercept external communication while the Admin Flying is working on the problem.</li>
<li><strong>Make a Checklist: </strong>When there is an emergency isn&#8217;t the time to be winging it. During quiet moments, talk as a team about what you would do in a hypothetical situation and work to distill out a basic checklist of things you&#8217;re going to run through. Focus on having it be the shortest list that verifies the largest set of items. When a problem shows up, use the checklist.</li>
</ol>
<h2>Problem Checklists</h2>
<p>There are a few great advantages to using a checklist for problems:</p>
<ul>
<li><strong>Reduce Solution Focus: </strong>When diagnosing problem, the general process is to propose a theory then test it to either prove or disprove it. This create cycles where you create theories you have to believe in then your job is to prove yourself wrong. It turns out that people tend to naturally bias towards information that proves themselves right and away from information that&#8217;s inconsistent with that diagnosis. Checklists for diagnostics can ensure that a significant breadth of information is available at the start of this process to enable the best theories to be created quickly.</li>
<li><strong>Creates a Pace:</strong> It&#8217;s easy to get caught up in an emergency and start working at a pace that really isn&#8217;t necessary, but degrades your accuracy and effectiveness. Checklists stop the emotional cycle that reinforces the early stages of emergencies and instead create a steadily paced environment of gathering and verifying facts.</li>
<li><strong>Establish a Baseline for Improvement:</strong> One of the most important parts of any emergency, and the least frequently used effectively, is an <strong>after action review</strong>. After you&#8217;re back up and everyone has calmed down, you want to learn as much as you can from what happened. The existence of a checklist creates a baseline for systematic (As opposed to random or by chance) improvement to your team&#8217;s ability to handle future problems. This is true even if the checklist wasn&#8217;t used; the fact it wasn&#8217;t used is itself an indictment of either the checklist itself or the team&#8217;s training.</li>
</ul>
<p>While initially it may feel corny or even overly dramatic or bureaucratic to create checklists, there is real evidence to back up using them in environments where the downside cost (crash and death) is very steep, and if pressed to admit it most engineer will confess they have a mental checklist they use for standard problems.</p>
<h3>Plans are Useless, Planning is Priceless.</h3>
<p>Just by creating the checklists (even if they were never used) your team can get a lot of value:</p>
<ul>
<li><strong>Cooperative learning:</strong> This is a great tool for the team to learn from each other. Each admin will share their best tips and tricks from their mental checklist and be surprised that they don&#8217;t line up. Where they don&#8217;t, the discussion on which approach is better and why is <strong>gold</strong>. It&#8217;s hard to get the same result with a contrived exercise, so use this opportunity to build the checklist and maintain it as a team.</li>
<li><strong>Clarifies Automation: </strong>While creating the checklist, it will naturally precipitate ideas for how to automatically identify and possibly solve steps in the checklist itself. For example, if a step in the checklist is to verify Internet connectivity, <em>how </em>are you going to accomplish that? Instead of having an ad-hoc mechanism, can an automated mechanism be put in place so that you now can quickly check that data point without variation?</li>
<li><strong>Encourages Collaboration: </strong>If the team collaborates to create the checklist, when a problem occurs they will be more likely to collaborate on resolving the problem because they already have had the experience of working together as a team. This will tend to replace individual ego with group esprit de corps.</li>
</ul>
<h2>An Exercise Left to the Interested Student</h2>
<p>A friend of mine also pointed out the principle that if you have a checklist that always ends in the same action, why not automate the action in response to the checklist? In other words, if you can automate the detection steps that lead up to the action, then find a way to automate the resolution. You will often find you get here in inches: You progressively improve your monitoring so that you can find problems faster. Once this is reliable, you start just hooking up alarms to the monitoring so you don&#8217;t wait for a call from a real user or a higher level system. Once that&#8217;s working well enough, you get tired of performing the resolution manually so you write a script that takes a few arguments to perform the resolution. Now, just connect them together.</p>
<h2>Move Forward One Step Today</h2>
<p>The best part about this is that you can get there in small steps that even the busiest team can fit into their schedule with a confidence that they will pay back in time saved in the future. With practice, it will become second nature and make it easier for your team to accommodate new processes and service requirements with ease. In the end, isn&#8217;t that what you need to ensure your team is viewed as a vital part of your organization?</p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/infrastructure/monitoring/aviate-navigate-communicate/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Technology is not Scalable</title>
		<link>http://reliable.esymmetrix.com/development/technology-is-not-scalable</link>
		<comments>http://reliable.esymmetrix.com/development/technology-is-not-scalable#comments</comments>
		<pubDate>Tue, 25 Mar 2008 04:22:16 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[ASP.NET]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Technology Selection]]></category>

		<guid isPermaLink="false">http://kendall.srellim.org/development/technology-is-not-scalable</guid>
		<description><![CDATA[I was watching Start-Up Junkies the other day which is following a group of people attempting to get Earth Class Mail off the ground. On one recent episode, the main focus was converting their system from being PHP-based to Microsoft .NET.
As you watch the episode, you hear two different reasons given for this transition. The [...]]]></description>
			<content:encoded><![CDATA[<p>I was watching <a href="http://www.mojohd.com/mojoseries/startupjunkies/" title="Start-Up Junkies">Start-Up Junkies</a> the other day which is following a group of people attempting to get <a href="http://www.earthclassmail.com/" title="Earth Class Mail">Earth Class Mail</a> off the ground. On one recent episode, the main focus was converting their system from being PHP-based to Microsoft .NET.</p>
<p>As you watch the episode, you hear two different reasons given for this transition. The CEO advances the issue that they started with PHP because they needed to get something done quick, but now they need to switch to ASP.NET to make it scale. They always knew they&#8217;d have to do it, but now they&#8217;re against the wall because they have been invited to demo at a Microsoft conference. The lead engineering staff and operations manager advance a different point: We are about to be demoing in Europe in Microsoft&#8217;s booth at a major convention that&#8217;s key to our growth, we need to be using Microsoft&#8217;s technology for this to happen.</p>
<p>Of these two reasons, which do you think is the better reason for their technology choice:</p>
<ol>
<li>Convert to ASP.NET to scale up because PHP can&#8217;t scale.</li>
<li>Convert to ASP.NET because we&#8217;re getting marketing and sales assistance from Microsoft, which we&#8217;ll only get on their platform.</li>
</ol>
<p>If you picked #1, two things are likely true: First, you haven&#8217;t worked with enough technology to know that the choice between PHP and ASP.NET for scalability is pretty far down on the list of &#8220;things that control how much we can scale&#8221;. Second, you&#8217;re probably not balancing business and technology interests effectively.</p>
<p>Go and check out the technology portfolio used at the most scalable web sites in the world &#8211; say the top ten super scalable systems, systems that are going to be at least two orders of magnitude greater than anything you&#8217;re likely to create (and this isn&#8217;t a negative thing; it&#8217;s a liberating thing). Notice anything in particular? You don&#8217;t tend to see either Microsoft ASP.NET or J2EE in the web infrastructure. In fact, you tend to see a lot of&#8230; PHP.</p>
<p>There are a few key reasons that the super scalable sites like these solutions:</p>
<ol>
<li><strong>Open source means they have the source: </strong>You can bet that these sites aren&#8217;t able to use anything off the shelf. Their needs so outstrip the normal system that it isn&#8217;t reasonable that any off-the-shelf framework is going to fit their needs entirely. In fact, if it did that framework is likely seriously over-engineered for the majority of its user base.</li>
<li><strong>Licensing costs add up: </strong>If you&#8217;re a small shop, licensing costs are highly unlikely to be a significant percentage of your total cost of goods for a technology product; bandwidth, hosting, and above all people are the big numbers. If you&#8217;re Google, you don&#8217;t want to pay even $10 in licensing per server.  This is similar to a large manufacturer worry about saving $.05 on a bolt; small incremental costs still add up.</li>
<li><strong>Scalability is their first concern: </strong>More important than ease of development, cool debuggers, third party component libraries, or anything else. If it can&#8217;t meet their scale, it isn&#8217;t even a potential solution. Perhaps more importantly, they have to have both the experience and human belief that it will scale. If you&#8217;re one of these sites, there is no way a vendor has tested their solution at your scale &#8211; you&#8217;ll be the first. If you&#8217;re going to be the first, you want to have a simple solution that you can adjust and correct.</li>
</ol>
<p>If you&#8217;re honest, your decision matrix isn&#8217;t the same as this. It&#8217;s highly unlikely you&#8217;ll create the next MySpace, even if you are successful. While the principles of scalability are constant, the importance of scalability vs. other constraints changes. More likely, you need to base your technology choices on a mix of:</p>
<ol>
<li><strong>What resources do you have?</strong> If you already have a staff of people experienced at technology X, they are likely to produce more results in any moderate interval of time (say one to three months) with this technology than any new one. If you have a large body of existing code in technology X, this is a big accelerator to your project.</li>
<li><strong>What resources can you get? </strong>When picking a technology, <a href="http://www.codinghorror.com/blog/archives/000706.html" title="Coding Horror:  Buy the community, not the product">buy the community, not the product</a>. If you can take a number of pieces off the shelf, particularly for things you aren&#8217;t attempting to innovate (such as security, content management, grid controls, reporting..) it will accelerate your product curve. Conversely, if you can&#8217;t get great people that want to work with a technology, it really doesn&#8217;t matter how great the technology is.</li>
<li><strong>What religion is your market?</strong> Many markets have a non-rational product selection bias. For example, if you want to sell your product primarily to Macintosh users, you probably shouldn&#8217;t use ASP.NET. It isn&#8217;t that it should make a difference to how the product works for them, but as a group Macintosh users tend to put &#8220;Not Microsoft&#8221; on their evaluation lists. Similarly Linux users. Conversely, there are several products that are defined as &#8220;just like X, but in ASP.NET!&#8221; If your market typically has a technology selection criteria that isn&#8217;t based on business or practical fundamentals, it&#8217;s best to respect it, otherwise you&#8217;ll have to focus additional energy during your sales and marketing efforts to overcome what your market will perceive as a natural disadvantage.  The coolest technology, developed quickly and cheaply, is no good if your target customer won&#8217;t even invite you to the dance.</li>
</ol>
<p>Back to Earth Class Mail. Could they scale using PHP? Absolutely, others have.  Should they switch to ASP.NET? Probably &#8211; they wanted to leverage the marketing advantage of Microsoft.  I suspect if IBM was the big animal in the space they wanted and a deal could have been made it would have been WebSphere instead of ASP.NET.  Each of these technologies can scale, or not scale, depending on how they are used.</p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/development/technology-is-not-scalable/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>No drop of rain believes it&#8217;s responsible for the flood</title>
		<link>http://reliable.esymmetrix.com/development/no-drop-of-rain-believes-it-is-responsible-for-the-flood</link>
		<comments>http://reliable.esymmetrix.com/development/no-drop-of-rain-believes-it-is-responsible-for-the-flood#comments</comments>
		<pubDate>Fri, 21 Mar 2008 01:33:13 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Accountability]]></category>
		<category><![CDATA[Problem Management]]></category>
		<category><![CDATA[Responsibility]]></category>

		<guid isPermaLink="false">http://kendall.srellim.org/development/no-drop-of-rain-believes-it-is-responsible-for-the-flood</guid>
		<description><![CDATA[I grew up as the third son in our family. When my oldest brother was a newly minted driver, like every new driver he was a little rough. And like any younger siblings, my other brother and I were kind and gentle in our commentary about it. This led him to declaring his first driving [...]]]></description>
			<content:encoded><![CDATA[<p>I grew up as the third son in our family. When my oldest brother was a newly minted driver, like every new driver he was a little rough. And like any younger siblings, my other brother and I were kind and gentle in our commentary about it. This led him to declaring his first driving rule: No comments on his driving when he was driving. He was in command, and that was it.</p>
<p>One day soon thereafter, he was backing the car out of the garage with my other brother and I in it. Now, he didn&#8217;t normally park on the right side of the garage &#8211; that was where my dad&#8217;s car went. But by whatever fluke, there the big Ford station wagon was &#8211; on the right side of the garage. When backing out, you have to start turning right away because the driveway isn&#8217;t straight. In fact, you have to start turning left, meaning the front of the car goes to the right. Normally this was not a problem since there was plenty of clearance. But when starting on the right side of the garage, on the right is&#8230; the garage. As he started backing out, my brother and I quietly sat there, watched the side of the garage come right up and <strong>*bang* *crunch*</strong> we hit it. In the &#8220;after action review&#8221; that followed, my brother exclaimed “<em>Why didn&#8217;t you tell me I was going to hit the garage?</em>” You can imagine our response – “<em>Because you had said never comment on your driving</em>.”</p>
<p>At the time, I was smug in my righteousness. We had done exactly what he&#8217;d asked, we weren&#8217;t the driver and the driver is responsible for the car, so it was a big ‘not-my-problem’.</p>
<p><strong>We were dead wrong.</strong>We were in a position to have prevented the problem, and we should have spoken up. Ever since, we&#8217;ve had the rule in our family that when riding in a car. The rule is to <strong>speak up </strong>if you see a problem without fear that the driver will be upset. The potential consequences of not calling a problem to the driver’s attention are too great.</p>
<h2>How Do You Play the Blame Game?</h2>
<p>The same story often plays out in the aftermath of a technology problem. Hang around a software development team long enough and you&#8217;re bound to hear a developer complain “<em>Why didn&#8217;t QA find that defect? They should have found it before it shipped.”</em> The difference between an experienced, healthy team and an amateur team is whether the developer is just venting or actually believes they are justified.</p>
<p>We often have a strong desire to try to reduce accountability for avoiding issues to a single party:</p>
<ul>
<li>QA is responsible for finding all defects in the software before it is released.</li>
<li>IT Operations is responsible for keeping all of the servers running.</li>
<li>The receptionist is responsible for ensuring we don&#8217;t run out of coffee.</li>
</ul>
<p>Before looking at the contentious examples, look at just the last one. Say that you noticed that you pulled the next-to-last box of K-cups out of the supply cabinet. You&#8217;re not out of coffee yet &#8211; there are 24 individual servings in the box you pulled, one more box on the shelf. In most small companies, that&#8217;s at least a day&#8217;s worth of coffee. Would you tell the receptionist that you need more coffee? Or just assume that it will be taken care of? Say you then run out of coffee two days later, and everyone has to run out to Starbucks to feed their habit. Would you feel at all responsible for not speaking up when the problem was still avoidable?</p>
<p>You probably would have spoken up &#8211; the receptionist is a nice person, it&#8217;s an easy enough thing to do and you like your coffee.</p>
<p>Now look at the other two scenarios. The only real difference between them and running out of coffee is that these two will tend to be political and possibly even contractual. While you&#8217;d likely also speak up if you saw a defect in your company&#8217;s product before it was released or if you saw that a server was just about out of disk space, you wouldn&#8217;t want to accept any accountability after the fact if things went bad.</p>
<p>Here&#8217;s the elephant in the room: Your customers don&#8217;t care who was accountable for avoiding a problem. They care that the problem happened. They pay you for something that works (and has to work according to their definition of what works means). Anything else is just internal noise. If you want to drive your business forward &#8211; and really, if you don&#8217;t, you need to look to work somewhere else &#8211; this needs to be your motivator.</p>
<h2>Formal vs. Practical Accountability</h2>
<p>What if, instead of looking at issues as someone else&#8217;s problem, you followed these two principles?</p>
<ul>
<li>If you are in a position to prevent a problem, you are accountable for preventing it.</li>
<li>If you are responsible for ensuring a problem doesn&#8217;t happen, you need to stay in a position to prevent it.</li>
</ul>
<p>This means that many different people and groups may each be 100% accountable for a problem, because the most useful way to look at accountability is based on the ability and responsibility for preventing the problem. Why the most <em>useful</em>? Because the problem <strong>happened</strong>, that&#8217;s a matter of <strong>fact</strong>. While recriminations, blame, and shame may be cathartic or fun, they aren&#8217;t <em>useful </em>because they don&#8217;t further the goals of the team or the company. Put simply, your customers don&#8217;t care who&#8217;s at fault within your organization, just that you get the seriousness of the problem and you&#8217;re making it right. When debriefing your team, the ideal outcome is that everyone in the room sees how they could have prevented the problem, and takes on that they should have prevented the problem. From that, you then work into who was in the best place to prevent it &#8211; who could have seen it first, and addressed it while it was cheapest to address. You want to have everyone walk out with a balanced perspective of how they could have prevented it and how to identify when you&#8217;re in the best spot to prevent it.</p>
<p>A natural concern with this approach, particularly if it&#8217;s new to your organization, is that after action reviews are often a game of musical chairs &#8211; while there&#8217;s a superficial impression of honesty and openness, the true goal is to not be left without a chair when the music stops. Far from a well-calculated political move, this is really an emotional and ego driven outcome. No one likes admitting they are wrong, and with practice people get very skilled at justifying their emotional responses with pseudo-intellectual reasoning – it is called rationalization.</p>
<p>The next time you&#8217;re in this situation, try being the first party to speak up about what you could have done to avoid the problem, and make sure you communicate sincere regret you didn&#8217;t catch it. If you are completely open in this &#8211; sticking just to what you could have done without any back handedness (that&#8217;s right &#8211; you can&#8217;t say &#8220;I couldn&#8217;t cover up his incompetence.&#8221; That doesn&#8217;t count.) you&#8217;ll be amazed at how quickly the mood in the room changes. Very quickly others will jump in with what they could have done. You&#8217;ve created an environment where people can speak the real fears that are on their mind without posturing.</p>
<p>Once you&#8217;ve established this environment, you need to be active in maintaining it. If someone jumps into the attack, speak up and redirect the conversation. This is true particularly if the attack isn&#8217;t directed at you. Keep listening to have the conversation stay in even tones and that each party is either talking about what their area could have done or is constructively helping the overall conversation.</p>
<p>Eventually, there will be a fundamentally sticky conversation about which party was in the best position to avoid the problem. At this point it&#8217;s going to come down to culture &#8211; if your culture is one that learns from mistakes, it will be a clear and short conversation. Depending on how strong the duck-and-cover instinct is in your shop, it can be very painful. In the end &#8211; speak up if your team is the one that should have the spotlight. Fear of accountability is often overstated. In practice, managers know that in the end they need people that will be accountable for what happened, and the experience can still be positive in the long term. Great managers actively hunt out people that are quick to learn from their mistakes and own them.</p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/development/no-drop-of-rain-believes-it-is-responsible-for-the-flood/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>First, Fly the Plane</title>
		<link>http://reliable.esymmetrix.com/infrastructure/first-fly-the-plane</link>
		<comments>http://reliable.esymmetrix.com/infrastructure/first-fly-the-plane#comments</comments>
		<pubDate>Mon, 17 Mar 2008 01:45:27 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[CFIT]]></category>
		<category><![CDATA[IT Operations]]></category>
		<category><![CDATA[Troubleshooting]]></category>
		<category><![CDATA[two person rule]]></category>

		<guid isPermaLink="false">http://kendall.srellim.org/infrastructure/first-fly-the-plane</guid>
		<description><![CDATA[I used to work with a former Navy A-6 pilot and instructor.  One of his standard techniques for helping pilots deal with emergencies was to train them to take an immediate action when they noticed the problem &#8211; an action that had no consequence but would fill the need to do something.  What he trained [...]]]></description>
			<content:encoded><![CDATA[<p>I used to work with a former Navy A-6 pilot and instructor.  One of his standard techniques for helping pilots deal with emergencies was to train them to take an immediate action when they noticed the problem &#8211; an action that had no consequence but would fill the need to do <em>something</em>.  What he trained them to do was reset the built-in timer clock as soon as they noticed the problem.  Ostensibly, this was to help them downstream know how long a problem had happened, but its true purpose was to give them a single, standard action to fill the human need to do something, then they could take time to reflect on the problem.  Step two on the checklist was <strong>fly the plane</strong>. There have been several <a title="Wikipedia - Controlled Flight into Terrain" href="http://en.wikipedia.org/wiki/CFIT" target="_blank">CFIT </a>accidents where pilots were too busy troubleshooting a problem to avoid the ground.  The pilots forgot their first responsibility: make sure you put flying the plane in front of any other activity.</p>
<p>When doing IT Operations, there&#8217;s a lot you can learn from aviation.  I&#8217;ve seen several situations where technicians have caused much larger problems while troubleshooting small ones.  This comes from the same mindset that caused air crashes:  you become so focused on the immediate problem that you are no longer aware of your environment. The longer you work at a problem, the more likely this will happen.</p>
<p>A few team techniques you can use to help avoid this:</p>
<ul>
<li><strong><a title="Two Person Rule" href="http://reliable.esymmetrix.com/infrastructure/two-person-rule" target="_self">The Two Person Rule</a>: </strong>Have two technicians involved in the problem with one taking the immediate actions and the other taking a longer view.</li>
<li><strong>Separate Diagnostics from Remediation: </strong>Break your approach into non-invasive diagnostic activities before remediation attempts. This gives you a discrete point before you start putting thing at risk to recheck your assumptions about dependencies and risks to other systems.</li>
<li><strong>Peer Review: </strong>Before approaching a problem, discuss your approach with two other people on your team (at the same time). If that approach isn&#8217;t successful or you need to deviate from it, reconvene the group to discuss again.</li>
</ul>
<p>In many ways this is an extension of <a title="Don't Taunt the Bear" href="http://reliable.esymmetrix.com/infrastructure/dont-taunt-the-bear" target="_self">Don&#8217;t Taunt the Bear</a>.  When working on a problem during business hours (or, if you like, non-maintenance hours) before taking anything off line, even for a moment, ask yourself:  Do I need to take this action right now?  How sure am I that it won&#8217;t have any unexpected consequences?  Is the risk I&#8217;m wrong worth the benefit of doing this right now?<br />
All of this may sound like it&#8217;s going to add time to problem resolution, and it might &#8211; however remember that your first responsibility is to keep services flowing to your users. Most users will be unsympathetic if they lose access to their home directories because you were troubleshooting a problem with the printer in accounting and took down the same services that shared files.</p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/infrastructure/first-fly-the-plane/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t Taunt the Bear</title>
		<link>http://reliable.esymmetrix.com/infrastructure/dont-taunt-the-bear</link>
		<comments>http://reliable.esymmetrix.com/infrastructure/dont-taunt-the-bear#comments</comments>
		<pubDate>Mon, 10 Mar 2008 05:45:32 +0000</pubDate>
		<dc:creator>Kendall Miller</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[cluster]]></category>
		<category><![CDATA[lockout]]></category>
		<category><![CDATA[redundancy]]></category>
		<category><![CDATA[SAN]]></category>

		<guid isPermaLink="false">http://kendall.srellim.org/infrastructure/dont-taunt-the-bear</guid>
		<description><![CDATA[When I first started at John Deere, I was working in a division that deployed systems to dealerships. Up until that point, they hadn&#8217;t done anything with hardware RAID. Dealerships are extremely cost-conscious, and while I was a huge believer in the value of hardware RAID arrays, they needed to prove their merit. At that [...]]]></description>
			<content:encoded><![CDATA[<p>When I first started at John Deere, I was working in a division that deployed systems to dealerships. Up until that point, they hadn&#8217;t done anything with hardware RAID. Dealerships are extremely cost-conscious, and while I was a huge believer in the value of hardware RAID arrays, they needed to prove their merit. At that time, HP was the preferred vendor for dealership equipment so I had gotten them to provide us a demonstration server with a hardware RAID card so I could show it off. The high point of my demo to the service staff was when I pulled a drive out of the running server while it was in the middle of running a very visible, high load process &#8211; and to everyone&#8217;s surprise it would just keep running! The first time I did the demo, it worked great &#8211; I pulled out the first drive and the server didn&#8217;t miss a beat.</p>
<p>A day later I was doing the same demo for a group of managers. The previous day&#8217;s work had been fruitful &#8211; it had gotten the attention I wanted and now a higher group wanted to discuss it. This time around, someone raised the question &#8220;so, any drive can fail and the system keeps running?&#8221; With much bravado I replied &#8220;sure! Watch!&#8221; and pulled out the <em>second </em>drive. Two seconds later to my shock the system froze and then went to a blue screen.</p>
<p>This was when I discovered that, unlike the Compaq systems I was used to the HP system didn&#8217;t automatically rebuild by default when you reinserted the drive.</p>
<p>I took a number of lessons away from this:</p>
<ol>
<li>Don&#8217;t assume each vendor&#8217;s equipment works the same way, even if that way seems to make a lot of sense.</li>
<li>There is almost no amount of check &amp; recheck that is too much when removing redundant components.</li>
</ol>
<p>When you work with systems designed for high reliability, it&#8217;s often tempting to take advantage of the innate redundancy of the system to allow you to be somewhat more cavalier in your operational procedures. For example, if you have two web servers that are part of a load balance cluster, conceptually you can take one offline, reboot it and do whatever &#8211; right in the middle of the day when it&#8217;s convenient to your IT staff. On the surface, there&#8217;s nothing wrong with this &#8211; if everything operates as designed, you should be able to rip out the second server and do whatever you want without causing a problem. It&#8217;s very tempting to forget the cluster while working on the server.</p>
<p>However, it often pays to be vigilant in this circumstance. Don&#8217;t taunt the bear &#8211; just because it <em>shouldn&#8217;t</em> cause a problem, doesn&#8217;t mean it <em>won&#8217;t</em> cause a problem. For example &#8211; what if during the reboot the server comes back on line? Depending on how exactly your load balancing system works it may start getting new requests because it appears to be operational. It&#8217;s very hard to explain to your peers and the rest of the business why you went offline because you took a shortcut.</p>
<p>There is a fine line between <em>taking advantage</em> of redundancy and <em>causing </em>problems.</p>
<h2>Don&#8217;t count on Redundancy</h2>
<p>At a SaaS company I worked for we had a highly redundant SAN. Each server had two cards, they connected to two independent switches which in turn each had a connection to the two storage processors that ran the array. The whole system was designed and certified by the vendor to operate without interruption in the face of a failure of a card, switch, storage processor, etc. It also was designed to be continuously operational while having every component upgraded &#8211; the firmware of the switch, the storage processors, etc.</p>
<p>This highly redundant design opens the possibility of performing configuration changes, firmware upgrades, even component replacement during the day while business is going on &#8211; after all, it <em>should </em>work just fine. This is a good example of being tempted to taunt the bear &#8211; <strong>just because a system should be redundant and not have a problem with what you&#8217;re doing, don&#8217;t bank on that capability if you don&#8217;t have a compelling reason to do so</strong>. If you have to do it, don&#8217;t rely on automatic redundancy behavior &#8211; manually take the component offline.</p>
<p>Treat the bear with respect. If you can, schedule work for maintenance time periods so that if there is a service interruption it will have the smallest impact. If you have a good deal of experience that a particular action won&#8217;t cause a problem then you might perform it just outside of business hours instead of during maintenance time periods (which are often in the dark of night).</p>
<h2>Restoring Redundancy</h2>
<p>The rules change a little when dealing with a failure. For example, if you have a drive fail in a redundant array and get in a new drive you have to balance the competing goals of restoring redundancy and the risk of replacing the drive. There are number of risk elements in replacing a failed drive:</p>
<ol>
<li>You could pull the wrong drive, causing the whole array to fail.</li>
<li>The physical disconnection of the drive could cause a SCSI bus reset or some other momentary interruption of data on the array.</li>
<li>The new drive could be electrically defective and short the bus.</li>
<li>Mechanically inserting the drive could disrupt the bus or jar another drive or other physical part, causing the array to fail.</li>
</ol>
<p>So, how do you balance the desire to replace the failed drive with the risks of causing the array to fail?</p>
<ol>
<li>If the system is stable and still redundant, wait until the next scheduled maintenance period to perform corrective action. There&#8217;s no rush.</li>
<li>If it is not redundant, but operable, you need to balance risk with benefit. It is very unlikely that an independent part will fail within 24 hours of another failure, so you can almost always wait until a low activity time outside of business hours or even in the middle of the night to replace the component.</li>
<li>If the system is not stable, you have the most difficult decision. First, <strong>don&#8217;t make this on your own.</strong> Get together at least the available IT engineers and, if at all possible, a representative of the business process(es) affected by the problem. You need to balance the current instability with the probability that you will make it worse by changing the system. If it&#8217;s just a dead drive, this is pretty easy: Low risk, high benefit (however it&#8217;s unlikely you&#8217;d be in an unstable situation if this happened).</li>
</ol>
<h2>Lockout / Tagout</h2>
<p>Clustering systems combine the ability to automatically recognize when a node is down (automatic failover) and be manually told to ignore a node (manual failover). <strong>Before performing invasive work on a node in the cluster that has been taken offline automatically, go back to the clustering system and place the node offline manually.</strong> Think of this as being the equivalent of procedures used when working with dangerous machinery &#8211; <a title="OSHA: Lockout / Tagout" href="http://www.osha.gov/SLTC/controlhazardousenergy/index.html">Lockout/Tagout</a>. Straight from our friends at OSHA:</p>
<blockquote><p>&#8220;Lockout/Tagout (LOTO)&#8221; refers to specific practices and procedures to safeguard employees from the unexpected energization or startup of machinery and equipment, or the release of hazardous energy during service or maintenance activities.</p></blockquote>
<p>This is exactly what we want to do &#8211; make sure while we&#8217;re performing actions that impair the availability of part of a reliable system we have the cluster configured so that the part can&#8217;t be accidentally used. There are two parts of this: Lock out the item so it can&#8217;t be unintentionally accessed and tag the device so that everyone knows that it&#8217;s locked out. You want to be clear on how to accomplish both for each cluster you have. The latter may take the form of just notification &#8211; an email to your support team &#8211; or a post on a central site. The point is you need a big, visible way of clearly communicating the status of the device.</p>
<p>If your clustering mechanism doesn&#8217;t have a way of doing this, or it relies on the node itself (such as Windows NLB) you should consider it always live and dangerous.</p>
<h2>Nice Bear. Friendly Bear.</h2>
<p>If the bear is working well, let him continue doing what he&#8217;s doing. Your running system should be treated with respect at all times, because there is a great deal of complexity that goes into each of the elements and how they work together, even if it appears simple on the outside. As a person responsible for a reliable system, you need to always be thinking in the long term. You don&#8217;t want to cause an outage just to deploy an upgraded component or firmware. Almost without question, the theoretical issues fixed by the firmware update aren&#8217;t going to be as important to your customers are the real issues caused by a service interruption.</p>
]]></content:encoded>
			<wfw:commentRss>http://reliable.esymmetrix.com/infrastructure/dont-taunt-the-bear/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.685 seconds -->
