Latest Posts »
Latest Comments »
Popular Posts »

Pick Your Scale, any Scale.

Written by Kendall Miller on July 6, 2008 – 11:51 pm

Let’s say you’re starting a project to create a new software system. How big does it need to scale? Realistically, either:

  1. This new system fits into an existing business, possibly replacing a prior application, so you can predict with some accuracy the different aspects of scalability that apply to it.
  2. It doesn’t, and you can’t.

The second scenario is the most interesting one. First off, let’s face it - your new system isn’t going to be the next Facebook, MySpace, or eBay. In short, you don’t need to worry about having your system needing to be designed front to back as a super-scalable system. This is good because the options at that level are time consuming and resource intensive.

The key question you need to understand when laying out a new software system is to what degree it needs to scale without being re-written? This scale is unlikely to be your “best case” business size, because scalability has opportunity cost. This scale should be defined as specifically as reasonable, and clearly understood and validated by both business and technical staff. This ensures that if your business grows beyond expectations that it won’t come as a surprise if you need to make even major changes to your system.

Creating facts from Air

Let’s say you’re starting to develop an application that fits into the second category above. You still need to work out what your scalability target is.

To make any decision that is better than random, you have to work out some aspects of the expected scaling of the application. In the absence of real facts to extrapolate scalability from, you need to cooperate with the business side to established presumed facts of the scalability requirements. This may sound a lot like assumptions, but they really go beyond that because these will become facts as you develop the system. As a starting point, make it clear to all involved that:

  1. If the targets are low, it should be assumed you’ll have to turn away business because the system can’t scale above them.
  2. If the targets are high, the system will cost more and take longer to create.

In most businesses, the second outcome is worse than the first. Why? Because the second is a price you pay up front, before the system goes into service. The first is based on an assumption: you might have to turn away business. You also might be able to realize it in time and address the issue. From a business standpoint, this is a better trade off. Finally, there’s the non-technical aspects:

  1. The sooner you have a working system, the sooner the business can validate the market and start getting real data on uptake to adjust your scalability goals
  2. Unless the product is a failure, you expect demand to eventually exceed the capacity of the system, it’s just a matter of when. If it does, then you should be able to afford rewriting all or part of the system. In other words, the funds to solve the problem should be available if you have the problem.

From this comes an axiom of scalability:

The system needs to be based on the lowest scale that will provide enough time and money to replace it with a new system.

Put another way, a system that is faster or more scalable than it needs to be for the business was more expensive and took longer to develop than necessary. Think of it like a race car: The ideal Indy Car would fall apart just after the judges validated it won without breaking the rules. Any stronger and that strength could have been put into something else. The time you spent making it more scalable than necessary could have added more features, fixed more defects, or gotten it out the door sooner.

Establish a Growth Curve

The growth curve needs to be sufficient to inform the developers of what decisions to make at each point. To get there, start with describing the scale from the business stand point. During design of the actual system you can keep translating this into the specific requirements for speed, storage, and capacity based on the behavior of the actual system. This will prevent you from achieving technical goals that don’t satisfy the business goals.

For most systems, you want to establish the business goals for:

  1. Number of Possible Users: How many accounts will there be on the system? This is an upper bound of the number of people that could access the system if they wanted to.
  2. Number of Simultaneous Users: Number of accounts that will be accessing the system at the same time. For most applications, at the same time is likely best thought of as in the same 15-30 minutes.
  3. Number of Customers: For most applications delivered to businesses the number of customers (e.g. businesses) drives the scalability of some parts of the system (such as configuration and data storage) will scale based on the number of customers, not the number of accounts those customers have.
  4. Data In and Out: If the system is going to have any imports and exports that aren’t user-driven (such as EDI feeds or a public API) then the number of partners (other entities that will exchange information with you) and the frequency of exchange need to be determined.

Things to not bother with:

  1. Response Time: For customer interactive products, response time is dictated by what end users will tolerate and is not really going to be a business decision (aside from deciding if you’re going to produce something your customers are willing to use). For non-interactive products or back-end this may need more discussion with the business, but again - the business is going to expect you to be able to figure out what will make it a success.
  2. Data Retention: Assume it all has to be kept and more indefinitely. In the end, storage is cheap and this design decision rarely costs a lot of made up front but is expensive to reverse. Data also has the amazing power to make heroes out of IT when the business starts posing questions later and you can answer them. Generate as many facts as you can now to help you out later.

These items are past the point of diminishing returns with the business. You should work them out within the development team and document them, but you shouldn’t believe that any business sign off you might get is binding or useful.

Build to the Scale

Once you’ve established your growth curves, pick your candidate architecture and translate the growth curves into system performance requirements.

Hypothetical Example: If you need to support 1000 simultaneous users for a web application, determine the dynamic web hits per second by determining how often an average user will request a dynamic page (say ever 5 seconds, which is very fast for most dynamic applications) These two numbers would give you a dynamic hits per second of (1000/5) = 200. Then add how long each page will take to calculate (make a goal of say 250ms) to get how many requests you need to be able to process at the same time: (200 * 0.250) = 50. This is the key scale point for your web application: When deployed, it must support 50 requests being processed in parallel. You’ll need to get to this point by either making it really scalable on a single server, or splitting the load over multiple servers.

One thing that should jump out of the math behind this is that anything you can do to make the calculation time of a single page drop pays big dividends: If you drop the average calculation time by half (125ms) then the number of requests in parallel drops by half (200*0.125) = 25. This in turn may well cut the number of servers you need in half, easing your maintenance and deployment cost. If you can’t do this, reduce the number of dynamic pages requested per second by either making more static pages (such as pre-rendering pages that change but don’t change frequently) or caching dynamic pages that have some predictable consistency (which really makes them static pages). This is often much trickier to do and test, so your best first option is to reduce the time for each page.

Side Point: This also highlights an easy way to accommodate guessing low on a system that’s been in service for a year or more: If you’re processor bound you can replace that hardware with current units and often pick up 30% per year it’s been since you purchased the original hardware. This won’t save you from network problems, disk storage problems, or some memory problems, but it is surprisingly handy.

As you look at each candidate architecture, look at each component and determine the critical “how much, how fast, how often” factors based on the business inputs. If you change your architecture or external interface design (the user interface or import/export capabilities) you need to re-evaluate if you’ve moved the targets as well because your design goals no longer reflect the business growth curves.

Really, to the Scale

Within your development team you will typically have two types of developers you need to watch: Those that never consider scale and those that obsessively consider scale. The former will build it however and then wait to see if there is a performance problem. The latter will try to make every system the next Amazon. Neither situation is good. Identify early people’s tendencies and work to manage them to the center. Remember that the system is only as scalable as its slowest part, and there is always a slowest part.

You can get good results by having the people that are most concerned about scalability move around on the project to different subsystems. This will tend to keep them too busy to earn the keeper of the nanosecond award on any one system (which they will do if you let them stay put and just work on one system) and will make it unlikely that more cavalier developers can hide a problem. It will also help the team learn from each other: It often isn’t worth making a specific feature as fast as possible, and it is always worth thinking about what will make a feature fast before coding it.

Finally, budget time in the development team to fix scalability issues. Regardless of how much work you put into it, once the real system is build and tested you’ll find places that are slower and less scalable than you expected. If nothing else, you need to develop an accurate model of how the system should perform in production so you can check the real world against it later. As your business grows, you need to be able to get ahead of it and understand when it is time to make the code faster, add hardware, or do something else to stay one step ahead.

Disk is Your Friend, but Beware the Network

If you’ve gone over the system from nose to tail and you’re disk bound, you’ve probably optimized that design as well as you can. Disk has gotten faster at a much slower pace than memory or processor, and being disk bound means you’re getting all the requests where they need to go in a timely manner and are able to process the inputs and outputs, so now it’s in the hands of the hardware. Unfortunately at that point there generally isn’t much more you can do: The difference in performance between server drives and the fastest drives money can buy isn’t very much.

If you’re finding that you aren’t disk bound and you aren’t processor bound then be worried. You’re either network throughput bound or you’re network latency bound. If you’re network throughput bound, you can probably fix it cost effectively with some basic engineering either in how you select what to send across the network or what you cache so you don’t need to send it across. You should try to give yourself some headroom here for growth, but faster networks can be purchased and you can generally tweak the software to mitigate this in minor updates.

Being network latency bound is a more serious issue because it often means that you are at the practical scalability limit of your application. The difference in network latency between relatively cheap hardware and the best hardware isn’t very much, and has been essentially constant for the last 10 years. You can’t buy your way out of this problem. It also is typically caused by a badly designed interface between components of the system which will need to be substantially or entirely rethought and rebuilt to address, which isn’t easy to do with a running system. If you find yourself in this situation and you aren’t sure you have met your business goals you should rethink your approach immediately. Because no amount of money on hardware can get you out of this problem, caution is the word of the day.


Tags: , , , , ,
Posted in Management, Software Development | No Comments »

Technology is not Scalable

Written by Kendall Miller on March 24, 2008 – 11:22 pm

I was watching Start-Up Junkies the other day which is following a group of people attempting to get Earth Class Mail off the ground. On one recent episode, the main focus was converting their system from being PHP-based to Microsoft .NET.

As you watch the episode, you hear two different reasons given for this transition. The CEO advances the issue that they started with PHP because they needed to get something done quick, but now they need to switch to ASP.NET to make it scale. They always knew they’d have to do it, but now they’re against the wall because they have been invited to demo at a Microsoft conference. The lead engineering staff and operations manager advance a different point: We are about to be demoing in Europe in Microsoft’s booth at a major convention that’s key to our growth, we need to be using Microsoft’s technology for this to happen.

Of these two reasons, which do you think is the better reason for their technology choice:

  1. Convert to ASP.NET to scale up because PHP can’t scale.
  2. Convert to ASP.NET because we’re getting marketing and sales assistance from Microsoft, which we’ll only get on their platform.

If you picked #1, two things are likely true: First, you haven’t worked with enough technology to know that the choice between PHP and ASP.NET for scalability is pretty far down on the list of “things that control how much we can scale”. Second, you’re probably not balancing business and technology interests effectively.

Go and check out the technology portfolio used at the most scalable web sites in the world - say the top ten super scalable systems, systems that are going to be at least two orders of magnitude greater than anything you’re likely to create (and this isn’t a negative thing; it’s a liberating thing). Notice anything in particular? You don’t tend to see either Microsoft ASP.NET or J2EE in the web infrastructure. In fact, you tend to see a lot of… PHP.

There are a few key reasons that the super scalable sites like these solutions:

  1. Open source means they have the source: You can bet that these sites aren’t able to use anything off the shelf. Their needs so outstrip the normal system that it isn’t reasonable that any off-the-shelf framework is going to fit their needs entirely. In fact, if it did that framework is likely seriously over-engineered for the majority of its user base.
  2. Licensing costs add up: If you’re a small shop, licensing costs are highly unlikely to be a significant percentage of your total cost of goods for a technology product; bandwidth, hosting, and above all people are the big numbers. If you’re Google, you don’t want to pay even $10 in licensing per server. This is similar to a large manufacturer worry about saving $.05 on a bolt; small incremental costs still add up.
  3. Scalability is their first concern: More important than ease of development, cool debuggers, third party component libraries, or anything else. If it can’t meet their scale, it isn’t even a potential solution. Perhaps more importantly, they have to have both the experience and human belief that it will scale. If you’re one of these sites, there is no way a vendor has tested their solution at your scale - you’ll be the first. If you’re going to be the first, you want to have a simple solution that you can adjust and correct.

If you’re honest, your decision matrix isn’t the same as this. It’s highly unlikely you’ll create the next MySpace, even if you are successful. While the principles of scalability are constant, the importance of scalability vs. other constraints changes. More likely, you need to base your technology choices on a mix of:

  1. What resources do you have? If you already have a staff of people experienced at technology X, they are likely to produce more results in any moderate interval of time (say one to three months) with this technology than any new one. If you have a large body of existing code in technology X, this is a big accelerator to your project.
  2. What resources can you get? When picking a technology, buy the community, not the product. If you can take a number of pieces off the shelf, particularly for things you aren’t attempting to innovate (such as security, content management, grid controls, reporting..) it will accelerate your product curve. Conversely, if you can’t get great people that want to work with a technology, it really doesn’t matter how great the technology is.
  3. What religion is your market? Many markets have a non-rational product selection bias. For example, if you want to sell your product primarily to Macintosh users, you probably shouldn’t use ASP.NET. It isn’t that it should make a difference to how the product works for them, but as a group Macintosh users tend to put “Not Microsoft” on their evaluation lists. Similarly Linux users. Conversely, there are several products that are defined as “just like X, but in ASP.NET!” If your market typically has a technology selection criteria that isn’t based on business or practical fundamentals, it’s best to respect it, otherwise you’ll have to focus additional energy during your sales and marketing efforts to overcome what your market will perceive as a natural disadvantage. The coolest technology, developed quickly and cheaply, is no good if your target customer won’t even invite you to the dance.

Back to Earth Class Mail. Could they scale using PHP? Absolutely, others have. Should they switch to ASP.NET? Probably - they wanted to leverage the marketing advantage of Microsoft. I suspect if IBM was the big animal in the space they wanted and a deal could have been made it would have been WebSphere instead of ASP.NET. Each of these technologies can scale, or not scale, depending on how they are used.


Tags: , , ,
Posted in Infrastructure, Software Development | 3 Comments »