We deliver 99.9% connectivity
rather than the 97% delivered by others
Many hosters will proudly tout their Cisco based BGP, BGP2, or
BGP4 routing for enhanced Internet connectivity reliability. In fact, a properly
managed BGP service will give a server site connectivity reliability that exceeds
99.99%. But the problem with BGP is that there is always only one pathway to
a web site at any given time. If anything along the multi-thousand mile route
"breaks" there is nothing that either the server or the client can
do to get around the problem. BGP4 is 99.99% "reliable" but it is
only 97% "accessible."
The Internet breaks, regularly!
When you make a connection to a web site, chances are that you
will pass through 20 to 30 routers. You will frequently also pass through thousands
of miles of optical fiber, with all of its signal amplifiers and segments that
can be assaulted by backhoes. Finally, there are times when pieces of the Internet
get congested, and some or all of your data just never gets through. The Internet,
overall, does a pretty good job of routing around these problems. The BGP routing
system generally finds breakage and posts alternate routes over time. But during
the interval, those users affected still cannot access the web site. While there
will be many routes that are never broken or almost never broken, in our experience
about 3 percent of the time someone tries to make a connection to a particular
web site, they will be unable to connect or stay connected for the entire session
due to some form of route failure or congestion.
BGP fixes last mile breakage,
but it does not fix intermediate breakage
BGP as used by hosting service providers delivers
two things to the provider. The first is rapid cutover to alternate carriers
if their last mile to an individual Internet carrier is cut. The second of these
is load-balancing to make more efficient use of bandwidth. BGP is extremely
useful for a hoster because it almost instantly restores local service for the
Internet provider if one of his carriers loses local Internet connectivity (though
propagation of the change so that all users can actually connect may take some
time to propagate). But while BGP restores provider service rapidly in their
own local-failure environment, neither the service provider nor the client at
the other end can do anything to get around breakage in the middle because neither
knows where the problem is. Both must wait for the Internet to self-heal through
the advertising of a new route.
MultiPathing fixes all but first mile breakage
All modern browsers have the ability to go to a
named website via multiple paths if the web site advertises multiple paths.
Browsers arbitrarily choose one, try it, and then try each item in the list
sequentially if there is no response from the prior route in about 30 seconds.
Browsers also do the same thing if a then-current connection is interrupted
during a session.
When these alternate destinations have different
Internet carriers (UUnet, AT+T, and Sprint are carriers) the routes normally
diverge from each other within a few router hops from the user's browser. Thus,
the only common point of failure is the first couple of hops, which is normally
the responsibility of the user's ISP. After divergence, there must be simultaneous
failure on all routes for connection to be impossible.
It is this early divergence of routes that makes
the MultiPathing approach far more "accessible" than BGP, because
MultiPathing self-heals in about 30 seconds, while BGP needs a number of minutes
to self-heal, if it can find the problem in the first-place.
The Interstate highway analogy
The best way to understand the difference between
BGP and MultiPathing is to think of traveling on the Interstate from Washington
DC to Los Angeles.
In BGP, at any given time, the driver might only
be permitted to drive via I-70/I-15. If there is an accident in Columbus, they
will never get to the destination, because there is only one path, and divergence
is not permitted until the Internet advertises a new satisfactory route.
Conversely, in MultiPath, the driver has at least
two routes, perhaps I-70/I-15 and I-66/I-81/I-10. If they cannot get through
on the first, they get to try to try the second route a few seconds later. All
they need to be able to do to get through is to get to the Beltway where both
start.
The mirrored variant of MultiPathing
If someone in Washington DC has a choice between going to Miami
or going to Boston for their data, they intrinsically have MultiPath capabilities,
because most of the route is not in common, and the endpoints are not in common.
Mirrored solutions are the most reliable for most web sites because they have
nothing in common with each other, and thus extremely few common points of failure.
This is the best solution for static content because the the data being delivered
changes only occasionally.
True MultiPathing for dynamic systems
When web sites are actually collecting data from customers and
prospects, rather than just giving the customers data, the data ultimately needs
to go back to one logical system. If you try to go to two or more simultaneously,
you end up with two data sets, and possibly the problem of someone starting
on one server and being unable to complete on another (due to Internet path
failure). Logically, therefore, most dynamic databases need to be on a single
server.
For such situations, we offer true MultiPathing, multiple routes,
on multiple carriers, all pointing at the same machine to deliver connectivity
that is as reliable as the machine itself ... something in excess of 99.9%.
Mirroring with MultiPathing
squares the reliability
Our Ultra product line squares reliability by delivering mirrorred
machines, each of which is MultiPathed. Thus even with the loss of a server,
traffic remains highly reliable. Similarly, delivery remains highly reliable
even if the Internet is having severe problems. Ultra costs a good deal more
than our other products, but if loss of business is a concern, it is a wise
investment.
Why 3 percent failure matters to you
Three percent matters to you because use of the
Internet is so low cost relative to the value it produces. EasyCo, for instance,
will deliver 20,000 typical web pages for just one dollar. Because costs are
so low, a typical e-tailer will have standard web hosting bills that amount
to about 1/10 of a percent of sales.
The costs of doing business are not in the hosting
service, but in the costs of people to prepare the materials, prepare the programs
that deliver it, process the orders, and keep the computer equipment that manages
it all alive. Losing 3 percent of sales in the face of these high, and generally
fixed costs, has a major impact on the bottom line. Reliability matters not
because of the intrinsic waste of bits that never get delivered, but because
of the massive economic loss that is a consequence of non-delivery.
The business question is: do you want to waste 3%
of your sales potential to save a fraction of a 1/10th percent in relative costs?
Why don't more providers offer MultiPathing?
To be honest, we only offer MultiPathing because
our business started in database hosting, where reliable connectivity is absolutely
needed for reliable service. In database hosting, people must remain connected
all day long, or even (in some cases) 24x7. We could not afford to have a company
with 100 employees totally out of business for 20 to 30 minutes at a time due
to a BGP route failure, nor the associated tech-support costs. MultiPathing
gives us the ability to transparently reconnect on a line failure, with "real"
down time a virtually unnoticeable 30 seconds.
This said, MultiPathing is not a simple engineering
project because there are some complex interrelationship and configuration issues.
Most hosters therefore buy the off the shelf stuff, and presume that it is the
best possible. In fact, many professionals don't understand the problem. They
ask the question: "can I connect reliably to the Internet," rather
than asking the more important question: "can someone on the Internet connect
reliably to me?" -- they design for what is in their control, rather than
designing for what is not in the control of the remote-user.
|