Close message

Blog posts tagged Architecture

Going with EC2/S3

December 8, 2007 – 18:14

There are a lot of crucial decisions to make when building a start up from scratch. One key call is your hardware strategy.

Matt Kenison, Mindsite’s first employee and our development architect, wrote a report earlier this summer on the hardware setup we should use and why. Taking into consider some of our business constraints- such as optimizing for scalability and cost- Matt thought that the EC2/S3 system was the best fit. We can’t speak to Amazon’s performance as a production partner just yet, but we’ve been incredibly happy so far. We think, if you’re looking for this kind of information, that Matt’s report will give you valuable insight to some of your options.

- - - - -

Mindsite Production Architecture

Last modified: 17 July 2007

This document compares architecture choices for hosting the Mindsite production website through the first two years.

Architecture choices

The hosting options we are comparing are:

  • Virtual Private Server (VPS): each server instance runs in a virtual machine on a server shared by other VMs in a data center owned by a hosting company. This means we get root on the VPS, but can’t control the hardware or network.
  • Dedicated server: a physical machine rented by a hosting company. This is a step up from the VPS in that we get access to the whole machine (providing better performance), but can’t upgrade it or control the network.
  • Amazon Elastic Computing Cloud (EC2): Amazon provides an API for creating and tearing down VPSes, charging per-hour-per-instance. EC2 integrates with their S3 storage system.
  • Purchasing hardware: Purchasing our hardware to run at a colo is too expensive and time-consuming for us to consider it a viable option at this point.

Managed hosting needs to either be fully automated or have 24x7 support.

Criteria

The main goals of the production site are scalability, reliability, performance and price. Scalability is the ability to add more hardware as load increases (e.g., more webservers to handle traffic or more database slaves to handle read queries) – the one exception to scalability is a master database, of which we will have only one for the near future. Reliability is the ability of the hosting environment to detect and route around failed hardware; if a database slave dies, web servers should stop sending it queries. Basically, have redundancy built everywhere. This allows individual parts to fail without bringing down the whole application. Performance is affected by a number of factors: CPU, memory, hard disk, network speed, network latency, other users (in the case of a VPS), etc.

In general, a scalable architecture negates performance problems since we can add more hardware, but the hardware may not be customized to suit the specific bottleneck causing the problem. Because of the performance requirement, latency overhead will require that all parts of the system be hosted with the same provider. Price includes the total cost of ownership including hosting fees, bandwidth, and any additional constraints required by the hosting choice.

Hosting environment

It is possible and easy to run a small-scale website off of one or two servers, if you don’t worry about reliability (if either server goes down, the whole site is dead – it’s easy to just restart it, though). While we may start with this, pricing will be generated by assuming we will use the following setup in the future:

  • 2 front-end load balancers
  • 3 web servers
  • 3 cache servers
  • 2 slave databases
  • 2 master databases (1 primary, 1 replicating secondary)
  • 1 monitoring server / batch processor
  • 2 extra servers for testing and staging

We’re planning for this now so we don’t spend a lot of time migrating to a new architecture every time we increase capacity. Overall, I am assuming a highly redundant setup, but not at the level of HA. A few failed requests are acceptable provided they are actually using the server when it fails.

I think two servers and one slave would be sufficient for base load, but the additional hardware is required to handle spikes in traffic gracefully.

Dedicated

Dedicated hosting is essentially renting out a physical box at a data center. The data center will manage the rack on which our boxes are located and the network/internet connection they use. Dedicated hosting easily provides the best performance, since boxes can be customized for their purpose (e.g., extra RAM in the caching server, faster hard drives on the database, etc.), and placed next to each other to reduce network latency.

Since the hosting company manages our machines, any changes to the architecture would need to be planned a few weeks to a month in advance. Additionally, we would be unable to respond to traffic spikes without prior planning – this means that most of the time we would have unused machines. We may be able to alleviate some of the cost by using these machines for testing, but that would add some overhead time to switch them into production when they’re needed. It would also screw up the testing schedule to have the machines unavailable.

Dedicated machines provide more power per instance, which really becomes useful when we get more machines – instead of having to manage 20 VMs, we would have far fewer dedicated boxes – 4 or 5. Since we need redundancy early on, however, we have a minimum requirement of n+1 boxes, where n is the number needed to satisfy our load (+1 is so that in case a box fails we do not hit our capacity limit). This also manifests in being able to provide customized machines – instead of 3 cache servers, we can have 2 with more memory, or possibly even put them on the web servers to start. The downside to this is that it requires more environment configuration, which means less development time.

Overall, dedicated hosting is the “safest” bet – it provides us the advantage of controlling our hardware without the overhead of purchasing it and configuring it for a colo. It’s also the most expensive and least flexible – we would pay for a lot more than we actually use, especially at the beginning. In the long run, the ability to deploy customized hardware makes the overall cost of dedicated boxes much cheaper.

VPS

Virtual private servers tend to be run on physical machines with other VPSes, so performance is lower, but the hosting is cheaper. For the most part, we can expand by adding more VPS instances to the point where it becomes cost-ineffective. The benefits of VPS over dedicated hosting are that VPS instances are generally quicker to provision – dedicated boxes require new hardware, but VPSes can run wherever a system is underutilized.

In general, unmanaged VPS instances can be used in the same manner as an equivalent dedicated box, but the overall throughput and capacity will be more limited. We may be able to mix VPSes with dedicated hardware (this is not provided by Rackspace). This would alleviate some scalability problems for the master database, since we can run it on more powerful hardware.

A VPS solution would still require leasing load balancers, and depending on the company may require us to roll our own in software as VPS instances.

Overall, the VPS solution may be good to start out, but I think unless the company is really helpful in managing our environment, we would be better off getting dedicated hosting. They are not drastically cheaper than dedicated machines, but offer less customization per box and less flexibility than EC2.

EC2

amazon_web_services_logo.png

EC2 provides dynamic computing instances, charging per hour per instance running. Each instance is a VM providing the equivalent of a 1.7Ghz x86 processor, 1.75GB of RAM, 160GB of local disk, and 250Mb/s network bandwidth [NOTE: additional options are now available]. Internal bandwidth, and bandwidth to S3 from EC2 is free. The benefits are mainly in its flexibility: we can provision new instances as we need them. This makes responding to traffic spikes almost free, and won’t require up-front planning to ensure the hardware is in place beforehand.

There are a few negative aspects of running on EC2. The minor ones are that it is still in beta, doesn’t have an SLA, and there are no real competitors yet. A bigger concern is the lack of persistent storage on each instance. If an instance crashes, we will lose everything stored to disk – this only affects the master database. We can eliminate a lot of risk by using master-master replication, but we may still lose a few records if it crashes before synchronization. VPS and dedicated hosting would lose the data until the primary master is back up. This can be caught in software with some extra planning. Another concern, also about the master database, is that the equivalent hardware of an EC2 instance is pretty low for a highly-trafficked database. We will need to start clustering the data very early in comparison to other solutions – with a large enough site we will have to do this anyway, but not for a long time.

The biggest consequence of using EC2 is that they do not provide static IP addresses. They are supposedly working on this feature, but don’t have a release date. If we register a load balancer’s public name as the CNAME for our domain, and the load balancer fails, we can update the DNS entry, but clients will not refresh it until the TTL expires (ideally), or their cache expires the entry. The second scenario is the problem – we cannot control for how long clients will cache the bad DNS entry.

Round-robin DNS against multiple load balancers would reduce the number of users affected, but would not mitigate the problem. We could have the DNS entry point to a hosted reverse proxy that connects only to good instances – this adds the cost of the reverse proxies, plus a few milliseconds of latency to every page request. Lastly, instead of reverse proxies we can simply send redirect commands to the hostname of the good load balancer -- www0.mindsite.com, www1.mindsite.com, etc. This would be visible to the client and it looks weird, but it would be temporary, and load balancer failing is a low-probability occurrence; it also adds the cost of running the web servers.


Add to Technorati Favorites