March 13, 2019 CLOUD COMPUTING,CLOUD SECURITY

Cloud Security – Why We Chose A Mixed Solution & How Much It Has Cost Us

If you refer to my previous post, you’ll know that we were considering whether to build a private cloud solution or move everything to the public cloud. And that after careful evaluation, we ended up adopting a mixed solution (hybrid), where most of the VMs are in our private cloud, but those directly exposed to the internet are in the public one.  The reasons for this decision should be clear as well. We chose to put the public facing servers in the public cloud because there, it’s a lot easier to react to DDoS attacks. First of all, our own infrastructure, no matter how big and mighty, could never have the bandwidth, resiliency and scalability that a public provider can offer. Second, if an attack does occur, I’d rather that it affects their routers and not mine (shh, don’t tell them I said that).

Well into 3 years in this particular configuration, I’m now able to generate some actual numbers to verify whether our choices were correct or not. I can go back to 2015 to check costs and compare scenarios, _what if’s_, and figure out if this was the most cost effective and efficient choice.

And indeed, it turns out that it was.

All the servers we’ve been running on AWS for 3 years have so far cost us an average of $1,000/year each. That may not sound like much, but we must also consider that on AWS, we’re running very low disk sizes, high bandwidth, small foot print machines. A single server in our private infrastructure (with 48 CPUs and 1TB of RAM) would’ve cost us merely $5,000, and it could run at least 20 of those VMs. Thus, comparatively speaking, the public cloud is _not_ cheaper than a private one.

In our private cloud, we spent about $100,000 to build a single 42RU rack. Remember that the firewalls are ours, and we’re running 4 of them in each rack. The other 38 Us are used for switches, CPU servers (compute nodes), disk servers and controllers.

One aspect that certainly reduced costs is the use of Open Stack.  VMWare would’ve been very expensive to adopt, and that’s a factor that must be accounted for if you’re running your own numbers.  I didn’t include the expertise needed to implement the private vs the public cloud simply because while the skills may be different, you’ll still need someone with the appropriate know-how to implement either one of them. On our part, we just happened to have someone who knew how to implement an AWS infrastructure, and who was also capable of figuring out how to implement OpenStack with Ceph.

The advantage of having our own infrastructure is that now, every time we need a new VM, we just bring one up without adding to the overall cost. If anything, every time we bring up a new VM, the cost per VM decreases proportionately.

The monthly cost of renting the rack in a high tier data center is also not to be underestimated. Given our requirement for almost zero down time (99.999%), we chose to let the data center handle our connectivity. We give them our IP addresses and they advertise those IP addresses on 3 separate ISPs, on 2 border gateway routers. In 16 years of service, we have had 0 downtime.

This comes with a price tag though.

Purchasing our own ISP line would be far less expensive; but it would never have the uptime that we need. And I really don’t want to have 200 angry clients calling me because their VM connectivity went down. That network link must stay up, at all costs, all the time. At this juncture, this is costing us $60,000/year, therefore in 3 years, the rack would’ve cost us $280,000. Would it be cheaper to have it on AWS? No, it wouldn’t. We’ve calculated that, based on the size of the servers we’re running, the use of bandwidth (1Gbps almost constantly utilized), on AWS, we’d likely have spent double that amount in the same amount of time (3 years).

The above numbers refer to 1 rack only simply because it’s easier to calculate costs this way. Each rack has its own connectivity and is fully autonomous. Even if your case is more complicated – for instance, you have CPU servers in one rack and disks in another, and many, many racks beside each other – the situation doesn’t really change. Maybe a rack has no internet connectivity, so the yearly cost goes way down. But that all bodes better for the private cloud. It’s something you’ll need to figure out for your own environment. After all, mine is definitely not “the ultimate business case”.  It serves merely as an example of how things can be done, specifically one that has worked well for us for 3 years, and that we don’t intend to change.

Will I think differently in the future?

Not any time soon, no. But if the prices on AWS start coming down because competition truly starts picking up, then we may reconsider our strategy. Ideally I’d love to get away completely from having to manage hardware. We built all this to simplify our life. I can tell you, in hindsight, we didn’t fully accomplish that objective. We’re still managing hardware, disks that break, journaling disks that don’t perform, memory that runs out, and all the usual issues hardware poses; obsolescence included. Being able to get away from all this altogether would be idyllic for me.

Right now, it’s simply not possible, unless I accept to literally double my expenses.

All things said and done, public cloud costs at some point will come down far enough that all I’ve written here today will prove inaccurate. And I look forward to that day.  But until that moment occurs, a hybrid infrastructure is our optimal solution.