Salesforce’s Database

If you’re building systems to run at a large scale, then rather than waste time and money trying to avoid any failure, you need to suck it up and accept that faults will happen – and make sure you’ve got enough cheap gear to recover.

So says Salesforce, which argued that this strategy can save you a bunch of cash you’d otherwise spend on expensive hardware, and makes it easier for your applications to survive catastrophes.

In a candid keynote speech at theRicon West distributed systems conference on Tuesday, Salesforce architect and former Amazon infrastructure brain Pat Helland talked up Salesforce’s internal “Keystone” system: this technology lets the company provide greater backup and replication capabilities for data stored in Oracle without having to spend Oracle prices on supporting infrastructure.

“You need to have enterprise-trust with web-class resilience,” Helland said. “Failures are normal, mean time between failure makes failures common. You have to expect things to break so you have to layer it with immutable data*.

“The ideal design approach is ‘web scale and I want to build it out of shit’.”

Salesforce’s Keystone system takes data from Oracle and then layers it on top of a set of cheap infrastructure running on commodity servers, Helland explained. The Oracle technology gives Salesforce confidence and consistency, he said, and the secondary layer of commodity systems and open-source software can give the company greater flexibility and a cheaper way of providing storage infrastructure, he explained.

What’s inside Keystone

Keystone consists of several clusters of storage servers running 10 four-terabyte drives and two 750GB SSDs apiece, he explained. Salesforce has a preference for buying “the shittiest SSDs money can buy,” he said, then designing systems to route around failures. Keystone’s storage underbelly consists of clusters of 10s to 100s of nodes, possibly scaling up to thousands he explained.

The design methodology behind Keystone comes from two techniques named ROC – recovery-oriented computation – and SOFT – storage over flaky technologies. ROC comes from some research done at the University of California at Berkeley in the mid-2000s and is about designing systems to rapidly recover from failures, while SOFT is Helland’s own term for building storage systems with guarantees despite its cheap-as-chips hardware.

‘You get better system behavior if you assume everything is a hunk of crap’

Salesforce is designing systems using these techniques so that it can better deal with the flakiness of commodity infrastructure, without having to upgrade to more expensive systems, he said. “Storage servers may crash and they can lie with their frickin’ teeth. You get better system behavior if you assume everything is a hunk of crap.”

Keystone has four main elements: a Catalog for keeping track of data, a Store for storing it, a Vault for long-term storage, and a Pump for shuffling data between systems over WAN.

The Catalog provides an intermediary between storage systems, such as Oracle, and secondary storage systems built on commodity hardware. Primary storage can point to Keystone, which then points to secondary systems, making it easy to shift the location of secondary data without having to fiddle with the primary system.

The Store keeps hold of the data which is fed into the catalog, and sees Salesforce adopt the design approach pioneered by major companies – such as Google, Facebook, and Amazon – of using large quantities of low-cost hardware to provide backend storage while achieving good guarantees and reliability through a software layer.

This design approach has one glaring problem: failures. These happen a lot, Helland explained. About 4 per cent or more SATA hard drives fail a year, he said, and so a data center with 1,200 of Salesforce’s storage-stuffed servers will lose 480 drives every 12 months. Therefore storage needs to be triple replicated to deal with these failures. However, servers will fail at a shade under one percent per year, he explained, so data must also be replicated at distance from rack infrastructures.

This brings about problems regarding maintaining consistency when pulling from a replicate, and so requires a caching layer to store locations of the data and keep everything up to date. This is built on cheap consumer-grade SSDs, he said, which have awful reliability.

“Consumer-grade SSDs will just die – the device stops less than 1 percent per year,” he said. They also wear out after about 3,000 write cycles, so with a 750GB SSD you can probably write 2.25PBs to it before it fails, he said.

They also sometimes suffer from bit rot, which in his experience using the cheapest possible systems has one uncorrected bit error for every 1014 bits (11.3TB) written. By comparison, top-of-the-line enterprise-class SSDs have a rot rate of one bit in every 1019 (1.08EB), but they are much more expensive.

Modern software needs to be built to fail, he says, because if you design it in a monolithic, interlinked manner, then a simple hardware brownout can ripple through the entire system and take you offline.

“If everything in the system can break it’s more robust if it does break. If you run around and nobody knows what happens when it breaks then you don’t have a robust system,” he says.

From: the Register

Leave a Reply

Your email address will not be published.


This site uses Akismet to reduce spam. Learn how your comment data is processed.

© 2024 Dubsbhoy Designs. All rights reserved.

Do you have one with full illuminated hands, if so this is the one I want. Just to let you guys rolex replica sale that my watch came yesterday, I am very pleased it looks great and I will tell my friends that you are a rolex replica replica company to be trusted ,and I will be ordering a ladies Rolex very soon, but I want the smooth movement can you tell me which ones are best . That's really disappointing, we wanted the replica watches we chose as a matching pair, I cannot tell from the pictures whether it is the man's or ladies you don't, either way if our original choices are not possible please cancel the entire order, that is both watches. Please process a rolex replica to my payment method. We do not want to choose other items if our selections are not available. What is your estimated time for these items to be back in stock. We will wait a little while, by that I mean days not weeks. Otherwise we have no option but to request a refund, I do not expect charges to be levied for issuing a refund, we entered into a replica watches uk for you to supply goods, if you have failed to complete the contract it is not my fault, if necessary I will refer this matter to my card company for attention.