Petabyte Scale Analytics

Data volumes are increasing every year and they are unmanageable. How a company is going to manage the data? We help you here. We implement the Big data strategies and help the customers to save the data in an efficient and cost effective way. We also help the customers in accessing, analysing and interpret the data.

app development

Everyone is talking about Big Data analytics and associated business intelligence marvels these days, but before organizations will be able to leverage the data, they'll have to figure out how to store it. Managing larger data stores—at the petabyte scale and larger—is fundamentally different from managing traditional large-scale data sets.

Managing the petabyte-scale and larger data stores that are a fact of life with Big Data is a different beast than managing traditional large-scale data infrastructures. Online photo site Shutterfly--which manages more than 30 petabytes of data--shares its strategy for taming the storage beast.

Infosmart differentiates itself by allowing users to store an unlimited number of images that are kept at the original resolution, never downscaled. It also says it never deletes a photo.

"Our image archive is north of 30 petabytes of data," says Neil Day, Shutterfly senior vice president and chief technology officer. He adds, "Our storage pool grows faster than our customer base. When we acquire a customer, the first thing they do is upload a bunch of photos to us. And then when they fall in love with us, the first thing they do is upload a bunch of additional photos."

"Petabyte-scale infrastructures are just an entirely different ballgame," Day says. "They're very difficult to build and maintain. The administrative load on a petabyte or multi-petabyte infrastructure is just a night and day difference from the traditional large-scale data sets. It's like the difference between dealing with the data on your laptop and the data on a RAID array."

When Day joined Shutterfly in 2009, storage had already become one of the company's biggest buckets of expense, and it was growing at a rapid clip—not just in terms of raw capacity, but in terms of staffing.

"Every n petabytes of additional storage meant we needed another storage administrator to support that physical and logical infrastructure," Day says. With such massive data stores, he says, "things break much more frequently. Anyone who's managing a really large archive is dealing with hardware failures on an ongoing basis. The fundamental problem that everyone is trying to solve is, knowing that a fraction of your drives are going to fail in any given interval, how do you make sure your data remains available and the performance doesn't degrade?"