Marco Kotrotsos in a recent article claims that the cloud has failed, and instead advocates a P2P cloud architecture:
I am referring to the slew of outages the last day’s and weeks of some high profile infrastructures like Amazon S3 and Google. I always thought Googles infrastructure was untouchable. But it seemed that I had woken up to a different world last week. When I saw tweets and messages going around with "Gmail is down".
...
P2P Cloud computing in general will be the future of infrastructure.
At first, this sounds like a good idea. At least it did in 2000, when it was called grid computing. Unfortunately, it's not.
It's been tried; it failed. This Nature article from 2000 is optimistic:
Internet computing and Grid technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries. And harnessing these new technologies effectively will transform scientific disciplines ranging from high-energy physics to the life sciences.
At the end of the article is a list of companies trying to commercialize the concept: Entropia, United Devices, Parabon and Popular Power. Guess what --- most of them are no longer around. The only one that is is Parabon, which is hosting the ComputeAgainstCancer project. Looking at their site, I get the feeling that their customers are mostly universities and other government agencies
Incentive and economics. People download SETI@Home out of philantrophy and because they think it's cool. You can't count on people to host your distributed app platform on their computer for philantrophy, so you'd have to pay them. Unfortunately, the amount of money their computational power is worth is so low that it wouldn't work. Would you bother to download and run some a program in the background for ~$5 / month? Also, you can't put your computer to sleep from now on. E.g. in the case of Parabon, the company mentioned above, you can't download and run their "Frontier Compute Engine", which is their distributed node software. The reason probably is that nobody wants to, anyway. So if a company wants to use Parabon's system, they either run their own nodes, or buy the capacity from Parabon's marketplace.
Latency. Google serves search requests in 10-100 milliseconds, wherever you are. Can you do that with P2P? Remember, latency vs. bandwidth:
Years ago David Cheriton at Stanford taught me something that seemed very obvious at the time -- that if you have a network link with low bandwidth then it's an easy matter of putting several in parallel to make a combined link with higher bandwidth, but if you have a network link with bad latency then no amount of money can turn any number of them into a link with good latency.
It's not attacking the right problems. I'm not sure about the Gmail outage, but Amazon posted a short explanation of their problem:
Early this morning, at 3:30am PST, we started seeing elevated levels of authenticated requests from multiple users in one of our locations. While we carefully monitor our overall request volumes and these remained within normal ranges, we had not been monitoring the proportion of authenticated requests. Importantly, these cryptographic requests consume more resources per call than other request types.
Shortly before 4:00am PST, we began to see several other users significantly increase their volume of authenticated calls. The last of these pushed the authentication service over its maximum capacity before we could complete putting new capacity in place. In addition to processing authenticated requests, the authentication service also performs account validation on every request Amazon S3 handles. This caused Amazon S3 to be unable to process any requests in that location, beginning at 4:31am PST. By 6:48am PST, we had moved enough capacity online to resolve the issue.
Even in a P2P architecture, I imagine that the authentication service would be located in the grid provider's own network, so P2P wouldn't make a difference. Also note that today's clouds (Google, Amazon, etc.) are explicitly designed to be unaffected by disk, machine or switch failure. So the only way to bring them down is either through an allocation mismatch (human error?), which is basically what happened with Amazon, a software bug / protocol bug (human error), some extreme power outage that outlasts the local generators (extremely unlikely?), or some kind of DDoS attack against the entire data center. Even with a P2P architecture, you still have to have some components running in the grid provider's local datacenter, so a DDoS against that would still bring the whole thing down. I think that the Gmail and Amazon problems were not really scalability problems, so P2P wouldn't have helped.
To close the article, I'll include one last quote from Jim Gray's short Distributed Computing Economics (2003) paper (emphasis mine):
Computing economics are changing. Today there is rough price parity between (1) one database access, (2) ten bytes of network traffic, (3) 100,000 instructions, (4) 10 bytes of disk storage, and (5) a megabyte of disk bandwidth. This has implications for how one structures Internet-scale distributed computing: one puts computing as close to the data as possible in order to avoid expensive network traffic.
...
The ideal mobile task is stateless (no database or database access), has a tiny network input and output, and has huge computational demand. For example, a cryptographic search problem: given the encrypted text, the clear text, and a key search range. This kind of problem has a few kilobytes input and output, is stateless, and can compute for days. Computing zeros of the zeta function is a good example. Monte Carlo simulation for portfolio risk analysis is another good example. And of course, SETI@Home is a good example: it computes for 12 hours on half a megabyte of input.
...
Most web and data processing applications are network or state intensive and are not economically viable as mobile applications. An FTP server, an HTML web server, a mail server, and Online Transaction Processing (OLTP) server represent a spectrum of services with increasing database state and data access. A 100MB FTP task costs 10 cents, and is 99% network cost. An HTML web access costs 10 microdollars and is 88% network cost. A Hotmail transaction costs 10 microdollars and is more cpu intensive so that networking and cpu are approximately balanced. None of these applications fits the cpu-intensive stateless requirement.