Zen and Nonzense: How Much Data Is Big Data?

In January 2015, I was invited to speak at a bank’s Corporate day. The audience comprised analysts from the Finance industry, drawn from various firms. Their normal job is to look at company financials, speak to other analysts and industry pundits and come up with pronouncements on the future performance of companies’ stocks.

Since the audience was not from the IT industry but have been subjected to the blizzard of IT terms we in the industry carelessly toss about, I thought I’d give them a layman’s perspective on two frequently-encountered buzzwords: Big Data and the IoT (Internet of things).

We talk a lot about Big Data as if we grew up with the stuff, but in fact, the truth is far more interesting than this blasé attitude suggests. There’s lots of data in the world, and 90% of it was created in the last 2 years. You’ve probably heard this before, but what you may not know is that that statement will be true in 2, 3, 5, years time from now and was true last year, and the year before that and the year before.

What it means is that the rate of data generation is not rising at a steady, arithmetic pace, but at something far more frenetic. It’s coming at us from our cellphones, from sensors embedded in the ground, in machines, satellites, the Internet, it’s being generated by social media, machines programmed to churn out spam (most of the world’s email is spam, by the way), the mass media, etc.

The rate of data accumulation is truly quite staggering – by 2020, IDC expects that there will be 40 zettabytes of data generated. That number makes no sense if I tell you it’s a 40 followed by 21 zeros, but a sense of the scale can be obtained if I say it is about 57 times the number of grains of sand on all the beaches on Earth, combined. And by the way, that’s 40 Zb of data generated PER YEAR!

I could chuck out a couple more staggering statistics like that, but it does beg the question: how did we get here?

The world's first solid-state transistor, invented at Bell Labs in 1947

The answer is somewhat prosaically described by Moore’s Law, which is actually an astute observation made by Gordon Moore back in 1965 that the density of transistors that could be crammed into an integrated circuit doubles every two years. Consider that the transistor was invented in 1947, and that we currently make 400 million transistors for every one of the 7.3 billion persons on this planet – per year – and it rather gives you a headache trying to get a sense of the scale, but that’s how we continue to generate prodigious amounts of data, and how it keeps increasing, year after year, at an exponential rate.

This doubling of computing capacity, by the way, has the corollary that processing power has grown by almost unimaginable leaps and bounds. The ignition chip in your car, for example, contains more computing power than NASA had at its disposal back in 1969 when they sent man to the moon. And at a tiny, tiny, tiny, fraction of the price.

The second part of my talk focused on the IoT and how it, and Big Data, are intimately related and the applications of all this, but I think for this story, that simply getting a scale of things is a fairly good start.

Zen and Nonzense

Monday, 23 February 2015

How Much Data Is Big Data?

No comments:

Post a Comment