What does “Big Data” mean? Is there a “Small Data?” Don’t
they mean “Lots of Data” rather than “Big Data?” And where’s it all coming
from? (I tell you, these IT folks are
just plain weird sometimes…)
Actually, “Big Data” sounds a lot better than “Plenty of
Data” or “Lots of Data”, both of which sound terribly mundane. “Data” is not very useful on its own, its
just a fact or a statistic, (for example, a single record of a toaster sale,
but data can become useful if we know how many toasters were sold, the
combination of colours, and where they were sold, for example, because then we
know which coloured toasters to make more of, or if our pricing is right.) At
this stage, our data has become “information” because it informs us, so in
effect, collections of facts or statistics, meaningless in themselves can be
organized or assembled to provide useful ‘information’.
Astonishingly, it’s estimated that 90 percent of all the data
in the world was created in the last 2 years! How can that be? You wonder what the rest of human
civilization, going back, back, to early societies, was doing. If you think about it, knowledge then was a
scarce commodity and in early human civilisations, only an elite privileged,
such as the priesthood, or royalty, was educated. They wrote on papyri,
inscribed on stones or leaves, or wood so recorded information was very, very
limited.
It wasn’t until the printing press – just a couple of
hundred years ago – and the spread of education that knowledge could be more
widely distributed, but everything was recorded or printed on a medium such as
paper. Every technological advance since then has increased the amount of data
– electricity, industrialization, mass-production, radio, the TV, the
computer. In fact, it wasn’t until the
mid-twentieth century that computers began to have a significant impact on society. The early computers were behemoths, but they
were able to process lots of information rapidly in a tireless manner, so
something like a population census was ideal for a fast recording and
transaction machine, which is what early computers were.
Suddenly, there was a lot of data, because these big, early
computers could process hundreds or thousands of times faster than a roomful of
clerks writing away with their pens.
This data was, and still is, in a certain form – the name was recorded
in a certain number of fields, followed by the gender, the date of birth and so
on. In other words, all these data had a
‘structure’ – so logically, they’re called ‘structured data’.
(Which begs the question – so what is ‘unstructured data?
Patience…)
Computers became smaller and more affordable and companies,
and not just governments, could afford to use them – their efficiency launched
humankind into a new era of civilisation that many of us lived through in the
twentieth century, ushering in more prosperity and economic activity than could
have been possible otherwise.
Computers – Information Technology, or IT – became more
pervasive with personal computers, or PCs – besides the fact that the man in
the street could now own one, it also meant that there was lots and lots of
digital data being generated – spreadsheets, documents, presentations, and so
on, and when these PCs became connected or networked together – well!!
With the Internet, data traffic exploded exponentially, and
all sorts of new ways of generating and using data. The cost of computing became dramatically
cheaper – the smartphone in your pocket has more computing power than NASA did
when it sent man to the moon in 1969, and it costs a miniscule fraction of what
NASA had to pay for roomfuls of heat-breathing, vacuum-tube technology back then. Astronauts used slide rules instead of
digital calculators – and this was only back in 1969!
In the modern era, a large aspect of our lives is digitized.
Our governments, commercial, banking systems, transport, factories, industries,
public services, heathcare, even agriculture are computerized, busily churning
out data and transactions. Much of this
computerization is invisible to the end-consumer, but supply chains, financial
transactions, transport schedules, traffic systems, financial institutions,
government machinery, all depend on computers whirring silently away, largely
unseen.
We, individuals, churn out lots and lots of data ourselves –
every time you send an email, make a phone call, text your boyfriend or
girlfriend, take a photo of the food you’re eating, post an update on Facebook,
download a video or a music clip, you’re generating data. A large part of the data generated by
humankind isn’t apparent, but satellites, CCTVs, sensors in machines, RFID tags
are all generating and sending data as well, humming silently away 24 hours a
day, everyday, the unseen lubricants of modern human civilization. No wonder all of the data in the world was
generated in the last two years, and it’s all accelerating rapidly, by the way,
the rate of data generation, in an increasingly globalised, fast-paced world
driven by commerce and human interaction.
All this stuff is by and large meaningless – your email
makes sense to you, but it has no formal structure, and the next email will
look completely different. Same with your FB postings, and the vacation
pictures you took – they don’t conform to a stereotype or form, and therefore
have no ‘structure’. Guess what all this
messy stuff is called?
“Unstructured data”, of course. And there’s so much ‘unstructured data’
generated by you and me and machines sending out messages that it’s estimated
that 80 percent of all the data in the world is unstructured.
There are some seven billion people on this planet.
Depending on which estimate you go by, there are going to be some fifty billion
connected devices in the near future, which far exceed the number of humans. Many
of these devices are sending signals indicating their condition, mundane things
such as oil pressure and temperatures, revs per minute, fluid strength, and so
on, and then there are the sorts of signals used for telecommunications, or for
location – such as GPS, and a whole multitude of other signals emitted by
machines. To be useful, these signals need to be transmitted, often to a
controlling device that can make sense of it all. That controlling device may be manned, as in
a human operator who looks at a screen, but it can also be unmanned, a machine
or system that detects anomalies and acts accordingly.
You may have noticed that all this data being generated is
of limited use if its not communicated or transmitted, so Big Data doesn’t
exist independently – rather, it is co-dependent on the evolution of other
technologies at the same time, such as Cloud as business enabler and delivery
platform, and the quantum increases in computing power and storage, which have
been accompanied by equally dramatic decreases in the associated costs.
Remember early in this discussion, the difference between
‘data’ and how it needs to be organized and assembled to be useful, to become
‘information?” The value of data is the ability to make some sense out of it –
in the ‘good old’ days – which aren’t that old, (and not that ‘good’ either),
data could be used for reporting – how much sales, who sold more, which units
did better, how the organization was doing against its targets, who its
customers were, what were the best selling items, and so on and on.
The proposition is really quite different when there was
‘structured’ data, which all conformed to some sort of standard form – it was
easy to assemble similar records to extract useful bits of information, but
when the majority of the data is ‘unstructured’, of varying lengths, forms,
pictures, sounds, that becomes a lot trickier.
How, in fact, do you make sense of what appear to be completely random
and dissimilar pieces of information?
Now having established what “Big Data” is, as well as some
of its qualities – volume, randomness, speed, lack of coherent structure, and
so on, what can you do with it, and how do you make sense of it?
Now that’s a story for another day.
No comments:
Post a Comment