Sorry guys and girls, I’ve had it. Too many people in this field are confusing things and I need to clarify them for everyone’s sake (including my own). Here’s part 1 of 2 of my Big Data Mythbuster series.
Myth 1: “There is Small Data and Big Data”
Not really. There is just data, full stop. The difference lies within the business approach versus the technology approach. If you follow a technology-agnostic business strategy then even the smallest dataset has a “Big” component to it, and therefore merits a closer look in terms of business value. If that business value is substantial then the datasets could and should be expanded to make the mashed-up sample more meaningful.
For example: Your partner turns 50 and you are wondering whether you want to stay together. You look at him/her and make a decision based purely on the visual information alone. Right? No, you don’t. You think about the good times and the bad times, you think about their shortcomings and strengths, your kids (if any), your financial situation and sooooo much more than just looks. You may even listen to your gut feeling and let emotions influence your decision 😉
All of that is data. So no, a seemingly small question (at least at first glance) has a whole load of implications, consequences and considerations, all of which are dependencies, associations and patterns. Even the smallest decision needs a complete dataset to support it. ‘Complete’ may mean a different size, depending on the question, but there is certainly no small, isolated data that exists in a vacuum.
In the business context you have social data, statistical data, more and more historical data, benchmarks, best practices and all the bits and pieces you and your data-universe are creating on a daily basis. Even if you are just a small number of shops selling pizzas, you can still benefit from looking at the bigger picture and drawing informed conclusions based on all the available information.
Myth 2: “It is easy to find Data Scientists”
Nope, it isn’t. Because you need Coders, who understand maths. You need Mathematicians who understand how to write code. You need business-minded people who understand both, and subject matter experts who understand data, coding, maths and business-minded people who have a grasp of all of the above. Then you can assemble the team which will remotely resemble your idea of a good Data Scientist. Ahh and you need a manager who has done that before, ideally…
These days everyone that has ever used a calculator thinks he is a Data Scientist, but it will be quite some time until these various disciplines come together and create the perfect hybrid – if ever.
One would expect a consultant to state that companies need consultants for that set of tasks and I strive to not disappoint. You need help from companies like ours. Not people flogging software, but actually people that have assembled teams to solve the skills-challenge described above. We’ve been building a few teams like these above and let me tell you: It is quite a journey!
Myth 3: “Cognitive is needed to do Big Data”
By cognitive I mean artificial intelligence (A.I.) and Machine Learning. These are separate fields that have never actually been applied in a business centric Big Data project as far as we know. There is talk, there is PowerPoint-ware and then there is real life. In real life, over 85% of all analytics challenges are simple statistical computations that can be done on a calculator and don’t even need an algorithm. (Free Buster-Bonus Algorithms are time intensive and difficult to develop – find a list of the common most ones here).
If you want to go all Watson than you need to massage data for a very long time before you can get meaningful insight resulting from cognitive technology. Don’t get me wrong, I love tools such as Watson as they stir the inner-geek in me, but in terms of real-life applications for the average company, it will be a while until technologies converge. Right now, A.I. needs data, but big data does not need A.I.