Big data is a popular buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software.
Data is dead. Long live big data.
In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity. Despite these problems, big data has the potential to help companies improve operations, make faster, more intelligent decisions, and evolve from lean to smart.
It is messy, appears almost random, sometimes unstructured or partially structured, not easily correlated, not consistent, not exhaustive, all over the place, unclear, unrelated (for the human brain to compute) – however, there are ways to profile it, analyse it, bring order to chaos, e.g. use design of experiments to derive viable statistical models, results and trends, simulate and close the gaps, into predictive models, with a mix of both raw and partially translated information, etc.
It is diverse, cross-functional, multi-format, disseminated across organisational silos, and application / database systems, difficult to capture, store, search, profile and analyse.
It is complex, misunderstood, structured for a specific purpose and sometimes needs to be re-purposed or restructured for a different purpose, it cannot be ‘computed‘ by the human brain alone, it needs IT to be translated and simplified by humans.
It is hidden, difficult to see and access, it needs to be converted to business information and ‘intelligence‘, it needs to be presented in a simplified way for easy consumption, to be validated or refined in visual representations.
It is everywhere, and perhaps ‘nowhere‘, in the cloud, in the Internet of Things (IoT), on many ‘connected’ devices, it is shared and re-used, it is interpreted in different ways, it needs to be channeled, shared and communicated appropriately.
It is live, it is dynamic and constantly augmented, streaming in at unprecedented speed, with different levels of reliability, fed at a high level of velocity, simultaneously to many devices and people who are making different uses of it and have different expectations from it.
It is big, with large volumes of it…
It is real, as a better representation of the ‘real‘, closer to human than IT (relatively speaking), it is raw, it contains more information than traditional data models and structures, there is more ‘noise‘ within it and it might be tricky to extract the various messages from it, it can offer a lot of more opportunities of decoding and interpretation.
New types of data analysis tools are required to process and makes sense of big data, which is as important to business – and society – as the Internet has become.
What are your thoughts?
This post was originally published on LinkedIn on 4 September 2015.