After attending the Big Bang Data exhibition the other day, I am slowly putting the pieces of what makes up Digital Humanities altogether.. It’s funny how we are able to learn and absorb better when we are physically present to see and experience those intangible concepts. While reflecting on that, it brings me to my this week’s topic on unstructured data.
Unstructured data, mainly made up of text, are data/information that does not consists of any pre-defined data model or is organized in any pre-determined manner. The rapid growth of unstructured data is crazy, with “Twitter seeing about 175 million tweets each day and has more than 465 million accounts; 571 new websites are created every minute of every day; and the world creates 2.5 quintillion bytes of data per day from unstructured data sources like sensors, social media posts and digital photos”.
Why do most current analytics applications and technologies are focused more on structured than unstructured data?
Unstructured data includes social media (tweets, blogs, posts, etc.), pictures, among other forms. The value of unstructured data, often gets undermined, although increasingly organizations are finding ways to extract meaning out of it.
A good way of understanding the value in unstructured data is when we compare it with structured data. The following excerpt provided a relatively clear view:
For example, while a company might track the sales of specific products and services, and correlate structured sales data with all kinds of variables (like time of year and customer demographics), without unstructured data (like social media and call center logs) it’s impossible to fully understand why sales rise or fall: structured data analytics can describe and explain what’s happening and unstructured data analytics can explain why it’s happening. Together you get the whole picture.
But of course, analyzing through millions and millions of such un-categorized/unorganized data will mean there will be plenty of “noise” that organizations will have to sieve through before getting the essence of the analytics.
This brings me to my next point: parameterization whom I will be doing up more readings to further understand and see how I can link both concepts together
(To be edited)