Week 6 Weekly Post

After attending the Big Bang Data exhibition the other day, I am slowly putting the pieces of what makes up Digital Humanities altogether.. It’s funny how we are able to learn and absorb better when we are physically present to see and experience those intangible concepts. While reflecting on that, it brings me to my this week’s topic on unstructured data.

Unstructured data, mainly made up of text, are data/information that does not consists of any pre-defined data model or is organized in any pre-determined manner. The rapid growth of unstructured data is crazy, with “Twitter seeing about 175 million tweets each day and has more than 465 million accounts; 571 new websites are created every minute of every day; and the world creates 2.5 quintillion bytes of data per day from unstructured data sources like sensors, social media posts and digital photos”.

Source: http://www.digitalreasoning.com/resources/Holistic-Analytics.pdf

Why do most current analytics applications and technologies are focused more on structured than unstructured data?

Unstructured data includes social media (tweets, blogs, posts, etc.), pictures, among other forms. The value of unstructured data, often gets undermined, although increasingly organizations are finding ways to extract meaning out of it.

A good way of understanding the value in unstructured data is when we compare it with structured data. The following excerpt provided a relatively clear view:

For example, while a company might track the sales of specific products and services, and correlate structured sales data with all kinds of variables (like time of year and customer demographics), without unstructured data (like social media and call center logs) it’s impossible to fully understand why sales rise or fall: structured data analytics can describe and explain what’s happening and unstructured data analytics can explain why it’s happening. Together you get the whole picture.

Source: http://www.forbes.com/sites/steveandriole/2015/03/05/the-other-side-of-analytics/#341149839a86

But of course, analyzing through millions and millions of such un-categorized/unorganized data will mean there will be plenty of “noise” that organizations will have to sieve through before getting the essence of the analytics.

This brings me to my next point: parameterization whom I will be doing up more readings to further understand and see how I can link both concepts together

(To be edited)

4 thoughts on “Week 6 Weekly Post”

That’s a neat metaphor for unstructured data! I believe observing the content of social media posts is related to sentiment mining and analysis – the frequency of “positive” posts might make up structured data, but the context of these posts must also be taken into account as the unstructured, interpreted parts of that data.

This is because automated sentiment analysis on social media posts usually only picks up words like “advantage” or “amazing” and marks them as positive – even if such words might be talking about a competitor’s product, or might be being sarcastic. It then weighs the amount of positive words with negative words within a post to determine if it leans more toward one or the other. In this regard, sentiment analysis by computers is still somewhat limited, although…

http://www.scientificamerican.com/article/computers-can-sense-sarcasm-yeah-right/

…our computers might be able to pick up on the irony some day.

LikeLike

I think you could link parameterisation to the idea of how parametrization technically doesn’t exist with unstructured data. When I think about it, the point of unstructured data is beautifully vast. (And trust me, I know how this sound really up in the air and arsty.) But parameterisation doesn’t exist because of the specificity and individualism of each set of data and the more I consider both topics together, I somehow see it as metaphor for individuals – there are major similarities between each set, but there’s always that one tiny thing that makes it different from others.

LikeLike

Great example that illustrates the problem and value of unstructured data! I believe that technical difficulties play a big part in shaping how we have been structuring and understanding our data for the past few decades..having a whole chunk of text is simply not an efficient way to analyze data. This is quite an interesting project that analyzes your tweets and aggregates them to gauge your personality, social styles and emotions: http://www.analyzewords.com/ It is one way of structuring unstructured data to give insights!

I think natural language processing is going to be quite a game changer for text analysis as well..perhaps we could look into what kind of parameters they use? https://www.wired.com/2015/06/ais-next-frontier-machines-understand-language/

LikeLike

http://www.nltk.org/ comprehensive documentation of NLTK

LikeLike

Gabriel Lim says:

September 12, 2016 at 10:16 am

That’s a neat metaphor for unstructured data! I believe observing the content of social media posts is related to sentiment mining and analysis – the frequency of “positive” posts might make up structured data, but the context of these posts must also be taken into account as the unstructured, interpreted parts of that data.

This is because automated sentiment analysis on social media posts usually only picks up words like “advantage” or “amazing” and marks them as positive – even if such words might be talking about a competitor’s product, or might be being sarcastic. It then weighs the amount of positive words with negative words within a post to determine if it leans more toward one or the other. In this regard, sentiment analysis by computers is still somewhat limited, although…

http://www.scientificamerican.com/article/computers-can-sense-sarcasm-yeah-right/

…our computers might be able to pick up on the irony some day.

LikeLike

itsukikurosagi says:

September 14, 2016 at 6:36 am

I think you could link parameterisation to the idea of how parametrization technically doesn’t exist with unstructured data. When I think about it, the point of unstructured data is beautifully vast. (And trust me, I know how this sound really up in the air and arsty.) But parameterisation doesn’t exist because of the specificity and individualism of each set of data and the more I consider both topics together, I somehow see it as metaphor for individuals – there are major similarities between each set, but there’s always that one tiny thing that makes it different from others.

LikeLike

sharmainesie says:

September 14, 2016 at 5:48 pm

Great example that illustrates the problem and value of unstructured data! I believe that technical difficulties play a big part in shaping how we have been structuring and understanding our data for the past few decades..having a whole chunk of text is simply not an efficient way to analyze data. This is quite an interesting project that analyzes your tweets and aggregates them to gauge your personality, social styles and emotions: http://www.analyzewords.com/ It is one way of structuring unstructured data to give insights!

I think natural language processing is going to be quite a game changer for text analysis as well..perhaps we could look into what kind of parameters they use? https://www.wired.com/2015/06/ais-next-frontier-machines-understand-language/

LikeLike

Miss Despoinas (@Miss_Despoinas) says:

September 22, 2016 at 12:54 am

http://www.nltk.org/ comprehensive documentation of NLTK

LikeLike

NM3213 Digital Humanities (Hilary's blog)

Week 6 Weekly Post

4 thoughts on “Week 6 Weekly Post”

Leave a reply to sharmainesie Cancel reply

Share this:

4 thoughts on “Week 6 Weekly Post”

Leave a reply to sharmainesie Cancel reply