CONTEXT AND TEXTUAL ANALYTICS
CONTEXT AND TEXTUAL ANALYTICS
By W H Inmon
For years computer technology has focused on structured, transaction based data. Whole operating systems and whole data base management systems have been based on efficient processing of highly repetitive data.
Lost in this infrastructure has been the ability to process text. Text is not highly repetitive. Text is classically unstructured. As such, computer technology has been a misfit for handling textual data.
AN IRONY
The irony is that there is great business value in handling text. The customer expresses his/her voice through text. And hearing the voice of the customer is one of the greatest – if not the greatest – undertaking that a corporation can address. Once the corporation hears the voice of the customer the corporation is in a position of great strength. Management can make decisions with great accuracy and insight. Without hearing the voice of the customer, management makes decisions by educated guesses. The problem with educated guesses is that they are often wrong. And being wrong costs the corporation in both direct revenue and loss of market share, the very things that corporations hold dearest.
PROCESSING TEXT MANUALLY
It has long been recognized that computers do not process text efficiently or even well at all. That recognition is not new.
In the past the inefficiencies of the computer in processing text have led to people trying to process text manually. While it is possible to process text manually, there are some basic problems –
Humans need to be trained
Humans are expensive
Humans get sick, go on vacations and have to go to the bathroom
The human mind goes numb when trying to process many, many records.
Humans can concentrate for only so long and make mistakes
Very quickly organizations discovered that despite the fact that humans can process text manually, in the face of a large amount of text it is expensive and ineffective to process text manually.
PROCESSING TEXT AUTOMATICALLY
There have been several attempts to overcome the deficiencies of the computer’s inability to process text. There have been blobs. There have been comments fields in a relational table. There has been “sound ex”. There has been tagging.
All of these attempts to solve the problems of the structured computer trying to do unstructured processing have addressed some aspect of the inefficiency of the processing by the computer. But in truth these measures have been – at most – partially successful.
SENTIMENT ANALYSIS
In recent vintage there has been sentiment analysis. In sentiment analysis the tone of the text is captured. And in sentiment analysis the computer can process unlimited amounts of text.
Sentiment analysis makes use of taxonomies and ontologies. The taxonomies are used to classify the text according to sentiment. The taxonomies “take the temperature” of the text that is being examined.
Finding and testing taxonomies is important and is a really positive step forward in processing text by a computer. Finding the temperament of the customer is a very important thing to do.
But sentiment analysis and taxonomical resolution do not go far enough.
CONTEXTUAL ANALYSIS
The missing ingredient in sentiment analysis is context.
The context of text in many ways is as important or even more important than the text itself. The context of text allows the text itself to be interpreted properly. Stated differently, text without context cannot be properly interpreted. And without proper interpretation the messages contained in text cannot be properly understood.
AN EXAMPLE
To illustrate the value of context, consider the following simple example –
Two gentlemen are standing on a street corner and a young lady walks by. One guy says – “ She’s hot”.
Now what is meant by “she’s hot”?
One interpretation is that the young lady is attractive and the gentleman is expressing a desire to have a date with her. In this case “she’s hot” means that the lady is attractive.
Another interpretation is that it is Houston, Texas on a July day and the temperature is 95 degrees and the humidity is 100%. The lady is covered with sweat. She is physically hot.
Another interpretation is that the gentlemen are in a hospital and they are doctors. One doctor has just taken the temperature of the lady and she has a temperature of 104 degrees. She is internally burning up. The doctor is saying that the lady is sick.
So the words “she’s hot” cannot be interpreted properly without context. And there is NOTHING in the words “she’s hot” that tell you what the meaning of the words are.
THE NEED FOR CONTEXT EVERYWHERE
There is nothing particularly unique about the words “she’s hot”. The need to understand context is there for ALL words.
Stated differently, it is one thing to understand the sentiment that is expressed. It is an entirely different thing to understand the context of the sentiment that has been expressed.
To focus on the difference between understanding sentiment and understanding the context of that sentiment, consider the following statement –
I don’t like enchiladas.
Sentiment analysis tells you that the author does not like enchiladas.
Now go one step further and consider the full statement –
I don’t like enchiladas because they have too much cheese
Contextual analysis tells you that the author doesn’t like enchiladas because they have too much cheese.
Which of these two assessments of text are more useful? The answer is of course that sentiment analysis plus contextual analysis is much more meaningful than sentiment analysis by itself.
So let’s take some examples that show the power of contextual analysis coupled with sentiment analysis.
ANALYSING RESTAURANT CUSTOMER FEEDBACK
Let’s start with the analysis of restaurant customer feedback. Suppose there is a restaurant chain who tries to listen to their customer. On a monthly basis the restaurant chain gets feedback on many aspects of the operation of the chain. There are far too many of the customer feedback comments to try to process them manually. So the restaurant chain processes the customer feedback comments with contextual analysis.
The first step in analyzing the voice of the customer is to ask what is on the mind of the customer in their feedback. Fig 1 shows this analysis –
In this analysis the analyst has looked at the raw mentions of products that are on the menu. There is no sentiment expressed at this point.
The next step is to look at the sentiment that has been expressed about the items on the menus. Fig 2 shows this analysis –
In this analysis the item – Pad Thai - has been selected. It is seen that the comments about Pad Thai are very negative. The customer has expressed significant negative sentiments about pad thai.
The question now becomes – why has the customer expressed negative sentiments about pad thai? There could be any number of reasons – pad thai was too spicy. Pad this was too cold (or too hot). Pad thai was not cooked enough.
While it is important to know that the customer was upset by pad thai, it is even more important to know WHY they were dissatisfied with pad thai. It is contextualization that allows this level of analysis to occur.
Fig 3 shows the contextual analysis of the dissatisfaction with pad thai –
The contextual analysis of the comments about pad thai are broken into two general categories – positive comments and negative comments. In the negative comments category it is seen that the single biggest factor for complaints was portion size. People wanted to have larger portions when they placed an order.
The reasons for complaints could have been many other factors. Sentiment analysis has told management that customers are unhappy with pad thai. Contextual analysis has shown WHY customers are unhappy with pad thai. Now management knows exactly what they need to do make customers pleased with the food and service they get at the restaurant chain.
Contextual analysis is hardly limited to restaurants. Contextual analysis applies anywhere there is text. It is as inherent to text as water is to fish.
A BANKING EXAMPLE
As another example of context analysis, consider a study that was done about banks and banking customers. A study was done using banking customer feedback from the Internet. The feedback on the Internet is publicly available and as such there are no security requirements for the data.
The banks that were analyzed were Bank of America, US Bank, BB&T and Chase Bank. These banks were chosen as being representative of the banking community.
The study was done for current banking customer feedback. For the most part the feedback was of the negative variety. For this reason sentiment – for the most part – is a moot point. But what is of interest is the reason for the sentiment. The reason for the sentiment is expressed by the context found in each of the customer feedback messages.
Fig 4 shows the study that was done –
The dashboard shows the synopsis of the banking comments found on the Internet. At the top of the spreadsheet is the symbol used for each of the banks.
The data found in column 1 is the information found in the feedback ranked by order of number of occurrences.
A glance at this column shows what is on the mind of the people who write complaints to the Internet.
However, the data in column 1 can be analyzed in more detail. The data in column1 can be selected to see that reason there is that is related to the data that was selected.
Fig 5 shows that bank fees are selected –
Fig 5 shows that in level 2 there is a breakdown of exactly what people have to say about bank fees. In other words, what is it about bank fees that is on people’s minds as they make a complaint. The ability to do the drill down is a result of the contextual analysis that has been done.
But further drill down is possible. Suppose you wanted to select the top category – fees – and wanted to find out exactly what about fees that people were upset about. Fig 6 shows the ability to do further drill down –
So it is seen that looking at sentiment is not enough. Looking at the reasons behind the sentiment is where insight lies. And it is context and contextual analysis that provides this foundation.
Books that you may enjoy –
STONE TO SILICON: THE HISTORY OF TECHNOLOGY AND COMPUTERS, by Bill Inmon and Dr Roger Whatley, Technics Publications
TURNING TEXT INTO GOLD, by Bill Inmon and Ranjeet Srivastava, Technics Publications
DATA ARCHITECTURE: BUILDING THE FOUNDATION, by Bill Inmon and Dave Rapien, Technics Publications








Think also having a way to organize (such as a taxonomy, metadata standards) the context behind survey results would also be useful for discovery to repurpose for something else.
Thank you.