From keyword research to topic modelling and then concept modelling

Keywords are very powerful in today’s digital landscape. We all love us some keywords, right? The monthly search volume and competitiveness of keywords guide our product, brand, content and advertising strategy. Several blogs have been written on keyword research strategy. In connection with these strategies are tools such as Google Keyword Planner, Ahrefs, Answer the public, Moz Keyword Explorer, Google Keyword Trends and other leading tools.

Google Trends is a great tool from Google to help gauge the search volume of keywords, topics and entities over time. A quick search for London Marathon reveals London Marathon as a search term and a topic. Whilst the search term focuses more on the ‘London Marathon’ searches carried out by users. London marathon as a topic is a collection of London Marathon related search terms and engagement with London Marathon related publications. Whilst Google Trends fails to provide an exact or bucketed search volume data, its scaling system of 1-10 indicates the popularity of a search term over a time range.

Google’s keyword planner provides an indication of the monthly search volume of the London Marathon and a charity looking at possibly targeting runners exploring fundraising places can plan their advertising campaign accordingly.

The screenshot below highlights the search volume for the London Marathon as a keyword

Topic Modelling:

Topic modelling as a technique is often used in the field of natural language processing to discover or extract hidden topics from a large corpus.  Gensim a Python library is quite popular in topic modeling.

Latent Dirichlet Allocation (LDA) is a popular generative statistical model used to explain similarities in some parts of data. It emphasizes that documents are a composition of different topics. Words are then allocated to the relevant topics. LDA is also considered important in document classification, sentimental analysis and bio-information analysis. It is a bag of words model and has Hyper Parameters divided into Alpha and Beta

Alpha: Per topic document distribution

Beta: Per topic word distribution

High Alpha: Document contains most of the topic

Low Alpha: Document contains some of the topics

High Beta: Topic contains most of the words

Low Beta; Topic contains some of the words

Based on the above scenarios one can suffice that:

High Alpha: Documents will appear to be similar

High Beta: Topics appear more similar

Google Trends is a great platform for discovering topics across several industries. A quick search below for the term London Marathon reveal trends for the search term and topic.

Whilst the ‘London Marathon’ search term reveals the popularity for that keyword. London Marathon as a topic includes the search term, synonyms, volume of people reading about the marathon. It clearly indicates the interest of people on the London Marathon. We can also analyse a London Marathon related article for related topics or subtopics within the given article.

A great article about the London Marathon 2019 was written by the New York Times.

Concept Modelling:

Concepts are important aspects of human cognition and previous work have revealed that concepts are products of relational thinking and relational reasoning. The first category of concepts is conceived to require lower-order cognitive ability while the more advanced form are viewed to be higher-order. As we’ve covered in a previous article, relational thinking generates lower order-concepts that tends to be limited to the inherent relations of an entity. The Schema markup scheme highlights the inherent relations that are apparent in most entities.

A quick look at Google’s Knowledge Graph for London Marathon reveals inherent relations of this event to consist of a location (London), date (April), course record holders and established the date. This relationship below aids us to better comprehend the concept of a marathon.

I used the AllenNLP fine-grained named entity recognition tool to visualise the properties in the London Marathon article. It clearly highlights that the concept of London Marathon usually embodies properties such as London (GPE), Sunday (Date), Eliud Kipchoge (person and winner), Kenya (country of origin of the winner), 2hrs, 2 minutes 37 seconds (finish time), 43,000 (total number of participants) and Tower Bridge (located at mile 12). This sample text gives us a great understanding of the London Marathon. Indicating that there is a relationship between Kenya and London Marathon or Marathon in general. It also establishes the competitive nature of a marathon as it uses cardinals such as second fastest time and some 43,000 as a portrayal of the sporting nature of a marathon.

Surprisingly, a search for London Marathon on Google Trends also produced the below-related topics and breakout queries. I was wondering about the correlation between the London Marathon with Lauren London. On a literal note, Lauren London was Nipsey’s partner and the name of his store was Marathon store.

This article establishes a connection between the London marathon and Nipsey Hustle. As it states London shares three numbers with Nipsey in the base cyphers. Without going into the numerological connection between London Marathon and Nipsey Hustle, Lauren London and The Marathon Store. On a conceptual level, the London Marathon adventure could be inspired by the loss of a relative and Nipsey’s Marathon Store sales skyrocketed to about $10 million after his passing. The entities of London Marathon (event) and The Marathon Store (Organisation) are fuelled by goodwill. One can classify both in the same semantic space on the literal and conceptual level.

A Count-based and Predictive vector models in the Semantic Age

Computer applications such as search engines and dialogue systems usually process a large volume of text on a regular basis. These systems have progressed over the years due to advanced word embeddings, progressive research and modern machine learning libraries.  It is believed that audio and visual datasets have dense high-dimensional datasets encoded as vectors of a separate raw pixel. Text generated from natural language is usually understood to contain vectors that are sparse when compared to video and visuals.  

Vector space models (VSM) embed or represent words in a continuous vector space. In this respect, words that are semantically similar are plotted to nearby points.  As representing words as unique and distinct ids can usually lead to a sparsity of data. Going this route will require a large amount of data to be collected to effectively train statistical models. This is why vector representation of words is useful to address the problem of sparsity and enhance semantic interpretation.  I ran a search for romantic dinner on Google and one of the people also ask questions was ‘Where can I eat for anniversary?’ We can clearly see the semantic similarity of the term ‘romantic’ and ‘anniversary’ used within the context of food or dining. You would normally expect a distance between the vector representation of these words but from a contextual perspective, an anniversary is usually expected to be romantic as it will involve couples celebrating a milestone in their relationship or marriage.

Words that appear in the same context or have semantic relevance have proximity in the vector representation. There are two different ways vectors are represented from text and they are count-based and predictive methods.

The Count-based Method in Vector Representation

The first form of word embeddings is the count-based method. The count-based method views a target word by the nature of words that co-occur with the word in a multiple of contexts. This is determined using some form of co-occurrence estimation. In this case, the meaning of a word is conceived by the words that co-occur with that word in a variety of scenarios.

A quick search for possible words that occur around the term ‘basketball coach’ from the Bleacher Report Sports website revealed some unsurprising named entities and terms.

It becomes clear that the terms around the tier of league usually co-occurs with basketball coach. Terms such as Division 1 and high school highlights that information. Other related terms could be the NBA, name of the professional or school athletic club. As I ran excerpt from a Bleacher Report’s article in the AllenNLP fine-grained named entity visualisation tool to understand some other related terms.  Date, name of the coach and sports team or institution can co-occur with the term basketball head in the same sentence. There are also terms that highlight the hierarchical nature of the term ‘basketball coach.’ Some of these terms are head, lead and assistant. With these terms comes responsibility, reward and consequences. It is not shocking to see terms such as firing, sacking and remuneration related phrases like ‘how much?’

In evaluating the suitability or strength of the co-occurrence of two words, various techniques have been suggested and these include log likelihood, ratio (LLR), χ 2 measure and the pointwise mutual information. The  Oxford English Dictionary (The Second Edition) is believed to contain about 171,476 active words and 47,156 dated words. With Count-based vector representation approaches, among all the words in the present day vocabulary, only about 29 words will co-occur with a target word.

The count-based approaches for vector representation are believed to be fast in training. These methods are also expected to efficiently use statistics in the embedding and prediction of words. On a less positive note, they are viewed as the best for capturing word similarity. These word embedding methods have disproportionate importance to large counts. One of the most popular count-based methods is Latent Semantic Analysis (LSA) or also viewed as Latent Semantic Indexing. LSA is a word – document co-occurrence matrix, as a co-occurrence matrix, it could have downsides to capturing new words or sparsity of words. To address this drawback, count-based vector representations utilise feature reduction approaches like Singular Value Decomposition (SVD) as a post-processing step in the prediction process.

Prediction-based word vectorisation methods

Prediction based approaches for word vectorisation seem to have better performance across a variety of NLP domains such as named entity recognition, machine translation and role labelling. These methods tend to have lower dimensions leading to better dense word representation. Some of the predictive methods are RNN (Recursive neural network), NLM (Neural network language model) and the popular Word2vec model developed by Tomas  Mikolov and a group of Google researchers.

These predictive-based vectorisation methods produce lower dimensions and account for rich and dense word representations. Word2Vec have two algorithms – the Continuous Bag of Words (CBOW) and the Skip Gram model.  With these algorithms come two training methods – hierarchical Softmax and negative sampling. The Continuous Bag of Words (CBOW) method predicts the centre word based on the context words. With Skip Gram, the surrounding words are predicted based on the centre words. Each word has two vector representation, one vector is for the centre word and the other for the context terms. Overall, Word2Vec improves the objective function by placing similar words. The predictive-based count methods generate improved performance and can capture complex patterns.

On the downside, these algorithms are believed to have efficient use of statistics and scale with corpus size.

Image Source:

Introducing the concept of relational intent

I have been reading research articles and thesis on the concept of relational reasoning. It is quite an interesting concept that is deeply rooted in the fields of cognitive science, neuroscience and artificial intelligence. Humans are generally regarded as relational beings as we constantly seek interaction and affection from others. The ability to discern meaningful patterns in a stream of data ensures we are not prisoners of our own senses. We utilise our senses such as sights, smell, sound and touch to encode data on a daily basis. A small portion of these data attains a sense of meaning when we find useful patterns. These patterns enable our ability to understand concepts and take necessary actions.

Businesses around the world are preoccupied with the intents of their potential customer. An airline will be interested in customers who type such terms such as ‘flight tickets from Rome to London’ or someone who likes the ‘Sky scanner Facebook page.’ These intents expressed by customers are usually not in isolation. It is always important to analyse the entity of a flight ticket and understand its attributes or elements. The next step will entail understanding the relationship between the attributes of this ticket. Some great attributes of flight tickets are: made of paper or card, contains a unique ticket number, issued in exchange for cash or air miles, vary in price depending on seat class, cost more in peak periods and closer to departure, presented before boarding, printed and verified at stations or airports and has designated or flexible dates. Understanding the attributes of a flight ticket helps you dissect the relational nature of the intent of  customers researching for a flight ticket.

Intent as a product of relational thinking

Every product or service a brand sells is a percept. We are reminded without relational thinking or reasoning, the numerous percept humans are flooded with via advertising, branded placements, sponsorship or content marketing will remain as separate entities and fail to influence human thought and action. The power of patterning helps transform the percept of an airline ticket advert to Barcelona become a concept of an airline ticket to watch a Barcelona game for a football fan or an opportunity of a culture traveller to visit historic landmarks in the Catalan city.

In our earlier post, we discovered relational thinking is spontaneous, less personal and shallower than it’s relational reasoning counterpart. In terms of user or customer intent, relational thinking could be used to transform percepts to concepts where the item or service of interest is inexpensive or an everyday item. Using the airline ticket example, one could adopt a relational thinking perspective through spontaneity by buying a cheap ticket from London to Barcelona. When it comes to purchasing a ticket from the UK to Australia, relational reasoning may need to be adopted as it is a more expensive purchase.

Intent as a product of relational reasoning

In the example of buying a flight ticket to Australia, relational reasoning may have to be employed as it is quite an expensive and long haul experience. One will have to dig deep into a personal narrative to justify the purchase. It could either be going to see loved ones or fulfilling a childhood dream. There has to be a deeper and personal narrative for most individuals to inspire purchasing a flight ticket from London to Australia. Relational reasoning is considered a higher-order form of cognitive thought process. Our ability as humans to perceive patterns will help us understand the connection between an airline ticket and spending time with our loved ones in Australia and the happy feeling that will arise as a result of that visit.

We’ve discovered the higher-order form of thinking (relational reasoning) is made of four different types. The first is analogical reasoning which looks at similarities between a source and a target. Using the airline ticket example, we could draw similarities with when we purchased a train ticket. As they both need to be paid for, usually have a departure gate or platform, a departure time, need to be presented to gain entry and may have allocated seats.

The second form is known as anomalous reasoning which refers to a discrepancy, gap or deviation between the source and target. This could also be applied to the flight ticket example. You can easily compare a flight ticket to a musical concert ticket and discover a gap or deviation between the two entities. A concert ticket allows you entry to a venue and does not transport you from one destination to the other.  A flight ticket, on the other hand, will usher you into an aeroplane that takes you from one point to another.

The third type is known as antithetical reasoning which identifies an opposition or disagreement between the source and target. It is basically a target that’s the opposite of the source. Let’s say you are marketing coffee to customers and exploring the relational elements of these intent to drink coffee. The common sense reasoning toolkit reminds us of some common properties of coffee which include served hot, bitter and good in the morning. As hot is an antithesis of cold, you’ll understand It will be best to position your product and content as an opposite of cold. I.e coffee is not cold but hot enough to keep you warm. Producing more coffee during the winter or cold climate is also an antithetical reasoning approach.

The final form of relational reasoning is referred to as antinomous reasoning which depicts incompatibility between the source and target. Chilli spice as a source is incompatible with coffee as a target. As people do not usually add chilli in their coffee. Looking at the airline example, we are also able to detect incompatibility with booking a flight ticket for more than a year in advance  People are only able to book flights for the next 12 months. Supposing you’re an airline planning to advertise to music festival fans in a source destination. You’re the best advertising to these set of the audience about 12 months ahead of the festival date than anything further.

Customers all have intents when deciding to purchase your product or service. Their intent does not exist in isolation but exists in connection or relationship to other events.

10 key differences between relational thinking and relational reasoning

Our brain encounters billions of objects or precepts on a daily basis. These data come in varying forms such as sound, smell and images (vision). Our sensory systems such as eyes, nose, ears, tongue and touch encode these signals that could either be transmitted to our brains or ignored. It is absolutely impossible to process all the signals our senses come in contact with on a daily basis. All of these signals are known as precepts and a few of these become concepts. Entities or signals that assume the role of concepts are those we can develop a relational identity. Example, you visit a sports store and see running shoes displayed in the far right corner, these are all precepts. When you relate or associate the shoes with running a 5k, 10k, half or full marathon, they now become a concept. It takes relational thinking and reasoning to transform precepts into concepts. But there are differences between relational thinking and reasoning.

Differences between relational thinking and reasoning

1) Lower cost vs higher cost: Spontaneity tends to be attributed to relational thinking. Customers adopt relational thinking and reasoning before making any purchase online or in store. For less expensive items, we are most likely to be relative thinkers. A good example will be seeing an ad on your Facebook feed for a local music concert. The price is cheap or it won’t cost you much. At this point, you create a connection between the ticket and music or dancing. Without waiting to create deeper connections with the event you click on the ad to complete your purchase. On the other hand, if the festival ticket was to cost you much, you’ll apply relational reasoning. With relational reasoning, you’ll link the ticket to music and create richer narratives and correlation to the last time you attended a music festival. You might go ahead to remind yourself of how you felt after your last local music festival and what you’ll have to forego to buy this ticket. These mental narratives you create helps inform your decision to either scroll past the ad or click to buy the ticket.

2) Earlier Vs later concepts: Our senses are bombarded with innumerable precepts only a few of these entities capture our attention. Relational thinking helps convert these precepts to an initial or early form of concepts. Relational thinking helps us draw meaning from these concepts. Relational thinking is useful in linking together these precepts to become a concept. Relational reasoning can then take this earlier formed concepts to an advanced level. An advanced concept will have a deeper meaning.

3) Rudimentary vs refined concepts: Relational thinking should be praised for being instrumental in piecing disparate precepts to form a meaningful concept. Although the concepts formed from relational thinking is believed to be rudimentary  it has the ability to influence human thought or action. A good example will be seeing a sneakers as a precept and then connecting it to basketball to form a concept of basketball trainers. It is rudimentary but is meaningful. When you now create a link between the basketball trainers and when you played basketball in high school, it then creates a more refined meaning.

4) Weaker vs stronger intent: In marketing the user intent is always important. Digital advertising aims at sending the right message to inspire an audience with a strong intent to take action. Humans see about 5000 ads a day. Some of these ads are on video platforms like YouTube or search engines like Google and Bing. There are also social media ads on Facebook and out of home ads on buses and tube platforms. Most ad impressions are precepts, a few are either lower or higher concepts. When a product and an ad resonates with the audience, they create links with their personal experience and are likely to take action or talk about it.

5) Spontaneous Vs intentional: In relational thinking, connections between entities is made in a fleeting and spontaneous manner. It is instinctive, as we have to do less or mind work. You arrive at the train station and the gates are shut. Whilst wondering, you now as we a poster stand stating a two-day strike. That was an instructive and spontaneous connection. You’ve made connected the percept of a locked gate to the notice stand to form a concept of workers protest. You can intentionally dig deeper by recalling a newspaper article you read last night that cited the reason behind the strike.

6) Lower vs higher-forms of cognitive thought and performance: In her article, Patricia Alexander made emphasised that relational reasoning is a higher-order form of cognitive thought and performance. It is, therefore, safe to say relational thinking is a lower-order form of cognition. As we understand cognition focuses on the acquisition of knowledge via senses, reasoning and experience.

7) External vs internal: We create an external link between entities to lead to the formation of concepts. It’s external because It does not necessarily require a personal connection or narrative. The workers riot example provided earlier, clearly illustrates how relational thinking is external and relational reasoning is internal. With relational reasoning, we tend to draw a personal connection or link the entity to  personal experience.

8) Unconscious Vs conscious: We derive meaning from objects or entities we encounter every day in either an unconscious or conscious manner. In relational thinking, we instinctively draw meaning without realising we’ve made a connection between entities. People queuing for coffee, unconsciously we think the shop may be understaffed. But in relational reasoning, we ask someone on the queue if they’ve been in line for long and could be the possible reason for the long queue. As it may be rush hour,a group of tourist, understaffed, broken coffee machine or anything else. There is that conscious effort with relational reasoning.  

9) Effortless vs full of effort: Relational thinking is believed to be effortless in nature. As one tends to create light patterns between entities to form a concept. On the contrary, relational reasoning requires more effort due to the deeper nature of connections required to form
meaningful concepts.

10) Less emotional vs more emotional As we’ve seen earlier, relational thinking is surface and lacks the deep connection than relational reasoning possesses. When we identify patterns between entities with our personal experience or narratives, our emotions are more likely to be triggered.  Advertising campaigns that connect with us emotionally are the ones that bond with our personal story or experience.

It is a thin line that differentiates the intuitive relational thinking from the intentional relational reasoning. Harnessing the power of patterning is quite important and products that trigger our relational reasoning faculty are more likely to succeed.

The importance of fine grained named entity recognition

Name entity recognition is usually viewed as a low level NLP task but could be crucial to other tasks such as named entity disambiguation and linking. It is also relevant for information retrieval and question and answering applications. Standard named entity recognition classes are usually person, location and miscellaneous. I used the AllenNLP demo application to run a quick NER test for the Hacksaw ridge storyline. The text was extracted from the IMDB website and the below image indicates the entities. Previous research led to the identification of three core classes – person, location and organisation.  During the Computational Natural Language Learning conference of 2003, a miscellaneous type was then added to the mix

The below reveals the four main entity classes or the non-fine grained, All four (person, organisation, location and miscellaneous) entity tags are highlighted. Desmond T. Doss is the name of the star character in the story and it is accurately identifies him as a person. When his surname was mentioned (Doss’s), it also has the accurate personal tag.  The miscellaneous tag was used for events like the ‘Battle of Okinawa’ and a thing ‘Congressional medal of honor.’

fine -grained named entity recognition

Whilst the stas Further research also introduced geopolitical entities such as weapons vehicles and facilities.  These were all contained in the article, “An empirical study on fine-grained named entity recognition”, and the authors further revealed that the apparent challenges of developing a fine-grained entity recognizer are because of the selection  of the tag set, creation of training data and the creation of a fast and accurate multi-class labelling algorithm.

With the benefit of AllenNLP, a fine-grained entity recognition was ran. The miscellaneous tag used for ‘the Congressional Medal of Honor’ phrase in  a standard NER (Named Entity recognition) task is different in a fine-grained NER. ‘Work of art’ is revealed as an entity tag and adds more meaning than a miscellaneous tag.

fine-grained named entity recognition

Previous research on fine-grained named entity recognition has led to more in-depth tags. In these works, the main tags are divided into sub-tags to generate more meaning to the entities. For example, the ‘Person’ entity is broken down to sub-categories such as actor, architect, artist, athlete, author, coach, director, doctor, engineer, monarch, politician, religious leader, soldier and terrorist.

The popular python NLP library SpaCy, also has a named entity recognition feature and some of the tags it supports are person, NORP (Nationalities or political or religious group), FAC (Building, airports, highways, bridges e.t.c), GPE (Countries, cities, states) and a lot more entities. One can easily state that a fine-grained named entity recognition application or library could be instrumental in narrative intelligence and relational reasoning. As the more detailed or fine-grained meaning the entities is a narrative can be expressed, the more enriched the story becomes and its ability to embody a string relational reasoning.


Relational reasoning as the basis for human intelligence

A blog and research paper from Google Deepmind brought our attention to the concept of relational reasoning. As humans we have the innate ability to connect the dots or plot a narrative from piece of information to make a decision either to run a search, make a purchase or predict the outcome of a movie. Artificial agents are yet to attain the creative human ability to connect entities together via a narrative exercise that leads to human action. A few days ago my wife told me an interesting story. She ran into a friend at the entrance of a shop, he’s just grabbed himself a drink and some popcorn. She asked him where he was headed and he said to West India Quay. It is a place in east London which boasts of a handful of restaurants, bars and a cinema. She pieced together the the popcorn, drinks and West Indian Quay and asked him if he was headed to the cinema? He was quite surprised and affirmed he was headed to the cinema. She then predicted or stated that he owns a monthly Cineworld membership. He was quite shocked and nodded to having a monthly Cineworld membership. She told me, her experience of buying popcorn, drink and preferring the West India Quay Cineworld with her Meetup movie mates, assisted her in creating a relationship and narrative from the little information she received to correctly predict the intention of her friend. This is relational reasoning at work as the deepmind team clearly mentioned “ We carve our world into relations between things. And we understand how the world works through our capacity to draw logical conclusions about how these different things – such as physical objects, sentences, or even abstract ideas – are related to one another.”

Continue reading