in Non Technical

From keyword research to topic modelling and then concept modelling

Keywords are very powerful in today’s digital landscape. We all love us some keywords, right? The monthly search volume and competitiveness of keywords guide our product, brand, content and advertising strategy. Several blogs have been written on keyword research strategy. In connection with these strategies are tools such as Google Keyword Planner, Ahrefs, Answer the public, Moz Keyword Explorer, Google Keyword Trends and other leading tools.

Google Trends is a great tool from Google to help gauge the search volume of keywords, topics and entities over time. A quick search for London Marathon reveals London Marathon as a search term and a topic. Whilst the search term focuses more on the ‘London Marathon’ searches carried out by users. London marathon as a topic is a collection of London Marathon related search terms and engagement with London Marathon related publications. Whilst Google Trends fails to provide an exact or bucketed search volume data, its scaling system of 1-10 indicates the popularity of a search term over a time range.

Google’s keyword planner provides an indication of the monthly search volume of the London Marathon and a charity looking at possibly targeting runners exploring fundraising places can plan their advertising campaign accordingly.

The screenshot below highlights the search volume for the London Marathon as a keyword

Topic Modelling:

Topic modelling as a technique is often used in the field of natural language processing to discover or extract hidden topics from a large corpus.  Gensim a Python library is quite popular in topic modeling.

Latent Dirichlet Allocation (LDA) is a popular generative statistical model used to explain similarities in some parts of data. It emphasizes that documents are a composition of different topics. Words are then allocated to the relevant topics. LDA is also considered important in document classification, sentimental analysis and bio-information analysis. It is a bag of words model and has Hyper Parameters divided into Alpha and Beta

Alpha: Per topic document distribution

Beta: Per topic word distribution

High Alpha: Document contains most of the topic

Low Alpha: Document contains some of the topics

High Beta: Topic contains most of the words

Low Beta; Topic contains some of the words

Based on the above scenarios one can suffice that:

High Alpha: Documents will appear to be similar

High Beta: Topics appear more similar

Google Trends is a great platform for discovering topics across several industries. A quick search below for the term London Marathon reveal trends for the search term and topic.

Whilst the ‘London Marathon’ search term reveals the popularity for that keyword. London Marathon as a topic includes the search term, synonyms, volume of people reading about the marathon. It clearly indicates the interest of people on the London Marathon. We can also analyse a London Marathon related article for related topics or subtopics within the given article.

A great article about the London Marathon 2019 was written by the New York Times.

Concept Modelling:

Concepts are important aspects of human cognition and previous work have revealed that concepts are products of relational thinking and relational reasoning. The first category of concepts is conceived to require lower-order cognitive ability while the more advanced form are viewed to be higher-order. As we’ve covered in a previous article, relational thinking generates lower order-concepts that tends to be limited to the inherent relations of an entity. The Schema markup scheme highlights the inherent relations that are apparent in most entities.

A quick look at Google’s Knowledge Graph for London Marathon reveals inherent relations of this event to consist of a location (London), date (April), course record holders and established the date. This relationship below aids us to better comprehend the concept of a marathon.

I used the AllenNLP fine-grained named entity recognition tool to visualise the properties in the London Marathon article. It clearly highlights that the concept of London Marathon usually embodies properties such as London (GPE), Sunday (Date), Eliud Kipchoge (person and winner), Kenya (country of origin of the winner), 2hrs, 2 minutes 37 seconds (finish time), 43,000 (total number of participants) and Tower Bridge (located at mile 12). This sample text gives us a great understanding of the London Marathon. Indicating that there is a relationship between Kenya and London Marathon or Marathon in general. It also establishes the competitive nature of a marathon as it uses cardinals such as second fastest time and some 43,000 as a portrayal of the sporting nature of a marathon.

Surprisingly, a search for London Marathon on Google Trends also produced the below-related topics and breakout queries. I was wondering about the correlation between the London Marathon with Lauren London. On a literal note, Lauren London was Nipsey’s partner and the name of his store was Marathon store.

This article establishes a connection between the London marathon and Nipsey Hustle. As it states London shares three numbers with Nipsey in the base cyphers. Without going into the numerological connection between London Marathon and Nipsey Hustle, Lauren London and The Marathon Store. On a conceptual level, the London Marathon adventure could be inspired by the loss of a relative and Nipsey’s Marathon Store sales skyrocketed to about $10 million after his passing. The entities of London Marathon (event) and The Marathon Store (Organisation) are fuelled by goodwill. One can classify both in the same semantic space on the literal and conceptual level.

Write a Comment