Rethinking keyword clustering through a semantic and cognitive lens

Keyword clustering remains an important task for search marketers and is key to driving organic traffic, improving rankings and enhancing the authority of the website on a given topic.

There are a handful of practical and great tools that exist in clustering keywords such as Keywordsinsights.io, SurferSEO, Inlinks and the incumbent greats such as Semrush and Ahref. For clustering people also asked questions the likes of AlsoAsked and AnswerThePublic will come in handy. The former clusters by PAA questions from SERP while the latter uses Google Autosuggests and has also added PAA to its collection.

These and a few more that haven’t been mentioned are conventional tools used for keyword clustering and have yielded ranking benefits and an inspiring source of content generation opportunities for most search marketers. But, I have a problem with the approaches and logic utilised by these tools. They almost pigeonhole human search behaviour into some lexical and linear route which ignores how searchers think, reason and act. They are more focused on clustering based on word cooccurrence and cosine similarity of the lexical nature of words but can often lack the semantic and sequential depth akin to human cognition and behaviour.

Continue reading →

The role of the betweenness centrality measure in networks

Ever wondered how to detect the most influential individual, station, motorway or node in a network? It is not a normal popularity test but a mathematical way for determining a node with the most impact in the flow of information within a network. A very good way of determining nodes that are great connectors for moving from one point of a graph to another. In a real-world situation, when these nodes are removed, the movement to other nodes in the graph becomes quite challenging. With betweenness centrality, the number of paths a node is a part of is also revealed. In a connected graph, the Betweenness Centrality algorithm calculates the shortest path between nodes in the given network. The weight between nodes is quite important in determining the shortest path as factors such as frequency, capacity, time, flow and influence determine these weights.

Continue reading →

The importance of fine grained named entity recognition

Name entity recognition is usually viewed as a low level NLP task but could be crucial to other tasks such as named entity disambiguation and linking. It is also relevant for information retrieval and question and answering applications. Standard named entity recognition classes are usually person, location and miscellaneous. I used the AllenNLP demo application to run a quick NER test for the Hacksaw ridge storyline. The text was extracted from the IMDB website and the below image indicates the entities. Previous research led to the identification of three core classes – person, location and organisation. During the Computational Natural Language Learning conference of 2003, a miscellaneous type was then added to the mix

The below reveals the four main entity classes or the non-fine grained, All four (person, organisation, location and miscellaneous) entity tags are highlighted. Desmond T. Doss is the name of the star character in the story and it is accurately identifies him as a person. When his surname was mentioned (Doss’s), it also has the accurate personal tag. The miscellaneous tag was used for events like the ‘Battle of Okinawa’ and a thing ‘Congressional medal of honor.’

Whilst the stas Further research also introduced geopolitical entities such as weapons vehicles and facilities. These were all contained in the article, “An empirical study on fine-grained named entity recognition”, and the authors further revealed that the apparent challenges of developing a fine-grained entity recognizer are because of the selection of the tag set, creation of training data and the creation of a fast and accurate multi-class labelling algorithm.

Continue reading →