As an industry, the topic cluster is being heralded as one of the greatest templates for setting up a website to rank favourably for target keywords by interlinking in a well-knotted format. Hubspot has put together a great resource on topic clusters and provided an experiment carried out by their former team members that indicates that effectively interlinked pages had better placements on Google’s SERP. A quick note to add is that internal linking is always a great SEO optimisation strategy, regardless of topic clusters or not.

Topic clusters are significant in SEO and help structure contents and pages in line with an overarching theme and is hierarchically structured to or linked to form sub-pages. The drive here is for businesses and marketers to think about the topics they intend to own and not just keywords. To implement this strategy, pillar pages and content clusters are required. The pillar page contains all the key elements or content areas of the topic. While the cluster content is expected to go deeper into an area mentioned or highlighted in the pillar content. For example, on a seed term ‘mortgage calculator,’ the pillar page could be ‘Ultimate guide to using a mortgage calculator’ and content clusters pages could be on topics such as ‘What is a mortgage calculator and how does it work’ or ‘types of mortgage calculators explained.” There is no doubt that topic clusters well executed have tremendous benefits.

Rearranging the content of a website to ensure pages do not cannibalise each other but are interlinked to the central pillar page is likely to assist in enhancing the organic rankings and generating traffic for a website. Frameworks and approaches need to be consistently challenged to give room for innovation and breakthroughs. Sometimes a simple face-lift of an exciting approach is not enough to bring about progress but a paradigm shift. A bold but intellectually grounded approach that complements topic clusters but elevates how search marketers should approach the aggregation of keywords not just based on search volume by a Semrush or Ahref-inspired topic but a fundamental understanding of the key concept or seed term. This ushers in the concept of ‘semantic clusters.’ An approach of breaking down and clustering keywords via a model-driven approach through approaches from cognitive science, causal reasoning and decision theory. It is a better way to approach keyword clustering from a user-first and structured approach.

Clustering in search marketing should not begin and stop with topic clusters. A sure way of gaining more context-rich insights around a seed term to aid forecasting, trend analysis and demand intelligence is to develop semantic clusters as part of the clustering exercise. To better explain this position, let’s explore “running shoes” clusters from Semrush. A solely driven topic cluster strategy is fuelled by the search volume of related search terms per pillar page. These terms are clustered based on co-occurrence, syntactic semblance and SERP similarity.

On the other hand, semantic clusters focus more on a meaning-first and model-driven approach. It views keywords as concepts by creating semantic relations or edges based on the seed keyword or term. You can safely say, it is a user-first approach that maps or models the clustering based on how people view and engage with the concept in a real world. I can confidently say it is a cognitive, contextual, causal and decision-framed approach. By clustering search terms using deep cognitive, causal and decision models we cover all aspects of the meaning of the concept and can target relevant keywords even if there is no search volume generated by keyword research and SEO tools. To make the case for semantic clusters stronger, it is important to use an established framework or in this case a scaffold. This is where a tool like ConceptNet comes into place.

Exploring some of the ConceptNet relations that will serve as a premise for developing Semantic Clusters

ConceptNet is a multi-lingual knowledge graph or a semantic network used to represent commonsense knowledge about the world.

Below are some of the core semantic relations or edges that is demonstrated for a concept like a ‘car’

IsA: Represents a subclass relationship (e.g., “A car IsA vehicle”).

PartOf: Indicates that something is a part of a larger whole (e.g., “Engine PartOf car”).

HasProperty: Attributes a property to a concept (e.g., “Car HasProperty Loud”).

UsedFor: Describes the typical use of an object (e.g., “Car UsedFor transportation”).

CapableOf: Indicates an ability or function (e.g., “Car CapableOf moving”).

Causes: Represents a causal relationship (e.g., “Car Causes traffic”).

HasA: Specifies a necessary part ot attribute (e.g., “Car HasA windshield”).

MadeOf: Indicates materials used to make a concept. (e.g “Car MadeOf metal”)

Some of these relations can form the foundation for building semantic clusters. This will be developed further in subsequent blogs to clearly establish why more efforts should be taken to transcend beyond topic clusters to semantic clusters.

The role of the betweenness centrality measure in networks

Ever wondered how to detect the most influential individual, station, motorway or node in a network? It is not a normal popularity test but a mathematical way for determining a node with the most impact in the flow of information within a network. A very good way of determining nodes that are great connectors for moving from one point of a graph to another. In a real-world situation, when these nodes are removed, the movement to other nodes in the graph becomes quite challenging. With betweenness centrality, the number of paths a node is a part of is also revealed. In a connected graph, the Betweenness Centrality algorithm calculates the shortest path between nodes in the given network. The weight between nodes is quite important in determining the shortest path as factors such as frequency, capacity, time, flow and influence determine these weights.

Continue reading →

The importance of fine grained named entity recognition

Name entity recognition is usually viewed as a low level NLP task but could be crucial to other tasks such as named entity disambiguation and linking. It is also relevant for information retrieval and question and answering applications. Standard named entity recognition classes are usually person, location and miscellaneous. I used the AllenNLP demo application to run a quick NER test for the Hacksaw ridge storyline. The text was extracted from the IMDB website and the below image indicates the entities. Previous research led to the identification of three core classes – person, location and organisation. During the Computational Natural Language Learning conference of 2003, a miscellaneous type was then added to the mix

The below reveals the four main entity classes or the non-fine grained, All four (person, organisation, location and miscellaneous) entity tags are highlighted. Desmond T. Doss is the name of the star character in the story and it is accurately identifies him as a person. When his surname was mentioned (Doss’s), it also has the accurate personal tag. The miscellaneous tag was used for events like the ‘Battle of Okinawa’ and a thing ‘Congressional medal of honor.’

Whilst the stas Further research also introduced geopolitical entities such as weapons vehicles and facilities. These were all contained in the article, “An empirical study on fine-grained named entity recognition”, and the authors further revealed that the apparent challenges of developing a fine-grained entity recognizer are because of the selection of the tag set, creation of training data and the creation of a fast and accurate multi-class labelling algorithm.

Continue reading →