in Entity mining

An introduction to comparable entity mining at a structural level.

The comparing of entities is usually important in human decision making. 

We are constantly comparing entities daily from holiday destinations, new mobile phone and next family car. Comparing entities look at the relatedness of these objects or concepts. Relatedness does not look at only similarity (analogy) but other relational structures such as resistance (antithesis), discrepancy (anomaly) and incompatibility (antinomy). 

Comparative entity mining is crucial to any keyword, concept, content, product and marketing strategy. This is because no product exists in isolation. It is therefore important for businesses to place themselves in the shoes of potential customers and explore the alternative products that are vying for the same attention and mind space. Conducting this exercise will help brands position their products in a more compellingly through  through engaging branding and compelling storytelling. 

The field of biomedical informatics have employed comparative entity mining between genes and proteins. These comparisons have now extended to diseases. The comparisons in the biomedical field looks at functional similarities more than sequential similarities. This has inspired me to classify comparative entity mining to three stages: structural, functional and sequential. 

This blog will focus on the structural stage of comparing entities. We will be comparing bananas to plantains. When looking at an entity or a concept from the structural level we focus more on the size, colour, shape and physical properties. The internal properties that make up the product

Shape: Bananas are understood to have curved shape and obtain this unique form through a process of negative geotropism. Banana grow against gravity – as they turn towards the sun than the ground in growth. The botanical history of the banana, growing in the rainforest and navigating insufficient light impacted its curved shape. 

Colour: Most bananas usually start in green colour before becoming yellow as they become ripe. Plantains on the other hand are green when ripe and yellow when overripe. Some overripe plantains can also become black  when overripe. In other words, at the height of being overripe these plantains are more black than yellow. 

Size: The average plantain is bigger  than a banana. Plantains are also more thick-skinned than bananas. Thick-skinned in a more literal sense. Overall, plantains are believed to be tougher than bananas due to their thick skin. 

A search on Google for the term plantain generates a comparative result as the article from  FoodRepublic compares banana to plantain. One could guess that Google anticipates someone who types the term ‘Plantain’ will also be interested in ‘banana’ or keen to understand the difference between both terms.

plantain search on Google

Using Wordnet to understand the properties of both plantain and banana.

Wordnet is a lexical database of English that records relations among synonyms of words. It helps create a structure of category, concepts and properties to words and is commonly used in text analytics and artificial intelligence applications. We will use the NLTK Python toolkit to query these synsets. Wordnet splits entities into their respective parts of speech like noun and verbs. Terms are further separated in a serial order by their differing meanings (more on this later). Wordnet produces synsets that help us understand the category and properties of terms. Some of the synsets are holonyms, which captures a whole of a part (e.g a face is a holonym of the ear). Secondly, there are hypernyms which are a broader category of a specific concept (e.g colour is a hypernym of green). Hyponym is somewhat the opposite of hypernym, a specific concept of a whole category (e.g  mountain bike is a hyponym of bikes). Finally. Meronyms are terms which refers to using part of an entity to refer to its whole (e.g bicycle seats to refer to bikes)

A plantain query on Wordnet reveals 3 noun entities (‘plantain.n.01’, ‘plantain.n.02’, and ‘plantain.n.03’). We then looked at the hyponyms or the sub-types of ‘plantain.n.03’ and discovered examples such as the English plantain, Plantago Psyllium, Buckthorn and a few more. 

wordnet analysis of plantain

I was keen to dig into the part_meronyms or parts of a plantain to compare that to the banana to understand similarities from a structural perspective. Unfortunately there are no meronyms for any of the plantain entities

Plantain Meronyms

Wordnet reveals two different types of Banana (banana.n.01 and banana.n.02). Interestingly Wordnet captures plantain and plantain tree as a hyponym of banana. Wordnet only had hyponyms for banana.n.01 and not banana.n.02. Similar to plantain, no part_meronyms are present for banana. As such it is not possible to compare the structural properties (meronyms) of both entities from a Wordnet perspective. But it is quite clear that there are similarities in shape and colour. 

Overall, it is quite clear based on the similarity in structures and results from Google Search that plantain and banana are comparable entities.

Write a Comment