The role of the betweenness centrality measure in networks

Ever wondered how to detect the most influential individual, station, motorway or node in a network? It is not a normal popularity test but a mathematical way for determining a node with the most impact in the flow of information within a network. A very good way of determining nodes that are great connectors for moving from one point of a graph to another. In a real-world situation, when these nodes are removed, the movement to other nodes in the graph becomes quite challenging. With betweenness centrality, the number of paths a node is a part of is also revealed. In a connected graph, the Betweenness Centrality algorithm calculates the shortest path between nodes in the given network. The weight between nodes is quite important in determining the shortest path as factors such as frequency, capacity, time, flow and influence determine these weights.

Continue reading →

Exploring Dijkstra’s shortest path algorithm

Several graph algorithms can help reveal hidden patterns in connected data. These algorithms can be classified into several categories such as approximations (e.g clustering), assortativity (e,g average neighbour degree), communities (e.g K-Clique) and centrality (e.g shortest path). In this blog, we will be looking at one of the most popular shortest path algorithms known as the Dijkstra’s algorithm. We will look at an example table and code implementation for this algorithm. Shortest path algorithm can be relevant in a traffic network situation a user desires to discover the fastest way to move from a source to a destination. It is an iterative algorithm that provides us with the shortest path from an origin node to all other nodes in the graph. This algorithm can work in weighted and unweighted graph scenarios.

Continue reading →

The impact of bridges in a graph

In graph theory, a bridge is an edge whose removal leads to the disconnection of the network. It is usually found in undirected graphs and can also be viewed within the context of disconnected undirected graphs. In these disconnected undirected graphs, removing a bridge leads to a further separation of the subgraphs. Let’s explore some examples below.

Bridge in an undirected connected graph:

The below graph is undirected but it is connected. There are 9 nodes and 10 edges in this graph. You can determine that one edge is in red and the rest are in grey/black. The edge in red is the bridge and removal will lead to a disconnection of node [L] from the rest of the graph. Assuming all of these nodes were train stations and the edges are London tube lines. A rail track maintenance work, shortage of control room staff or a broken train on the track are some of the reasons that can cause a train line that connects node [H] to node[L] to become disconnected or temporarily removed from the train network or graph. When this occurs, people living close to the station [L] will become stranded or short of transportation options. This is a connected graph because the disconnected unit is not a sub-graph on its own but a single node stranded and possibly struggling in the network.

Continue reading →

Implementing Bellman-Ford’s algorithm on negative edge weights of a graph

Negative cycles usually occur in weighted directed graphs. What then is a negative cycle? A negative cycle takes place when the total sum of the cycle in a graph is negative. Negative weights are required for these cycles to exist. It will be great to explore the meaning and scenarios of negative weights.

Negative weights of a graph:

In an earlier article, I touched on the different types of weights in a graph. We’ve got flow, capacity, frequency, distance, time, resistance and cost weights. Negative edges and cycles are not always common in graphs. Their existence determines the most suitable algorithm to either detect or print the negative cycle. Before looking into the best algorithmic implementation of the negative cycles in a graph, it will be great to explore some scenarios or examples of negative edges.

Logistics Example: Let’s assume you are a delivery driver contracting for an eCommerce company and the weight of your edges W(a,b,c) of an edge a,b,c is the cost of petrol or diesel from the depot to the customer destination. Nodes x,y,z form a negative cycle as these are last-minute delivery destinations that the contracting driver has just accepted from a different supplier.

Continue reading →

How to discover sink nodes in a graph

We all understand graphs or networks are composed of nodes and edges. Some of these nodes can exist in a directed or undirected structure. In a directed graph, there edges can be either incoming or outgoing from a given node. A sink refers to a node in directed graph with no outgoing edges. In a simple fashion, you can view a sink as someone who receives a lot but never chooses to give. Or, is obsessed with hoarding. In a social network, a sink or friendship node will be poor in passing on important information to the group. On the other hand, they might be great in keeping secrets and avoid allowing things to slip up. A sink could either be local or global. Local sinks are also referred to as a terminal and are directed graphs with no departing edge. A global sink is usually viewed as a sink and is a node that is reached by all the nodes in a given directed graph.

Simple illustration of a sink node

Continue reading →

The world’s leading graph databases

The increase in data has led to a growing need for graph databases or technologies. With a graph database, the relationships that exist within the data can be stored, refined and queried properly. A graph database, therefore, is a database created to store data without restricting it to a pre-set model. The data in these graph-based technology expresses how each entity is related to others. Nodes and edges are quite important when looking at graph databases as the later represents the relationship with the former. This nodes and edges setup, makes the retrieval and querying of relationships easier. Retrieving complex hierarchical structures is an advantage that these graph technologies have over relational databases. The software review forum G2 has a list of the top-rated graph databases in the market. The leading graph database technologies on G2 have had more reviews, a higher percentage of positive feedback, more data generated from other online networks and social platforms.

Continue reading →

Introduction to a graph data structure in Python

Graphs play an important part in intelligent technologies such as recommendation engines, search engines, fraud detection applications, network mapping, customer journey applications, latency evaluation and dependency management. Graphs data structures are also powerful in social networks such as Twitter, Facebook, LinkedIn and a few others. Graphs in the context of the social graph are used to recommend friends, events, company pages, jobs and personalise the ad experience. These are some of the use cases of graph data structures and applications.

A quick reminder of the definition, graphs are considered to be the composition of vertices (nodes) with respective pairs of edges(also known as links or arcs). In a nutshell, they are viewed to comprise a determined set of nodes and edges which connect these given nodes.

The example below adapted from GeekforGeeks indicates 5 nodes {0,1,2,3,4} and 7 edges {01,12,23,34,04,31,13,14}

Continue reading →

Different ways of representing Graphs

Graph data can be represented in different formats for onward computation. The choice of the graph representation hugely relies on the density of the graph, space required, speed and weight of edges. The main ways a graph can be represented are as an adjacency matrix, incidence matrix, adjacency list and incidence list.

Adjacency Matrix: This is one of the most popular ways a graph is represented. One of the core aims of this matrix is to assess if the pairs of vertices in a given graph are adjacent or not. In an adjacency matrix, row and columns represent vertices. The row sum equals the degree of the vertex that the row represents. It is advisable to use the adjacency matrix for weighted edges. You replace the standard ‘1’ with the respective weight. It is easier to represent directed graphs with edge weights through an adjacency matrix. Adjacency matrix works best with dense graphs. As dense graphs usually have twice the size of the edges to the given nodes.

Continue reading →

The rise of eight different types of graphs

We are witnessing the rise and adoption of graph databases across different verticals. Gartner acknowledged the five different types of graphs as social, intent, consumption, mobile and interest. In a presentation titled: Graph All the things! Introduction to graph databases, the team from Neo4j captured Gartner’s graph classifications in the illustration below. There is a slight difference in how one of the types of a graph is named by Neo4j in comparison to Gartner. To Neo4j it is a payment graph while to Gartner it is the consumption graph. There is the argument that a consumption graph is a better name as we do not necessarily pay for every consumption. We will now look at each graph and add some additional types.

Continue reading →

An introduction to comparable entity mining at a structural level

The comparing of entities is usually important in human decision making.

We are constantly comparing entities daily from holiday destinations, new mobile phone and next family car. Comparing entities look at the relatedness of these objects or concepts. Relatedness does not look at only similarity (analogy) but other relational structures such as resistance (antithesis), discrepancy (anomaly) and incompatibility (antinomy).

Comparative entity mining is crucial to any keyword, concept, content, product and marketing strategy. This is because no product exists in isolation. It is therefore important for businesses to place themselves in the shoes of potential customers and explore the alternative products that are vying for the same attention and mind space. Conducting this exercise will help brands position their products in a more compellingly through through engaging branding and compelling storytelling.

The field of biomedical informatics have employed comparative entity mining between genes and proteins. These comparisons have now extended to diseases. The comparisons in the biomedical field looks at functional similarities more than sequential similarities. This has inspired me to classify comparative entity mining to three stages: structural, functional and sequential.

This blog will focus on the structural stage of comparing entities. We will be comparing bananas to plantains. When looking at an entity or a concept from the structural level we focus more on the size, colour, shape and physical properties. The internal properties that make up the product.

Continue reading →