ToriTompkins

Tori Tompkins' Blog

Visualising Network Data in Power BI with Python Integration and NetworkX

The long awaited Python Integration in Power BI added earlier this month welcomes the opportunity for further customised reporting by exploiting the vast range of Python visualisation libraries.

Among my favourite of these Python visualisation/ data science libraries is NetworkX, a powerful package designed to manipulate and study the structure and dynamics of complex networks. While NetworkX excels most at applying graph theory algorithms on network graphs in excess of 100 million edges, it also provides the capability to visualise these networks efficiently and, in my opinion, easier than the equivalent packages in R.

In this article, I will explain how to visualise network data in Power BI utilising the new Python Integration and the NetworkX Python library.

Getting Started

To begin experimenting with NetworkX and Python in Power BI, there are several pre-requisites:

  • Enable Python integration in the preview settings by going to File –> Options and Settings –> Options –> Preview features and enabling Python support.

clip_image002_thumb4_thumb

  • Ensure Python is installed and fully up-to-date.
  • Install the following Python libraries:
    • NetworkX
    • NumPy
    • pandas
    • Matplotlib

Loading Data

The data I used was created to demonstrate this task in Power BI but there are many real-world network datasets to experiment with provided by Stanford Network Analysis Project. This small dummy dataset represents a co-purchasing network of books.

The data I loaded into Power BI consisted of two separate CSVs. One, Books.csv, consisted of metadata pertaining to the top 40 bestselling books according to Wikipedia and their assigned IDs. The other, Relationship.csv, was an edgelist of the book IDs which is a popular method for storing/ delivering network data. The graph I wanted to create was an undirected, unweighted graph which I wanted to be able to cross-filter accurately. Because of this, I duplicated this edgelist and reversed the columns so the ToNodeId and FromNodeId were swapped. Adding this new edge list onto the end of the original edgelist has created a dataset with can be filtered on both columns later down the line. For directed graphs, this step is unnecessary and can be ignored.

Once loaded into Power BI, I duplicated the Books table to create the following relationship diagram as it isn’t possible to replicate the relationship between FromNodeId to Book ID and ToNodeId to Book ID with only one Books table.

clip_image004_thumb3_thumb

From here I can build my network graph.

Building the Network Graph

Finally, we can begin the Python Integration!

Select the Python visual from the visualizations pane and drag this onto the dashboard.

clip_image006_thumb3_thumb

Drag the Book Title columns of both Books and Books2 into Values.

clip_image008_thumb1_thumb

Power BI will create a data frame from these values. This can be seen in the top 4 lines in the Python script editor.

clip_image010_thumb3_thumb

The following Python code (also shown above) will create and draw a simple undirected and unweighted network graph with node labels from the data frame Power BI generated:

import networkx as nx
import matplotlib.pyplot as plt
G = nx.from_pandas_edgelist(dataset, source="Book Title", target="Book Title.1")
nx.draw(G, with_labels = True)
plt.show()

** NOTE: You may find that the code above will fail to work with large networks. This is because by default networkx will draw the graph according to the Fruchterman Reingold layout, which will position the nodes for the highest readability. This layout is unsuitable for networks larger than 1000 nodes due to the memory and run time required to run the algorithm. As an alternative, you can position the nodes in a circle or randomly by editing the line

nx.draw(G, with_labels = True)

to

nx.draw(G, with_labels = True, pos=nx.circular_layout(G))

or

nx.draw(G, with_labels = True, pos=nx.random_layout(G))
**

This will produce the network graph below:

clip_image012_thumb3_thumb

You are also able to cross filter the network graph by selecting rows in the table on the right-hand side:

clip_image014_thumb4_thumb

Conclusion

Python visuals are simple to produce and although the visual itself isn’t interactive, they will update with data refreshes and cross filtering, much like the R integration added 3 years ago. The introduction of Python in Power BI has opened doors for visualisation with libraries such as NetworkX, to visualise all BI networks from Airline Connection Flights and Co-Purchasing networks to Social Network Analysis.

Loading