Conf42 Python 2022 - Online

Financial Network Analysis using Python

Video size:

Abstract

Historically, networks have been studied extensively in graph theory, an area of mathematics. After many applications to several different subjects including physics, health science, and sociology, over the last years, network analysis has become an active topic not only in data science but also in finance. In a nutshell, a network is a system with nodes connected by linkages. Network analysis is popular to describe the characteristics or behaviors of complex networks. There has been also some research conducted to model the stock market using networks. The motivation is that the performances of certain stocks are often correlated, either because of the general market direction or the cyclicity of the same segments of the market. To model the stock market using network analysis, different stocks are represented as different nodes. However, defining the interaction, or creating edges, between different nodes is rather non-intuitive, unlike some physical networks, such as friendship network, in which interaction between different nodes can be defined explicitly. A traditional way to create edges between different nodes for stock market is to look at the correlations of some defined attributes. In our case, we analyze one of the reputed stock index data and identifies stock relationships in it. We propose a model that can depict such relationships and create networks of stocks. We investigate and create different networks according to the degree of correlation of stocks. Finally, we will visualize and evaluate our results accordingly. In this talk, we are going to cover the following points: • Introduction to Networks • History & why graphs • Finance evolution in networks • Understanding Network structure • Leveraging the power of Python Graphs • Real-time finance usage of network analysis using two examples(hands-on) • Wrap-up

Goal

By the end of the talk, I will make sure that: • How is data connected with other data?
• How do these financial connections matter? • How do complex systems move in time in the stock market? I promise you; it is an interesting one!

Summary

  • Kalyan Prasad is a self taught data scientist and analytics manager. He will talk about financial network analysis using Python. Feel free to follow or connect to him. If you have any feedback or suggestions, feel free to write me.
  • Graphs were first introduced in the 18th century by swiss mathematician called Leonard Euler. Here's the one has attempt and ultimate solution to the famous coins with Mitch problem. In modern times, graph algorithm, graph applications and graph analytics has been booming.
  • Network data are generated when we consider relationship between two or more entities. In a nutshell, networks is a system with nodes connected by linkages. The best part of any conference is all about networking.
  • This is exactly how we define the network structure for any problem when we are dealing with network analysis in real time. Here are some real time examples of a network analysis. All of these complex network analysis can be understanding better if we see through the lens of anetwork.
  • The centrality aims to identify the most important node in a network. Different nodes could be considered as important depending upon how importance it is defined. Each flavor defines the importance of a nodes in a different way. So that's all about indicators.
  • Financial networks analysis been on the research agenda since the financial crisis of 2008. Financial networks has become an active topic not only in data science, but also in finance. There are some major areas of interest and applications for the study of financial networks.
  • Financial data is a very complex data. How we actually deal with this complex data or know how we build better networks with this financial data is where the power of Python graphs come into the picture. Let's straightaway jump into action to see some real time financial network analysis.
  • Here is a code notebook which I have created for this demonstration. This notebook has been categorized into two sections. In the first section we'll take some sample stocks and do basic network analysis. The second section will deep dive into financial network analysis and build some interesting visuals.
  • So let's start with the basics of network analysis here. I'm combining both Tesla and Google data frames with pandas concrete function. Finally we'll plot our network graph.
  • Our data starts from no 11th January 2013 and it ends with December 10, 2017. We have asset prices for this period. Once I execute this, so you can see here that now the date is can index and we can see all our asset classes. Cool.
  • For calculating correlation matrix and comparing correlation between assets, we actually first need to convert our asset prices into daily log return. This allows us to compare the expected return between two assets much easily.
  • Now we are good to proceed for calculating correlation matrix. Let's try to visualize this correlation matrix and try to understand the insights through division. Traditional way of visualizing the correlation matrix is usually heat map. We can clearly see that which assets are behaving are close or similar to each other.
  • The heat map is color coded. For example, ETFs like ETF assets like EwI, EWQ, Em, EWJ are all these are highly correlated. Riskier asset classes. We'll further investigate these findings and insights with network graphs.
  • Non networks is one of the most popular python library for doing complex network analysis. In order to analyze correlations in a network, we need to convert our correlation matrix into an edge list. Now let's visualize our network. All these plots look pretty fancy but they actually fail to convey the information.
  • Now firstly, I'm removing edges. I really want to cut these unwanted edges in the graph so that I know my graph shows more meaningful information. Next we are identifying the positive and negative correlation. All these correlations you'll get to understand when you see the final plot.
  • Most of the asset classes are here are strongly correlated. The only thing which we are not able to figure out from this is which assets are similar to each other in terms of correlation to nodes. So we can improve this visual by taking a different layout approach. We can also use a minimum spanning tree to help us identify our goal.
  • In this talk we have seen how the history and graphs have been came to the picture. And we also understood why the power of Python graphs, why network X and why Python is so powerful for doing the complex network analysis. If you have any questions, feel free to ping me on the platform.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, I am Kalyan Prasad and I'm going to talk about financial network analysis using Python. Thank you so much for joining my talk and I am really so excited to be here today. Some cheap marketing. I am a self taught data scientist and analytics manager. Yeah, of course I'm a community person. I love being involved with different communities and I try to help those communities as much as I can. Currently I'm associated with following organizations called Pycon India, Pycon Hyderabad, Hyde, PI and humans for AI where I perform different roles and responsibilities. In all of these organizations I always love to give back to community, so I always look for an opportunities to share my knowledge and I also do mentoring in hackathons and also in other community activities. So these are my social platforms. Feel free to follow or connect to me and in case if you have any feedback or suggestions or anything for me, feel free to write me. I'll be responding to each and every message that's pretty much about me. So here is the outline for today's talk. We'll start understanding history and graph. Then we'll talk about what are networks and how we can construct the network structure. And then we'll see the financial evolution networks. And then we'll try to understand the power of an importance of python graph. And then we'll straightaway see some actions on financial network analysis with two different case studies. So without any further delays, let's get started. History and graph as we all know that data visualizations is a powerful way to simplify and interpret the underlying patterns in the data. The use of graphs is one such visualization technique and it is incredibly useful and help business for making better data driven decisions. Now, what exactly the graphs are? In order to understand the concept of graphs, we first need to understand the concept called graph theory. So here I'll be quickly talking about the origin of graph theory to get a better understanding of graphs. Graphs were first introduced in the 18th century by swiss mathematician called Leonard Euler. So here's the one has attempt and ultimate solution to the famous coins with Mitch problem which you are seeing here are generally referred as an origin of graph theory. So we'll try to understand what exactly the Coinsburg bridge problem is and how Euler has solved that problem and how the origin of graph theory has been raised. So first things first. So Coinsburg bridge has four main areas and seven bridges. The question asked here was pretty straightforward. Can you cross each bridge only once and return to the starting points? So while you are creating bridge, you should keep two things in mind. First one is you should not uncross any bridge. Second is each bridge should not be crossed more than once. So, Euler insight for this problem here has the only relevant data is main areas and bridges connecting them, meaning that Ehler recognized the relevant constraints are four main areas and seven bridges. Then he has drawn a first visual representation of a modern graph which you can see here. So this graphs basically represents a set of points which are known as nodes that connected to set of connecting lines which are known as. So this was the problem and this was these insight he has shared. Later, after experimenting with multiple graph, with alternating the number of nodes and edges, Eulera has abstracted this problem and created a very generic rule case on the nodes and relationships that apply to any connecting system which you can see here. So you can see the nodes and relationships that can be applied to any connecting system. So from there, the origin of graph theory has been demanded for decades. In modern times, graph algorithm, graph applications and graph analytics has been booming and exploiting in multiple industries. Now. Now, what are networks? Network data are generated when we consider relationship between two or more entities in the data, like highways, connections, cities or friendship between people or their phone calls. In recent times, a huge number of network data are being generated and analyzed in multiple fields. For example, in sociology, there is a huge interest in analyzing block networks which can be built based on the citation to look for the discussion in the structure between political correlations. Networks has been extensively studied in the graph theory, an area of mathematics. So networks are known as a graph in mathematics. In a nutshell, networks is a system with nodes connected by linkages. A node can be a person, firm, industry, or even a geographical area. Correspondingly, different types of relationships are represented as linkages. Each nodes and edges can hold specific properties which describe its characteristics. Don't worry if you are not able to catch what exactly the node is or what exactly the edges and how they can hold specific properties and all those stuff. I'll try to explain all these things with an interesting example in the next slide. As I mentioned earlier, that network consists of two main items, nodes and edges, which generally form a network or graph networks are also associated with a metadata, meaning that networks can hold some metadata with them. Now let's try to understand the network structure with some interesting example. So the best part of any conference is all about networking. Either it can be a physical conference or a virtual conference. We all love to do networking. Do you agree with me or not? I'm sure you will definitely agree with this statement. In a conference we all love to do networking. We all love to connect people. We all love to build relationship friendships. So considering the same conference example. So I'll try to explain this network structure. So let's say that Kalyan and Mark, who are two friends and these are connected on 20 been January 2022 at conference conference. So the nodes here are the Kalyan and mark, and they also have a metadata associated, which are stored as a key value pair back to python dictionaries, where we have a key value pairs. So the key, these are the age and location, and the values here are the number and the country and the conference friendship is represented as a line between nodes. And it also has a metadata associated with data, which is known as a date, which meaning that the date when we actually first connected. So this is how our friendship has been built through conflict network. So that is why I named it as confirm network. So, coming to the exact point of a networks structure here, this is exactly how we define the network structure for any problem when we are dealing with network analysis in real time. I hope you got a better understanding on the network structure. So here are some real time examples of a network analysis. First is social networks like Facebook, Instagram, Twitter. So in social networks, we model the relationship between people, for example, we try to identify these influences in social media, and we model the relationship between those influences. So those sort of analysis we do in social networks when it comes to biological network. So in a human disease network, we study that when two diseases are linked, at least if they share, if they try to share at least one common gene. So those kind of studies we do in human network analysis, when it comes to financial networks, we study the correlation between stock based on their daily prices or any other parameters. And there are also many other examples in different domains. So all of these complex network analysis can be understanding better if we see through the lens of a network. So I believe that with this, you got a pretty fair understanding and connection about graphs, theory, networks and data science. So next we have is indicators. So, indicators are very much important in network analysis. The crucial thing in network analysis is to identify the important nodes in a network. This is known as measuring the centrality network. So the centrality aims to identify the most important node in a network. So in a simple terms, how central our node is within patent graph. So different nodes could be considered as important depending upon how importance it is defined. And centrality also has a different flavor and each flavors become. Each flavor defines the importance of a node in a different way, which leads to an inequity of measuring centrality. So some of the most commonly and often used in real time flavors are degree centrality, closeness centrality between a centrality. I'll quickly talk about all these flavors. Again, I'm not going into in detail about each of these flavors because it goes again beyond the scope of the stock degree centrality. So as the name mentioned, that a nodes node with a higher degree has a higher centrality, meaning that the higher the degree of a node these more important it is in graph. So that is why we call it most connecting node closeness centrality. So this centrality calculates the shortest path between all nodes and assigns a score for each node based on the sum of its shortest path. So it is a fastest communicating node. And finally, between a centrality measure the number of times a node lies on the shortest path between other nodes and it represents the degree of which nodes stands between each other. So this is the most influential nodes in a graph. So that's all about indicators. And next is the most awaiting and important topic in our talk, which is a financial evolution. Networks financial networks analysis been on the research agenda since the financial crisis of 2008. So the crisis has played a huge role in leveraging the understanding of a financial network. So after 2008 crisis, many economists have come around to the view the very network architecture of a financial systems plays a central role in shaping system risk. In fact, many of ensuing policy actions has been motivated from these insights. So as a result of those insights, network science concepts has been cross applied to finance field after 2008 crisis. From there, financial networks has got into a full swing and it has then become an active topic not only in data science, but also in finance. There are some major areas of interest and applications for the study of financial networks. For example, interbank networks, stock correlation networks, agent based models, and there are also many other different applications in financial networks. So in our talks, we are dealing with stock correlation networks. So we'll see the stock correlation networks in real time. And there are also several studies and research has been conducted for studying the stock correlation network. And the research and studies are still on and they're also trying to find out even more better techniques for studying the stock correlation networks. So far, the stock correlation network has proven its efficiency in predicting market movement, which is a very positive news and great in financial networks. Now, as we all know, that financial data is a very complex data. So how we actually deal with this complex data or know how we build better networks with this financial data. So this is where the power of Python graphs come into the picture. Now, why Python? So Python is a general purpose and high level programming language whose design philosophy emphasize nodes, readability, clear syntax, dynamic typing and strong online community and numerous libraries and fast prototyping. And it also has expressive features. So that is why Python is so powerful. Now in order to create a powerful graphs, we need to have a software. So network X is a very good software, a high productive software for doing a complex network analysis. And this software is very flexible where in roots can be a hashable subjects in Python. So it can be a text, images or XML records and it just can be an arbitrary data. So maybe it can be a weights or no realtime change data. So this software is a treasure true of a graph algorithm, meaning that we can build many standard graph algorithm and we can solve many complex problems with this software. And it is very easy to use. So I think I have given enough download on theoretical part. Let's straightaway jump into action to see some real time financial network analysis. Let me quickly switch to my code notebook. Okay. All right, so here is my code notebook. So here is a code notebook which I have created for this demonstration. So considering the time constraint, so I have executed the entire code. But don't worry, I'll explain each and every point in the code so that you'll get a better understanding of this concept. So installed a couple of libraries to satisfy this demonstration. So installed network x. And I've also installed Yahoo finance to crawl some data from Yahoo finance. So basically this notebook has been categorized into two sections. In the first section we'll take some sample stocks and do basic network analysis. In the second section we'll take some asset prices and we'll deep dive into financial network analysis and build some interesting visuals. And we'll find out some interesting insights from those visuals. So let's start with these section one now. So as usual, we have imported necessary subjects here. So a couple of libraries have imported and then I'm loading my data here. So for the first equity, so I created a variable called ticker here. And then I'm creating a ticker object and passing a ticker called Tesla. So these equity, with the first equity which I've selected here is a Tesla. So once I execute this, we got a ticker object for Tesla. So with this ticker object, we can access the entire information related to. So I've created a variable again, the Tesla, and then I'm passing my ticket object and I want to access these institutional holders of the Tesla. So once I execute this, there we got institutional holders and its shares and the values of the. So all these companies are the institutional holders of the Tesla. So meaning that all these companies have some part of ownership in Tesla stock. Now next, what I'm doing is I'm adding a new column to my data frame which represented the ticker symbol of. So the reason behind for adding these company column is for easy mapping purpose when we build a network graphs, let's say. So what I mean exactly here is for example if BlackRock is holding so much of this much of shares and it has this much of a value and this company is mapped to Tesla. So for that kind of understanding level. So I'm creating a company column here. So this is our clear data frame on the Tesla stock. Next I'm taking another equity under stock. So this time I'm selecting Google stock here. So again I have created a ticker object for Google. And then again I'm requesting the institutional holds of the Google. And again I'm also adding the company column to this Google so that we got a clear entire data frame of the institutional holders of Google and its respective cool. So then next what I'm doing is I'm combining these two data frames. So I'm combining both Tesla and Google data frames with pandas concrete function. So I create a variable called combined and I'm calling a pandas concrete function. I'm passing these two data frames. So once I executed this so that we got can entire data frame of both Google and Tesla and its respective institutional. So far it is good. So we'll start with the basics of network analysis here. So I've created a variable called p and then I'm passing a function here called nx from underscore pandas edge list. So network X has a pretty handy function when we are dealing with the data frame which is called nx from underscore panels edge list where I'm passing my data frame and I'm also giving my source and also giving my target here. The source here is institutional folders and the target here is a company. So we want to map against each company and its institutional folders. So once I executed this, we got a network graph object here and you can also see the nodes in our graphs. So the nodes here are the Tesla. The node here is these Tesla and its respective institutional holders. Respective institutional holders of the Tesla. And if you also see the edges and you can also see the Tesla. And the vanguard is for TeSLa. And the vanguard is also for booming. So these are the edges list of these our graph. So finally we'll plot our network graph. So network X has a function for plotting is an exit draw. These we need to pass the edges list which we have created, which is called p. And I'm also passing a labels called true, which means that I want labels to be showcased on my graph. So once I plot this, you can see that you can see a network graph here. So we got our nodes and engines. But what we can do is for me it is very easy to understand what are nodes and what are edges in this graph. But if I show you this point of time, maybe if I show you or maybe any person who is just looking at these first time of this graphs so he don't understand what are nodes and what are edges in this graph. So for that, what I'm doing is I'll make this plot much clearer by adding colors here. So I'm adding colors. So for that what I'm doing is I've created an empty list called colors where I'm quickly doing a loop here that whether if my combined data friend company has values, then I want that to be a red color and it should be appended to my color list. Otherwise it should be showcased as a green. So again we are plotting this final plot draw function where I'm passing my edge list labels. This time I'm also passing these node colors because we have correlated a list here, colors. So I'm passing the node colors equal to colors. So once I execute this code, you can see that. So our nodes are in red colors. So nodes here are the Google and Tesla, and our edges are all the institutional holders. So you can clearly see that TeSLa and its respective institutional holders, Google and its respective institutional holders. So with this, what we can do is we can also further expand our analysis by identifying who are the majority institutional holders in Tesla or Google. And we can also do some correlation and compare between the institutional holders. And we can also identify the top five or non top ten institutional holders in Tesla or Google. For example, if you see here, Tesla is holding the management group LLP and LLP, but the management group LLP institutional may not be these top five or top ten. So you can identify those kind of insights if you further expand your analysis. But for now, we keep things simple. And my main objective for this section is to show you how we can do basic network analysis with the financial stock. But definitely what I can suggest you is you can definitely give it a shot and try to analyze some interesting insights from these graphs. So with this, I'll conclude my first section part. And then now we'll jump into our second section in this section. So we're taking these ETF prices, which are nothing but SS prices over a period of time, and we dive into financial network analysis and we'll come through with very meaningful visuals and insights from those network graphs, as usual. Again, I've imported can suspect libraries here for this case. And the objective from this data set is to identify the correlation between asset classes. And so for that, in order to achieve that, we need to analyze and visualize the relationship between our asset classes. So, which we'll be doing now, and you'll see it in a while. So again, I am loading the data here. So I created a label, sorry, variable called ETF, where I'm reading my data, asset price data, and once I execute this, so I'll got my ETF prices here. So ETF is nothing but can exchange traded fund. So it is an indicator of a security. So ETF may including different type of investment securities. Like it includes stocks, it includes bonds, it includes commodities, currencies, or also some type of different investments as well. All those kind of securities assets are included in exchange rate fund data. If you see here, we have a 40 columns and 1013 rows, meaning that we have 40 different asset classes and we have a 1013 rows for each asset class. Okay, cool. So next is we are converting these time to eight subjects. So often when we are dealing with a timesheet data, we first initially need to check that our date is in these right data type and we need to set the date as an index. So this is exactly what I'm doing here. So I'm creating efs date these again passing a function called PD two underscore datatime function where I'm converting the date column from an object which is in a string format, and I'm converting that to date time object and then I'm setting the date column as an index. So once I execute this, so you can see here that now the date is can index and we can see all our asset classes. Cool. And if you want to understand what is the start and end period of our data, you can see that our data starts from no 11th January 2013 and it ends with December 10, 2017. We have asset prices for this period. Cool. So next, what we are doing is we are converting into lock daily returns. So what is a log return? What do you mean by log return? So it is a way of calculating the rate of return on investment before we actually proceed. For calculating correlation matrix and comparing correlation between assets, we actually first need to convert our asset prices into daily log return. So the reason behind for doing it allows us to compare the expected return between two assets much easily. So that is the reason why we are converting our no asset prices into daily log return. So what I'm doing is I'm creating an empty data frame for log written these what I'm doing is we want to calculate daily log return on each asset. So what I'm doing is I'm quickly looping each columns in my data frame and calculating a daily log written in my data frame. And finally what I'm using is finally I'm passing all the daily log returns to my log written data frame. So once if I execute this, you can see that we got a daily log written values for all our asset classes. So you can see that different daily log written values for all our asset classes. Cool. Now we are good to proceed for calculating correlation matrix. So I've created a variable called correlation matrix. And what we are doing is these, we are doing a pairwise correlation by using a built in pandas function called core. So I am calling my log written data frame and then I'm passing a function called core. So once I execute this, so you can see the correlation values of our asset classes instead of looking at this correlation values instead of looking at this number. So let's try to visualize this correlation matrix and try to understand the insights through division. So has we all know that the traditional way of visualizing the correlation matrix is usually heat map. So this is exactly how I do in real time. So when comparing correlation matrix, simply I plot through the heat map and I try to analyze the correlations which are positive and which are negative correlations. So that's exactly how I do. So I did the same thing here. So written some HTML styling for my cluster map. So what I'm doing is I've taken a cBond cluster map which visualize the matrix as a heat map and it also identifies the cluster of our assets so that we can see that which assets are similar to each other. So we can clearly see that which assets are behaving are close or similar to each other. So we'll see that here once I plot this, so we can clearly see that we got our cluster heat map on correlation between ETF price returns. So first things first. So the heat map is color coded here and you can see the dark blue color which is highlighting here indicates that there's a strong correlation where the correlation equals to value equals to one. And the yellow color here clearly highlights that it is uncorrelated. Where the correlation value equals to zero and the color with the red, it is a negative correlation where the correlation value equals to minus one. And if you also observe this meat map, we can see some interesting insights here. For example, ETFs like ETF assets like EwI, EWQ, Em, EWJ are all these are highly correlated and you can see that they are close to each other. Just like if you can also see that EWB and NLU, so all these are strongly correlated assets. And if you see the ETF like Pxx which is another ETF asset, where you can see that it is negatively correlated into equities here, negative correlated into equities. And if you also look at that FxY currency here, FXC which is japanese, you can also see the japanese currency which is moving into the opposite direction here. And you can also see that all these are riskier asset classes. Riskier asset classes. The heat map here is conveying one dimension information. We are only able to see the distance of these correlation between assets, but if we want to see how the volatility between assets and how the analyzed return between asset class are performing, so we are not able to find such things in this heat map. So what we can do here is we'll take all these insights and findings from this heat map and we'll further investigate these findings and insights with network graphs, and we'll try to build some meaningful visuals and find insights in a more meaningful way. Let's see those things now. Next we'll see the financial network analysis using network tech. So as I mentioned, that non networks is one of the most popular python library for doing complex network analysis. So in order to analyze correlations in a network, so we need to convert our correlation matrix into an edge list so that we can easily create graphs and compare the correlation. So what I'm doing here is I'm converting the correlation matrix into can edges list and renaming the column. So created a variable called edges where I'm converting through converting my correlation matrix and resetting its index as well, and then also renaming the columns. So I'm giving different nodes for our asset classes, first one, SS two and correlation, and then finally if we execute this. So you can see that our edge list data frame here. So the nodes here are asset classes, and the connection between nodes which are known as edges, are these numeric value, which is a correlation value here, and these values are corresponding to the correlation between their respective paid nodes here. So we have successfully created an edges list now. So with this address we can create a graph I'm creating a variable called g, as you have seen earlier in the first section about this function called x from underscore pandas edges list, which is a function which takes function which helps us to create a graph from the edge list. So these, I'm passing my edges list here, which is known as edges, and I'm also passing my source and target. This is true. I'm also passing the sources here, the target edges attribute, which is a correlation here. And if you look at the information of our graphs here, so our graph contains 39 nodes and 741 edges, meaning that we have 39 asset classes and these 741 connections for those asset classes. Now let's visualize our network. What I'm doing is I'm creating a subplots here, creating a subjects here, and also passing some maplot properties. Probably you can break it when you export the documentation and also creating layouts here. So I'm creating a different layouts here and I want all these layouts to be plotted on my graph separately. So for that I am quickly writing a conditional statement here. And then I'm calling my network plotting function which is nx draft. I am passing my edges list and I'm also passing the labels here and also giving the nodes size. I'm also giving a node color. So probably you can check all these nodes colors and edit colors in the documentation page or maybe you can find it in Google has well, and giving a layout and also giving can access here. And I'm also giving a title for each of the layouts. And once I execute this, you can see the four different layout plots on each of these graph. So let me quickly show you this. So you can see that we got a circular layout, we got a random layout, and we got a string layout, and we also got a spec layout. All these plots look pretty fancy if you observe them and they looks pretty fancy, but they actually fail to convey the information which we are actually looking from our network graph. So the main thing which we are looking in our network graphs is to be able to identify the correlation in between assets. But these plots are actually failing to failing or fail to showcase them. Exactly. So what we can do is we can improve these plots. We can improve these plots by taking certain steps and approaches so that we can build a meaningful network graph. And so we'll see how we can do that. Now firstly, I'm removing edges. So I really want to cut these unwanted edges in the graph so that I know my graph shows more meaningful information. So for that, in order to remove the edges. What I'm using is I'm giving a minimum correlation point to remove the edges in the graph. So the point here is the 0.5 which I'm taking. And then again I'm creating a new edge list here where I'm passing my edges source and target as well. And then I'm creating a list to store the remote edges here. So maybe you can also use it in. But again I'm creating here I want to store everything in these remote list. So for that what I'm doing is I'm quickly looping through my edge list and finding out the correlation which are below my correlation point. So if my absolute correlation is less than my given point, I want all the edges to be appended to this remove list. Then. And finally what I'm doing is I'm adding all the removed edges to this remove edges. So if we execute this, you can see that total 530 edges were removed. So if you see earlier that we have 741 edges in the graphs, now we have only five. So we have removed 530 edges from our graph. So we have removed all unnecessary edges from the graph. So you'll get to see that when we plot final pictures. Next, what I'm doing is I'm doing some styling here. So why I'm, the reason behind for doing styling is to show my plot more meaningful. So the styling these is not here to show give you some fancy stuff. So it is more to show you some meaningful stuff in the vision. So for that what I'm doing is I've written simple custom functions here in order to avoid and creating multiple lines of code. First thing is I have quickly defined a function called selecting color where I'm passing a correlation parameter. If a correlation is less than or equal to zero, I want my color to be written as a red, otherwise it should be green. Then again, same with selecting thickness, selecting the thickness in the nodes and edges. So what I'm doing is I'm passing a parameter of correlation, giving the parameters called benchmark thickness scaling factor. And I'm written that I want to return it as benchmark thickness into absolute correlation. Then it should be an exponential to the scaling factor. So all these correlations you'll get to understand when you see the final plot. So don't get confused or scared by looking at all these things. And same with the node size as well. So I've written custom function for node size as well. Cool. So let me execute these then. Next we are identifying the positive and negative correlation. So it is important to identify which assets are positively correlated and which assets are negatively correlated in our asset classes. So in order to identify those positive and negative correlations. So h colors will be help us two selecting because they will help us to select defining upon the positive whether it is a positive or negative correlation. So for that what I'm doing here is I have created an empty list called edges color and edge width. Then where I'm written a conditional statement called where for key value in our Nx get attributes correlation items. So if this condition is satisfied, I want my select color values to be appended two this edge color. So the select color is nothing but the custom function which we have written here. So I'm passing the same custom function value and so this shouldn't be appended if this condition is satisfied to this edges color. The same goes with a selective thickness. So it will be automatically appended to this edge with list if this condition is satisfied. And finally, I'm also doing the same thing for node size, so assigning the node side depending upon the number of connections, making that these more number of a connection we have, the size of a node will be that much big and it shows that how much number of strong correlations it has. So you'll see that has, well, you'll see that as well when we go to visual same thing. So let me execute this one. And now it's time for our final graph. So we have taken different steps. So in order to improve our final graphs, let's see whether this will definitely help us to identify our goal or we are able to draw some meaningful insights from overnight paragraph or not. Again, creating a fixed size here and passing a font size and then calling my product function with I'm passing my parameter and also passing a layout. This time I want only the circular layout here and given node labels and given the node sizes, nodes size list which we have created above here, the list which we have created same goes with the edge color, same goes with the edges width as well. So passing these same the list which we have created here, passing the same here. And then I'm creating a title price correlation since we are understanding the correlations. And so once I execute this one, you can see that price correlation graph. So let's try to understand what all the changes we have made in this graph comparatively with what we have seen above. So firstly, I have removed these edges with the weak correlations and we have kept only the edges which have strong and significant correlations. And secondly, we have also added colors to indicate the positive and negative correlation. So all the positive correlations are in a green color and all the negative correlations are in a red color. So, which you can see here, the green color indicates positive correlation and the red color indicates the negative correlation. Here we can also see the relative strength of a correlation between nodes. And we have also adjusted these size of a node, which represented the number of strong correlations between the nodes with. So for example, if you see that VGT vanguard has vanguard size is pretty big and it has quite a strong correlations with others in the network. And if you also look at dia, which these, the node size is also big, where it also has some strong correlations. And same goes with ebod here and same goes with xlk here. So all these are strong correlations in the networks, comparatively with these others in the network. And if you also look at the graph, if you also look at the graph, majority are strongly correlated. So most of the asset classes are here are strongly correlated. And if you also observe the small nodes, like for example GDx or XLU or Fxf, so all these are different etfs, they are negatively correlated with other assets. So all these are negatively correlated with other assets. The only thing which we are not able to figure out from this, which assets are similar to each other in terms of correlation to nodes. So this is the only thing which we are not able to figure it out in this network. So for that, in order to identify that, what we can do is we can further improve this visual by taking a different layout approach. So let's see that now here what I'm doing is I'm taking a layout called Fretcherman Rainbow dayout. Basically, this layout will basically cluster this layout. What I will do is this layout basically cluster the nodes which are strongly correlated to each other. And it allows us to identify these group of assets with similar properties. So let's see how it showcase now. So I'm calling an x raw function, where I'm passing my parameter x list and this time I'm giving a fetch membrane goal layout and restaurant. Other parameters remain the same, which we have seen. But so once I execute this, you can clearly see that how it has clustered the nodes which has a strong correlation between each other. And we can also see that it has clearly identified these group of assets with similar properties. For example, if you take GLD, which is commodity, it has been successfully grouped with similar properties. And same goes with, for example BND, which is all these bond etfs, which have been successfully grouped with respective similar properties. And same goes with here it is a group of. So this is quite a large group of cluster of equities and it has been successfully mapped with their similar properties. This is pretty cool, but the only glitch in this entire visual is so the labels, these are overlapping in these large cluster group assets. And we can also see that we are not able two see these nodes as well, clearly because they are quite packed. So what we can do is we can quickly improve this visual by taking an approach method called minimum spanning tree. So what exactly is a minimum spanning tree? So these minimum spanning tree is a very famous and often used in financial network analysis. So what exactly these minimum spanning. So minimum spanning tree. So what exactly the minimum spanning tree will do? So minimum spanning tree will minimize these edges in the graph edges and it reduces all the clutter it years that it removes all the clutters in the network. So we'll see how our minimum spanning tree help us to identify our insight or help us to identify our goal. So these, I'm creating a minimum spanning tree here. Again, I'm adding colors to my minimum spanning tree and then I'm creating my minimum spanning these here and calling my plotting function. And the best part here is networks has a built in function, built in function which calculates minimum spanning tree for us. So here I'm passing building function here and passing labels. And layout is again feature mandatory layout because this layout help us to identify the group of assets with the similar properties here and we can quickly identify the correlation with this layout. So I'm using the same thing here, install other parameters remain the same given the title here. Now you can clearly see that how it has been removed the clutches. And so our minimum spanning these looks more readable and it has successfully removed these unnecessary edges and unwanted nodes from our graph. And it is more readable now. Now you can clearly see the group of cluster of equities here with the similar properties and same with our commodities bonds and also currencies here structure is very clear and we have successfully able to identifies the correlation between assets. And we have also seen the group of assets with these similar properties with our graphs. So with this I'll conclude my section two part and to summarize the things. So in this talk we have seen that how the history and graphs have been came to the picture. And we have seen what are networks and how two define the network structure. And we have also been how the financial network evolution came into the picture. And we also understood why the power of Python graphs, why network X and why Python is so powerful for doing the complex network analysis and coming to hands on part, we have seen two sections. In first section we have done some basic network analysis on financial data, and in these second section we have taken the ETF prices and we have deep dived into our network analysis where we have seen the asset correlation. Initially, we have seen the asset correlation with our heat map, and we have find out some interesting insights and issues in the heat map. And we have further investigated and further investigated and analyzed with our network graphs. And we have also seen some potential issues with our fancy graphs and where we have improved those fancy graphs by taking certain steps and different approaches. And finally, we have seen that finally we have seen that different layout approaches, two identifies to gain our final core, where we have seen that correlation between our asset classes and what are the positive and negative correlations? And we have seen remove the unnecessary edges. And we have also seen the group of assets with similar properties. So all these things have been achieved through our network graphs with the power of Python and Network X. Yeah, so that's all I have in my plate today. And let me quickly jump to my slides here. So these are some of the great references if you want to study about network analysis and graph theory. So feel free to check them out. And I really appreciate you all for being patient and listening my talk. And if you have any questions, feel free to ping me on the platform. I'll be addressing each and every question. And thank you so much for having me today. Have a great day.
...

Kalyan Prasad

Data Scientist & Analytics Manager @ Creative Crewz

Kalyan Prasad's LinkedIn account Kalyan Prasad's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways