Web Crawler

Ayush Vyas

0

Graduate Student

Python

Web Crawler BFS of Ontario Tech University Web Page using Python BeautifulSoup and Gephi to visualize all the nodes.
Analysis:
Nodes: 4127
Edges: 62021
Directed Graph
Average Degree: 15.028
Diameter: 3
Average Path length: 2.4249902310359985
Density: 0.004
Average Clustering Coefficient: 0.205
Number of Weakly Connected Components: 1Number of Strongly Connected Components: 4127
Top 10 nodes having most Betweenness Centrality:
Nodes/Pages having the highest betweenness centrality
Nodes/Pages having the highest betweenness centrality
Top 10 nodes having highest degree i.e., Degree Centrality:
Nodes/Pages having highest degree centrality
Nodes/Pages having highest degree centrality
From this graph we can see that there are a lot of communities, 18 to be exact and the assorted color represent the different communities. I have used force atlas 2 layout to make the graph look more beautiful. The graph is directed as I had to crawl from one main node that is “ontariotechu.ca” and implement a BFS algorithm. After cleaning the data using check_link function I got around 4127 nodes and 62K edges. Wherein the highest number of degrees is 422 and policy page has the highest betweenness centrality around the network. The average path length is 2.42 which is the number required to move from one page to another page. The average clustering coefficient is 0.205.
Like this project
0

Posted Jul 27, 2022

Web Crawler BFS of Ontario Tech University Web Page using Python BeautifulSoup and Gephi to visualize all the nodes.

Likes

0

Views

9

Clients

University of Ontario Institute of Technology

Tags

Graduate Student

Python

Ayush Vyas

IT Security Analyst, PenTester, WordPress Dev

Network Analysis on Street Networks
Network Analysis on Street Networks
Fake news detection using ML
Fake news detection using ML