Original source:Tuo end data tribal official account
Create topic network
Research publications in Social Sciences, computers and informatics by analyzing texts and co-author social networks.
One of the questions I have encountered is: how to measure the relationship (relevance) between themes? I want to create a web visualization that connects similar topics and helps users browse a large number of topics more easily.
As an alternative to loading files, you can use the output of the LDA function of the topic models package to create word topic and document topic matrices.
# Loaded into the author topic matrix, the first column is words author.topic <- read.csv("topics.csv", stringsAsFactors = F) # Loaded into the word topic matrix, the first column is the word # Rename theme colnames(author.topic) <- c("author\_name",name$topic\_name)
Unlike the standard LDA, I run an “author centric” LDA in which the abstracts of all authors are merged and treated as a document for each author. This is because my ultimate goal is to use topic modeling as an information retrieval process to determine the expertise of researchers.
Create static network
In the next step, I use the correlation between the word probabilities of each topic to create a network.
First, I decided to keep only relationships (edges) with significant correlation (0.2 + correlation). I use 0.2 because it has a statistical significance level of 0.05 for 100 observation samples.
cor_threshold <- .2 Next, we use the correlation matrix to create the iGraph data structure and delete all edges with a minimum threshold correlation less than 0.2. library(igraph) Let's draw a simple iGraph network. title( cex.main=.8)
Each number represents a topic, and each topic has a number to identify it.
Community detection, especially the label propagation algorithm in iGraph, is used to determine the clusters in the network.
clp <- cluster\_label\_prop(graph) class(clp)
The community test found 13 communities, as well as communities with multiple isolated themes (i.e. themes without any connection).
Similar to my initial observations, the algorithm found the three main clusters we identified in the first graph, but also added other smaller clusters, which do not seem to be suitable for any of the three main clusters.
V(graph)$community <- clp$membership V(graph)$degree <- degree(graph, v = V(graph))
In this section, we will use the visnetwork interactive network diagram.
First, let’s call the library and run the Visigraph interactive network, set up to run on the iGraph structure (graph).
We create the visnetwork data structure, and then divide the list into two data frames: nodes and edges.
data <- toVisNetworkData(graph)nodes <- data\[\[1\]\]
Delete unconnected nodes (Topics) (degree = 0).
nodes <- nodes\[nodes$degree != 0,\]
Add colors and other network parameters to improve the network.
library(RColorBrewer) col <- brewer.pal(12, "Set3")\[as.factor(nodes$community)\] nodes$shape <- "dot"s$betweenness))+.2)*20 # Node size nodes$color.highlight.background <- "orange"
Finally, we create our network with interactive charts. You can use the mouse wheel to zoom.
visNetwork(nodes, edges) %>%visOptions(highlightNearest = TRUE, selectedBy = "community", nodesIdSelection = TRUE)
First, there are two drop-down menus. The first drop-down list allows you to find any topic by name (the top five words by word probability).
The second drop-down list highlights the communities detected in our algorithm.
The three largest seem to be:
- Calculation (grey, Cluster 4)
- Social (green, blue, cluster 1)
- Health (yellow, cluster 2)
What is unique about the smaller communities detected? Can you explain?
Most popular insights