Link to the original text:http://tecdat.cn/?p=4012
Taking the twitter data captured by R language as an example, we conduct text mining on the data, and further conduct emotional analysis, so as to get a lot of interesting information.
tweets <-tweets_df>%select(id, statusSource, text, created) %>% extract(statusSource, "source", "Twitter for (.*?)<")>%filter(source %in%c("iPhone", "Android"))
The data is visualized to calculate the proportion of tweets corresponding to different times.
And compare the number of tweets on Android and apple phones.
From the comparison chart, we can find that there is a significant difference in the time of tweeting between Android mobile phone and apple mobile phone. Android mobile phone tends to tweet between 5:00 and 10:00, while Apple mobile phone generally tweets between 10:00 and 20:00. At the same time, we can see that the number of tweets released by Android phones is higher than that of Apple phones.
Then check whether there are references in twitter, and compare the number on different platforms.
ggplot(aes(source, n, fill = quoted)) + geom_bar(stat ="identity", position ="dodge") + labs(x ="", y ="Number of tweets", fill ="")
From the results of comparison, the proportion of Android phones not cited is significantly lower than that of Apple phones. The number of Android phone references is significantly larger than that of Apple phone. Therefore, it can be considered that most of the tweets sent by Apple mobile phone are original, while most of the tweets sent by Android mobile phone are quoted.
Then check whether there are links or pictures in twitter, and compare the situation of different platforms
ggplot(tweet_picture_counts, aes(source, n, fill = picture)) + geom_bar(stat ="identity", position ="dodge") + labs(x ="",
From the above comparison chart, we can see that there are more Android phones without pictures or links than apple. That is to say, Apple users usually post photos or links when they tweet.
At the same time, it can be seen that users of Android platform generally do not use pictures or links on twitter, while users of Apple mobile phone do the opposite.
spr <-tweet_picture_counts>%spread(source, n) %>% mutate_each(funs(. /sum(.)), Android, iPhone) rr <-spr$iPhone /spr$Android
Then we detect the abnormal characters in twitter and delete them. Then we find the keywords in twitter and sort them according to the number
reg <- "([^A-Za-zd#@']|'(?![A-Za-zd#@])) "tweet_words <-tweets>%filter(!str_detect(text, '^"')) %>%m utate(text =str_replace_all(text, "https://t.co/[A-Za-zd]+|&", "")) %>% unnest_tokens(word, text, token ="regex", pattern = reg) %>% filter(!word %in%stop_words$word,str_detect(word, "[a-z]")) tweet_words %>%count(word, sort =TRUE) %>%head(20) %>% mutate(word =reorder(word, n)) %>%ggplot(aes(word, n)) +geom_b
Emotional analysis of the data, and calculate the relative influence ratio of Android and apple mobile phone.
The emotional ratios of different platforms are calculated and visualized by the emotional tendency of feature words.
After counting the number of words with different emotional tendencies, draw their confidence interval. As can be seen from the above figure, compared with apple phones, Android phones have the most negative emotions, followed by disgust and sadness. There is little tendency to express positive emotions.
Then we count the number of keywords in each emotion category.
android_iphone_ratios %>%inner_join(nrc, by ="word") %>% filter(!sentiment %in%c("positive", "negative")) %>% mutate(sentiment =reorder(sentiment, -logratio),word =reorder(word, -logratio)) %>%
From the results, we can see that most negative words appear on Android phones, while the number of negative words on Apple phones is far less than that on Android platforms.
Most popular insights