[big data tribe] R language carries out text sentiment analysis on twitter data

Time:2021-4-18

Link to the original text:http://tecdat.cn/?p=4012

Taking the twitter data captured by R language as an example, we conduct text mining on the data, and further conduct emotional analysis, so as to get a lot of interesting information.

Find out if the source of Twitter is a sample of Apple mobile phone or Android mobile phone, and clean up the samples from other sources.

tweets <-tweets_df>%select(id, statusSource, text, created) %>%
extract(statusSource, "source", "Twitter for (.*?)<")>%filter(source %in%c("iPhone", "Android"))

The data is visualized to calculate the proportion of tweets corresponding to different times.

And compare the number of tweets on Android and apple phones.

[big data tribe] R language carries out text sentiment analysis on twitter data

From the comparison chart, we can find that there is a significant difference in the time of tweeting between Android mobile phone and apple mobile phone. Android mobile phone tends to tweet between 5:00 and 10:00, while Apple mobile phone generally tweets between 10:00 and 20:00. At the same time, we can see that the number of tweets released by Android phones is higher than that of Apple phones.

Then check whether there are references in twitter, and compare the number on different platforms.

ggplot(aes(source, n, fill = quoted)) +
geom_bar(stat ="identity", position ="dodge") +
labs(x ="", y ="Number of tweets", fill ="") 

[big data tribe] R language carries out text sentiment analysis on twitter data

From the results of comparison, the proportion of Android phones not cited is significantly lower than that of Apple phones. The number of Android phone references is significantly larger than that of Apple phone. Therefore, it can be considered that most of the tweets sent by Apple mobile phone are original, while most of the tweets sent by Android mobile phone are quoted.

Then check whether there are links or pictures in twitter, and compare the situation of different platforms

ggplot(tweet_picture_counts, aes(source, n, fill = picture)) +
geom_bar(stat ="identity", position ="dodge") +
labs(x ="",

[big data tribe] R language carries out text sentiment analysis on twitter data

From the above comparison chart, we can see that there are more Android phones without pictures or links than apple. That is to say, Apple users usually post photos or links when they tweet.

At the same time, it can be seen that users of Android platform generally do not use pictures or links on twitter, while users of Apple mobile phone do the opposite.

spr <-tweet_picture_counts>%spread(source, n) %>%
mutate_each(funs(. /sum(.)), Android, iPhone)
rr <-spr$iPhone[2] /spr$Android[2]

Then we detect the abnormal characters in twitter and delete them. Then we find the keywords in twitter and sort them according to the number

reg <- "([^A-Za-zd#@']|'(?![A-Za-zd#@]))
"tweet_words <-tweets>%filter(!str_detect(text, '^"')) %>%m
utate(text =str_replace_all(text, "https://t.co/[A-Za-zd]+|&", "")) %>%
unnest_tokens(word, text, token ="regex", pattern = reg) %>%
filter(!word %in%stop_words$word,str_detect(word, "[a-z]"))


tweet_words %>%count(word, sort =TRUE) %>%head(20) %>%
mutate(word =reorder(word, n)) %>%ggplot(aes(word, n)) +geom_b

[big data tribe] R language carries out text sentiment analysis on twitter data

Emotional analysis of the data, and calculate the relative influence ratio of Android and apple mobile phone.

The emotional ratios of different platforms are calculated and visualized by the emotional tendency of feature words.

[big data tribe] R language carries out text sentiment analysis on twitter data

After counting the number of words with different emotional tendencies, draw their confidence interval. As can be seen from the above figure, compared with apple phones, Android phones have the most negative emotions, followed by disgust and sadness. There is little tendency to express positive emotions.

Then we count the number of keywords in each emotion category.

android_iphone_ratios %>%inner_join(nrc, by ="word") %>%
filter(!sentiment %in%c("positive", "negative")) %>%
mutate(sentiment =reorder(sentiment, -logratio),word =reorder(word, -logratio)) %>%

[big data tribe] R language carries out text sentiment analysis on twitter data

From the results, we can see that most negative words appear on Android phones, while the number of negative words on Apple phones is far less than that on Android platforms.


[big data tribe] R language carries out text sentiment analysis on twitter data

Most popular insights

1.Data side of data post demand

2.Research hotspots of big data journal articles

3.Machine learning boosts fast fashion accurate sales forecast

4.Recognition of changing stock market by machine learning — Application of hidden Markov model (HMM)

5.Data inventory: new trend of online consumption of household appliances

6.GAM (generalized additive model) is used to analyze power load time series in R language

7.Hupu forum Gene Exploration: social user behavior data insight

8.Grasp the data pulse of taxi driving

9.Smart door lock “hand cutting” data strategy

Recommended Today

Review of SQL Sever basic command

catalogue preface Installation of virtual machine Commands and operations Basic command syntax Case sensitive SQL keyword and function name Column and Index Names alias Too long to see? Space Database connection Connection of SSMS Connection of command line Database operation establish delete constraint integrity constraint Common constraints NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY DEFAULT […]