Analysis and visualization of Python wechat friend feature data


1、 Background and research status

In the process of the development of Internet in China, the PC Internet has become increasingly saturated, but the mobile Internet is showing a blowout development. Data show that by the end of 2013, China’s mobile Internet users had exceeded 500 million, accounting for 81%. With the decline of the price of mobile terminals and the wide spread of WiFi, mobile Internet users are showing an explosive trend.

Wechat has become an important tool to connect online and offline, virtual and reality, consumption and industry. It improves the conversion rate of o2o marketing users. In the past, when developing software, programmers often had to consider the language of different development environments, the adaptability of devices and the cost. Now, developers can develop applications in a “bottom layer of class operation”, breaking the previous restricted development environment.

2、 Research significance and purpose

With the rapid development of broadband wireless access technology and mobile terminal technology, people urgently hope to easily obtain information and services from the Internet anytime, anywhere and even in the process of mobile. Mobile Internet came into being and developed rapidly. However, mobile Internet still faces a series of challenges in mobile terminals, access network, application services, security and privacy protection. The research of its basic theory and key technology has important practical significance for the overall development of national information industry.

3、 Research content and data acquisition

After ordinary users scan the QR code on their mobile phones and confirm it on their mobile phones, wxpy automatically gets the list of friends from the user’s wechat web page, including friends’ nicknames, regions, personal signatures, gender and other information.

4、 Python Programming

#Analysis and visualization of wechat friend feature data
#1. Package guide operation
from wxpy import *
import re
import jieba
import numpy as np
from scipy.misc import imread
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from pylab import mpl
import pandas as pd

mpl.rcParams['font.sans-serif'] = ['SimHei']
from pyecharts.charts import Map
from pyecharts import options as opts

#2. Login operation
bot = Bot()
#List all friends who log in to your account
all_friends = bot.friends()

The official account of all accounts that the login account is concerned.
all_maps = bot.mps()
Print ("all friends list", all_ maps)

#Get the group chat list of the current login account
all_groups = bot.groups()
Print ("all group chat list", all_ groups)

#Search friends by their note name
#Myfriend = bot. Friends(). Search ('xu Kuan ') [0]
#Print ("search friends:" myfriend)

#Search for friends and send messages
#Bot. Friends (). Search ('xu Kuan ') [0]. Send ('Hello')

#Send information to file transfer assistant

#3. Show the ratio of men to women
sex_dict = {'male': 0, 'female': 0, "no_set": 0}
for friend in all_friends:
  if == 1:
    sex_dict['male'] += 1
  elif == 2:
    sex_dict['female'] += 1
  elif == 0:
    sex_dict['no_set'] += 1

#4 visualization using Matplotlib
slices = [sex_dict["male"], sex_dict["female"], sex_dict["no_set"]]
activities = ["male", "female", "no_set"]
cols = ["r", "m", "g"]
#Startangle: the angle at which the drawing starts, rotating counter clockwise
#Shadow: Shadow
#% 1.1f%%: format string, integer part minimum 1 digit, one digit after decimal point,%%: Escape Character
plt.pie(slices, labels=activities, colors=cols, startangle=90, shadow=True, autopct='%1.1f%%')
Plt.title ("wechat friends scale chart")

#Statistics login account number of friends
province_ Dict = {Hebei ': 0,' Shandong ': 0,' Liaoning ': 0,' Guangxi ': 0,' Jilin ': 0,
         'Gansu': 0, 'Qinghai': 0, 'Henan': 0, 'Jiangsu': 0, 'Hubei': 0,
         'Hunan': 0, 'Jiangxi': 0, 'Zhejiang': 0, 'Guangdong': 0, 'Yunnan': 0,
         'Fujian': 0, 'Taiwan': 0, 'Hainan': 0, 'Shanxi': 0, 'Sichuan': 0,
         'shaanxi ': 0,' Guizhou ': 0,' Anhui ': 0,' Beijing ': 0,' Tianjin ': 0,
         'Chongqing': 0, 'Shanghai': 0, 'Hong Kong': 0, 'Macao': 0, 'Xinjiang': 0,
         'Inner Mongolia': 0, 'Tibet': 0, 'Heilongjiang': 0, 'Ningxia': 0}
#Statistical Province
for friend in all_friends:
  # print(friend.province)
  if friend.province in province_dict.keys():
    province_dict[friend.province] += 1


#In order to facilitate data presentation, data in JSON array format is generated
data = []
for key, value in province_dict.items():
  Data. Append ({name ': key,' value ': value}) # adds a dictionary element at the end of the data list

data_ Process = PD. Dataframe (data) # create data frame
data_process.columns = ['city', 'popu']

Map = map (). Add ("wechat friends city distribution map", [list (z) for Z in zip (data)_ process['city'], data_ process['popu'])],
  title_ Opts = opts. Titleopts (title = "map visualmap (continuous)"), visualmap_ opts=opts.VisualMapOpts(max_= 10))

#At the end of the with... As... Statement, F. close() is called automatically
#A means: add at the end of the file
def write_ txt_ File (path, txt): # write file
  with open(path, 'a', encoding='gbk') as f:
    return f.write(txt)

#Each time before running the program, you need to delete the last file
#The default character encoding is GBK
def read_txt_file(path):
  with open(path, 'r', encoding='gbk') as f:

#Statistics of login account friend's personal signature
for friend in all_friends:
  print(friend, friend.signature)
  #The data are cleaned, and the punctuation marks and other factors that affect the word frequency statistics are removed
  #[...]: matches any character in brackets
  #R: prevent escape
  Pattern = re. Compile (R '[one - 龥] +') #
  #To match a signature, only Chinese characters are matched, and the result is a list
  filterdata = re.findall(pattern, friend.signature)
  write_txt_file('signatures.txt', ''.join(filterdata))

#Read the file and output.
content = read_txt_file('signatures.txt')
Print (content) # output content, only Chinese characters

#Output word segmentation results, the result is a list
Segment = jieba.lcut (content) # precise mode: there is no redundant data, which is suitable for text analysis

#Generates a data frame with a list of elements
word_ DF = PD. Dataframe ({'segment': segment}) # dictionary type

# index_ Col = false: the first line is not used as an index
#Seq =: separator
#Names = ['stopword ']: column name
#"Stopwords. TXT": stop Thesaurus
stopwords = pd.read_csv("stopwords.txt", index_col=False, sep=" ", names=['stopword'], encoding='gbk')

#View the data box after filtering stop words
word_df = word_df[~word_df.segment.isin(stopwords.stopword)]
Print ("filtered:)

#View the frequency of word segmentation
#Groupby in Python can be regarded as an aggregation operation based on row or index
#The AGG function provides column based aggregation operations and is generally used with groupby
#NP. Size: numpy is used to count the number of different values in a column
words_ stat = word_ DF. Groupby (by = ['segment '] ['segment']. AGG ({count: NP. Size}) # warning message

#In descending order according to the count column
words_ stat = words_ stat.reset_ index().sort_ Values (by = ["count"], ascending = false)

#Read in background image
color_mask = imread("black_mask.png")

#Set word cloud properties
wordcloud = WordCloud(font_ Path = "hiragino. TTF", # set font
           background_ Color = "pink", # set color
           max_ Words = 100, # the maximum number of words displayed in the word cloud
           mask=color_ Mask, # set the background picture
           max_ font_ Size = 100 # maximum font size
#Generate the word cloud dictionary and get the top 100 words of the word cloud
word_frequence = {x[0]: x[1] for x in words_stat.head(100).values}

#Drawing word cloud

#Image processing
PLT. Axis ("off")? Hide axis

5、 Data analysis and visualization

Sex ratio of wechat friends


Proportion of wechat friends in different provinces


The territorial sovereignty of the motherland is sacred and inviolable!
Some areas are not marked, please understand!

Cloud picture of personal signature words of wechat friends


The above is the whole content of this article, I hope to help you learn, and I hope you can support developer more.

Recommended Today

Implementation example of go operation etcd

etcdIt is an open-source, distributed key value pair data storage system, which provides shared configuration, service registration and discovery. This paper mainly introduces the installation and use of etcd. Etcdetcd introduction etcdIt is an open source and highly available distributed key value storage system developed with go language, which can be used to configure sharing […]