DWQA QuestionsCategory: ProgramA problem of Python data processing and dictionary generation

Problem description
There are two dictionary files on hand, file1, file1
You need to generate a new file based on these two dictionary files
The contents of file1 file are

zhangwei
wangwei
wangfang
liwei
lina
zhangmin
lijing
wangjing
liuwei
wangxiuying
zhangli
lixiuying
wangli
zhangjing
zhangxiuying
liqiang
wangmin
limin
wanglei
liuyang
wangyan
wangyong
lijun
zhangyong
lijie
zhangjie
zhanglei
wangqiang
lijuan
wangjun
zhangyan
zhangtao
wangtao
liyan
wangchao
liming
liyong
wangjuan
liujie
liumin
lixia
lili
......

The file content of File2 is

123
123456
@123
888
999
666
2015
2016
521

File1 + File2 are required, and generation is similar

zhangwei123
zhangwei123456
[email protected]
zhangwei888
zhangwei999
zhangwei666
zhangwei2015
zhangwei2016
zhangwei521
wangwei123
wangwei123456
[email protected]
wangwei888
wangwei999
wangwei666
wangwei2015
wangwei2016
wangwei521
wangfang123
wangfang123456
[email protected]
wangfang888
wangfang999
wangfang666
wangfang2015
wangfang2016
wangfang521

Dictionary file for
So far, I’m writing this

#!/usr/bin/env python
# -*- coding: utf-8 -*-

f = open('zidian.txt','w')
print user
with open('file2.txt','r') as dict:
f.write(user.strip()+dic.strip('\r')+'\n')



But there is a drawback in this way: the generated dictionary file is too large
At present, I want to change it to one to five lines of file1 + File2 to generate a file. Lines 6 to 10 of file1 + File2 are generated in one cycle until File2 is completed
How to improve the education of Daniel

This is not the way to cut documents,itertools.productIt can help you finish more succinctly:

import itertools

with open('zidian.txt', 'w') as z:
with open('file1.txt') as f1, open('file2.txt') as f2:
for a, b in itertools.product(f1, f2):
a, b = a.strip(), b.strip()
print(a+b, file=z)

Cutting output method:

import itertools

with open('file2.txt') as f2:
for key, group in itertools.groupby(enumerate(f2), lambda t: t[0]//5):
with open('file1.txt') as f1, open('zidian-{}.txt'.format(key), 'w') as z:
for a, (_, b) in itertools.product(f1, group):
a, b = a.strip(), b.strip()
print(a+b, file=z)

• f = open('zidian.txt','w')You open the file here but forget to close it. It’s better to use with to read and write the file
• dict.readlines(), do not use unless absolutely necessaryreadlinesRemember! Please refer to this articleText format conversion code optimization
• In addition,dicordictThis word has a unique meaning in Python. A little experienced Python programmer will think that they are Python dictionary, which is easy to cause misunderstanding

MichaelXoX replied 4 weeks ago

Did you execute this? Does it meet your requirements? How do I feel like I don’t match the example answer in your question?The requirement given in your topic looks like this: every name in file1 needs to be added with every string in File2, so the result is m * n values, and the result of this execution is not only this!

Well, to understand the meaning of the wrong subject, rewrite the code, I admit to usingfilehandler.readlines()It’s my face~
In fact, if you just think the generated file is a little big,*nixThere’s a gadgetsplitIt’s very suitable. You can freely split large files into several small ones
The following code can be modified simply without considering the result segmentationwrite2fileFunction, andid_generatorFunctions and related modules（random, string）Can be deleted

def write2file(item):
with open("dict.txt", "a") as fh, open("file1.txt", "r") as f1:
for j in item:
fh.write("{}{}\n".format(i.strip(), j))

import random
import string
from multiprocessing.dummy import Pool

def id_generator(size=8, chars=string.ascii_letters + string.digits):
return ''.join(random.choice(chars) for _ in range(size))

def generate_index(n, step=5):
for i in range(0, n, step):
if i + step < n:
yield i, i+step
else:
yield i, None

def write2file(item):
ext_id = id_generator()
with open("dict_{}.txt".format(ext_id), "w") as fh, open("file1.txt", "r") as f1:
for j in item:
fh.write("{}{}\n".format(i.strip(), j))

def multi_process(lst):
pool = Pool()
pool.map(write2file, b_lst)
pool.close()
pool.join()

if __name__ == "__main__":
with open("file2.txt") as f2:
_b_lst = [_.strip() for _ in f2.readlines()]
b_lst = (_b_lst[i: j] for i, j in generate_index(len(_b_lst), 5))
multi_process(b_lst)


The results are shown in the figure, and severaldict_plus8Text document with random string of bits

One of themdict_3txVnToL.txt

zhangwei123
zhangwei123456
[email protected]
zhangwei888
zhangwei999
wangwei123
wangwei123456
[email protected]
wangwei888
wangwei999
...

Here’s what’s old

with open("file1") as f1, open("file2") as f2, open("new", "w") as new:
while b:
for i in range(5):
if b:
new.write("{}{}\n".format(a, b))
else: break


Each time, you can only read by line. No matter how large the file is, it can hold. It is energy-saving and environmental friendly. The results are as follows:

\$ head new
zhangwei123
zhangwei123456
[email protected]
zhangwei888
zhangwei999
wangwei666
wangwei2015
wangwei2016
wangwei521
wangwei123


PS: as mentioned above, try to avoid usingreadlinesMethod, in the case of limited memory, it will be a disaster if large files are encountered

Save each line of File2 into a list, and then take five from the list every time
I don’t have Python on hand, and I guess there is a mistake in pure handwriting. Just understand the thought

names = []
names.append(line)

list = []
with open('file2.txt','r') as dict:
list.append(line)
for i in range(len(line) / 5):
f = open('zidian' + str(i + 1) + '.txt', 'w')
for j in range(5):
for name in names:
f.write(user.strip() + line[i * 5 + j] + '\n')
f.close()
#Divide the remainder of 5, that is, write another file for the last few lines, and the code will not be written

@dokelungItertools.cycle is a wonderful use. I have a better way:

with open('file2') as file2_handle:
#With the name "password" dict, I don't want to do anything, no matter what the other files are