Regular Chinese character number case in Python


Remove special characters and keep only Chinese, English and numbers

import re
String = "123I 123456abcdefgabcvdff? /,.:;: ';'] {} () ()"
123I 123456abcdefgabcvdff? /,.:; "';"' [] {} () ()
sub_str = re.sub(u"([^\u4e00-\u9fa5\u0030-\u0039\u0041-\u005a\u0061-\u007a])","",string)

123I 123456abcdefgabcvdff
function explain
sub(pattern,repl,string) Replace all the matching expressions in the string with repl
[^**] Indicates that it does not match any character in this character set
\u4e00-\u9fa5 Unicode range of Chinese characters
\u0030-\u0039 Unicode range of numbers
\u0041-\u005a Upper case Unicode range
\u0061-\u007a Lowercase Unicode range
\uAC00-\uD7AF Unicode range of Korean
\u3040-\u31FF Unicode range of Japanese