Java implementation of sensitive word tool detailed instructions

Time:2021-2-26

sensitive-word

In daily work, as long as the user can speak freely (blog, document, forum), we should consider the sensitivity of the content.

Sensitive word is a high performance sensitive word tool based on DFA algorithm. The tool is implemented in Java to help us solve common problems.

characteristic

  • 6W + thesaurus, and constantly optimize and update
  • Based on DFA algorithm, the performance is good
  • Based on fluent API implementation, using elegant and concise
  • Support the judgment, return, desensitization and other common operations of sensitive words
  • Support full angle and half angle interchange
  • Support English case interchange

Quick start

get ready

  • JDK1.7+
  • Maven 3.x+

Maven introduction

<dependency>
    <groupId>com.github.houbb</groupId>
    <artifactId>sensitive-word</artifactId>
    <version>0.0.4</version>
</dependency>

API overview

SensitiveWordBsAs the leading class of sensitive words, the core method is as follows:

method parameter Return value explain
newInstance() nothing Guidance class Initialize boot class
contains(String) String to verify Boolean value Verify that the string contains sensitive words
findAll(String) String to verify String list Returns all sensitive words in a string
replace(String, char) Replace sensitive words with the specified char character string Returns the desensitized string
replace(String) use*Replace sensitive words character string Returns the desensitized string

Use examples

For all test cases, please refer to sensitivewordbstest

Determine whether sensitive words are included

Final string Text: "the five-star red flag is flying in the wind, and the portrait of Chairman Mao stands in front of Tiananmen Square. ";

Assert.assertTrue(SensitiveWordBs.newInstance().contains(text));

Returns the first sensitive word

Final string Text: "the five-star red flag is flying in the wind, and the portrait of Chairman Mao stands in front of Tiananmen Square. ";

String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals (five star red flag, word);

Return all sensitive words

Final string Text: "the five-star red flag is flying in the wind, and the portrait of Chairman Mao stands in front of Tiananmen Square. ";

List<String> wordList = SensitiveWordBs.newInstance().findAll(text);
Assert.assertEquals ("[five star red flag, Chairman Mao, Tian'anmen]", wordList.toString ());

Default replacement policy

Final string Text: "the five-star red flag is flying in the wind, and the portrait of Chairman Mao stands in front of Tiananmen Square. ";
String result = SensitiveWordBs.newInstance().replace(text);
Assert.assertEquals ("* * *" flutters in the wind, and the portrait of * * stands in front of * *. ", result);

Specifies what to replace

Final string Text: "the five-star red flag is flying in the wind, and the portrait of Chairman Mao stands in front of Tiananmen Square. ";
String result = SensitiveWordBs.newInstance().replace(text, '0');
Assert.assertEquals "0000 flutters in the wind, 000's portrait stands in front of 000. ", result);

More features

Many subsequent features, mainly for all kinds of processing, as far as possible to improve the hit rate of sensitive words.

It’s a long offensive and defensive battle.

ignore case

final String text = "fuCK the bad words.";

String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("fuCK", word);

Ignore half angle fillet

final String text = "fuck the bad words.";

String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("fuck", word);

Late road map

  • Digital conversion processing
  • Exchange of traditional and simplified
  • Repetition
  • Pauses
  • Pinyin interchange
  • User defined sensitive words and white list
  • Text image flipping
  • Sensitive word label support

Expand reading

Implementation of sensitive word tool

Explanation of DFA algorithm

Optimization process of sensitive Thesaurus

Thinking record of stop words