# Implement a text diff gadget from an algorithm problem

Time：2022-1-2

As we all know, many communities have a content review mechanism. In addition to the first release, subsequent modifications also need to be reviewed. Of course, the most rough way is to look at it again from the beginning, but the editor certainly wants to kill you. Obviously, this is inefficient. For example, if you change a typo, you may not be able to see it several times, so if you can know what has been modified each time, Like`git`of`diff`It’s much more convenient. This paper will simply implement one.

# Find the longest common subsequence

If you want to know the difference between the two texts, we can find their public content first, and the rest is deleted or added. In the algorithm, this is a classic problem. There is this problem on the force button1143. Longest common subsequence, the title is described as follows:

This kind of problem for finding the best value is generally done by using dynamic programming. Dynamic programming is more like reasoning problem. It can be solved from top to bottom by recursion, or it can be used`for`Cycle from bottom to top, use`for`Loops typically use a called`dp`The specific use of a several-dimensional array depends on the topic. Because there are two variables (the length of two strings), we use a two-dimensional array, which we define`dp[i][j]`express`text1`from`0-i`Substring sum of`text2`from`0-j`When the length of the subsequence is the longest, first consider the length of the subsequence`i`by`0`When`text1`The substring of is an empty string, so no matter`j`The length of the longest common subsequence is`0``j`by`0`The same is true for, so we can initialize an initial value of all`0`of`dp`Array:

``````let longestCommonSubsequence = function (text1, text2) {
let m = text1.length
let n = text2.length
let dp = new Array(m + 1)
dp.forEach((item, index) => {
dp[index] = new Array(n + 1).fill(0)
})
}``````

When`i`and`j`Not for`0`In this case, we need to look at it in several cases:

1. When`text1[i - 1] === text2[j - 1]`If the characters at these two positions are the same, they must be in the longest subsequence, and the current longest subsequence depends on the substring in front of them, that is`dp[i][j] = 1 + dp[i - 1][j - 1]`

2. When`text1[i - 1] !== text2[j - 1]`When, obviously`dp[i][j]`Depending on the previous situation, there are three types:`dp[i - 1][j - 1]``dp[i][j - 1]``dp[i - 1][j]`However, the first case can be excluded, because it is obviously not as long as the latter two cases, because the latter two are one more character than the first, so it may be longer`1`, then we can take the optimal value of the latter two cases;

Next, we just need a double loop to traverse all cases of the two-dimensional array:

``````let longestCommonSubsequence = function (text1, text2) {
let m = text1.length
let n = text2.length
//Initialize 2D array
let dp = new Array(m + 1).fill(0)
dp.forEach((item, index) => {
dp[index] = new Array(n + 1).fill(0)
})
for(let i = 1; i <= m; i++) {
let t1 = text1[i - 1]
for(let j = 1; j <= n; j++) {
let t2 = text2[j - 1]
//Case 1
if (t1 === t2) {
dp[i][j] = 1 + dp[i - 1][j - 1]
}Else {// case 2
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1])
}
}
}
}``````

`dp[m][n]`The value of is the length of the longest common subsequence, but it’s no use only knowing the length. We need to know the specific location. We need to recurse again. Why not in the above loop`t1 === t2`The collection position in the branch of, because all positions of the two strings will be compared in pairs. When there are multiple identical characters, there will be duplication, as follows:

We define a`collect`Function, recursive judgment`i`and`j`Is the position in the longest subsequence, such as for`i`and`j`Location, if`text1[i - 1] === text2[j - 1]`Well, obviously, these two positions are in the longest subsequence. Next, just judge`i - 1`and`j - 1`If the current position is different, we can`dp`Array, because we already know the whole`dp`The value of the array:

Therefore, there is no need to try every position again, so there will be no repetition, such as`dp[i - 1] > dp[j]`, then the next thing to judge is`i-1`and`j`Position, otherwise judge`i`and`j-1`Position, the condition for the end of recursion is`i`and`j`One has arrived`0`Location of:

``````let arr1 = []
let arr2 = []
let collect = function (dp, text1, text2, i, j) {
if (i <= 0 || j <= 0) {
return
}
if (text1[i - 1] === text2[j - 1]) {
//Collect the index of the same character in two strings
arr1.push(i - 1)
arr2.push(j - 1)
return collect(dp, text1, text2, i - 1, j - 1)
} else {
if (dp[i][j - 1] > dp[i - 1][j]) {
return collect(dp, text1, text2, i, j - 1)
} else {
return collect(dp, text1, text2, i - 1, j)
}
}
}``````

The results are as follows:

You can see that it is in reverse order. If you don’t like it, you can also arrange it in order:

``````arr1.sort((a, b) => {
return a - b
});
arr2.sort((a, b) => {
return a - b
});``````

There is still no end here. We have to calculate the deletion and addition positions according to the longest subsequence. This is relatively simple. We can directly traverse the two strings, not in`arr1`and`arr2`Characters in other positions in the are deleted or added:

``````let getDiffList = (text1, text2, arr1, arr2) => {
let delList = []
//Traverse old string
for (let i = 0; i < text1.length; i++) {
//The character representation of the position in the old string that is not in the common subsequence is deleted
if (!arr1.includes(i)) {
delList.push(i)
}
}
//Traverse new string
for (let i = 0; i < text2.length; i++) {
//The character representation of the position in the new string that is not in the common subsequence is new
if (!arr2.includes(i)) {
}
}
return {
delList,
}
}``````

We all know the public subsequence and the index of addition and deletion, so we can mark it out. For example, the deleted ones use a red background and the new ones use a green background, so that we can be sure where the changes have taken place at a glance.

For the sake of simplicity, we will display the addition and deletion on the same text, like this:

Suppose there are two pieces of text to compare, and each piece of text is marked with`\n`Separate to break lines. We first divide them into arrays, and then compare them in pairs. If the old and new text are equal, they are directly added to the displayed array. Otherwise, we operate on the basis of the new text. If the character at a certain position is new, wrap it with a new label, The deleted characters also find the corresponding position in the new text, wrap a label and insert it. The template part is as follows:

``{{ index + 1 }}``

Then make a pairwise comparison:

``````export default {
data () {
return {
oldTextArr: [],
newTextArr: [],
showTextArr: []
}
},
mounted () {
this.diff()
},
methods: {
diff () {
//Split old and new text into arrays
this.oldTextArr = oldText.split(/\n+/g)
this.newTextArr = newText.split(/\n+/g)
let len = this.newTextArr.length
for (let row = 0; row < len; row++) {
//If the old and new texts are identical, there is no need to compare them
if (this.oldTextArr[row] === this.newTextArr[row]) {
this.showTextArr.push(this.newTextArr[row])
continue
}
//Otherwise, the position of the longest common subsequence of old and new text is calculated
let [arr1, arr2] = longestCommonSubsequence(
this.oldTextArr[row],
this.newTextArr[row]
)
//Label operation
this.mark(row, arr1, arr2)
}
}
}
}``````

`mark`Method is used to generate the final string with difference information, first through the above`getDiffList`Method to obtain the deleted and added index information. Because we are based on the new text, the operation of adding is relatively simple. Directly traverse the new index, and then find the character at the corresponding position in the new string, splicing the character of the label element before and after:

``````/*
Oldarr: the longest common subsequence index array of old text
Newarr: the longest common subsequence index array of new text
*/
mark (row, oldArr, newArr) {
let oldText = this.oldTextArr[row];
let newText = this.newTextArr[row];
//Get deleted and added location indexes
let { delList, addList } = getDiffList(
oldText,
newText,
oldArr,
newArr
);
//Because the added span tag will also occupy the position, it will lead to the offset of our new index, which needs to be corrected by subtracting the length occupied by the tag
//Traverse the new location array
let pos = index + addTagLength;
//Intercepts the string before the current position
let pre = newText.slice(0, pos);
//Intercept the following string
let post = newText.slice(pos + 1);
newText = pre + `\${newText[pos]}` + post;
addTagLength += 25;//  The length of the is 25
});
this.showTextArr.push(newText);
}``````

The effects are as follows:

Deleting is a little troublesome, because obviously the deleted character does not exist in the new text. We need to find out where it should be if it has not been deleted, and then insert it back here. Let’s draw a picture:

Look at the deleted first`Flash`, its position in the old string is`3`, through the longest common subsequence, we can find the index of the character in front of it in the new list. Obviously, the index is followed by the position of the deleted character in the new string:

First write a function to get the index of the deleted character in the new text:

``````getDelIndexInNewTextIndex (index, oldArr, newArr) {
for (let i = oldArr.length - 1; i >= 0; i--) {
if (index > oldArr[i]) {
return newArr[i] + 1;
}
}
return 0;
}
}``````

The next step is to calculate the specific position in the string`Flash`Its position is calculated as follows:

``````mark (row, oldArr, newArr) {
// ...

//Traverses the deleted index array
delList.forEach((index) => {
let newIndex = this.getDelIndexInNewTextIndex(index, oldArr, newArr);
return item < newIndex;
}).length;
//The number of characters that have not changed before
let noChangeLength = newArr.filter((item) => {
return item < newIndex;
}).length;
let pos = addLength * 26 + noChangeLength;
let pre = newText.slice(0, pos);
let post = newText.slice(pos);
newText = pre + `\${oldText[index]}` + post;
});

this.showTextArr.push(newText);
}``````

Come here`Flash`You can see the location of the. See the effect:

You can see that the back is in chaos. The reason is very simple. For`crystal`For example, the newly inserted`Flash`We didn’t add it to the position occupied:

``````//The position occupied by the inserted character
let insertStrLength = 0;
delList.forEach((index) => {
let newIndex = this.getDelIndexInNewTextIndex(index, oldArr, newArr);
return item < newIndex;
}).length;
let noChangeLength = newArr.filter((item) => {
return item < newIndex;
}).length;
//Add the total length of newly inserted characters
let pos = insertStrLength + addLength * 26 + noChangeLength;
let pre = newText.slice(0, pos);
let post = newText.slice(pos);
newText = pre + `\${oldText[index]}` + post;
//The length of X is 26
insertStrLength += 26;
});``````

Here we are hasty`diff`The tool is complete:

# Existing problems

I believe you will find that there is a problem with the above implementation. If I delete a line completely or add a new line completely, the number of new and old lines will be different. Repair it first`diff`Function:

``````diff () {
this.oldTextArr = oldText.split(/\n+/g);
this.newTextArr = newText.split(/\n+/g);
//If the number of new and old lines is different, fill it with an empty string
let oldTextArrLen = this.oldTextArr.length;
let newTextArrLen = this.newTextArr.length;
let diffRow = Math.abs(oldTextArrLen - newTextArrLen);
if (diffRow > 0) {
let fixArr = oldTextArrLen > newTextArrLen ? this.newTextArr : this.oldTextArr;
for (let i = 0; i < diffRow; i++) {
fixArr.push('');
}
}
// ...
}``````

If we add or delete the last line, it is not a problem:

However, if a row in the middle is added or deleted, all the rows after the row will be deleted`diff`Will be meaningless:

The reason is very simple. Deleting a row will cause the subsequent pairwise comparison to be staggered. What should I do? One idea is to find that a row has been deleted or a row is new, and then correct the number of rows compared. Another method is not to separate each row`diff`, but directly`diff`The whole text, so it doesn’t matter to delete the new line.

The first idea I can’t decide anyway, so I can only look at the second one. We delete the logic separated by line feed and directly`diff`Entire text:

``````diff () {
this.oldTextArr = [oldText];// .split(/\n+/g);
this.newTextArr = [newText];// .split(/\n+/g);
// ...
}``````

It seems possible. Let’s increase the number of text:

Sure enough, it’s cold. Obviously, our previous simple algorithm for finding the longest common subsequence can’t bear too many words, either`dp`The space occupied by the array is too large, or the number of layers of recursive algorithm is too deep, resulting in memory overflow.

For the author of algorithm slag, this is uncertain. What should we do? We can only use the power of open source. Dangdang, Dangdang, that’s it:diff-match-patch

# Diff match patch Library

`diff-match-patch`It is a high-performance library for operating text. It supports a variety of programming languages. In addition to calculating the difference between the two texts, it can also be used for fuzzy matching and patching, which can also be seen from the name.

It’s easy to use. Let’s bring it in first,`import`If the method is introduced, you need to modify the source code file. By default, the source code hangs the class to the global environment. We need to manually export the class, and then`new`An instance, call`diff`Method:

``````import diff_match_patch from './diff_match_patch_uncompressed';

const dmp = new diff_match_patch();

diffAll () {
let diffList = dmp.diff_main(oldText, newText);
console.log(diffList);
}``````

The returned result is as follows:

The returned is an array, and each item represents a difference,`0`Represents no difference,`1`The representative is new,`-1`Represents deletion. We just need to traverse the array and splice the strings. It’s very simple:

``````diffAll () {
let diffList = dmp.diff_main(oldText, newText);
let htmlStr = '';
diffList.forEach((item) => {
switch (item[0]) {
case 0:
htmlStr += item[1];
break;
case 1:
htmlStr += `\${item[1]}`;
break;
case -1:
htmlStr += `\${item[1]}`;
break;
default:
break;
}
});
this.showTextArr = htmlStr.split(/\n+/);
}``````

Measured`21432`Characters`diff`Time consuming`4ms`Around, still very fast.

Well, the editors can fish happily in the future~

# summary

This paper simply does an algorithm problem of [finding the longest common subsequence], and analyzes its application in the text`diff`But our simple algorithm can not support the actual project, so if there are relevant requirements, you can use an open source library introduced in this paper.

Complete sample code:https://github.com/wanglin2/text_diff_demo