As we all know, many communities have a content review mechanism. In addition to the first release, subsequent modifications also need to be reviewed. Of course, the most rough way is to look at it again from the beginning, but the editor certainly wants to kill you. Obviously, this is inefficient. For example, if you change a typo, you may not be able to see it several times, so if you can know what has been modified each time, Likegit
ofdiff
It’s much more convenient. This paper will simply implement one.
Find the longest common subsequence
If you want to know the difference between the two texts, we can find their public content first, and the rest is deleted or added. In the algorithm, this is a classic problem. There is this problem on the force button1143. Longest common subsequence, the title is described as follows:
This kind of problem for finding the best value is generally done by using dynamic programming. Dynamic programming is more like reasoning problem. It can be solved from top to bottom by recursion, or it can be usedfor
Cycle from bottom to top, usefor
Loops typically use a calleddp
The specific use of a several-dimensional array depends on the topic. Because there are two variables (the length of two strings), we use a two-dimensional array, which we definedp[i][j]
expresstext1
from0-i
Substring sum oftext2
from0-j
When the length of the subsequence is the longest, first consider the length of the subsequencei
by0
Whentext1
The substring of is an empty string, so no matterj
The length of the longest common subsequence is0
,j
by0
The same is true for, so we can initialize an initial value of all0
ofdp
Array:
let longestCommonSubsequence = function (text1, text2) {
let m = text1.length
let n = text2.length
let dp = new Array(m + 1)
dp.forEach((item, index) => {
dp[index] = new Array(n + 1).fill(0)
})
}
Wheni
andj
Not for0
In this case, we need to look at it in several cases:
1. Whentext1[i - 1] === text2[j - 1]
If the characters at these two positions are the same, they must be in the longest subsequence, and the current longest subsequence depends on the substring in front of them, that isdp[i][j] = 1 + dp[i - 1][j - 1]
;
2. Whentext1[i - 1] !== text2[j - 1]
When, obviouslydp[i][j]
Depending on the previous situation, there are three types:dp[i - 1][j - 1]
、dp[i][j - 1]
、dp[i - 1][j]
However, the first case can be excluded, because it is obviously not as long as the latter two cases, because the latter two are one more character than the first, so it may be longer1
, then we can take the optimal value of the latter two cases;
Next, we just need a double loop to traverse all cases of the two-dimensional array:
let longestCommonSubsequence = function (text1, text2) {
let m = text1.length
let n = text2.length
//Initialize 2D array
let dp = new Array(m + 1).fill(0)
dp.forEach((item, index) => {
dp[index] = new Array(n + 1).fill(0)
})
for(let i = 1; i <= m; i++) {
//Because I and j both start with 1, so subtract 1
let t1 = text1[i - 1]
for(let j = 1; j <= n; j++) {
let t2 = text2[j - 1]
//Case 1
if (t1 === t2) {
dp[i][j] = 1 + dp[i - 1][j - 1]
}Else {// case 2
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1])
}
}
}
}
dp[m][n]
The value of is the length of the longest common subsequence, but it’s no use only knowing the length. We need to know the specific location. We need to recurse again. Why not in the above loopt1 === t2
The collection position in the branch of, because all positions of the two strings will be compared in pairs. When there are multiple identical characters, there will be duplication, as follows:
We define acollect
Function, recursive judgmenti
andj
Is the position in the longest subsequence, such as fori
andj
Location, iftext1[i - 1] === text2[j - 1]
Well, obviously, these two positions are in the longest subsequence. Next, just judgei - 1
andj - 1
If the current position is different, we candp
Array, because we already know the wholedp
The value of the array:
Therefore, there is no need to try every position again, so there will be no repetition, such asdp[i - 1] > dp[j]
, then the next thing to judge isi-1
andj
Position, otherwise judgei
andj-1
Position, the condition for the end of recursion isi
andj
One has arrived0
Location of:
let arr1 = []
let arr2 = []
let collect = function (dp, text1, text2, i, j) {
if (i <= 0 || j <= 0) {
return
}
if (text1[i - 1] === text2[j - 1]) {
//Collect the index of the same character in two strings
arr1.push(i - 1)
arr2.push(j - 1)
return collect(dp, text1, text2, i - 1, j - 1)
} else {
if (dp[i][j - 1] > dp[i - 1][j]) {
return collect(dp, text1, text2, i, j - 1)
} else {
return collect(dp, text1, text2, i - 1, j)
}
}
}
The results are as follows:
You can see that it is in reverse order. If you don’t like it, you can also arrange it in order:
arr1.sort((a, b) => {
return a - b
});
arr2.sort((a, b) => {
return a - b
});
There is still no end here. We have to calculate the deletion and addition positions according to the longest subsequence. This is relatively simple. We can directly traverse the two strings, not inarr1
andarr2
Characters in other positions in the are deleted or added:
let getDiffList = (text1, text2, arr1, arr2) => {
let delList = []
let addList = []
//Traverse old string
for (let i = 0; i < text1.length; i++) {
//The character representation of the position in the old string that is not in the common subsequence is deleted
if (!arr1.includes(i)) {
delList.push(i)
}
}
//Traverse new string
for (let i = 0; i < text2.length; i++) {
//The character representation of the position in the new string that is not in the common subsequence is new
if (!arr2.includes(i)) {
addList.push(i)
}
}
return {
delList,
addList
}
}
Dimension deletion and addition
We all know the public subsequence and the index of addition and deletion, so we can mark it out. For example, the deleted ones use a red background and the new ones use a green background, so that we can be sure where the changes have taken place at a glance.
For the sake of simplicity, we will display the addition and deletion on the same text, like this:
Suppose there are two pieces of text to compare, and each piece of text is marked with\n
Separate to break lines. We first divide them into arrays, and then compare them in pairs. If the old and new text are equal, they are directly added to the displayed array. Otherwise, we operate on the basis of the new text. If the character at a certain position is new, wrap it with a new label, The deleted characters also find the corresponding position in the new text, wrap a label and insert it. The template part is as follows:
{{ index + 1 }}
Then make a pairwise comparison:
export default {
data () {
return {
oldTextArr: [],
newTextArr: [],
showTextArr: []
}
},
mounted () {
this.diff()
},
methods: {
diff () {
//Split old and new text into arrays
this.oldTextArr = oldText.split(/\n+/g)
this.newTextArr = newText.split(/\n+/g)
let len = this.newTextArr.length
for (let row = 0; row < len; row++) {
//If the old and new texts are identical, there is no need to compare them
if (this.oldTextArr[row] === this.newTextArr[row]) {
this.showTextArr.push(this.newTextArr[row])
continue
}
//Otherwise, the position of the longest common subsequence of old and new text is calculated
let [arr1, arr2] = longestCommonSubsequence(
this.oldTextArr[row],
this.newTextArr[row]
)
//Label operation
this.mark(row, arr1, arr2)
}
}
}
}
mark
Method is used to generate the final string with difference information, first through the abovegetDiffList
Method to obtain the deleted and added index information. Because we are based on the new text, the operation of adding is relatively simple. Directly traverse the new index, and then find the character at the corresponding position in the new string, splicing the character of the label element before and after:
/*
Oldarr: the longest common subsequence index array of old text
Newarr: the longest common subsequence index array of new text
*/
mark (row, oldArr, newArr) {
let oldText = this.oldTextArr[row];
let newText = this.newTextArr[row];
//Get deleted and added location indexes
let { delList, addList } = getDiffList(
oldText,
newText,
oldArr,
newArr
);
//Because the added span tag will also occupy the position, it will lead to the offset of our new index, which needs to be corrected by subtracting the length occupied by the tag
let addTagLength = 0;
//Traverse the new location array
addList.forEach((index) => {
let pos = index + addTagLength;
//Intercepts the string before the current position
let pre = newText.slice(0, pos);
//Intercept the following string
let post = newText.slice(pos + 1);
newText = pre + `${newText[pos]}` + post;
addTagLength += 25;// The length of the is 25
});
this.showTextArr.push(newText);
}
The effects are as follows:
Deleting is a little troublesome, because obviously the deleted character does not exist in the new text. We need to find out where it should be if it has not been deleted, and then insert it back here. Let’s draw a picture:
Look at the deleted firstFlash
, its position in the old string is3
, through the longest common subsequence, we can find the index of the character in front of it in the new list. Obviously, the index is followed by the position of the deleted character in the new string:
First write a function to get the index of the deleted character in the new text:
getDelIndexInNewTextIndex (index, oldArr, newArr) {
for (let i = oldArr.length - 1; i >= 0; i--) {
if (index > oldArr[i]) {
return newArr[i] + 1;
}
}
return 0;
}
}
The next step is to calculate the specific position in the stringFlash
Its position is calculated as follows:
mark (row, oldArr, newArr) {
// ...
//Traverses the deleted index array
delList.forEach((index) => {
let newIndex = this.getDelIndexInNewTextIndex(index, oldArr, newArr);
//Number of characters added before
let addLength = addList.filter((item) => {
return item < newIndex;
}).length;
//The number of characters that have not changed before
let noChangeLength = newArr.filter((item) => {
return item < newIndex;
}).length;
let pos = addLength * 26 + noChangeLength;
let pre = newText.slice(0, pos);
let post = newText.slice(pos);
newText = pre + `${oldText[index]}` + post;
});
this.showTextArr.push(newText);
}
Come hereFlash
You can see the location of the. See the effect:
You can see that the back is in chaos. The reason is very simple. Forcrystal
For example, the newly insertedFlash
We didn’t add it to the position occupied:
//The position occupied by the inserted character
let insertStrLength = 0;
delList.forEach((index) => {
let newIndex = this.getDelIndexInNewTextIndex(index, oldArr, newArr);
let addLength = addList.filter((item) => {
return item < newIndex;
}).length;
let noChangeLength = newArr.filter((item) => {
return item < newIndex;
}).length;
//Add the total length of newly inserted characters
let pos = insertStrLength + addLength * 26 + noChangeLength;
let pre = newText.slice(0, pos);
let post = newText.slice(pos);
newText = pre + `${oldText[index]}` + post;
//The length of X is 26
insertStrLength += 26;
});
Here we are hastydiff
The tool is complete:
Existing problems
I believe you will find that there is a problem with the above implementation. If I delete a line completely or add a new line completely, the number of new and old lines will be different. Repair it firstdiff
Function:
diff () {
this.oldTextArr = oldText.split(/\n+/g);
this.newTextArr = newText.split(/\n+/g);
//If the number of new and old lines is different, fill it with an empty string
let oldTextArrLen = this.oldTextArr.length;
let newTextArrLen = this.newTextArr.length;
let diffRow = Math.abs(oldTextArrLen - newTextArrLen);
if (diffRow > 0) {
let fixArr = oldTextArrLen > newTextArrLen ? this.newTextArr : this.oldTextArr;
for (let i = 0; i < diffRow; i++) {
fixArr.push('');
}
}
// ...
}
If we add or delete the last line, it is not a problem:
However, if a row in the middle is added or deleted, all the rows after the row will be deleteddiff
Will be meaningless:
The reason is very simple. Deleting a row will cause the subsequent pairwise comparison to be staggered. What should I do? One idea is to find that a row has been deleted or a row is new, and then correct the number of rows compared. Another method is not to separate each rowdiff
, but directlydiff
The whole text, so it doesn’t matter to delete the new line.
The first idea I can’t decide anyway, so I can only look at the second one. We delete the logic separated by line feed and directlydiff
Entire text:
diff () {
this.oldTextArr = [oldText];// .split(/\n+/g);
this.newTextArr = [newText];// .split(/\n+/g);
// ...
}
It seems possible. Let’s increase the number of text:
Sure enough, it’s cold. Obviously, our previous simple algorithm for finding the longest common subsequence can’t bear too many words, eitherdp
The space occupied by the array is too large, or the number of layers of recursive algorithm is too deep, resulting in memory overflow.
For the author of algorithm slag, this is uncertain. What should we do? We can only use the power of open source. Dangdang, Dangdang, that’s it:diff-match-patch。
Diff match patch Library
diff-match-patch
It is a high-performance library for operating text. It supports a variety of programming languages. In addition to calculating the difference between the two texts, it can also be used for fuzzy matching and patching, which can also be seen from the name.
It’s easy to use. Let’s bring it in first,import
If the method is introduced, you need to modify the source code file. By default, the source code hangs the class to the global environment. We need to manually export the class, and thennew
An instance, calldiff
Method:
import diff_match_patch from './diff_match_patch_uncompressed';
const dmp = new diff_match_patch();
diffAll () {
let diffList = dmp.diff_main(oldText, newText);
console.log(diffList);
}
The returned result is as follows:
The returned is an array, and each item represents a difference,0
Represents no difference,1
The representative is new,-1
Represents deletion. We just need to traverse the array and splice the strings. It’s very simple:
diffAll () {
let diffList = dmp.diff_main(oldText, newText);
let htmlStr = '';
diffList.forEach((item) => {
switch (item[0]) {
case 0:
htmlStr += item[1];
break;
case 1:
htmlStr += `${item[1]}`;
break;
case -1:
htmlStr += `${item[1]}`;
break;
default:
break;
}
});
this.showTextArr = htmlStr.split(/\n+/);
}
Measured21432
Charactersdiff
Time consuming4ms
Around, still very fast.
Well, the editors can fish happily in the future~
summary
This paper simply does an algorithm problem of [finding the longest common subsequence], and analyzes its application in the textdiff
But our simple algorithm can not support the actual project, so if there are relevant requirements, you can use an open source library introduced in this paper.
Complete sample code:https://github.com/wanglin2/text_diff_demo