Spring boot common object list comparison and de duplication

Time:2022-7-30

1. Foreword

When importing data in batches from excel, you often need to compare it with the existing data in the database, and you need to consider the possibility of duplicate imported data.

The imported row data generally has an entity class (object) corresponding to it. Often, this entity class has more fields in the database than the imported data, such as the primary key ID field, which generally does not appear in the imported row data. At this time, the imported object is identified by other unique keys, such as importing user data of a company, and uniquely identifying users with user names or mobile phone numbers, There will be no user ID field (at this time, the user ID of the new user has yet to be allocated by the system).

When importing data in batches, you often need to compare it with existing object data. Insert is used for new objects, and update is used for existing objects. But if you query the database one by one and then decide whether to insert or update, it is very inefficient and the code will be very bloated. Using on duplicate key update similar to MySQL is a solution, but when using global ID, there may be a conflict between ID primary key and other unique primary keys, resulting in the failure of update.

Therefore, it is safe to compare the imported object set with the existing object set. The comparison results in four sets: new data set, object set with exactly the same attribute value, object set with the same object key value but with changed attributes, and remaining object set.

Generally, there is a limit for importing data in batches. For example, importing user data and importing a company are limited, so that the data set for comparison will not be too large.

Therefore, the comparison of object lists is the basic function of batch data import. In addition, to import data, we must first eliminate duplicate objects in the data, which are abnormal data.

2. Solutions

Object list comparison should support general entity class objects, which can realize code reuse and eliminate inefficient and repeated code for specific entity classes. Therefore, you need to support generics and use reflection mechanism.

In addition, object comparison is a group of attribute field values comparison, including key values and common attribute values, which are used to identify objects.

The code is as follows:

package com.abc.example.common.utils;

import java.lang.reflect.Field;
import java.util.List;
import java.util.Map;

import lombok.extern.slf4j.Slf4j;

/**
 * @className	        : ObjectListUtil
 *@description: object list tool class
 * @summary		:
 * @history		:
 * ------------------------------------------------------------------------------
 * date		version		modifier		remarks                   
 * ------------------------------------------------------------------------------
 *2021/08/15 1.0.0 sheng.zheng First Edition
 *
 */
@Slf4j
public class ObjectListUtil {
    /**
     * 
     * @methodName	: compareTwoList
     *@description: compare the two object lists, compare the new list with the old list, and compare the results according to the attribute field Dictionary of the comparison
     *1. New objects; 2. Objects with the same key value and different attributes that need to be modified; 3. Properties are exactly the same
     *In this way, there are new object list, modified object list, same object list, and remaining object list
     *@param: generic type T
     *@param fieldMap: compare the field dictionary. Key is the field name, value is the key value, 1 is the key value field, and 0 is the common field.
     *@param NEWLIST: a new list, requiring no duplicate key values
     *@param oldlist: old list, requiring no duplicate key values, which will be changed in the method
     *@param addlist: new object list
     *@param samelist: same object list
     *@param updatelist: modify the object list
     *@param remainderlist: list of remaining objects
     * @history		:
     * ------------------------------------------------------------------------------
     * date		version		modifier		remarks                   
     * ------------------------------------------------------------------------------
     *2021/08/15 1.0.0 sheng.zheng First Edition
     *
     */
    public static  void compareTwoList(Map fieldMap,
    	List newList,List oldList,
    	List addList, List sameList, 
    	List updateList, List remainderList) {
    	
    	//Traverse new list
    	for(int i = 0; i < newList.size(); i++) {
    		T newItem = newList.get(i);
    		//Mark whether the objects match
    		boolean found = false;
        	//Traverse the old list, in reverse order
    		for (int j = oldList.size() - 1; j >= 0; j--) {
    		    T oldItem = oldList.get(j);
    		    //Compare two objects
    		    int compare = compareTwoItem(fieldMap,newItem,oldItem);
    		    if (compare == 1) {
    			//If two objects are the same, add them to the same list
    			sameList.add(newItem);
    			//Remove from list
    			oldList.remove(j);
    			found = true;
    			//End this round of traversal
    			break;
    		    }else if(compare == 2) {
    			// pass
    		    }else if(compare == 3) {
    			//Matching objects with different attributes are added to the modification table
    			updateList.add(newItem);
    			//Remove from list
    			oldList.remove(j);
    			found = true;
    			//End this round of traversal
    			break;
    		    }else {
    			//Exception occurred
    			return;
    		    }
    		}
    		if (found == false) {
    		    //If no match is found in this round of traversal, add it to the new list
    		    addList.add(newItem);
    		}
    	}
    	
    	//Add the remaining items in the oldlist to the remaining list
    	for(int i = 0; i < oldList.size(); i++) {
    		T oldItem = oldList.get(i);
    		remainderList.add(oldItem);
    	}
    	
    }
    
    /**
     * 
     * @methodName	: compareTwoItem
     *@description: compare two objects of the same type
     *@param: generic type T
     *@param fieldMap: compare the field dictionary. Key is the field name, value is the key value, and 1 indicates the key value field.
     *@param newitem: new object
     *@param olditem: old object
     *@return: the return value is defined as follows:
     *0: data processing exception
     *1: the objects are exactly the same (the field values of the comparison fields are the same)
     *2: different objects (field values of key value fields are different)
     *3: the object is the same, but the attribute is different (the field value of the key value field is the same, but the attribute value is different)
     * @history		:
     * ------------------------------------------------------------------------------
     * date		version		modifier		remarks                   
     * ------------------------------------------------------------------------------
     *2021/08/15 1.0.0 sheng.zheng First Edition
     *
     */
    @SuppressWarnings("unchecked")
	public static  int compareTwoItem(Map fieldMap,T newItem, T oldItem) {
    	int retCode = 1;
    	try {
            for (Map.Entry entry : fieldMap.entrySet()) {
        	//Get field name
        	String fieldName = entry.getKey();
        	Integer keyFlag = entry.getValue();
        		
        	//Get the current field value of the new object
    		Class newClazz = (Class) newItem.getClass();        		
        	Field newField = newClazz.getDeclaredField(fieldName);
        	newField.setAccessible(true);
        	Object newValue = newField.get(newItem);
        		
        	//Get the current field value of the old object
    		Class oldClazz = (Class) oldItem.getClass();        		
        	Field oldField = oldClazz.getDeclaredField(fieldName);
        	oldField.setAccessible(true);
        	Object oldValue = oldField.get(oldItem);
        		
        	//Compare the values of two attribute fields
        	if (!newValue.equals(oldValue)) {
        	    //If the field values are not equal
        	    if (keyFlag == 1) {
        		//If it is a key value field, it indicates two objects
        		return 2;
        	    }else {
        		//Non key value field, indicating that the attribute has changed
        		retCode = 3;
        	    }
        	}        		
            }
    	}catch (NoSuchFieldException e) {
    		e.printStackTrace();    
    		log.error(e.getMessage());
    		return 0;
    	}catch (IllegalAccessException e) {
    		e.printStackTrace();    		
    		log.error(e.getMessage());
    		return 0;
    	}
    	return retCode;
    }
    
    /**
     * 
     * @methodName	: removeDuplicate
     *@description: object list de duplication
     *@param: generic type T
     *@param fieldMap: compare the field dictionary. Key is the field name, value is the key value, and 1 indicates the key value field.
     *@param inputlist: object list, which will be de duplicated
     *@param duplist: duplicate redundant object list, which will be de duplicated
     * @history		:
     * ------------------------------------------------------------------------------
     * date		version		modifier		remarks                   
     * ------------------------------------------------------------------------------
     *2021/08/15 1.0.0 sheng.zheng First Edition
     *
     */
    public static  void removeDuplicate(Map fieldMap,List inputList,List dupList){
    	//Start comparing Subscripts
    	int pos = 0;
    	while (true) {
    	    if (inputList.size() -1 < pos) {
    	        break;
    	    }    		
    	    //Whether the marking object is repeated
    	    boolean found = false;    		
    		
    	    //Compared object
    	    T compItem = inputList.get(pos);
    		
    	    //Traversal list
    	    for (int i = pos + 1; i < inputList.size(); i++) {
    		//Comparison object
    		T newItem = inputList.get(i);
    		int compare = compareTwoItem(fieldMap,newItem,compItem);
    		if (compare == 1 || compare == 3) {
    			//The key value is the same, which is a duplicate object
    			found = true;
    			//End this comparison
    			break;
    		}
    	    }
    	    if (found == true) {
    		//Duplicate object
    		dupList.add(compItem);
    		//Remove duplicates
    		inputList.remove(pos);
    		//Note that after removal, the current position remains unchanged
    	    }else {
    		//Do not repeat, process next
    		pos ++;    			
    	    }    		
    	}
    }
}

3. Call method

The calling method is as follows:

//De reprocessing
    	//Field Dictionary
    	Map fieldMap = new HashMap();
	//Mobile number and user name as object identification attributes
    	fieldMap.put("phoneNumber", 1);
    	fieldMap.put("userName", 1);
    	List dupList = new ArrayList();
    	ObjectListUtil.removeDuplicate(fieldMap,importUserList,dupList);
	//Duplist can be used as the output of import error data

    	//Query the data existing in the database. Orgid is the import additional parameter, indicating the company ID
    	List dbUserList = userDao.selectItemsByOrgId(orgId);

    	//Compare old and new data list
    	List addList = new ArrayList();
    	List updateList = new ArrayList();
    	List sameList = new ArrayList();
    	List remainderList = new ArrayList();
    	
	//Other attribute fields of the object, non key values
    	fieldMap.put("age", 0);
    	fieldMap.put("address", 0);    	    	
    	fieldMap.put("height", 0);
    	ObjectListUtil.compareTwoList(fieldMap, importUserList, dbUserList, 
    		addList, sameList, updateList, remainderList);

    	//Insert the addlist in batches

    	//Update the updatelist one by one

Recommended Today

JS generate guid method

JS generate guid method https://blog.csdn.net/Alive_tree/article/details/87942348 Globally unique identification(GUID) is an algorithm generatedBinaryCount Reg128 bitsNumber ofidentifier , GUID is mainly used in networks or systems with multiple nodes and computers. Ideally, any computational geometry computer cluster will not generate two identical guids, and the total number of guids is2^128In theory, it is difficult to make two […]