Spring data JPA enables the problem analysis of batch insertion and batch update

Time:2021-12-31

Recently, I was going to write a personal project in the spring family bucket. I learned everything I should learn. Among them, I first used JDBC template and later used mybatis, but I didn’t use JPA (hibernate) system. In the past, I thought hibernate was too heavyweight. Later, after springboot and spring data JPA came out, I felt it was pretty good. In addition, Google trend…

Only China, Japan and South Korea are using mybatis on a large scale (I seriously doubt that it is China’s outsourcing), so it is very strange. Although China’s IT technology is slowly rising, it seems that the leading social IT development is still the United States and Europe. JPA and Hibernate are the absolute mainstream here. Then I think learning and developing based on JPA is my next personal project choice.

After a few days of simple exploration, I found that JPA can be said to be very easy to use. In particular, the DDD design idea is perfectly in line with the current design concept of springboot and microservice (only represents my personal opinion).

However, there will be a problem in the process of use. The crawler insertion efficiency I wrote before is very low. On the one hand, the performance of MySQL on my server is very poor. For details, please see:

[evaluation] in the low configuration environment, the read-write performance of PostgreSQL and MySQL is simply compared (you are welcome to put forward MySQL optimization suggestions)

On the other hand, JPA’s batch insert source code:

This saveall is clearly a circular call to the save method. Let’s write a simple test insert data method and try it:

We opened hibernate to display the SQL log. Take a look at the output results:

Look at the log. JPA first finds out all the data. If the database is available, it will be updated, and if not, it will be added. Let’s take another look at Ali’s monitoring. If it is not configured, it can be configured according to the methods in this article

springboot2. 0 configure connection pool (Hikari, Druid)

I inserted 5 pieces of data and executed SQL 10 times. What’s so special about this… I don’t quite understand. Maybe I’m too delicious? I ran the script and inserted at least thousands of data into the database. This writing method had to be written to my spicy chicken database.

And it’s too inefficient. The query won’t use in? Can insert spell SQL? And I can control whether to insert or add in many scenes. I don’t need you to verify it for me at all. I just want to insert data quietly. Is there any way? yes , we have.

Add in the configuration file:


spring.jpa.properties.hibernate.jdbc.batch_size=500
spring.jpa.properties.hibernate.jdbc.batch_versioned_data=true
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates =true

This batch size is recommended to be set to the maximum number of writes to your database per second / 2. There is no reason, but I feel better…

Define two interfaces in the repository, batchsave and batchupdate, and implement them:


@Override
    @Transactional
    public <S extends T> Iterable<S> batchSave(Iterable<S> var1) {
        Iterator<S> iterator = var1.iterator();
        int index = 0;
        while (iterator.hasNext()){
            em.persist(iterator.next());
            index++;
            if (index % BATCH_SIZE == 0){
                em.flush();
                em.clear();
            }
        }
        if (index % BATCH_SIZE != 0){
            em.flush();
            em.clear();
        }
        return var1;
    }

@Override
    public <S extends T> Iterable<S> batchUpdate(Iterable<S> var1) {
        Iterator<S> iterator = var1.iterator();
        int index = 0;
        while (iterator.hasNext()){
            em.merge(iterator.next());
            index++;
            if (index % BATCH_SIZE == 0){
                em.flush();
                em.clear();
            }
        }
        if (index % BATCH_SIZE != 0){
            em.flush();
            em.clear();
        }
        return var1;
    }

Then make a slight change to the batch insert entry and change it to the method just implemented to call batchsave

Let’s run it and see the results:

This time 500 inserts took 778 milliseconds, which is not much different from the previous 5 712, but we all see that 500 inserts are printed on the console… This led me to think that the batch insertion failed for a time. It is reasonable to say that the time is successful, so I later configured Ali’s monitoring:

Don’t worry, there is a problem with hibernate’s own log printing, so later I turned off hibernate’s log printing and concentrated on using Ali’s Druid

This is the end of this article about spring data JPA starting batch insertion and batch update. For more information about spring data JPA insertion and update, please search previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!