Ten common errors of Java developers when writing SQL statements

Time:2021-2-2

The coordination degree between object-oriented programming thinking and command-line programming thinking of Java developers depends on their abilities as follows:

Tips (anyone can write command line code)
Dogma (some people use the “pattern pattern” approach, that is, patterns are ubiquitous and identified by name)
Emotional state (in the early days, real object-oriented code is more difficult to understand than imperative code. )

However, when Java developers write SQL statements, everything becomes different. SQL is a kind of descriptive language, which has nothing to do with object-oriented thinking and imperative thinking. In SQL language, query is very easy to express. But it’s not so easy to write in the best or most correct way. Developers not only need to rethink their own programming mode, but also need to think deeply from the perspective of set theory.

Here’s how Java developers use JDBC or jooq to write SQL statements,Several common mistakes

1. Forget null

Misunderstanding the meaning of null is probably the most common mistake Java developers make when writing SQL. This may be because null is also known as unknown, but there are other reasons. Of course, it would be easier to understand if it was only called unknown. Another reason is that when JDBC gets data or binds variables, null in SQL is mapped to null in Java. This may lead people to think that similar to the case of null = = null in Java, null = null also exists in SQL.

A more bizarre example of misunderstanding null is when null predicates are used in row valued expressions.

Another subtle problem is the misunderstanding of the meaning of null in the anti join of notin.

terms of settlement

Keep training yourself. Always be clear about the meaning of null. Every time you write SQL, you should consider:

Is the predicate correct for null?
Does null affect the result of this function?
2. Processing data in JAVA memory

Some Java developers are well aware of SQL features. Occasionally join, scattered Union, no problem. But what about windows, result set grouping and so on? Many Java developers will load SQL data into memory, convert the data into some suitable collection types, and perform annoying mathematical operations on the collection with a very lengthy loop structure (at least before Java 8 improved the container).

However, some SQL databases not only support SQL standard, but also support advanced OLAP features, so they are more efficient and easier to write. A nonstandard example is the oracle model clause. Just let the database process the data and load the final result into JAVA memory. Because some very smart people have optimized these expensive products. So, in fact, by migrating to an OLAP database, you get two benefits:

Simplicity. It may make it relatively easier to write the right code in SQL than in Java
Performance. The database will probably be faster than your algorithm. More importantly, you don’t have to transfer millions of records over the network.
terms of settlement

Every time you implement a data centric algorithm in Java, try to ask yourself: is there a way for the database to perform these tasks and only deliver the results to me?

3. Try to use union instead of union all

Compared with union, union all needs additional keywords. If the following support has been defined in the SQL standard, it will be much better:

Union (repeat allowed)
Union distinct
In general, it is rarely necessary to remove duplication (sometimes it is even wrong), and for large result sets with many columns, it is often very slow, because the two subqueries need to be sorted, and each tuple needs to be compared with the subsequent tuples.

It should be noted that even though the SQL standard specifies intersectall and exceltall, almost no database implements these useless operations.

terms of settlement

Every time you write Union, consider whether you actually want to write union all.

4. Use JDBC paging function to page a large number of results

Most databases support pagination of results in some way through limit.. offset, top.. start at, offset.. fetch and other clauses. There is no support for these clauses, but there is still rownum (Oracle) or row_ Number () over () (DB2, SQL Server 2008 and earlier), which is much faster than paging in memory. And this is even more obvious for large data sets.

terms of settlement

As long as you use those clauses or tools (such as jooq), you can simulate the above paging clauses for you.

5. Connect Java in memory

From the early development of SQL, some developers still feel uneasy when facing SQL connection. There has always been an inherent fear that join is slow. If the cost based optimizer chooses to execute a nested loop and load the complete table into the database memory before creating a connection table source, it will be very slow indeed. But it rarely happens. With proper predicates, constraints, and indexes, mergejoin and hashjoin operations are very fast. It’s about the right metadata (I don’t have to cite Tom Kyte anymore). However, it is still possible that many Java developers will load two tables from a single query into the map container, and connect them in some way in JAVA memory.

terms of settlement

If you select from multiple tables in multiple steps, consider carefully whether you can express the query function you need in a single statement.
6. Use distinct or union to remove duplicates from a Cartesian product

The existence of long connection will lead to the loose relationship in SQL statement. Specifically, if multi column foreign key relationships are involved, it is likely to forget to add predicates to the join clause. This may lead to duplicate records, but perhaps only in exceptional cases. Then some developers may choose to use distinct to delete these duplicate records again. This kind of mistake has three kinds of harms

There may be a temporary cure but not a permanent cure. Even in some marginal cases, it can’t be cured
This can be very slow on large result sets with many columns. Distinct performs an order by operation to remove the duplicate.
This is also very slow in large Cartesian products, because doing so will still cause a lot of data to be loaded in memory.
terms of settlement

As a rule of thumb, when you get unwanted duplicate results, you should first check your join predicate. Because there may be an imperceptible Cartesian product somewhere.

7. Do not use merge statement

Strictly speaking, this is not a real mistake. It may be just a lack of cognition or fear of the powerful merge sentence. Some databases include other forms of update statements, such as the duplicate key update clause of MySQL. But merge is really powerful. The most important thing is that in the database, it extends the SQL standard to a large extent, such as SQL server.

terms of settlement

If you link insert and update or select… For update to achieve updating, you should think more about it. Regardless of the risk associated with running conditions, you may be able to achieve this with a simple merge statement.

8. Aggregate function is used instead of form function

Before the introduction of window function, using group by clause and projection aggregation function is the only way to summarize data. This is very effective in most cases. If the aggregated data needs to be supplemented by conventional data, the grouped query can be placed in the join sub query.

However, SQL: 2003 defines the window function, and many mainstream database manufacturers have implemented the window function one after another. The window function can aggregate the ungrouped data in the result set. In fact, the function of each window supports its own independent partition by clause, which is a very useful tool for report application.

Using the window function will:

Results in more readable SQL (reduces the presence of non dedicated group by clauses in subqueries)
To improve performance, as an RDBMS, it is likely to be easier to optimize its window function.
terms of settlement

When you write a group by clause in a subquery, think about whether it can be done with a window function.

9. Use memory sort method to sort indirectly

Multiple types of expressions are supported in the SQL order by clause, including case statements, which are very useful for indirect sorting. You should always be able to sort data in JAVA memory because you think:

SQL sorting is too slow
SQL sorting can’t do this
terms of settlement

If you sort any SQL data in memory, please think about whether you can migrate the sort to the database. This is the same reason for migrating paging to the database.

Insert a large number of records one by one

JDBC includes batch processing, and you should use it. In the face of thousands of records, do not create a new Preparedstatement for each record to insert. If you want to insert all records into the same table, use a single SQL statement and multiple sets of bound values to create a batch insert statement. Depending on your database and database configuration, you may need to commit after a certain number of inserted records, in order to keep the undo log not too large.

terms of settlement

Always bulk insert large datasets.

Java developers write SQL statements common 10 kinds of errors, you have a general understanding, I hope in the process of writing must pay special attention to!