Personal opinion – use and efficiency of Oracle’s exists and in in practical application – record of problems encountered and ideas

Time:2021-3-20

In my work, the number of in parameters in a SQL query will reach 110000, so I want to improve the running efficiency, so I wrote another kind of exists writing method. The SQL execution result surprised me very much.

On the parameter limitation of Oracle for in

The number of Oracle 9i cannot exceed 256, and the number of Oracle 10g cannot exceed 1000Oracle 11g has removed this restriction

I’m usingOracle 11g

Execute SQL

SQL A :

In writing

select DISTINCT

    a.aay002 as CODENAME,

    a.aay003 as CODEVALUE

    from A a left join P p on a.aay002 = p.aay002

    where 1=1

    and a.bya343 is not null  and a.bya343 <> 0

    and a.aae100 = ‘1’

    and a.aay103 = ‘2AA’

    and a.aab038 in (

        select unit_code from 

          (select code UNIT_CODE, name COMENAME from B 

           where 1=1

           and unit_level in (‘2′,’3′,’4′,’5’) 

           and is_enabled = ‘1’

           start with id= ‘00000000’ connect by prior   id = parent_id 

           )

    )

 

order by a.aay002

Query results: 212 data, time consuming: 2.773 seconds

SQL B :

 

EXISTSHow to write it

 

select DISTINCT

    a.aay002 as CODENAME,

    a.aay003 as CODEVALUE

    from A a left join P p on a.aay002 = p.aay002

    where 

    exists 

    (select 1 from B 

    where code = a.aab038 

    start with id = ‘00000000’ 

    connect by prior id = parent_id)

    and a.bya343 is not null  and a.bya343 <> 0

    and a.aae100 = ‘1’

    and a.aay103 = ‘2AA’

order by a.aay002;

 

Data volume of table a: 9040

Table B: 111839 items

Data volume of table P: 50

Query result: 212 data, time consuming: 137.548 seconds

Personal summary: in SQL query after Oracle 11g, the efficiency of in is much faster than exists

But the information on the Internet is:

give an example:

select *  from A where id in(select id from B);

select *  from A where exists (select 1 from B where A.id = B.id);

For example, if table a has 10000 records and table B has 1000000 records, then exists() will execute 10000 times to determine whether the ID in table a is equal to that in table B.
For example, if there are 10000 records in table a and 100000000 records in table B, then exists() is executed 10000 times, because it only executes a.length times. It can be seen that the more data in table B, the more suitable exists() is.
Another example: if table a has 10000 records and table B has 100 records, then exists() is still executed 10000 times, which is better than in() traversing 10000 * 100 times, because in() is traversing and comparing in memory, while exists() needs to query the database. We all know that querying the database consumes higher performance, and the memory comparison is fast

Conclusion: exists () is suitable for the case that the data of table B is larger than that of table a

When the data in table a is as large as that in table B, the efficiency of in and exists is almost the same. You can choose any one to use.

It contradicts the conclusion on the Internet. At present, it is suspected that the reason may be related to the table structure and index. If there is any new progress, it will continue to update. At present, in is still highly efficient