Ten thousand word long article detailing shardingsphere’s support for XA distributed transactions

Time:2021-12-2

Apache shardingsphere is an ecosystem composed of a set of open source distributed database middleware solutions. It is composed of JDBC, proxy and sidecar (under planning), which are independent of each other but can be deployed and used together. They all provide standardized data fragmentation, distributed transaction and database governance functions, which can be applied to various application scenarios such as Java isomorphism, heterogeneous language, cloud native and so on.

Shardingsphere became the top project of Apache Software Foundation on April 16, 2020.

Distributed system cap theory

Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

Consistency
  • Consistency meansall nodes see the same data at the same timeThat is, after the update operation is successful and the client is returned, the data of all nodes at the same time is completely consistent, and there can be no intermediate state.
  • As for consistency, if the data users see at all times are consistent, it is called strong consistency. If an intermediate state is allowed and the data is finally consistent after a period of time, it is called final consistency. In addition, if partial data inconsistency is allowed, it is called weak consistency
Availability
  • Availability means that the services provided by the system must always be available, and the results can always be returned in a limited time for each operation request of the user.Limited time means: for a user’s operation request, the system must be able to return the corresponding processing result within the specified time. If it exceeds this time range, the system is considered unavailable.
  • Return resultsIs another very important indicator of availability. It requires the system to return a normal response result after processing the user’s request, whether the result is successful or failed.
Partition tolerance
  • When the distributed system encounters any network partition failure, it still needs to be able to ensure that it can provide services that meet consistency and availability, unless the whole network environment fails.

X / open DTP Model and XA specification

X / open, now the open group, is an independent organization, mainly responsible for formulating various industry technical standards. Official website address:http://www.opengroup.org/。 The X / open organization is mainly supported by major well-known companies or manufacturers. These organizations not only follow the industry technical standards defined by the X / open organization, but also participate in the formulation of standards. The following figure shows the current main members of the open group (screenshot of the official website):

Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

DTP Model

Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

  • Application program (AP): used to define transaction boundaries (i.e. define the start and end of transactions) and operate resources within the transaction boundaries.
  • Resource Manager (RM for short, also commonly referred to as transaction participant): such as database, file system, etc., and provides a way to access resources.

    • Transaction manager (TM for short, commonly referred to as transaction coordinator): it is responsible for allocating unique transaction identifiers, monitoring the execution progress of transactions, and committing and rolling back transactions.
Xa specification

Ten thousand word long article detailing shardingsphere's support for XA distributed transactions
There are many interface specifications here. Let’s just talk about some of the most important ones.

  • xa_start: inRMThe client calls this interface to open aXATransaction, which needs to be connected laterXIDAs a parameter.
  • xa_end: disassociate the current thread from the transactionxa_startIs paired.
  • xa_prepare: askRMWhether the transaction is ready to commit.
  • xa_commit: NotificationRMCommit transaction branch.
  • xa_rollback: NotificationRMCommit rollback transaction branch.
Xa phase II submission
  • Phase I: TM informs RMS that they are ready to commit their transaction branches. If RM judges that its work can be submitted, it will persist the work content and give TM a positive reply; If anything else happens, TM will be given a negative answer. After sending a negative reply and rolling back the work, RM can discard the transaction branch information.
  • Phase II: TM decides whether to commit or roll back the transaction according to the results of each RM prepare in phase 1. If all RMS are prepared successfully, TM will notify all RMS to submit; If RM prepare fails, TM notifies all RMS to roll back their transaction branches.

MySQL support for XA protocol

MySQLfrom5.0.3Xa distributed transactions are supported, and onlyInnoDBThe storage engine supports XA transactions.
MySQLstayDTPThe model also belongs to the explorerRM

SQL syntax for MySQL XA transactions
Xa start XID // start an XA transaction. XID is a unique value indicating the transaction branch identifier
Xa end XID // ends an XA transaction,
Xa prepare XID ready to submit
Xa commit XID [one phase] // commit the transaction. In the two-phase commit protocol, if only one RM participates, it can be optimized to one-phase commit
Xa rollback XID // rollback
Xa recover [convert XID] // list all XA transactions in the Prepare phase
MySQL XID details

MySQL uses XID as the identifier of a transaction branch. It is described in C language as follows:

/∗
∗ Transaction branch identification: XID and NULLXID:
∗/
#define XIDDATASIZE 128  /∗ size in bytes ∗/
#define MAXGTRIDSIZE 64  /∗ maximum size in bytes of gtrid ∗/
#define MAXBQUALSIZE 64  /∗ maximum size in bytes of bqual ∗/
struct xid_t {
    long formatID;     /* format identifier */
    long gtrid_length; /* value 1-64 */
    long bqual_length; /* value 1-64 */
    char data[XIDDATASIZE];
    };
/∗
∗ A value of -1 in formatID means that the XID is null.
∗/
typedef struct xid_t XID;
/∗
∗ Declarations of routines by which RMs call TMs:
∗/
extern int ax_reg(int, XID ∗, long);
extern int ax_unreg(int, long);
  • gtrid: global transaction identifier, which cannot exceed 64 bytes at most.
  • bqual: Branch qualifier, maximum 64 bytes.
  • formatId: record the format of gtrid and bqual, which is similar to the function of flags field in memcached.
  • data: XID, which is the spliced content of gtrid and bqual..
MySQL XA transaction status

Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

JTA specification

JTA(Java Transaction API): it provides the ability of distributed transaction for J2EE platform. To some extent, it can be considered thatThe JTA specification is the Java version of the XA specification, it abstracts the DTP Model interaction interface specified in XA specification into methods in java interface, and specifies what functions each method should realize.
Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

JTA defined interface
  • javax.transaction.TransactionManager: transaction manager, responsible for transactionsbegin, commitrollbackWait for orders.
  • javax.transaction.UserTransaction: used to declare a distributed transaction.
  • javax.transaction.TransactionSynchronizationRegistry: transaction synchronization registration
  • javax.transaction.xa.XAResource: definitionRMprovide forTMOperational interface
  • javax.transaction.xa.Xid: transaction XID interface.
TM provider:
  • Implement UserTransaction, transactionmanager, transaction, transactionsynchronizationregistry, synchronization and XID interfaces, and realize distributed transactions through interaction with xaresource interface.
RM provider:
  • The xaresource interface needs to be implemented by the resource manager. Some methods are defined in the xaresource interface, which will be called by TM, such as:

    • Start method: open the transaction branch
    • End method: end the transaction branch
    • Prepare method: prepare for submission
    • Commit method: submit
    • Rollback method: rollback
    • Recover method: lists all transaction branches in the prepared state

Shardingsphere’s support for XA distributed transactions

Shardingsphere provides standard and SPI based interfaces for XA distributed transactions and JTA specificationsorg.apache.shardingsphere.transaction.spi.ShardingTransactionManager

public interface ShardingTransactionManager extends AutoCloseable {

    /**
     * Initialize sharding transaction manager.
     *
     * @param databaseType database type
     * @param resourceDataSources resource data sources
     */
    void init(DatabaseType databaseType, Collection<ResourceDataSource> resourceDataSources);

    /**
     * Get transaction type.
     *
     * @return transaction type
     */
    TransactionType getTransactionType();

    /**
     * Judge is in transaction or not.
     *
     * @return in transaction or not
     */
    boolean isInTransaction();

    /**
     * Get transactional connection.
     *
     * @param dataSourceName data source name
     * @return connection
     * @throws SQLException SQL exception
     */
    Connection getConnection(String dataSourceName) throws SQLException;

    /**
     * Begin transaction.
     */
    void begin();

    /**
     * Commit transaction.
     */
    void commit();

    /**
     * Rollback transaction.
     */
    void rollback();
}

The specific implementation classes supported for XA distributed transactions are:org.apache.shardingsphere.transaction.xa.XAShardingTransactionManager
In this class, the SPI based implementation is calledorg.apache.shardingsphere.transaction.xa.spi.XATransactionManagerTo manage XA transactions.

summary

We understand the cap theory of distributed transactions, the DTP Model of X / open, the interface specification of Xa, and MySQL’s support for XA protocol. We’d better explain the JTA specification and the SPI interface defined when shardingsphere integrates XA transactions. These are very important theoretical foundations. Next, we’ll explain in detail based onAtomkikosXATransactionManagerAnd source code analysis.

Shardingsphere integrates atomikos to analyze the source code of XA distributed transactions

Atomikos(https://www.atomikos.com/)In fact, it is the name of a companyImplementation of XA distributed transaction TM based on JTA specification。 Its most famous product is the transaction manager. The product is divided into two versions:

  • Transaction Essentials: open source free product;
  • Extreme transactions: on the commercial version, there is a charge.

The relationship between the two products is shown in the figure below:
Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

Extreme transactions provides the following additional functions (important) on top of transaction Essentials:

  • TCC support: This is a flexible transaction
  • Support transaction propagation through remote procedure call technologies such as RMI, IIOP and soap.
  • Transaction logs are stored in the cloud. The cloud recovers transactions and provides a perfect background management.

Org.apache.shardingsphere.transaction.xa.xashardingtransactionmanager details

Let’s simply revieworg.apache.shardingsphere.transaction.spi.ShardingTransactionManager

public interface ShardingTransactionManager extends AutoCloseable {

    /**
     * Initialize sharding transaction manager.
     *
     * @param databaseType database type
     * @param resourceDataSources resource data sources
     */
    void init(DatabaseType databaseType, Collection<ResourceDataSource> resourceDataSources);

    /**
     * Get transaction type.
     *
     * @return transaction type
     */
    TransactionType getTransactionType();

    /**
     * Judge is in transaction or not.
     *
     * @return in transaction or not
     */
    boolean isInTransaction();

    /**
     * Get transactional connection.
     *
     * @param dataSourceName data source name
     * @return connection
     * @throws SQLException SQL exception
     */
    Connection getConnection(String dataSourceName) throws SQLException;

    /**
     * Begin transaction.
     */
    void begin();

    /**
     * Commit transaction.
     */
    void commit();

    /**
     * Rollback transaction.
     */
    void rollback();
}

We focus on countiesinitMethod. From its name, you should be able to see that this is the initialization method of the whole framework. Let’s see how it is initialized.


 private final Map<String, XATransactionDataSource> cachedDataSources = new HashMap<>();

 private final XATransactionManager xaTransactionManager = XATransactionManagerLoader.getInstance().getTransactionManager();

    @Override
    public void init(final DatabaseType databaseType, final Collection<ResourceDataSource> resourceDataSources) {
        for (ResourceDataSource each : resourceDataSources) {
            cachedDataSources.put(each.getOriginalName(), new XATransactionDataSource(databaseType, each.getUniqueResourceName(), each.getDataSource(), xaTransactionManager));
        }
        xaTransactionManager.init();
    }
  • First, load the specific implementation class of xatransactionmanager in SPI mode. Here, the returned isorg.apache.shardingsphere.transaction.xa.atomikos.manager.AtomikosTransactionManager
  • We’re watchingnew XATransactionDataSource(), enterorg.apache.shardingsphere.transaction.xa.jta.datasource。XATransactionDataSourceClass.
public XATransactionDataSource(final DatabaseType databaseType, final String resourceName, final DataSource dataSource, final XATransactionManager xaTransactionManager) {
        this.databaseType = databaseType;
        this.resourceName = resourceName;
        this.dataSource = dataSource;
        if (!CONTAINER_DATASOURCE_NAMES.contains(dataSource.getClass().getSimpleName())) {
            //Focus on 1 and return xadatasource
            xaDataSource = XADataSourceFactory.build(databaseType, dataSource);
            this.xaTransactionManager = xaTransactionManager;
            //Focus on 2 Registration resources
            xaTransactionManager.registerRecoveryResource(resourceName, xaDataSource);
        }
    }
  • Let’s focus onXADataSourceFactory.build(databaseType, dataSource), as we can see from the name, this should be a returnXadatasource in JTA specification, many functions in shardingsphere can be guessed from the naming of code style, which is elegant code (blowing a wave). Don’t force us to enter the method.
public final class XADataSourceFactory {

    public static XADataSource build(final DatabaseType databaseType, final DataSource dataSource) {
        return new DataSourceSwapper(XADataSourceDefinitionFactory.getXADataSourceDefinition(databaseType)).swap(dataSource);
    }
}
  • The first is an SPI definitionXADataSourceDefinitionFactory, it loads different dialects according to different database types. Then we enterswapmethod.
 public XADataSource swap(final DataSource dataSource) {
        XADataSource result = createXADataSource();
        setProperties(result, getDatabaseAccessConfiguration(dataSource));
        return result;
    }
  • Very concise. The first step is to create,XADataSourceThe second step is to set its properties (including data connection, user name, password, etc.), and then return.
  • returnXATransactionDataSourceClass, focus xaTransactionManager.registerRecoveryResource(resourceName, xaDataSource);As can be seen from the name, this is the registered transaction recovery resource. We’ll explain this in detail when the transaction resumes.
  • returnXAShardingTransactionManager.init(), let’s focus on:

xaTransactionManager.init();, finally enterAtomikosTransactionManager.init()。 The flow chart is as follows:

Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

code:

public final class AtomikosTransactionManager implements XATransactionManager {

    private final UserTransactionManager transactionManager = new UserTransactionManager();

    private final UserTransactionService userTransactionService = new UserTransactionServiceImp();

    @Override
    public void init() {
        userTransactionService.init();
    }

}
  • get intoUserTransactionServiceImp.init()
private void initialize() {
       //Don't worry about adding recovery resources
        for (RecoverableResource resource : resources_) {
            Configuration.addResource ( resource );
        }
        for (LogAdministrator logAdministrator : logAdministrators_) {
            Configuration.addLogAdministrator ( logAdministrator );
        }
         //Don't worry about registering plug-ins
        for (TransactionServicePlugin nxt : tsListeners_) {
            Configuration.registerTransactionServicePlugin ( nxt );
        }
        //Get key concerns about configuration properties
        ConfigProperties configProps = Configuration.getConfigProperties();
        configProps.applyUserSpecificProperties(properties_);
        //Initialize
        Configuration.init();
    }
  • We focus on getting configuration properties. Finally entercom.atomikos.icatch.provider.imp.AssemblerImp.initializeProperties()method.
@Override
    public ConfigProperties initializeProperties() {
         //Read the default configuration transactions-defaults.properties under classpath
        Properties defaults = new Properties();
        loadPropertiesFromClasspath(defaults, DEFAULT_PROPERTIES_FILE_NAME);
        //Read the transactions.properties configuration under the classpath and overwrite the value of the same key in transactions-defaults.properties
        Properties transactionsProperties = new Properties(defaults);
        loadPropertiesFromClasspath(transactionsProperties, TRANSACTIONS_PROPERTIES_FILE_NAME);
        //Read jta.properties under the classpath and overwrite the value of the same key in transactions-defaults.properties and transactions.properties
        Properties jtaProperties = new Properties(transactionsProperties);
        loadPropertiesFromClasspath(jtaProperties, JTA_PROPERTIES_FILE_NAME);

        //Read the path of the custom configuration file specified by Java - dcom.atomikos.icatch.file to overwrite the previous configuration with the same name
        Properties customProperties = new Properties(jtaProperties);
        loadPropertiesFromCustomFilePath(customProperties);
        //Finally, a configproperties object is constructed to represent the actual configuration to be used
        Properties finalProperties = new Properties(customProperties);
        return new ConfigProperties(finalProperties);
    }
  • Next, focus on,Configuration.init(), initialize.
ublic static synchronized boolean init() {
        boolean startupInitiated = false;
        if (service_ == null) {
            startupInitiated = true;
           //SPI mode loading plug-in registration, without too much concern
            addAllTransactionServicePluginServicesFromClasspath();
            ConfigProperties configProperties = getConfigProperties();
          //If you call the beforeinit method of the plug-in for initialization, you don't need to care too much
            notifyBeforeInit(configProperties);
          //It is very important to initialize the transaction log recovery, which will be explained in detail next
            assembleSystemComponents(configProperties);
         //It is generally important to enter the initialization of system annotations
            initializeSystemComponents(configProperties);
            notifyAfterInit();
            if (configProperties.getForceShutdownOnVmExit()) {
                addShutdownHook(new ForceShutdownHook());
            }
        }
        return startupInitiated;
    }
  • Let’s focus firstassembleSystemComponents(configProperties);Enter it, entercom.atomikos.icatch.provider.imp.AssemblerImp.assembleTransactionService()method:
@Override
    public TransactionServiceProvider assembleTransactionService(
            ConfigProperties configProperties) {
        RecoveryLog recoveryLog =null;
       //Print log
        logProperties(configProperties.getCompletedProperties());
       //Generate unique name
        String tmUniqueName = configProperties.getTmUniqueName();

        long maxTimeout = configProperties.getMaxTimeout();
        int maxActives = configProperties.getMaxActives();
        boolean threaded2pc = configProperties.getThreaded2pc();
      //Loading oltplog in SPI mode is the most important extension. If the user does not have SPI mode to extend, it will be null
        OltpLog oltpLog = createOltpLogFromClasspath();
        if (oltpLog == null) {
            LOGGER.logInfo("Using default (local) logging and recovery...");
                        //Create transaction log storage resource
            Repository repository = createRepository(configProperties);
            oltpLog = createOltpLog(repository);
            //??? Assemble recoveryLog
            recoveryLog = createRecoveryLog(repository);
        }
        StateRecoveryManagerImp    recoveryManager = new StateRecoveryManagerImp();
        recoveryManager.setOltpLog(oltpLog);
           //Generate a unique ID generator, which will be used to generate XID in the future
        UniqueIdMgr idMgr = new UniqueIdMgr ( tmUniqueName );
        int overflow = idMgr.getMaxIdLengthInBytes() - MAX_TID_LENGTH;
        if ( overflow > 0 ) {
            // see case 73086
            String msg = "Value too long : " + tmUniqueName;
            LOGGER.logFatal ( msg );
            throw new SysException(msg);
        }
        return new TransactionServiceImp(tmUniqueName, recoveryManager, idMgr, maxTimeout, maxActives, !threaded2pc, recoveryLog);
    }
  • Let’s focus on analysiscreateOltpLogFromClasspath(), it is obtained by SPI loading method. By default, it will be returned herenull, what do you mean?
    When there is no extension, atomikos will create framework custom resources to store transaction logs.
private OltpLog createOltpLogFromClasspath() {
        OltpLog ret = null;
        ServiceLoader<OltpLogFactory> loader = ServiceLoader.load(OltpLogFactory.class,Configuration.class.getClassLoader());
        int i = 0;
        for (OltpLogFactory l : loader ) {
            ret = l.createOltpLog();
            i++;
        }
        if (i > 1) {
            String msg = "More than one OltpLogFactory found in classpath - error in configuration!";
            LOGGER.logFatal(msg);
            throw new SysException(msg);
        }
        return ret;
    }
  • We followed inRepository repository = createRepository(configProperties);
private CachedRepository createCoordinatorLogEntryRepository(
            ConfigProperties configProperties) throws LogException {
        //Create memory resource store
        InMemoryRepository inMemoryCoordinatorLogEntryRepository = new InMemoryRepository();
       //Initialize
        inMemoryCoordinatorLogEntryRepository.init();
       //Create and use file storage resources as backup
        FileSystemRepository backupCoordinatorLogEntryRepository = new FileSystemRepository();
       //Initialize
        backupCoordinatorLogEntryRepository.init();
      //Merge memory and file resources
        CachedRepository repository = new CachedRepository(inMemoryCoordinatorLogEntryRepository, backupCoordinatorLogEntryRepository);
        repository.init();
        return repository;
    }
  • This will createCachedRepository, which containsInMemoryRepositoryAndFileSystemRepository
  • Back to the main linecom.atomikos.icatch.config.Configuration.init()Finally, let’s analyzenotifyAfterInit();
private static void notifyAfterInit() {
         //Initialize the plug-in
        for (TransactionServicePlugin p : tsListenersList_) {
            p.afterInit();
        }
        for (LogAdministrator a : logAdministrators_) {
            a.registerLogControl(service_.getLogControl());
        }
         //Set transaction recovery service to recover transactions
        for (RecoverableResource r : resourceList_ ) {
            r.setRecoveryService(recoveryService_);
        }

    }
  • The initialization of the plug-in will entercom.atomikos.icatch.jta.JtaTransactionServicePlugin.afterInit()
public void afterInit() {
        TransactionManagerImp.installTransactionManager(Configuration.getCompositeTransactionManager(), autoRegisterResources);
          //If we customize and extend oltplog, null will be returned here. If it is null, xaresourcerecoverymanager is null
        RecoveryLog recoveryLog = Configuration.getRecoveryLog();
        long maxTimeout = Configuration.getConfigProperties().getMaxTimeout();
        if (recoveryLog != null) {
            XaResourceRecoveryManager.installXaResourceRecoveryManager(new DefaultXaRecoveryLog(recoveryLog, maxTimeout),Configuration.getConfigProperties().getTmUniqueName());
        }

    }
  • Key attentionRecoveryLog recoveryLog = Configuration.getRecoveryLog();, if the user adoptsSPI mode, extendedcom.atomikos.recovery.OltpLogNull will be returned here。 If NULL, noXaResourceRecoveryManagerInitialize.
  • go back tonotifyAfterInit(), let’s analyzesetRecoveryService
public void setRecoveryService ( RecoveryService recoveryService )
            throws ResourceException
    {

        if ( recoveryService != null ) {
            if ( LOGGER.isTraceEnabled() ) LOGGER.logTrace ( "Installing recovery service on resource "
                    + getName () );
            this.branchIdentifier=recoveryService.getName();
            recover();
        }
    }
  • We enterrecover()method:
 public void recover() {
        XaResourceRecoveryManager xaResourceRecoveryManager = XaResourceRecoveryManager.getInstance();
        //null for LogCloud recovery
        if (xaResourceRecoveryManager != null) {
            try {
                xaResourceRecoveryManager.recover(getXAResource());
            } catch (Exception e) {
                refreshXAResource(); //cf case 156968
            }

        }
    }
  • See the most critical comment, if the user adopts itSPI mode, extendedcom.atomikos.recovery.OltpLog, thenXaResourceRecoveryManager If it is null, cloud recovery will be performed; otherwise, transaction recovery will be performed. Transaction recovery is very complex. We will talk about it separately.

Here, the basic initialization of atomikos has been completed.

Atomikos transaction begin process

We know that local affairs will have onetrainsaction.begin, which is not the case for XA distributed transactions. In addition, let’s switch back toXAShardingTransactionManager.begin(), will callcom.atomikos.icatch.jta.TransactionManagerImp.begin()。 The flow chart is as follows:
Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

code:

public void begin ( int timeout ) throws NotSupportedException,
            SystemException
    {
        CompositeTransaction ct = null;
        ResumePreviousTransactionSubTxAwareParticipant resumeParticipant = null;

        ct = compositeTransactionManager.getCompositeTransaction();
        if ( ct != null && ct.getProperty (  JTA_PROPERTY_NAME ) == null ) {
            LOGGER.logWarning ( "JTA: temporarily suspending incompatible transaction: " + ct.getTid() +
                    " (will be resumed after JTA transaction ends)" );
            ct = compositeTransactionManager.suspend();
            resumeParticipant = new ResumePreviousTransactionSubTxAwareParticipant ( ct );
        }

        try {
      //Create transaction compensation point
            ct = compositeTransactionManager.createCompositeTransaction ( ( ( long ) timeout ) * 1000 );
            if ( resumeParticipant != null ) ct.addSubTxAwareParticipant ( resumeParticipant );
            if ( ct.isRoot () && getDefaultSerial () )
                ct.setSerial ();
            ct.setProperty ( JTA_PROPERTY_NAME , "true" );
        } catch ( SysException se ) {
            String msg = "Error in begin()";
            LOGGER.logError( msg , se );
            throw new ExtendedSystemException ( msg , se );
        }
        recreateCompositeTransactionAsJtaTransaction(ct);
    }
  • Here we mainly focus oncompositeTransactionManager.createCompositeTransaction(),
public CompositeTransaction createCompositeTransaction ( long timeout ) throws SysException
    {
        CompositeTransaction ct = null , ret = null;

        ct = getCurrentTx ();
        if ( ct == null ) {
            ret = getTransactionService().createCompositeTransaction ( timeout );
            if(LOGGER.isDebugEnabled()){
                LOGGER.logDebug("createCompositeTransaction ( " + timeout + " ): "
                    + "created new ROOT transaction with id " + ret.getTid ());
            }
        } else {
             if(LOGGER.isDebugEnabled()) LOGGER.logDebug("createCompositeTransaction ( " + timeout + " )");
            ret = ct.createSubTransaction ();

        }

        Thread thread = Thread.currentThread ();
        setThreadMappings ( ret, thread );

        return ret;
    }
  • Create a transaction compensation point and put it into the map with the current thread as the key. Think about it here,Why doesn't it use ThreadLocal

Here, the transaction begin process of atomikos has been completed. You may have some doubts. It seems that begin didn’t do anything and XA start didn’t call? Don’t panic. Let’s continue in the next section.

Xatransactiondatasource getconnection() process

We all know that to execute SQL statements, we must obtain the connection to the database. Let’s go back toXAShardingTransactionManager.getConnection()Finally, it will callorg.apache.shardingsphere.transaction.xa.jta.datasourceXATransactionDataSource.getConnection()。 The flow chart is as follows:
Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

code:

public Connection getConnection() throws SQLException, SystemException, RollbackException {
      //First check whether there is an existing connection. This step is very concerned and is also the key to Xa, because XA transactions must be in the same connection
        if (CONTAINER_DATASOURCE_NAMES.contains(dataSource.getClass().getSimpleName())) {
            return dataSource.getConnection();
        }
      //Get database connection
        Connection result = dataSource.getConnection();
      //When converted to xaconnection, it is actually the same connection
        XAConnection xaConnection = XAConnectionFactory.createXAConnection(databaseType, xaDataSource, result);
      //Get JTA transaction definition interface
        Transaction transaction = xaTransactionManager.getTransactionManager().getTransaction();
        if (!enlistedTransactions.get().contains(transaction)) {
      //Register resources
            transaction.enlistResource(new SingleXAResource(resourceName, xaConnection.getXAResource()));
            transaction.registerSynchronization(new Synchronization() {
                @Override
                public void beforeCompletion() {
                    enlistedTransactions.get().remove(transaction);
                }

                @Override
                public void afterCompletion(final int status) {
                    enlistedTransactions.get().clear();
                }
            });
            enlistedTransactions.get().add(transaction);
        }
        return result;
    }
  • The first step is very concerned, especially for shardingsphere. Because there will be multiple SQL statements in a transaction to the same database, the same xaconnection must be obtained for the same database, so as to commit and rollback XA transactions.
  • What we care about next transaction.enlistResource(new SingleXAResource(resourceName, xaConnection.getXAResource()));, will entercom.atomikos.icatch.jta.TransactionImp.enlistResource(), the code is too long, intercepting part.
try {
                restx = (XAResourceTransaction) res
                        .getResourceTransaction(this.compositeTransaction);

                // next, we MUST set the xa resource again,
                // because ONLY the instance we got as argument
                // is available for use now !
                // older instances (set in restx from previous sibling)
                // have connections that may be in reuse already
                // ->old xares not valid except for 2pc operations

                restx.setXAResource(xares);
                restx.resume();
            } catch (ResourceException re) {
                throw new ExtendedSystemException(
                        "Unexpected error during enlist", re);
            } catch (RuntimeException e) {
                throw e;
            }

            addXAResourceTransaction(restx, xares);
  • Let’s look directlyrestx.resume();
public synchronized void resume() throws ResourceException {
        int flag = 0;
        String logFlag = "";
        if (this.state.equals(TxState.LOCALLY_DONE)) {// reused instance
            flag = XAResource.TMJOIN;
            logFlag = "XAResource.TMJOIN";
        } else if (!this.knownInResource) {// new instance
            flag = XAResource.TMNOFLAGS;
            logFlag = "XAResource.TMNOFLAGS";
        } else
            throw new IllegalStateException("Wrong state for resume: "
                    + this.state);

        try {
            if (LOGGER.isDebugEnabled()) {
                LOGGER.logDebug("XAResource.start ( " + this.xidToHexString
                        + " , " + logFlag + " ) on resource "
                        + this.resourcename
                        + " represented by XAResource instance "
                        + this.xaresource);
            }
            this.xaresource.start(this.xid, flag);

        } catch (XAException xaerr) {
            String msg = interpretErrorCode(this.resourcename, "resume",
                    this.xid, xaerr.errorCode);
            LOGGER.logWarning(msg, xaerr);
            throw new ResourceException(msg, xaerr);
        }
        setState(TxState.ACTIVE);
        this.knownInResource = true;
    }
  • Oh, dorky, see? Everybody, seethis.xaresource.start(this.xid, flag);Did you? Let’s go in and assume that the MySQL database we use:
 public void start(Xid xid, int flags) throws XAException {
        StringBuilder commandBuf = new StringBuilder(300);
        commandBuf.append("XA START ");
        appendXid(commandBuf, xid);
        switch(flags) {
        case 0:
            break;
        case 2097152:
            commandBuf.append(" JOIN");
            break;
        case 134217728:
            commandBuf.append(" RESUME");
            break;
        default:
            throw new XAException(-5);
        }

        this.dispatchCommand(commandBuf.toString());
        this.underlyingConnection.setInGlobalTx(true);
    }
  • assembleXA start XidSQL statement for execution.

So far, we have summarized that when obtaining the database connection, we executed the functions in the XA protocol interfaceXA start xid

Atomikos transaction commit process

Well, we have started the transaction above. Now let’s analyze the transaction commit process, and then switch back to the perspectiveXAShardingTransactionManager.commit()Finally, we will entercom.atomikos.icatch.imp.CompositeTransactionImp.commit()method. The flow chart is as follows:
Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

code:

public void commit () throws HeurRollbackException, HeurMixedException,
            HeurHazardException, SysException, SecurityException,
            RollbackException
    {
       //First, update the status of the transaction log
        doCommit ();
        setSiblingInfoForIncoming1pcRequestFromRemoteClient();

        if ( isRoot () ) {
         //Real commit operation
          coordinator.terminate ( true );
        }
    }
  • Our concern coordinator.terminate ( true );
protected void terminate ( boolean commit ) throws HeurRollbackException,
            HeurMixedException, SysException, java.lang.SecurityException,
            HeurCommitException, HeurHazardException, RollbackException,
            IllegalStateException

    {
        synchronized ( fsm_ ) {
            if ( commit ) {
                     //Judge how many participants there are. If there is only one, submit directly
                if ( participants_.size () <= 1 ) {
                    commit ( true );
                } else {
                                //Otherwise, follow the XA 2 phase submission process, prepare first and then submit
                    int prepareResult = prepare ();
                    // make sure to only do commit if NOT read only
                    if ( prepareResult != Participant.READ_ONLY )
                        commit ( false );
                }
            } else {
                rollback ();
            }
        }
    }
  • First, we will judge the number of participants. Here we can understand it as the number of MySQL databases. If there is only one, it will degenerate into one stage and submit directly.
    If there are multiple, follow the standard XA two-phase submission process.
  • Let’s seeprepare ();The process will finally come tocom.atomikos.icatch.imp.PrepareMessage.send() —> com.atomikos.datasource.xa.XAResourceTransaction.prepare()
int ret = 0;
        terminateInResource();

        if (TxState.ACTIVE == this.state) {
            // tolerate non-delisting apps/servers
            suspend();
        }

        // duplicate prepares can happen for siblings in serial subtxs!!!
        // in that case, the second prepare just returns READONLY
        if (this.state == TxState.IN_DOUBT)
            return Participant.READ_ONLY;
        else if (!(this.state == TxState.LOCALLY_DONE))
            throw new SysException("Wrong state for prepare: " + this.state);
        try {
            // refresh xaresource for MQSeries: seems to close XAResource after
            // suspend???
            testOrRefreshXAResourceFor2PC();
            if (LOGGER.isTraceEnabled()) {
                LOGGER.logTrace("About to call prepare on XAResource instance: "
                        + this.xaresource);
            }
            ret = this.xaresource.prepare(this.xid);

        } catch (XAException xaerr) {
            String msg = interpretErrorCode(this.resourcename, "prepare",
                    this.xid, xaerr.errorCode);
            if (XAException.XA_RBBASE <= xaerr.errorCode
                    && xaerr.errorCode <= XAException.XA_RBEND) {
                LOGGER.logWarning(msg, xaerr); // see case 84253
                throw new RollbackException(msg);
            } else {
                LOGGER.logError(msg, xaerr);
                throw new SysException(msg, xaerr);
            }
        }
        setState(TxState.IN_DOUBT);
        if (ret == XAResource.XA_RDONLY) {
            if (LOGGER.isDebugEnabled()) {
                LOGGER.logDebug("XAResource.prepare ( " + this.xidToHexString
                        + " ) returning XAResource.XA_RDONLY " + "on resource "
                        + this.resourcename
                        + " represented by XAResource instance "
                        + this.xaresource);
            }
            return Participant.READ_ONLY;
        } else {
            if (LOGGER.isDebugEnabled()) {
                LOGGER.logDebug("XAResource.prepare ( " + this.xidToHexString
                        + " ) returning OK " + "on resource "
                        + this.resourcename
                        + " represented by XAResource instance "
                        + this.xaresource);
            }
            return Participant.READ_ONLY + 1;
        }
  • Finally, we saw such a sentenceret = this.xaresource.prepare(this.xid);But wait, we didn’t say before,XA start xid After that, firstXA end xidAre you? The answer issuspend();Inside.
public synchronized void suspend() throws ResourceException {

        // BugzID: 20545
        // State may be IN_DOUBT or TERMINATED when a connection is closed AFTER
        // commit!
        // In that case, don't call END again, and also don't generate any
        // error!
        // This is required for some hibernate connection release strategies.
        if (this.state.equals(TxState.ACTIVE)) {
            try {
                if (LOGGER.isDebugEnabled()) {
                    LOGGER.logDebug("XAResource.end ( " + this.xidToHexString
                            + " , XAResource.TMSUCCESS ) on resource "
                            + this.resourcename
                            + " represented by XAResource instance "
                            + this.xaresource);
                }
                 //The XA end statement was executed
                this.xaresource.end(this.xid, XAResource.TMSUCCESS);

            } catch (XAException xaerr) {
                String msg = interpretErrorCode(this.resourcename, "end",
                        this.xid, xaerr.errorCode);
                if (LOGGER.isTraceEnabled())
                    LOGGER.logTrace(msg, xaerr);
                // don't throw: fix for case 102827
            }
            setState(TxState.LOCALLY_DONE);
        }
    }

Here, we have executed XA start XID – > XA end XID — > XA prepare XID, and the next step is the final commit

  • Let’s go back toterminate(false)Method to see the commit () process. In fact, like the prepare process, it will finally go tocom.atomikos.datasource.xa.XAResourceTransaction.commit()。 After the commit is executed, the data is submitted
//If there is too much complex code, the core code will be displayed
this.xaresource.commit(this.xid, onePhase);

Think: the participants here submit one by one in a cycle. If the previous participants submit and the subsequent participants hang up when submitting, it will cause data inconsistency.

Atomikos rollback() process

Ten thousand word long article detailing shardingsphere's support for XA distributed transactions
We have analyzed the commit process above. In fact, the rollback process is the same as the commit process. We are switching back to org.apache.shardingsphere.transaction.xa.XAShardingTransactionManager.rollback()Finally, it will be executed tocom.atomikos.icatch.imp.CompositeTransactionImp.rollback()

public void rollback () throws IllegalStateException, SysException
    {
        //Empty resources, update transaction log status, etc
        doRollback ();
        if ( isRoot () ) {
            try {
                coordinator.terminate ( false );
            } catch ( Exception e ) {
                throw new SysException ( "Unexpected error in rollback: " + e.getMessage (), e );
            }
        }
    }
  • Focus oncoordinator.terminate ( false );, this is the same as the commit process, except that in the commit process, the parameter is passed as true.
protected void terminate ( boolean commit ) throws HeurRollbackException,
            HeurMixedException, SysException, java.lang.SecurityException,
            HeurCommitException, HeurHazardException, RollbackException,
            IllegalStateException

    {
        synchronized ( fsm_ ) {
            if ( commit ) {
                if ( participants_.size () <= 1 ) {
                    commit ( true );
                } else {
                    int prepareResult = prepare ();
                    // make sure to only do commit if NOT read only
                    if ( prepareResult != Participant.READ_ONLY )
                        commit ( false );
                }
            } else {
                 //If it's false, it's rollback
                rollback ();
            }
        }
    }
  • We focus onrollback()Finally, I will walk tocom.atomikos.datasource.xa.XAResourceTransaction.rollback()
public synchronized void rollback()
            throws HeurCommitException, HeurMixedException,
            HeurHazardException, SysException {
        terminateInResource();

        if (rollbackShouldDoNothing()) {
            return;
        }
        if (this.state.equals(TxState.TERMINATED)) {
            return;
        }

        if (this.state.equals(TxState.HEUR_MIXED))
            throw new HeurMixedException();
        if (this.state.equals(TxState.HEUR_COMMITTED))
            throw new HeurCommitException();
        if (this.xaresource == null) {
            throw new HeurHazardException("XAResourceTransaction "
                    + getXid() + ": no XAResource to rollback?");
        }

        try {
            if (this.state.equals(TxState.ACTIVE)) { // first suspend xid
                suspend();
            }

            // refresh xaresource for MQSeries: seems to close XAResource after
            // suspend???
            testOrRefreshXAResourceFor2PC();
            if (LOGGER.isDebugEnabled()) {
                LOGGER.logDebug("XAResource.rollback ( " + this.xidToHexString
                        + " ) " + "on resource " + this.resourcename
                        + " represented by XAResource instance "
                        + this.xaresource);
            }
            this.xaresource.rollback(this.xid);

First insupend()MethodXA end xidStatement, execute nextthis.xaresource.rollback(this.xid);Rollback of data.

Atomikos recover process

Before we talk about the transaction recovery process, let’s discuss what happens to transaction recovery? Isn’t the XA two-phase commit protocol strongly consistent? To answer this question, let’s take a look at the problems of XA phase II protocol?

Problem 1: single point of failure

Due to the importance of the coordinator, once the coordinator TM fails. Participant RM will be blocked all the time. Especially in the second stage, if the coordinator fails, all participants are still in the state of locking transaction resources and cannot continue to complete transaction operations. (if the coordinator hangs up, a coordinator can be re elected, but the problem that the participants are blocked due to the coordinator’s downtime cannot be solved)

Problem 2: inconsistent data

Inconsistent data. In phase 2 of phase 2 submission, after the coordinator sends a commit request to the participants, a local network exception occurs or the coordinator fails during the sending of the commit request, which leads to only some participants receiving the commit request. In this part, the participants will execute the commit operation after receiving the commit request. However, other machines that do not receive a commit request cannot perform transaction commit. Therefore, the phenomenon of data inconsistency appears in the whole distributed system.

How to solve it?

The solution is simple. At every step of transaction operation, we need to record the transaction status log artificially. We can store the log records where we want to store them, either locally or centrally. We also analyzed the open source version of atomikos before. It uses memory + file and is stored locally. In this case, if a node goes down in a cluster system, the log is stored locally, so the transaction cannot be recovered in time (the service needs to be restarted).

Transaction recovery in atomikos multiple scenarios.

Atomikos provides two ways to deal with exceptions in different scenarios.

  • Scenario 1: the service node is not down, and transaction recovery is required for other reasons. This is the time to resume the scheduled task.

Specific codecom.atomikos.icatch.imp.TransactionServiceImp.init()Method to initialize a scheduled task for transaction recovery.

public synchronized void init ( Properties properties ) throws SysException
    {
        shutdownInProgress_ = false;
        control_ = new com.atomikos.icatch.admin.imp.LogControlImp ( (AdminLog) this.recoveryLog );
        ConfigProperties configProperties = new ConfigProperties(properties);
        long recoveryDelay = configProperties.getRecoveryDelay();
        recoveryTimer = new PooledAlarmTimer(recoveryDelay);
        recoveryTimer.addAlarmTimerListener(new AlarmTimerListener() {
            @Override
            public void alarm(AlarmTimer timer) {
                //Transaction recovery
                performRecovery();

            }
        });

        TaskManager.SINGLETON.executeTask(recoveryTimer);
        initialized_ = true;
    }
  • Will eventually entercom.atomikos.datasource.xa.XATransactionalResource.recover()method.
   public void recover() {
        XaResourceRecoveryManager xaResourceRecoveryManager = XaResourceRecoveryManager.getInstance();
        if (xaResourceRecoveryManager != null) { //null for LogCloud recovery
            try {
                xaResourceRecoveryManager.recover(getXAResource());
            } catch (Exception e) {
                refreshXAResource(); //cf case 156968
            }

        }
    }
  • Scenario 2: when the service node is down and restarted, the transaction is restored. The specific implementation is incom.atomikos.datasource.xa.XATransactionalResource.setRecoveryService()Method inside
@Override
    public void setRecoveryService ( RecoveryService recoveryService )
            throws ResourceException
    {

        if ( recoveryService != null ) {
            if ( LOGGER.isTraceEnabled() ) LOGGER.logTrace ( "Installing recovery service on resource "
                    + getName () );
            this.branchIdentifier=recoveryService.getName();
         //Transaction recovery
            recover();
        }

    }

Com. Atomikos. Datasource. Xa. Xatransactionalresource. Recover() process details.

Ten thousand word long article detailing shardingsphere's support for XA distributed transactions

Main code:

public void recover(XAResource xaResource) throws XAException {
      //Get XID according to XA recovery protocol
        List<XID> xidsToRecover = retrievePreparedXidsFromXaResource(xaResource);
        Collection<XID> xidsToCommit;
        try {
            //XID matches the XID of the log record
            xidsToCommit = retrieveExpiredCommittingXidsFromLog();
            for (XID xid : xidsToRecover) {
                if (xidsToCommit.contains(xid)) {
            //Execute XA commit XID to commit
                    replayCommit(xid, xaResource);
                } else {
                    attemptPresumedAbort(xid, xaResource);
                }
            }
        } catch (LogException couldNotRetrieveCommittingXids) {
            LOGGER.logWarning("Transient error while recovering - will retry later...", couldNotRetrieveCommittingXids);
        }
    }
  • Let’s see how toThe XA recovery protocol obtains the XID stored on the RM side。 Entry methodretrievePreparedXidsFromXaResource(xaResource), finally enter com.atomikos.datasource.xa.RecoveryScan.recoverXids()method.
public static List<XID> recoverXids(XAResource xaResource, XidSelector selector) throws XAException {
        List<XID> ret = new ArrayList<XID>();

        boolean done = false;
        int flags = XAResource.TMSTARTRSCAN;
        Xid[] xidsFromLastScan = null;
        List<XID> allRecoveredXidsSoFar = new ArrayList<XID>();
        do {
            xidsFromLastScan = xaResource.recover(flags);
            flags = XAResource.TMNOFLAGS;
            done = (xidsFromLastScan == null || xidsFromLastScan.length == 0);
            if (!done) {
                // TEMPTATIVELY SET done TO TRUE
                // TO TOLERATE ORACLE 8.1.7 INFINITE
                // LOOP (ALWAYS RETURNS SAME RECOVER
                // SET). IF A NEW SET OF XIDS IS RETURNED
                // THEN done WILL BE RESET TO FALSE
                done = true;
                for ( int i = 0; i < xidsFromLastScan.length; i++ ) {
                    XID xid = new XID ( xidsFromLastScan[i] );
                    // our own XID implements equals and hashCode properly
                    if (!allRecoveredXidsSoFar.contains(xid)) {
                        // a new xid is returned -> we can not be in a recovery loop -> go on
                        allRecoveredXidsSoFar.add(xid);
                        done = false;
                        if (selector.selects(xid)) {
                            ret.add(xid);
                        }
                    }
                }
            }
        } while (!done);

        return ret;
    }
  • We focus onxidsFromLastScan = xaResource.recover(flags);This method, if we use mysql, will enter the mysqlxaconnection. Recover () method for a long time. implementXA recovery xidStatement to get XID.
 protected static Xid[] recover(Connection c, int flag) throws XAException {
        /*
         * The XA RECOVER statement returns information for those XA transactions on the MySQL server that are in the PREPARED state. (See Section 13.4.7.2, ???XA
         * Transaction States???.) The output includes a row for each such XA transaction on the server, regardless of which client started it.
         *
         * XA RECOVER output rows look like this (for an example xid value consisting of the parts 'abc', 'def', and 7):
         *
         * mysql> XA RECOVER;
         * +----------+--------------+--------------+--------+
         * | formatID | gtrid_length | bqual_length | data |
         * +----------+--------------+--------------+--------+
         * | 7 | 3 | 3 | abcdef |
         * +----------+--------------+--------------+--------+
         *
         * The output columns have the following meanings:
         *
         * formatID is the formatID part of the transaction xid
         * gtrid_length is the length in bytes of the gtrid part of the xid
         * bqual_length is the length in bytes of the bqual part of the xid
         * data is the concatenation of the gtrid and bqual parts of the xid
         */

        boolean startRscan = ((flag & TMSTARTRSCAN) > 0);
        boolean endRscan = ((flag & TMENDRSCAN) > 0);

        if (!startRscan && !endRscan && flag != TMNOFLAGS) {
            throw new MysqlXAException(XAException.XAER_INVAL, Messages.getString("MysqlXAConnection.001"), null);
        }

        //
        // We return all recovered XIDs at once, so if not  TMSTARTRSCAN, return no new XIDs
        //
        // We don't attempt to maintain state to check for TMNOFLAGS "outside" of a scan
        //

        if (!startRscan) {
            return new Xid[0];
        }

        ResultSet rs = null;
        Statement stmt = null;

        List<MysqlXid> recoveredXidList = new ArrayList<MysqlXid>();

        try {
            // TODO: Cache this for lifetime of XAConnection
            stmt = c.createStatement();

            rs = stmt.executeQuery("XA RECOVER");

            while (rs.next()) {
                final int formatId = rs.getInt(1);
                int gtridLength = rs.getInt(2);
                int bqualLength = rs.getInt(3);
                byte[] gtridAndBqual = rs.getBytes(4);

                final byte[] gtrid = new byte[gtridLength];
                final byte[] bqual = new byte[bqualLength];

                if (gtridAndBqual.length != (gtridLength + bqualLength)) {
                    throw new MysqlXAException(XAException.XA_RBPROTO, Messages.getString("MysqlXAConnection.002"), null);
                }

                System.arraycopy(gtridAndBqual, 0, gtrid, 0, gtridLength);
                System.arraycopy(gtridAndBqual, gtridLength, bqual, 0, bqualLength);

                recoveredXidList.add(new MysqlXid(gtrid, bqual, formatId));
            }
        } catch (SQLException sqlEx) {
            throw mapXAExceptionFromSQLException(sqlEx);
        } finally {
            if (rs != null) {
                try {
                    rs.close();
                } catch (SQLException sqlEx) {
                    throw mapXAExceptionFromSQLException(sqlEx);
                }
            }

            if (stmt != null) {
                try {
                    stmt.close();
                } catch (SQLException sqlEx) {
                    throw mapXAExceptionFromSQLException(sqlEx);
                }
            }
        }

        int numXids = recoveredXidList.size();

        Xid[] asXids = new Xid[numXids];
        Object[] asObjects = recoveredXidList.toArray();

        for (int i = 0; i < numXids; i++) {
            asXids[i] = (Xid) asObjects[i];
        }

        return asXids;
    }
  • Note here that if MySQL version is < 5.7.7, there will be no data. MySQL has been repaired in later versions. Therefore, if we want to use MySQL as RM, the version must be > = 5.7.7, because:

MySQL 5.6 automatically rolls back prepared transactions when the client exits. Why does MySQL do this? This mainly depends on the internal implementation of MySQL. In versions before MySQL 5.7, MySQL does not record binlog for prepared transactions (officially, it reduces fsync, which plays an optimization role). The previous operations are written to binlog information only when the distributed transaction is committed. Therefore, for binlog, the distributed transaction is no different from ordinary transactions, and the previous operation information of prepare is saved in the connected io_ In cache, if the client exits at this time, the previous binlog information will be lost. If submission is allowed after reconnection, binlog will be lost, resulting in inconsistent master-slave data. Therefore, the official will directly roll back the prepared tasks when the client exits!

  • Go back to the main line and get XID from the transaction log recorded by yourself
  Collection<XID> xidsToCommit = retrieveExpiredCommittingXidsFromLog();
  • Let’s take a look at how to get the XID in the transaction logretrieveExpiredCommittingXidsFromLog()method. Then entercom.atomikos.recovery.imp.RecoveryLogImp.getCommittingParticipants()method.
public Collection<ParticipantLogEntry> getCommittingParticipants()
            throws LogReadException {
        Collection<ParticipantLogEntry> committingParticipants = new HashSet<ParticipantLogEntry>();
        Collection<CoordinatorLogEntry> committingCoordinatorLogEntries = repository.findAllCommittingCoordinatorLogEntries();

        for (CoordinatorLogEntry coordinatorLogEntry : committingCoordinatorLogEntries) {
            for (ParticipantLogEntry participantLogEntry : coordinatorLogEntry.participants) {
                committingParticipants.add(participantLogEntry);
            }
        }
        return committingParticipants;
    }

Here, let’s briefly introduce the storage structure of transaction log. FirstCoordinatorLogEntry, this is all the information entity classes of an XA transaction.

public class CoordinatorLogEntry implements Serializable {

  //Global transaction ID
     public final String id;

   //Has it been submitted
    public final boolean wasCommitted;

    /**
     * Only for subtransactions, null otherwise.
     */
    public final String superiorCoordinatorId;

   //Participant collection
    public final ParticipantLogEntry[] participants;
}
  • Let’s look at the participant entity classParticipantLogEntry :
public class ParticipantLogEntry implements Serializable {

    private static final long serialVersionUID = 1728296701394899871L;

    /**
     * The ID of the global transaction as known by the transaction core.
     */

    public final String coordinatorId;

    /**
     * Identifies the participant within the global transaction.
     */

    public final String uri;

    /**
     * When does this participant expire (expressed in millis since Jan 1, 1970)?
     */

    public final long expires;

    /**
     * Best-known state of the participant.
     */
    public final TxState state;

    /**
     * For diagnostic purposes, null if not relevant.
     */
    public final String resourceName;
}
  • go back tocom.atomikos.recovery.xa.DefaultXaRecoveryLog.getExpiredCommittingXids()Method, you can get the XID in the transaction log stored during an XA transaction.
public Set<XID> getExpiredCommittingXids() throws LogReadException {
        Set<XID> ret = new HashSet<XID>();
        Collection<ParticipantLogEntry> entries = log.getCommittingParticipants();
        for (ParticipantLogEntry entry : entries) {
            if (expired(entry) && !http(entry)) {
                XID xid = new XID(entry.coordinatorId, entry.uri);
                ret.add(xid);
            }
        }
        return ret;
    }
  • If the XID retrieved from RM through XA recovery is included in the XID retrieved from the transaction log, commit; otherwise, rollback
List<XID> xidsToRecover = retrievePreparedXidsFromXaResource(xaResource);
        Collection<XID> xidsToCommit;
        try {
            xidsToCommit = retrieveExpiredCommittingXidsFromLog();
            for (XID xid : xidsToRecover) {
                if (xidsToCommit.contains(xid)) {
                    replayCommit(xid, xaResource);
                } else {
                    attemptPresumedAbort(xid, xaResource);
                }
            }
        } catch (LogException couldNotRetrieveCommittingXids) {
            LOGGER.logWarning("Transient error while recovering - will retry later...", couldNotRetrieveCommittingXids);
        }
  • The replaycommit method is as follows:
private void replayCommit(XID xid, XAResource xaResource) {
        if (LOGGER.isDebugEnabled()) LOGGER.logDebug("Replaying commit of xid: " + xid);
        try {
      //Commit transaction
            xaResource.commit(xid, false);
     //Update transaction log
            log.terminated(xid);
        } catch (XAException e) {
            if (alreadyHeuristicallyTerminatedByResource(e)) {
                handleHeuristicTerminationByResource(xid, xaResource, e, true);
            } else if (xidTerminatedInResourceByConcurrentCommit(e)) {
                log.terminated(xid);
            } else {
                LOGGER.logWarning("Transient error while replaying commit - will retry later...", e);
            }
        }
    }
  • attemptPresumedAbort(xid, xaResource); The method is as follows:
private void attemptPresumedAbort(XID xid, XAResource xaResource) {
        try {
            log.presumedAborting(xid);
            if (LOGGER.isDebugEnabled()) LOGGER.logDebug("Presumed abort of xid: " + xid);
            try {
         //Roll back
                xaResource.rollback(xid);
        //Update log status
                log.terminated(xid);
            } catch (XAException e) {
                if (alreadyHeuristicallyTerminatedByResource(e)) {
                    handleHeuristicTerminationByResource(xid, xaResource, e, false);
                } else if (xidTerminatedInResourceByConcurrentRollback(e)) {
                    log.terminated(xid);
                } else {
                    LOGGER.logWarning("Unexpected exception during recovery - ignoring to retry later...", e);
                }
            }
        } catch (IllegalStateException presumedAbortNotAllowedInCurrentLogState) {
            // ignore to retry later if necessary
        } catch (LogException logWriteException) {
            LOGGER.logWarning("log write failed for Xid: "+xid+", ignoring to retry later", logWriteException);
        }
    }

summary

This article has been written for a long time. We analyzed shardingsphere’s XA solution, provided a set of SPI solutions, integrated atomikos, and also analyzed atomikos initialization process, start transaction process, obtain connection process, submit transaction process, rollback transaction process and transaction recovery process. I hope it will help you understand the principle of Xa.

Join us

Apache shardingsphere has been practicing the open source approach of Apache way. The community is completely open and equal, and everyone enjoys the happiness brought by open source.

Address:https://github.com/apache/sha…

Author introduction: Xiao Yu, Apache shardingsphere Committee, author of open source Hmily distributed transaction framework,
Author of open source soul gateway, loves open source and pursues to write elegant code. At present, he works in Jingdong digital science department and participates in the open source construction of shardingsphere and the research and development of distributed database.