Graphical janusgraph series – Concurrent security: analysis of lock mechanism (local lock + distributed lock)

Time:2021-1-21

Hello, everyone. I amYang Zai, janusgraph graphic series,Real time update~

Table of contents of articles in figure database:

Source code analysis related to view GitHub (code text is not easy, find a star ~)https://github.com/YYDreamer/janusgraph

Address of the following flow chart:https://www.processon.com/view/link/5f471b2e7d9c086b9903b629

Version: janusgraph-0.5.2

Please keep the following statement:

Author: Yang Zai chat programming
WeChat official account: originality Java
Original address:https://segmentfault.com/u/yoylee


In the distributed system, it is inevitable to involve the concurrent operation of the same data. How to ensure the concurrent security of data in the distributed system?Distributed lock!

1: Distributed lock

Common implementation methods of distributed lock are as follows:

1. Implementation of distributed lock based on Database

For distributed lock of database, such as mysqlfor updateIn janusgraph, it is also a distributed lock based on databasedatabaseIt refers to the third party we currently usebackend storageThe specific implementation method is also different from mysql. We will analyze the details below

2. Distributed lock based on redis

Based onLua script+setNxrealization

3. Distributed lock based on ZK

Based onznodeThe orderliness and stability ofTemporary node+ZK’swatcherMechanism realization

4. Mvcc multi version concurrency control optimistic lock implementation

This paper mainly introduces the lock mechanism of janusgraph, and other implementation mechanisms will not be explained in detail here

Now let’s analyze itJanusGraphOfLock mechanismImplementation~

2: Janusgraph lock mechanism

The locking mechanism used in janusgraph is as follows:Local lock + Distributed lockTo achieve the goal;

2.1 consistent behavior

stayJanusGraphThere are three main types ofConsistency modifierTo represent three differentConsistent behaviorTo control the degree of concurrency in the process of using graph library;

public enum ConsistencyModifier {
    DEFAULT,
    LOCK,
    FORK
}

In the source codeConsistencyModifierEnumeration class is mainly used to control janusgraph inUltimately consistent or other non transactional back end systemsConsistent behavior on the Internet! The effects are as follows

  • DEFAULT: the default consistency behavior, which does not use distributed lock to control, uses the default consistency model guaranteed by closed transaction for the configured storage backend. The consistency behavior mainly depends on the configuration of storage backend and (optional) configuration of closed transaction; it can be used without displaying configuration
  • LOCK: on the premise that the storage back-end supports locks, the display can obtain distributed locks to ensure consistency! The exact guarantee of consistency depends on the lock implementation that is configured; themanagement.setConsistency(element, ConsistencyModifier.LOCK);Statement
  • FORK: formulti-edgesandlist-propertiesIt can be used in two cases: when janusgraph modifies data, it uses the method of deleting first and then adding new edge / attribute, instead of covering the existing edge / attribute, so as to avoid potential concurrent write conflicts; it needs tomanagement.setConsistency(element, ConsistencyModifier.FORK);Configure

LOCK

When querying or inserting data, whether to useDistributed lockControl the concurrency, as shown in FigshcemaIn the process of creating, you can configure theSchema elementbyConsistencyModifier.LOCKMode control concurrency, it will be used in the use processDistributed lockControl the concurrency;

To improve efficiency, janusgraph does not use locking by default. Therefore, the user must define theConsistency constraintsEach schema element of determines whether locking is used.

useJanusGraphManagement.setConsistency(element,ConsistencyModifier.LOCK)Explicitly enable locking of schema elements

The code is as follows:

mgmt = graph.openManagement() 
name = mgmt.makePropertyKey('consistentName').dataType(String.class).make() 
index = mgmt.buildIndex('byConsistentName', Vertex.class).addKey(name).unique().buildCompositeIndex() 
mgmt.setConsistency(name, ConsistencyModifier.LOCK) // Ensures only one name per vertex 
mgmt.setConsistency(index, ConsistencyModifier.LOCK) // Ensures name uniqueness in the graph 
mgmt.commit()

FORK

Since the edge is stored as a single record in the back end of the underlying storage, modifying a single edge at the same time will cause conflicts.

FORKJust to replace itLOCKYou can configure the edge label to use theConsistencyModifier.FORK

The following example creates a new edge label and sets it toConsistencyModifier.FORK

mgmt = graph.openManagement() 
related = mgmt.makeEdgeLabel('related').make() 
mgmt.setConsistency(related, ConsistencyModifier.FORK) 
mgmt.commit()

After the above configuration, modify the label configuration toFORKThe operation steps are as follows:

  1. First, delete the edge
  2. Add the modified edge as a new edge

Therefore, if two concurrent transactions modify the same edge, there will be two modified copies of the edge at commit time, which can be resolved as needed during query traversal.

Note that edge fork is only applicable to multi edge. Edge tags with multiple constraints cannot use this policy because there is a unique constraint built into the definition of non multi edge tags, which requires explicit locking or using the conflict resolution mechanism of the underlying storage backend

Now let’s take a specific lookjanusgrphOfLock mechanismThe realization of this method is as follows

2.2 LoackID

Before introducing the lock mechanism, let’s see what the lock should lock?

We all know where it isjanusgraphIn the underlying storage of, vertexid is used as a rowkey, and attributes and edges are stored in the cell, which is composed of column + value

When we modifyAttributes and edges of nodes+Edge propertiesObviously, just lock the correspondingRowkey + ColumnThat’s it;

stayJanusgraphThe basic part of the identification of this lock isLockID

LockID = RowKey + Column

The source code is as follows:

KeyColumn lockID = new KeyColumn(key, column);

2.3 local lock

Local lockIt is a lock that needs to be acquired in any case. Only after successful acquisition can the following operations be performedDistributed lockAccess to!

Local lockIt’s based onFigure exampleDimension exists; its main function is to ensure that there is no conflict in the operation of the current graph instance!

Local locks are implemented throughConcurrentHashMapData structure, which is unique in the graph instance dimension;

Based on currentaffair+lockIdTo act asLock identification

The main process of acquisition is as follows:

Graphical janusgraph series - Concurrent security: analysis of lock mechanism (local lock + distributed lock)

The source code is as follows:

It is suggested that the above figure be analyzed according to the source code. The source code is in theLocalLockMediatorClass, belowSource code analysis moduleWe will analyze it in detail

    public boolean lock(KeyColumn kc, T requester, Instant expires) {
    }

The main purpose of this paper is to introduce local locking mechanismMake a layer of lock judgment in the graph instance dimension,Reducing concurrent conflicts of distributed locksTo reduce the performance consumption of distributed lock

2.4 distributed lock

stayLocal lockOnly after success can we try to get itDistributed lock

The whole process of obtaining distributed lock is divided into two parts

  1. Distributed lock information insertion
  2. State judgment of distributed lock information

Distributed lock information insertion

This part is mainly through thelockIDTo construct theRowkey and columnAnd insert the data into thehbaseSuccessful insertion means that this part is processed successfully!

The specific process is as follows:

Graphical janusgraph series - Concurrent security: analysis of lock mechanism (local lock + distributed lock)

State judgment of distributed lock information

This part will be completed after the previous part, mainly to judge whether the distributed lock is successful!

Find out the corresponding data in the current HBaseAll columns of rowkey, filter the unexpired column collection, and compare whether the first column of the collection is equal to the column inserted by the current transaction;

Equal to success! If it is not equal to, get failed!

The specific process is as follows:

Graphical janusgraph series - Concurrent security: analysis of lock mechanism (local lock + distributed lock)

3: Source code analysis and overall process

Source code analysis has been pushed to GitHubhttps://github.com/YYDreamer/…

1. Access to lock

public void acquireLock(StaticBuffer key, StaticBuffer column, StaticBuffer expectedValue, StoreTransaction txh) throws BackendException {
        //A locker is a consistent key lock object
        if (locker != null) {
            //Gets the current transaction object
            ExpectedValueCheckingTransaction tx = (ExpectedValueCheckingTransaction) txh;
            //Judge: whether the current acquire lock operation has the operation of adding, deleting and modifying in the current transaction operation
            if (tx.isMutationStarted())
                throw new PermanentLockingException("Attempted to obtain a lock after mutations had been persisted");
            //Use key + column assembly as lockid, for the following locking use!!!!!
            KeyColumn lockID = new KeyColumn(key, column);
            log.debug("Attempting to acquireLock on {} ev={}", lockID, expectedValue);
            //Obtain the write lock in the current local JVM process (see the following 1: analysis of write lock acquisition)
            //(the acquisition lock here only stores the corresponding klv in HBase! Successful storage does not mean successful lock acquisition)
            //1. If the acquisition is successful (equivalent to the storage is successful), the execution will continue
            //2. If the acquisition fails (equivalent to the storage failure), an exception will be thrown to the top layer. The error log "could not commit transaction [" + transactionid + "] due to exception" will be printed and the corresponding exception will be thrown. The end of this data insertion
            locker.writeLock(lockID, tx.getConsistentTx());
            //Premise of execution: the above lock acquisition is successful!
            //Store the expected value. Here, in order to realize that when the same key + value + TX has multiple locks, only the first one is processed
            //It is stored in the transaction object and identifies which lock information is inserted in the current transaction when commit judges whether the lock is obtained successfully
            tx.storeExpectedValue(this, lockID, expectedValue);
        } else {
            //When the locker is empty, a runtime exception is thrown directly to terminate the program
            store.acquireLock(key, column, expectedValue, unwrapTx(txh));
        }
    }

2. Implementation locker.writeLock (lockID, tx.getConsistentTx ()) trigger lock acquisition

public void writeLock(KeyColumn lockID, StoreTransaction tx) throws TemporaryLockingException, PermanentLockingException {

        if (null != tx.getConfiguration().getGroupName()) {
            MetricManager.INSTANCE.getCounter(tx.getConfiguration().getGroupName(), M_LOCKS, M_WRITE, M_CALLS).inc();
        }

        //Determine whether the current transaction has occupied the lockid lock in the dimension of the graph instance
        //The lockstate here takes the transaction as the key and the value as the map, where the key is the lockid and the value is the locked state (start time, expiration time, etc.) after a transaction successfully obtains the local lock + distributed lock
        if (lockState.has(tx, lockID)) {
            log.debug("Transaction {} already wrote lock on {}", tx, lockID);
            return;
        }

        //The current transaction does not occupy the lock corresponding to the lockid
        //Perform (locklocally (lockid, TX) local locking operation,
        if (lockLocally(lockID, tx)) {
            boolean ok = false;
            try {
                //On the premise of successful local lock acquisition:
                //Try to get the distributed lock based on HBase;
                //Attention!!! (the acquisition lock here only stores the corresponding klv in HBase! Successful storage does not mean successful lock acquisition)
                S stat = writeSingleLock(lockID, tx);
                //After obtaining the distributed lock successfully (that is, after writing successfully), update the expiration time of the local lock to the expiration time of the distributed lock
                lockLocally(lockID, stat.getExpirationTimestamp(), tx); // update local lock expiration time
                //Store the locks obtained above in the set identifying the current locks. Map < TX, map < lockid, s > >, key is the transaction, map in value is the lock obtained by the current transaction, key is the lockid, and value is the consistentkeystatus object of the current obtained distributed lock
                lockState.take(tx, lockID, stat);
                ok = true;
            } catch (TemporaryBackendException tse) {
                //After the failure of getting the distributed lock, the exception is caught and thrown
                throw new TemporaryLockingException(tse);
            } catch (AssertionError ae) {
                // Concession to ease testing with mocks & behavior verification
                ok = true;
                throw ae;
            } catch (Throwable t) {
                //Underlying storage error occurred! Direct locking failed!
                throw new PermanentLockingException(t);
            } finally {
                //Judge whether the lock is acquired successfully. If the distributed lock is not acquired, release the local lock
                if (!ok) {
                    //If the lock is not acquired successfully, the local lock is released
                    // lockState.release(tx, lockID); // has no effect
                    unlockLocally(lockID, tx);
                    if (null != tx.getConfiguration().getGroupName()) {
                        MetricManager.INSTANCE.getCounter(tx.getConfiguration().getGroupName(), M_LOCKS, M_WRITE, M_EXCEPTIONS).inc();
                    }
                }
            }
        } else {
            //If the local lock acquisition fails, an exception will be thrown directly without re local contention

            // Fail immediately with no retries on local contention
            throw new PermanentLockingException("Local lock contention");
        }
    }

It consists of two parts

  1. Acquisition of local locklockLocally(lockID, tx)
  2. Acquisition of distributed lockwriteSingleLock(lockID, tx)Note that the lock information is only written into HBase, which does not mean that the distributed lock is obtained successfully. It is just the first stage of the above introductionDistributed lock information insertion

3. Local lock acquisitionlockLocally(lockID, tx)

public boolean lock(KeyColumn kc, T requester, Instant expires) {
        assert null != kc;
        assert null != requester;

        final StackTraceElement[] acquiredAt = log.isTraceEnabled() ?
                new Throwable("Lock acquisition by " + requester).getStackTrace() : null;

        //The value of map takes transaction as the core
        final AuditRecord<T> audit = new AuditRecord<>(requester, expires, acquiredAt);
        //Concurrent HashMap implements locks with lockid as the key and transaction as the core
        final AuditRecord<T> inMap = locks.putIfAbsent(kc, audit);

        boolean success = false;

        //It means that there is no lockid in the current map, indicating that the lock has not been occupied and the lock has been acquired successfully
        if (null == inMap) {
            // Uncontended lock succeeded
            if (log.isTraceEnabled()) {
                log.trace("New local lock created: {} namespace={} txn={}",
                    kc, name, requester);
            }
            success = true;
        } else if (inMap.equals(audit)) {
            //Represents the current lockid, and compares whether the transaction objects in the old value and the new value are the same
            // requester has already locked kc; update expiresAt
            //After the above judgment, the transaction object is the same, indicating that the current transaction has obtained the lock of this lockid;
            //1. In this step, CAS is replaced to refresh the expiration time
            //2. Concurrent processing. If the lock is occupied by other transactions due to the expiration of the lock, the lock occupation fails
            success = locks.replace(kc, inMap, audit);
            if (log.isTraceEnabled()) {
                if (success) {
                    log.trace("Updated local lock expiration: {} namespace={} txn={} oldexp={} newexp={}",
                        kc, name, requester, inMap.expires, audit.expires);
                } else {
                    log.trace("Failed to update local lock expiration: {} namespace={} txn={} oldexp={} newexp={}",
                        kc, name, requester, inMap.expires, audit.expires);
                }
            }
        } else if (0 > inMap.expires.compareTo(times.getTime())) {
            //Compare the expiration time. If the lock has expired, the current transaction can occupy the lock

            // the recorded lock has expired; replace it
            //1. Lock occupied by current transaction
            //2. Concurrent processing. If the lock is occupied by other transactions due to the expiration of the lock, the lock occupation fails
            success = locks.replace(kc, inMap, audit);
            if (log.isTraceEnabled()) {
                log.trace("Discarding expired lock: {} namespace={} txn={} expired={}",
                    kc, name, inMap.holder, inMap.expires);
            }
        } else {
            //ID: if the lock is occupied by other transactions and has not expired, the occupation of the lock fails
            // we lost to a valid lock
            if (log.isTraceEnabled()) {
                log.trace("Local lock failed: {} namespace={} txn={} (already owned by {})",
                    kc, name, requester, inMap);
                log.trace("Owner stacktrace:\n        {}", Joiner.on("\n        ").join(inMap.acquiredAt));
            }
        }

        return success;
    }

As described above, the local lock is implemented through theConcurrentHashMapData structure, unique in the graph instance dimension!

4. The first stage of distributed lock acquisition: distributed lock information insertion

protected ConsistentKeyLockStatus writeSingleLock(KeyColumn lockID, StoreTransaction txh) throws Throwable {

        //Assemble rowkey to insert HBase data
        final StaticBuffer lockKey = serializer.toLockKey(lockID.getKey(), lockID.getColumn());
        StaticBuffer oldLockCol = null;

        //The default number of attempts is 3
        for (int i = 0; i < lockRetryCount; i++) {
            //Try to insert data into HBase; oldlockcolumn represents the column to be deleted, which represents the last attempt to insert data
            WriteResult wr = tryWriteLockOnce(lockKey, oldLockCol, txh);
            //If the insertion is successful
            if (wr.isSuccessful() && wr.getDuration().compareTo(lockWait) <= 0) {
                final Instant writeInstant =  wr.getWriteTimestamp (); // write time
                final Instant expireInstant =  writeInstant.plus (lockexpiration); // expiration time
                Return new consistentkeylockstatus (writeinstant, expireinstant); // returns the inserted object
            }
            //Assign the data to be inserted in the current attempt and delete it in the next attempt
            oldLockCol = wr.getLockCol();
            //Judge the cause of insertion failure, try temporary exception, stop trying non temporary exception!
            handleMutationFailure(lockID, lockKey, wr, txh);
        }
        //If the insertion fails after three attempts, delete the last attempt
        tryDeleteLockOnce(lockKey, oldLockCol, txh);
        // TODO log exception or successful too-slow write here
        //Throw an exception to identify the failure of importing data
        throw new TemporaryBackendException("Lock write retry count exceeded");
    }

The above only inserts the lock information, and the successful insertion indicates the end of the process

5. The first stage of distributed lock acquisition: whether the distributed lock is locked successfully or not

This step is in thecommitVerification in phase

public void commit() throws BackendException {
        //This method calls checksinglelock to check the result of distributed lock
        flushInternal();
        tx.commit();
    }

Finally, thecheckSingleLockMethod to determine the state of the lock!

protected void checkSingleLock(final KeyColumn kc, final ConsistentKeyLockStatus ls,
                                   final StoreTransaction tx) throws BackendException, InterruptedException {

        //Check whether it has been checked
        if (ls.isChecked())
            return;

        // Slice the store
        KeySliceQuery ksq = new KeySliceQuery(serializer.toLockKey(kc.getKey(), kc.getColumn()), LOCK_COL_START,
            LOCK_COL_END);
        //Find all the columns of the locked row from HBase! Default query retries 3
        List<Entry> claimEntries = getSliceWithRetries(ksq, tx);

        //Extract the timestamp and rid from the column of each return entry, and then filter out the timestamp object with expiration timestamp
        final Iterable<TimestampRid> iterable = Iterables.transform(claimEntries,
            e -> serializer.fromLockColumn(e.getColumnAs(StaticBuffer.STATIC_FACTORY), times));
        final List<TimestampRid> unexpiredTRs = new ArrayList<>(Iterables.size(iterable));
        For (timestamprid tr: Iterable) {// filter to get unexpired locks!
            final Instant cutoffTime = now.minus(lockExpire);
            if (tr.getTimestamp().isBefore(cutoffTime)) {
                ...
            }
            //Store lock records that have not expired into a collection
            unexpiredTRs.add(tr);
        }
        //Judge whether the current TX holds the lock successfully! If the column we insert is the first column to read, or the previous column only contains our own rid (because we obtained the lock under the premise of the first part, we successfully obtained the lock based on the current process in the first part, so if the rid is the same, it means that we also successfully obtained the current distributed lock), then we hold the lock. Otherwise, another process holds the lock and we cannot get it
        //If the lock acquisition fails, a temporary locking exception will be thrown!!!! Thrown to the top mutator.commitStorage (), the final import failed and the transaction was rolled back
        checkSeniority(kc, ls, unexpiredTRs);
        //If the above steps do not throw an exception, it indicates that the current TX has successfully acquired the lock!
        ls.setChecked();
    }

4: Overall process

The general flow chart is as follows:

Graphical janusgraph series - Concurrent security: analysis of lock mechanism (local lock + distributed lock)

The overall process is as follows:

  1. Get local lock
  2. Get distributed lock

    1. Insert distributed lock information
    2. The commit phase judges whether the distributed lock acquisition is successful
  3. If the acquisition fails, try again

5: Summary

Janusgraph’s lock mechanism is mainly throughLocal lock + distributed lockTo achieve data consistency in distributed system;

The control dimensions of distributed lock are: property, vertex, edge and index;

JanusGraphSupport in the data import through the frontConsistent behaviorPart of itLOCKTo switch the distributed lock:

  • Lock: open the distributed lock when importing data to ensure the distributed consistency
  • Default, fork: close the distributed lock when importing data

Thinking about whether to open distributed lock:

When the distributed lock is turned on, the data import cost is very large; if the data does not require high consistency and the amount of data is large, we can choose to turn off the distributed lock correlation to improve the import speed;

Then, for the data with small amount and high consistency, the distributed lock is opened to ensure the data security;

In addition, we can use theFull exploration of dataTo reduce conflict!

Whether to open or close the distributed lock for the elements of the graph schema depends on the actual business situation.

If you have any questions in this article, you can add wechat or comments to point out. Thank you!

Code text is not easy, give a compliment and star~

In this paper, the blog group issued a multi article and other operational tools platformOpenWriterelease