# Learn how clojure in Java abstracts concurrency and shared state

Time：2019-11-30

Preface

Of all Java’s next generation languages, clojure has the most radical concurrency mechanisms and capabilities. Both groovy and scala provide a combination of improved abstraction and syntactic sugar for concurrency, while clojure insists on its tough stance of always providing unique behavior on the JVM. In this next generation of Java, I’ll cover some of the many concurrency options in clojure. The first is the basic abstraction to support the volatile references in clojure: epichal time model.

Epochal event model

Perhaps the most significant difference between clojure and other languages is closely related to variable states and values. The values in clojure can be data of interest to any user: number 42, mapping structure {: first name “Neal: last name” Ford “}, or some larger data structure, such as Wikipedia. Basically, the clojure language treats all values as other languages treat numbers. The number 42 is a value that you cannot redefine. But you can apply one function to the value and return another value. For example, (inc 42) returns a value of 43.

In Java and other C-based languages, variables hold identity and value at the same time, which is one of the factors that make concurrency so difficult to implement in Java language. Language designers create variable abstractions before thread abstractions. The design of variables does not consider the complexity of concurrency. Because variables in Java assume only a single thread, in a multithreaded environment, a cumbersome mechanism like synchronization block is needed to protect variables. Clojure’s designer, rich Hickey, revived the old word “weave,” which is defined as “winding or weaving,” to describe design flaws in Java variables.

Clojure separates values from references. In the clojure worldview, data exists in the form of a series of invariant values, as shown in Figure 1.

Figure 1. Values in epochal time model

Figure 1 shows that independent values like V1 represent data such as 42 or Wikipedia, using boxes. Independent of values are functions that take values as parameters and generate new values, as shown in Figure 2.

Figure 2. Functions in epochal time model

Figure 2 shows the function as a circle independent of the value. Function calls generate new values, using values as parameters and results. A series of values are stored in a reference that represents the identity of the variable. Over time, this identity may point to different values (due to function application), but the identity never changes, as shown by the dotted line in Figure 3.

Figure 3. References in epochal time model

In Figure 3, the entire diagram represents the change of a reference over time. The dotted line is a reference that holds a series of values over its lifetime. You can assign a new invariant value to a reference at a certain time; the target that the reference points to can be changed without changing the reference.

During the lifetime of the reference, one or more observers (other programs, user interfaces, any objects interested in the value held by the reference) will dereference it and view its value (or perhaps perform some other operation), as shown in Figure 4.

Figure 4. Dereference

In Figure 4, the observer (represented by two wedges) can hold the reference itself (represented by an arrow from the dotted line reference), or dereference it and retrieve its value (represented by an arrow from the value). For example, you might have a function that takes a database connection passed to you as a parameter, which you then pass to a lower level persistence function. In this case, you hold the reference but never need its value; the persistence function may dereference it to get its value to connect to a database.

Note that the observers in Figure 4 don’t coordinate – they don’t rely on each other at all. This structure enables the clojure runtime to guarantee some useful properties throughout the language, such as never allowing the reader to block, which makes the read operation very efficient. If you want to change a reference (that is, point it to a different value), you can use an API from clojure to perform the update, which uses the epochal time model.

The epochal time model provides support for reference updates throughout clojure. Because the runtime controls all updates, it protects against thread conflicts, and developers have to contend with threads in less complex languages.

Clojure has a wide range of ways to update references, depending on what features you want. Next, I’ll discuss two ways: simple atoms and complex software transaction memory.

atom

Atoms in clojure are references to an atomic part of the data, no matter how large it is. You create an atom and initialize it, then apply a mutation function. Here, I create a reference called counter for an atom and initialize it to 0. If I want to update a reference to a new value, I can use a function like (swap!) that atomically trades in a new value for the reference:


(def counter (atom 0))
(swap! counter + 10)

According to the Convention in clojure, the name of the mutated function ends with an exclamation point. The (swap!) function accepts the reference, the function to apply (in this case, the + operator), and any other parameters.

The clojure atom holds data of any size, not just the original value. For example, I can create an atomic reference around a person map and update it with the map function. Using the (create person) function (not shown), I create a person record in an atom, then update the reference with (swap!) and (Assoc), which updates a mapping Association:


(def person (atom (create-person)))
(swap! person assoc :name "John")

Atoms also use the (compare and set!) function to implement a general optimistic locking mode


(compare-and-set! a 0 42)
=> false
(compare-and-set! a 1 7)
= true

The (compare and set!) function takes three parameters: the atomic reference, the existing value you want, and the new value. If the value of the atom does not match the value you want, the update does not occur and the function returns false.

Clojure has a variety of mechanisms that follow reference semantics. For example, promise (a different reference) promises to provide a value later. Here, I create a reference to a promise called number later. This code doesn’t generate any values, just as it promises to do in the end. When you call the (deliver) function, a value is bound to number later:


(def number-later (promise))
(deliver number-later 42)

Although this example uses the futures Library in clojure, the reference semantics are consistent with simple atoms.

Software transaction memory

No other clojure feature has received more attention than software transaction memory (STM), which is the internal mechanism of clojure to encapsulate concurrency by encapsulating garbage collection in Java language. In other words, you can write a high-performance multithreaded clojure application without considering synchronization blocks, deadlocks, thread libraries, and so on.

Clojure encapsulates concurrency by controlling all mutations in references through STM. When updating a reference (the only volatile abstraction), it must be executed in a transaction so that the clojure runtime can manage the update. Consider a classic banking problem: deposit money in one account and loan money to another account at the same time. Listing 1 shows a simple clojure solution.

Listing 1. Bank transactions


(defn transfer
[from to amount]
(dosync
(alter from - amount)
(alter to + amount)))

In Listing 1, I define a (transfer) function that takes three parameters: from and to accounts – both references – and amounts. I subtract the amount from the from account and add it to the to account, but this operation must occur with the (dosync) transaction. If I try an alter call outside the transaction block, the update fails and an IllegalStateException is thrown:


(alter from - 1)
=>> IllegalStateException No transaction running

In Listing 1, the (Alter) function still follows the epichal time model, but uses STM to ensure that both operations are complete or incomplete. To do this, STM – much like a database server – temporarily retries blocked operations, so your update function should have no side effects other than updates. For example, if your function also writes to a log, you may see multiple log entries due to constant retries. STM also increases the priority of unresolved transactions over time, showing other more common behaviors in the database engine.

The use of STM is simple, but the underlying mechanism is complex. As you can see from the name, STM is a transaction system. STM implements the ACI part of the acid transaction standard: all changes are atomic, consistent, and isolated. The durable part of acid is not applicable here because STM operates in memory. It’s rare to see a high-performance mechanism like STM built into the core of a language; Haskell is the only other mainstream language that seriously implements STM – don’t be surprised, because Haskell (like clojure) likes immutability very much. (. Net ecosystem tried to build an STM manager, but eventually gave up because of the complexity of dealing with transactions and invariance.)

Reducer and digital classification

If we don’t discuss the alternative implementation of the digital classifier in the previous issue, the introduction of parallelism is incomplete. Listing 2 shows an atomic version without parallelism.

Listing 2. Number classifier in clojure


(defn classify [num]
(let [facts (->> (range 1 (inc num))
(filter #(= 0 (rem num %))))
sum (reduce + facts)
aliquot-sum (- sum num)]
(cond
(= aliquot-sum num) :perfect
(> aliquot-sum num) :abundant
(< aliquot-sum num) :deficient)))

The classifier version in Listing 2 condenses into a single function that returns a clojure keyword (represented by a leading colon). The (let) block enables me to establish a local binding. To determine the factors, I use the thread last operator to filter the range of numbers and make the code more orderly. Sum and aliquot sum are simple to calculate; the sum of the true factors of a number is the sum of its factors minus itself, which makes my comparison code simpler. The last line of the function is the (cond) statement, which calculates aliquote sum for the calculated value and returns the appropriate keyword enumeration. One of the interesting things about this code is that the methods in my previous implementation were folded into simple assignments in this version. When the calculation is simple and concise enough, you usually need to create fewer functions.

Clojure contains a powerful library of concurrency called reducers. (an explanation of the development process for reducing Libraries – including optimizations to take advantage of the latest JVM native fork / join tools – is a fascinating story.) Reduction libraries provide in place replacement of common operations, such as map, filter, and reduce, enabling these budgets to automatically utilize multiple threads. For example, replacing the standard (map) with (R / map) (R / is the namespace of a reduced program) will cause your mapping operations to be automatically parallelized at run time.

Listing 3 shows a version of a number classifier that uses a reduction program.

Listing 3. A classifier with a reduced library


(ns pperfect.core
(:require [clojure.core.reducers :as r]))
(defn classify-with-reducer [num]
(let [facts (->> (range 1 (inc num))
(r/filter #(= 0 (rem num %))))
sum (r/reduce + facts)
aliquot-sum (- sum num)]
(cond
(= aliquot-sum num) :perfect
(> aliquot-sum num) :abundant
(< aliquot-sum num) :deficient)))

You have to watch carefully to find the difference between Listing 2 and Listing 3. The only difference is the introduction of reducing program namespace and alias, adding R / to both filter and reduce. With these subtle changes, my filtering and reduction operations now automatically use multiple threads.

Concluding remarks

This installment introduces some of the concurrency options in clojure, a rich topic area. I discussed the underlying abstraction of the core, the epichal time model, showing how atoms and STM use this concept. I also demonstrated a simple in place replacement library that enables existing applications to use advanced concurrency features, such as fork / join.

There are many other concurrency options in clojure, including simpler parallel functions, such as PMAP (parallel map). Clojure also includes agents – autonomous workers (defined by the system or users) bound to threads in a pool, much like Scala’s actors. Clojure also incorporates all the current concurrency advances in the Java language, making it easy to use modern libraries such as fork / join.

Perhaps more obvious than any other clojure feature, concurrency tools show the engineering focus of the clojure ecosystem: making full use of language features to build powerful abstractions. Clojure did not attempt to create a Lispy version of Java. Designers fundamentally rethink the core infrastructure and implementation.