Inside story of showmebug core technology

Time:2020-8-14

Showmebug is a remote interview tool that allows both parties to communicate in real time through an online interview board. So the key technology is “real-time synchronization”. For real-time synchronization, showmebug uses the following techniques.

Ot conversion algorithm

In essence, the core of showmebug is multi person online real-time editing, and that’s the difficulty. Due to network reasons, the operation may arrive asynchronously, lose, and conflict with others. Think about it. It’s a complicated problem.

After research, the best way of user experience is ot conversion algorithm. This algorithm was first proposed by C. Ellis and S. Gibbs in 1989, and is now used by quip and Google Docs.

Ot algorithm allows users to freely edit any line, including conflicting operations can also be well supported without locking. Its core algorithm is as follows:

Document operations are unified into the following three types of operations:

  1. retain(n): keep n characters
  2. insert(s): insert string s
  3. delete(s): delete string s

Then, the historical versions of the records of the client and the server are pushed to the other end after a certain transformation.

The core of transformation is

S(o_1, o_2) = S(o_2, o_1)

In other words, the concurrent operations are transformed and merged to form a new operation, and then applied to the historical version, the locking free synchronous editing can be realized.

The following figure shows the corresponding operation conversion process.

https://imgs.developpaper.com/imgs/070918.jpg

The difficulty of this algorithm lies in the distributed implementation. The client and server need to record the history and keep a certain sequence. It also needs to process the conversion algorithm.

Processing of OT rails side

In essence, this is an algorithm application based on websocket. So we chose actioncable as its basis without any doubt. Thinking about it can save us a lot of time. In fact, we were wrong.

Actioncable is actually the same as nodejs version socket.io In the same way, without any guarantee of reliability, it is possible to do some fun chatting tools, or to do message notification to allow loss or even repeated push in weak scenarios. But the strong requirements like ot algorithm are not feasible.

Because of the unreliability of network transmission, we have to process every operation in order. So first of all, we implement a mutex lock, that is, prepare a lock for an interview board, and only one operation can be operated at the same time. Redis lock is adopted. The implementation is as follows:

def unlock_pad_history(lock_key)

logger.debug "\[padable\] unlock( lock\_key: #{lock\_key} )..."  
old\_lock\_key = REDIS.get(\_pad\_lock\_history\_key)  
if old\_lock\_key == lock\_key  
  REDIS.del(\_pad\_lock\_history\_key)  
else  
  log = "\[FIXME\] unlock\_pad\_history expired: lock\_key=#{lock\_key}, old\_lock\_key=#{old\_lock\_key}"  
  logger.error(log)  
  e = RuntimeError.new(log)  
  ExceptionNotifier.notify\_exception(e, lock\_key: lock\_key, old\_lock\_key: old\_lock\_key)  
end  

end

#In order to prevent deadlock, the lock time is 5 minutes, and the timeout will automatically unlock, but an exception will be issued when the lock is unlocked
def lock_pad_history(lock_key)

return REDIS.set(\_pad\_lock\_history\_key, lock\_key, nx: true, ex: 5\*60)  

end

def wait_and_lock_pad_history(lock_key, retry_times = 200)

total\_retry\_times = retry\_times  
while !lock\_pad\_history(lock\_key)  
  sleep(0.05)  
  logger.debug '\[padable\] locked, waiting 50ms...'  
  retry\_times-=1  
  raise "wait\_and\_lock\_pad\_history(in #{total\_retry\_times\*0.1}s) #{lock\_key} failed" if retry\_times == 0  
end  
logger.debug "\[padable\] locking it(lock\_key: #{lock\_key})..."  

end

After the concurrency control of the server is completed, the client queues up and publishes operation records one by one through the “state queue” technology. The core is as follows:

class PadChannelSynchronized {
sendHistory(channel, history){

channel.\_sendHistory(history)  
return new PadChannelAwaitingConfirm(history)  

}
}

class PadChannelAwaitingConfirm {
constructor(outstanding_history) {

this.outstanding\_history = outstanding\_history  

}

sendHistory(channel, history){

return new PadChannelAwaitingWithHistory(this.outstanding\_history, history)  

}

receiveHistory(channel, history){

return new PadChannelAwaitingConfirm(pair\_history\[0\])  

}

confirmHistory(channel, history) {

if(this.outstanding\_history.client\_id !== history.client\_id){  
  throw new Error('confirmHistory error: client\_id not equal')  
}  
return padChannelSynchronized  

}
}

class PadChannelAwaitingWithHistory {
sendHistory(channel, history){

let newHistory = composeHistory(this.buffer\_history, history)  
return new PadChannelAwaitingWithHistory(this.outstanding\_history, newHistory)  

}
}

let padChannelSynchronized = new PadChannelSynchronized()

export default padChannelSynchronized

Above, a queued transmission scenario is implemented.

In addition, we designed a padchannel to manage the communication events with the server, maintain the historical state, handle the disconnection retransmission, operation conversion and verification.

Define your own history protocol

Solving the problem of editor collaboration is the beginning of the real problem. It is the first time to edit the history of the terminal. Thus, showmebug defines the following protocols:

#It includes the following: edit (update editor content), run (execute command), clear (clear), sync (synchronize data)
#Select, locate
#The history format is as follows:
#
# {
# op: ‘run’ | ‘edit’ | ‘select’ | ‘locate’ | ‘clear’
#ID: ID / / the global unique operation increases the ID automatically. It is null when the front-end passes in the first time, and the server fills in. If it is null when returned, it indicates that the history is refused to write
#Version: ‘V1’ / / data format version
# prev_ id: prev_ ID / / when the JS side generates the history, it receives the ID of the server last time, which is used to identify the operation sequence
# client_ id: client_ ID / / the unique ID of the history generated by the client
# creator_ id: creator_ ID / / the user ID of the operator. For security, it is null when it is first passed in from the front end, and is filled by the middle station
#Event: {/ / when OP is edit, record the data transformed by the editor ot, see here: https://github.com/Aaaaash/bl…
# [length, “string”, length]
#/ / when OP is select, record editor selection area (including cursor)
# }
# snapshot: {
# editor_ Text: ‘/ / records the current editor content snapshot, which is filled by the server
# language_ Type: ‘/ / records the language type of the current editor
# terminal_ Text: ‘/ / record the current terminal snapshot
# }
# }
# created_ at: created_ At / / generation time

It is worth noting that,client_idIt is an 8-bit random code generated by the client for de duplication and ACK confirmation with the client.

idIt is an auto increment ID generated by redis on the server. The client will judge whether the history is new according to this.prev_idIt is used to record the history queue for conversion operation.

eventIt is the most important operation record. We use the conversion data of ot for storage, such as:[length, "string", length]

Through the above design, we cover all the operation details of the interview board, so as to realize the real-time synchronization of multi person interview, automatic synchronization of interview questions and interview language, operation playback and other core functions.

summary

Space limit. Here we only talk about the core technology of showmebug. We will continue to share more details.

Showmebug currently carries 3000 interview records, successfully supporting a large number of actual interviewers’ interviews, and its reliability has been further guaranteed. There are two important programming paradigms to consider

  1. How to design an orderly and reliable delivery protocol on untrusted link is the key to define clear protocol data and handle asynchronous events.
  2. How to balance the relationship between R & D efficiency and stability, such as the implementation of the busy lock, allowing failure for a certain reason, but handling the user’s prompt and retrying. It not only completes the function efficiently, but also does not affect the user experience.

ShowMeBug( showmebug.com )Make your technical interview more efficient and help you find the candidate you want.