This directory includes the implementation of runtime distributed tensorflow, and the bottom layer uses grpc as the support library for in-process communication.
First, you need to build a server-side executable version of tensorflow（
grpc_tensorflow_server）And a grpc based client. Currently, it can only be self built based on the source code, but it will be included in the binary version released in the future. You can use the following command to build:
# CPU-only build. $ bazel build -c opt //tensorflow/core/distributed_runtime/rpc:grpc_tensorflow_server # GPU build. $ bazel build -c opt --config=cuda //tensorflow/core/distributed_runtime/rpc:grpc_tensorflow_server
If you create a python dependency package from the latest source code, it automatically includes a grpc based client. If you are using a previously released binary version, you need to recompile the installation according to the installation instructions. After you have successfully built the distributed tensorflow component, you can start the server and judge whether your installation is successful by:
# Start a TensorFlow server as a single-process "cluster". $ bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server \ --cluster_spec='local|localhost:2222' --job_name=local --task_index=0 &
Then start the python communicator and start a session:
$ python >>> import tensorflow as tf >>> c = tf.constant("Hello, distributed TensorFlow!") >>> sess = tf.Session("grpc://localhost:2222") >>> sess.run(c) 'Hello, distributed TensorFlow!'
Command line arguments
grpc_tensorflow_serverDefines the relationship between clusters. Parameters
--cluster_specIt determines the number of working objects in the cluster, such as a series ofjobsAnd eachjobsMore than onetaskTerminal. All processes in the cluster must have the same
--cluster_specParameters, for example:
--task_indexFlags indicate which tasks will run on the current process. Specifically,
--job_name=local --task_index=0This means that the process will be marked as
/job:local/task:0, and then all tensorflow devices in the process will use this prefix.
Manually specifying these operating parameters can be tedious, especially for a large cluster. We are developing tools that can be started programmatically, such as using a cluster manager similar to kubernetes. If you have any cluster management tools that you think are good to join in, you can put forward your suggestions on GitHub issue.
Distributed devices in annotation model
In order to put an operation on a special process, it can still be used in a distributed environment
Function, which is used to indicate whether it is on the CPU or GPU. For example:
with tf.device("/job:ps/task:0"): weights_1 = tf.Variable(...) biases_1 = tf.Variable(...) with tf.device("/job:ps/task:1"): weights_2 = tf.Variable(...) biases_2 = tf.Variable(...) with tf.device("/job:worker/task:7"): input, labels = ... layer_1 = tf.nn.relu(tf.matmul(input, weights_1) + biases_1) logits = tf.nn.relu(tf.matmul(layer_1, weights_2) + biases_2) # ... train_op = ... with tf.Session("grpc://worker7:2222") as sess: for _ in range(10000): sess.run(train_op)
In the above example, variables are in the job
psThe two tasks of are created, and the calculation intensive part is created in the job
workUp. Tensorflow automatically transfers data between different jobs. (from
workIt’s forward, and from
psIs gradient application).
A common training configuration (data parallel training) contains jobs
psShared parameters and jobs on
workMultiple tasks on to train the same model. Each task will generally run on a different machine. There are still many ways to implement this structure in tensorflow. In the future, we will also provide a simpler way to implement it. The main ways are:
Build a single graph (in
tf.Variablenodes pinned to
/job:ps）And create copies of multiple models to map to
/job:workerDifferent tasks in. Each copy of the model has a different
train_op, and for each worker
iOne or more client threads can call
sess.run(train_ops[i])。 This method uses a single
tf.Session, whose target is a worker in the cluster.
As above, but where the gradients from all workers are averaged. See the
CIFAR-10 multi-GPU trainer
for an example of this form of replication. The implements synchronous training
Another method of distributed trainer is to use multiple graphs, one graph corresponds to one worker, and each graph contains a set of parameters（
/job:ps）And a model assignment. The mechanism of container is to share variables among different graphs: once a variable is constructed, optional
containerParameters are determined by the same values for each copy in the graph. For larger models, this method will be more effective, after all, the whole graph is a little smaller.
This method uses multiple
tf.SessionObject: each worker process will contain one, but different sessions will point to different target workers. this
tf.SessionObjects can be created either in a single Python client or in multiple clients.
A typical client will build a tensorflow diagram and use
tensorflow::SessionTo complete the interaction with the cluster. Clients are usually written in Python or C + +. Generally speaking, a client can interact with multiple servers at the same time (refer to the repeated training above), and a server can also serve multiple clients at the same time.
A tensorflow cluster contains one or more tensorflow servers, which are divided into a series of named jobs, and each job is responsible for a series of tasks. A cluster usually focuses on a relatively high-level goal, such as training a neural network with multiple machines in parallel.
A job will contain a series of tasks dedicated to the same goal. For example, a
ps(meaning parameter service) will be used to handle the work stored in updating variables. And one is called.
workerThe job of will be used to host stateless nodes that are used for computation intensive. Generally speaking, tasks in a job run on different machines.
Master service is an RPC service used to interact with a series of remote distributed devices. Master service implements
tensorflow::SessionInterface, and is used to coordinate multiple worker services.
A task is usually associated with a single tensorflow server’s process, belongs to a specific job and has a unique index in the job’s task list.
The process used to run grpc? Tensorflow? Server is a member of a cluster and exposes a master service and a worker service.
An RPC service that performs part of the tensorflow diagram.