Optimization with TensorFlow

TensorFlow is a free, open-source machine learning framework that’s geared towards deep learning. Optimization algorithms are at the heart of artificial neural networks. We can therefore let TensorFlow solve numerical optimization problems.

TensorFlow and TFX

For end-to-end machine learning (ML) workflows there is TensorFlow eXtended (TFX), which runs on Kubeflow, an ML orchestration framework that leverages Kubernetes. Inside a TFX workflow the following components are available:

  • ExampleGen ingests and splits the input dataset. As of version 0.13, only CSV files and BigQuery result sets are supported!
  • StatisticsGen calculates statistics for the dataset.
  • SchemaGen examines the statistics and creates a data schema.
  • ExampleValidator looks for anomalies and missing values in the dataset.
  • Transform performs re-usable feature engineering on the dataset.
  • Trainer trains the model, for instance with (canned) estimators.
  • Evaluator performs analysis of the training results.
  • ModelValidator ensures that the model is ‘good enough’ to be pushed to production.
  • Pusher deploys the model to a serving infrastructure.

TensorFlow Serving can be used for serving the saved model. TensorFlow Serving is a C++ backend that has built-in support for hot-swapping models in production. For local development, Apache Airflow is used in the current TFX repository set-up, although it’s possible to use any orchestration framework, including Spotify’s open-source Luigi.

The TFX User Guide and the various TensorFlow Dev Summit talks contain plenty of useful information. An overview of TFX and Kubeflow Pipelines is available here.

Optimization

Only the Transform and Trainer components include TensorFlow user code, which is what I’ll focus on now, in particular the optimizers.

Algorithms

TensorFlow comes with a few optimization algorithms.

  • The GradientDescentOptimizer is the simplest and most intuitive option. For high learning rates, it can easily miss the optimal value, and for low learning rates it is excruciatingly slow. The algorithm is also prone to oscillate between values. Its learning rate is typically set in the range [0.0001, 0.1] and gradually decreased until the algorithm converges. This makes it a very finicky optimizer.
  • The next-best alternative is the MomentumOptimizer, which has a lot in common with the gradient-descent optimizer, but typically converges more quickly and suffers less from oscillations.
  • The AdagradOptimizer is an adaptive gradient descent algorithm that allows different variables to converge at different rates.
  • The RMSPropOptimizer is a gradient descent algorithm with a decaying learning rate.
  • Probably the best optimization algorithm in TensorFlow is the AdamOptimizer. It’s often recommended as the default and only in exceptional cases should anyone pick another algorithm.
  • An algorithm that ‘follows the regularized leader’ is the FtrlOptimizer. In each iteration the solution with the least loss over all past rounds (i.e. the ‘leader’) is selected. This is known as ‘follow the leader’ (FTL). The problem with that is that it can easily oscillate between local minima. To stabilize FTL, a regularization term is added to the loss up to the current step, which is referred to as FTRL, ‘follow the regularized leader’.

There are also Proximal* versions of both Adagrad and the gradient-descent optimizers. These methods can solve non-differentiable convex optimization problems.

Internally, these optimization algorithms use automatic differentiation to obtain accurate gradients at different points of interest.

Set-up

As long as you have Python 3.4 or newer, you can get started with TensorFlow on your machine by:

The third option is probably the easiest and does not even require Python:

docker pull tensorflow/tensorflow:latest-py3

docker run -it --rm -p 6006:6006 -v $(pwd):/tmp -w /tmp \
  tensorflow/tensorflow:latest-py3 python ./script.py

Here, script.py is a Python/TensorFlow script you wish to execute. You can also run an interactive Python session:

docker run -it --rm -p 6006:6006 -w /tmp \
  tensorflow/tensorflow:latest-py3 python

Since Python 2.7 reaches its end of life in 2020, there is little reason to not use Python 3, hence the tag.

If you are not familiar with the Docker CLI, let’s break down the entire command:

  • docker run tensorflow/tensorflow:latest-py3 python runs python in the specified container;
  • -it creates an interactive Bash shell;
  • --rm removes the container after it exits;
  • -p 6006:6006 exposes port 6006 from the container as 6006 on the host, so we can use localhost:6006 in any browser to see TensorBoard;
  • -w /tmp sets the work directory to /tmp.

Problems

For the various (unconstrained) optimization problems I’ll use the ones discussed in an introduction to genetic algorithms in optimization. The objective (or loss) functions are defined in losses:

\[\min_{\mathbf{x}\in\mathbb{R}}{4x^2+4y^2-4xy-12x}\] \[\min_{x\in\mathbb{R}}{\sin{(x)}\mathrm{e}^{-ax^2}}\]

The first has a global minimum at (x, y) = (2, 1), and the second has a maximum at x = 1.540005942 approximately.

import os
import sys

import argparse
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

FLAGS = None


def main(_):
    learning_rate = FLAGS.learning_rate
    epochs = FLAGS.epochs

    x_var = tf.Variable(0.0, name='x_opt')
    y_var = tf.Variable(0.0, name='y_opt')
    step_var = tf.Variable(0, trainable=False)

    losses = {1: 4.0 * x_var * x_var + 4.0 * y_var * y_var - 4.0 * x_var * y_var - 12.0 * x_var,
              2: tf.math.sin(x_var) * tf.math.exp(- 0.01 * x_var * x_var)}

    optimizers = {'gradientdescent': tf.train.GradientDescentOptimizer(learning_rate),
                  'momentum': tf.train.MomentumOptimizer(learning_rate, FLAGS.momentum),
                  'adagrad': tf.train.AdagradOptimizer(learning_rate),
                  'adam': tf.train.AdamOptimizer(learning_rate),
                  'ftrl': tf.train.FtrlOptimizer(learning_rate),
                  'rmsprop': tf.train.RMSPropOptimizer(learning_rate)}

    loss = losses[FLAGS.loss]
    optimizer = optimizers[FLAGS.optimizer.lower()]
    # Objective function must be defined outside of tf.Session() for all stateful optimizers
    # (i.e. all but GradientDescent).
    # See: https://github.com/tensorflow/tensorflow/issues/8057
    objective = optimizer.minimize(loss, global_step=step_var)
    optimizer_name = type(optimizer).__name__

    init = tf.global_variables_initializer()

    saver = tf.train.Saver()

    (x_op, y_op) = (tf.summary.scalar('x', x_var), tf.summary.scalar('y', y_var))
    summary_op = tf.summary.merge([x_op, y_op])
    log_file = '{}/log_optimizer={}_loss={}_learning_rate={}_epochs={}'.format(FLAGS.log,
                                                                               optimizer_name.lower(),
                                                                               FLAGS.loss,
                                                                               learning_rate,
                                                                               epochs)
    writer = tf.summary.FileWriter(log_file, graph=tf.get_default_graph())

    with tf.Session() as sess:
        sess.run(init)

        for epoch in range(epochs):
            _, step, x_res, y_res, summary = sess.run(
                [objective, step_var, x_var, y_var, summary_op])

            tf.logging.info('%s - epoch %d / step %d: estimated optimum = (%f, %f)' % (
                optimizer_name, epoch, step, x_res, y_res))

            writer.add_summary(summary, global_step=step)
            writer.flush()

        saver.save(sess, os.getcwd() + '/output')
        x_opt = sess.run(x_var)
        y_opt = sess.run(y_var)

        tf.logging.info('%s - computed optimum = (%f, %f)' % (optimizer_name, x_opt, y_opt))


if __name__ == '__main__':
    parser = argparse.ArgumentParser(prog='TensorFlow optimizer demonstration',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument(
        '--optimizer',
        type=str,
        help='Optimizer to be used: GradientDescent, Momentum, Adagrad, Adam, Ftrl, or RMSProp',
        default='GradientDescent'
    )
    parser.add_argument(
        '--loss',
        type=int,
        help='Loss function to be used: 1 or 2',
        default='1'
    )
    parser.add_argument(
        '--log',
        type=str,
        help='Location of the logs',
        default='/tmp/tf'
    )
    parser.add_argument(
        '--learning-rate',
        type=float,
        help='Learning rate for optimizer',
        default='0.1'
    )
    parser.add_argument(
        '--momentum',
        type=float,
        help='Momentum for the momentum optimizer',
        default='0.01'
    )
    parser.add_argument(
        '--epochs',
        type=int,
        help='Number of epochs/iterations',
        default='100'
    )

    FLAGS, unparsed = parser.parse_known_args()
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

You execute the script with the command

docker run -it --rm -p 6006:6006 -v $(pwd):/tmp -w /tmp \
  tensorflow/tensorflow:latest-py3 python ./optimize.py \
  --optimizer Adagrad \
  --loss 1 \
  --learning-rate 0.01 \
  --epochs 50

This presumes the Python script is called optimize.py and it is located in the present working directory on the host. It runs the script for 50 epochs using the AdagradOptimizer with a learning rate of 0.01 for the first loss function. The optimizers are case insensitive.

You can also supply -h or --help to obtain information on the accepted arguments.

TensorBoard

The code writes output to log files the work directory under /tmp/tf. Inside an interactive session on the container you can execute tensorboard --logdir /tmp/tf to open TensorBoard to visualize the progress. On the host machine (i.e. the computer on which you’ve executed docker run), navigate to localhost:6006 to see TensorBoard.

To run all optimizers, let’s create a simple script optimize.sh:

#!/bin/bash
rm -rf tf

OPTIMIZERS=("GradientDescent" "Momentum" "Adagrad" "Adam" "FTRL" "RMSProp")
LOSSES=(1 2)

for optimizer in "${OPTIMIZERS[@]}"; do
  for loss in "${LOSSES[@]}"; do
    python optimize.py --optimizer "$optimizer" --loss "$loss"
  done
done

tensorboard --logdir /tmp/tf

We run it with:

docker run -it --rm -p 6006:6006 -v $(pwd):/tmp -w /tmp \
  tensorflow/tensorflow:latest-py3 ./optimize.sh

After successful completion you can navigate to localhost:6006 and you should see TensorBoard:

TensorBoard for all optimizers for the first loss function
TensorBoard for all optimizers for the first loss function

By default, it shows the values for x and y for each epoch for all optimizers and loss functions. You can use a regex loss=1 or loss=2 in the ‘Runs’ menu on the left (not shown here) to filter only for a single loss function.

TensorBoard for all optimizers for the second loss function
TensorBoard for all optimizers for the second loss function

For the first loss function we see that Adagrad and FTRL have not converged after 100 iterations for the given (default) learning rate. For the second loss function, only Adam and RMSProp have approximated the optimum within 100 epochs.