logging Data

Logging Scalar Metrics

Saving Matplotlib Figures

log_video([*frames...

log_images([*images...

log_pkl(data...

log_text("your text...

save pyTorch modules

save TensorFlow variables

from ml_logger import logger
# ~> logging data to /tmp/ml-logger-debug
logger.configure('/tmp/ml-logger-debug')

logger.log(metrics={'some_val/smooth': 10, 'status': f"step ({i})"}, reward=20, timestep=i)
# flush the data, otherwise the value would be overwritten with new values in the next iteration.
logger.flush()

outputs ~>

╒════════════════════╤════════════════════════════╕
│       reward       │             20             │
├────────────────────┼────────────────────────────┤
│      timestep      │             0              │
├────────────────────┼────────────────────────────┤
│  some val/smooth   │             10             │
├────────────────────┼────────────────────────────┤
│       status       │          step (0)          │
├────────────────────┼────────────────────────────┤
│      timestamp     │'2018-11-04T11:37:03.324824'│
╘════════════════════╧════════════════════════════╛

Asynchronously log the summary of LOTs of training metrics

A common scenario is you only want to upload averaged statistics of your metrics. A pattern that @jachiam uses is the following: store_metrics(), peak_stored_metrics(), and log_metrics_summary()

# You log lots of metrics during training.
for i in range(100):
    logger.store_metrics(metrics={'some_val/smooth': 10}, some=20, timestep=i)
# you can peak what's inside the cache and print out a table like this:
logger.peek_stored_metrics(len=4)

outputs ~>

     some      |   timestep    |some_val/smooth
━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━
      20       |       0       |      10
      20       |       1       |      10
      20       |       2       |      10
      20       |       3       |      10
# The metrics are stored in-memory. Now we need to actually log the summaries:
logger.log_metrics_summary(silent=True)
# outputs ~> . (data is now logged to the server)

Configuring The Experiment Folder

from ml_logger import logger, Color, percent
from datetime import datetime

now = datetime.now()
logger.configure(log_directory="/tmp/ml-logger-demo", f"deep_Q_learning/{now:%Y%m%d-%H%M%S}")

This is a singleton pattern similar to matplotlib.pyplot. However, you could also use the logger constructor

from ml_logger import ML_Logger

logger = ML_Logger(log_directory="/tmp/ml-logger-demo", f"deep_Q_learning/{now:%Y%m%d-%H%M%S}")

Logging Text, and Metrics

logger.log({"some_var/smooth": 10}, some=Color(0.85, 'yellow', percent), step=3)

colored output: (where the values are yellow)

╒════════════════════╤════════════════════╕
│  some var/smooth   │         10         │
├────────────────────┼────────────────────┤
│        some        │       85.0%        │
╘════════════════════╧════════════════════╛

Logging Matplotlib pyplot Figures

We have optimized ML-Logger, so it supports any format that pyplot supports. To save a figure locally or remotely,

import numpy as np
import matplotlib.pyplot as plt

xs = np.linspace(-5, 5)

plt.plot(xs, np.cos(xs), label='Cosine Func')
# saves current figure
logger.savefig('cosine_function.pdf')

fig = plt.figure(figsize=(4, 2))
xs = np.linspace(0, 5, 1000)
plt.plot(xs, np.cos(xs))
# saves a particular figure
logger.savefig("face_02.png", fig=fig)
# close the figure to release the memory
plt.close()

Logging Images

import scipy.misc

face = scipy.misc.face()
logger.log_image(face, "face.png")

# to log a grid of images
logger.log_image([face] * 20,  key="face.png", n_rows=4, n_cols=5)

For details, please refer to the detailed API doc.

Logging Videos

It is especially hard to visualize RL training sessions on a remote computer. With ML-Logger this is easy, and super fast. We optimized the serialization and transport process, so that a large stack of video tensor gets first compressed by ffmepg before getting sent over the wire.

The compression rate (and speed boost) can be 2000:1.

import numpy as np

def im(x, y):
    canvas = np.zeros((200, 200))
    for i in range(200):
        for j in range(200):
            if x - 5 < i < x + 5 and y - 5 < j < y + 5:
                canvas[i, j] = 1
    return canvas

frames = [im(100 + i, 80) for i in range(20)]

logger.log_video(frames, "test_video.mp4")

Saving PyTorch Modules

PyTorch has a very nice module saving and loading API that has inspired the one in Keras. We make it easy to save this state dictionary (state_dict) to a server, and load it. This way you can load from 100+ of your previous experiments, without having to download those weights to your code-block repository.

from ml_logger import logger

# save a module
logger.save_module(fastCNN, "models/fastCNN.pkl")

# You can inspect the data saved.
state_dict, = logger.load_pkl("models/fastCNN.pkl")

# To load the data back to a module,
logger.load_module(FastCNN, "models/fastCNN.pkl")

Saving Tensorflow Models

The format tensorflow uses to save the models is opaque. I prefer to save model weights in pickle as a dictionary. This way the weight files are transparent. ML_Logger offers easy helper functions to save and load from checkpoints saved in this format:

## To save checkpoint
from ml_logger import logger
import tensorflow as tf

logger.configure(log_directory="/tmp/ml-logger-demos")

x = tf.get_variable('x', shape=[], initializer=tf.constant_initializer(0.0))
y = tf.get_variable('y', shape=[], initializer=tf.constant_initializer(10.0))
c = tf.Variable(1000)

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

trainables = tf.trainable_variables()
logger.save_variables(trainables, path="variables.pkl", namespace="checkpoints")

which creates a file checkpoints/variables.pkl under /tmp/ml-logger-demos.

Visualization

An idea visualization dashboard would be 1. Fast, instantaneous. On an AWS headless server? View the plots as if they are on your local computer. 2. Searchable, performantly. So that you don’t have to remember where an experiment is from last week. 3. Answer Questions, from 100+ Experiments. We make available Google’s internal hyperparameter visualization tool, on your own computer.

Searching for Hyper Parameters

Experiments are identified by the metrics.pkl file. You can log multiple times to the same metrics.pkl file, and the later parameter values overwrites earlier ones with the same key. We enforce namespace in this file, so each key/value argument you pass into the logger.log_parameters function call has to be a dictionary.

Args = dict(
    learning_rate=10,
    hidden_size=200
)
logger.log_parameters(Args=Args)

How to launch the Vis App

This requires node.js and yarn dev environment at the moment. We will streamline this process without these requirements soon.

  1. download this repository
  2. go to ml-vis-app folder
  3. Install the dev dependencies
    1. install node: Installation
    2. install yarn: Installation
    3. install the dependencies of this visualization app:
      1. yarn install
  4. in that folder, run yarn.

The IP address of the server is currently hard code-blockdhere. To use this with your own instrumentation server, over-write this line. I’m planning on making this configuration more accessible.

Full Logging API

from ml_logger import logger, Color, percent

logger.log_params(G=dict(some_config="hey"))
logger.log(some=Color(0.1, 'yellow'), step=0)
logger.log(some=Color(0.28571, 'yellow', lambda v: "{:.5f}%".format(v * 100)), step=1)
logger.log(some=Color(0.85, 'yellow', percent), step=2)
logger.log({"some_var/smooth": 10}, some=Color(0.85, 'yellow', percent), step=3)
logger.log(some=Color(10, 'yellow'), step=4)

colored output: (where the values are yellow)

╒════════════════════╤════════════════════╕
│        some        │        0.1         │
╘════════════════════╧════════════════════╛
╒════════════════════╤════════════════════╕
│        some        │     28.57100%      │
╘════════════════════╧════════════════════╛
╒════════════════════╤════════════════════╕
│        some        │       85.0%        │
╘════════════════════╧════════════════════╛
╒════════════════════╤════════════════════╕
│  some var/smooth   │         10         │
├────────────────────┼────────────────────┤
│        some        │       85.0%        │
╘════════════════════╧════════════════════╛

In your project files, do:

from params_proto import cli_parse
from ml_logger import logger


@cli_parse
class Args:
    seed = 1
    D_lr = 5e-4
    G_lr = 1e-4
    Q_lr = 1e-4
    T_lr = 1e-4
    plot_interval = 10
    log_dir = "http://54.71.92.65:8081"
    log_prefix = "ml-logger-debug"

logger.configure(log_directory="http://some.ip.address.com:2000", prefix="your-experiment-prefix!")
logger.log_params(Args=vars(Args))
logger.log_file(__file__)


for epoch in range(10):
    logger.log(step=epoch, D_loss=0.2, G_loss=0.1, mutual_information=0.01)
    logger.log_key_value(epoch, 'some string key', 0.0012)
    # when the step index updates, logger flushes all of the key-value pairs to file system/logging server

logger.flush()

# Images
face = scipy.misc.face()
face_bw = scipy.misc.face(gray=True)
logger.log_image(index=4, color_image=face, black_white=face_bw)
image_bw = np.zeros((64, 64, 1))
image_bw_2 = scipy.misc.face(gray=True)[::4, ::4]

logger.log_image(i, animation=[face] * 5)

This version of logger also prints out a tabular printout of the data you are logging to your stdout. - can silence stdout per key (per logger.log call) - can print with color: logger.log(timestep, some_key=green(some_data)) - can print with custom formatting: logger.log(timestep, some_key=green(some_data, percent)) where percent - uses the correct unix table characters (please stop using | and +. Use ``│``, ``┼`` instead)

A typical print out of this logger look like the following:

from ml_logger import ML_Logger

logger = ML_Logger(log_directory=f"/mnt/bucket/deep_Q_learning/{datetime.now(%Y%m%d-%H%M%S.%f):}")

logger.log_params(G=vars(G), RUN=vars(RUN), Reporting=vars(Reporting))

outputs the following

═════════════════════════════════════════════════════
              G
───────────────────────────────┬─────────────────────
           env_name            │ MountainCar-v0
             seed              │ None
      stochastic_action        │ True
         conv_params           │ None
         value_params          │ (64,)
        use_layer_norm         │ True
         buffer_size           │ 50000
      replay_batch_size        │ 32
      prioritized_replay       │ True
            alpha              │ 0.6
          beta_start           │ 0.4
           beta_end            │ 1.0
    prioritized_replay_eps     │ 1e-06
      grad_norm_clipping       │ 10
           double_q            │ True
         use_dueling           │ False
     exploration_fraction      │ 0.1
          final_eps            │ 0.1
         n_timesteps           │ 100000
        learning_rate          │ 0.001
            gamma              │ 1.0
        learning_start         │ 1000
        learn_interval         │ 1
target_network_update_interval │ 500
═══════════════════════════════╧═════════════════════
             RUN
───────────────────────────────┬─────────────────────
        log_directory          │ /mnt/slab/krypton/machine_learning/ge_dqn/2017-11-20/162048.353909-MountainCar-v0-prioritized_replay(True)
          checkpoint           │ checkpoint.cp
           log_file            │ output.log
═══════════════════════════════╧═════════════════════
          Reporting
───────────────────────────────┬─────────────────────
     checkpoint_interval       │ 10000
        reward_average         │ 100
        print_interval         │ 10
═══════════════════════════════╧═════════════════════
╒════════════════════╤════════════════════╕
│      timestep      │        1999        │
├────────────────────┼────────────────────┤
│      episode       │         10         │
├────────────────────┼────────────────────┤
│    total reward    │       -200.0       │
├────────────────────┼────────────────────┤
│ total reward/mean  │       -200.0       │
├────────────────────┼────────────────────┤
│  total reward/max  │       -200.0       │
├────────────────────┼────────────────────┤
│time spent exploring│       82.0%        │
├────────────────────┼────────────────────┤
│    replay beta     │        0.41        │
╘════════════════════╧════════════════════╛