ml_logger.ML_Logger(prefix='', *prefixae, root='/home/docs/checkouts/readthedocs.org/user_builds/ml-logger/checkouts/stable/ml_logger/docs', user=None, access_token=None, buffer_size=2048, max_workers=None, asynchronous=None, summary_cache_opts: dict = None)[source]¶ML_Logger, a logging utility for ML training. —
Async(clean=False, **kwargs)[source]¶Returns a context in which the logger logs [a]synchronously. The new asynchronous request pool is cached on the logging client, so this context can happen repetitively without creating a run-away number of parallel threads.
The context object can only be used once b/c it is create through generator using the @contextmanager decorator.
| Parameters: |
|
|---|---|
| Returns: | context object |
AsyncContext(clean=False, **kwargs)¶Returns a context in which the logger logs [a]synchronously. The new asynchronous request pool is cached on the logging client, so this context can happen repetitively without creating a run-away number of parallel threads.
The context object can only be used once b/c it is create through generator using the @contextmanager decorator.
| Parameters: |
|
|---|---|
| Returns: | context object |
Prefix(*praefixa, metrics=None, sep='/')[source]¶Returns a context in which the prefix of the logger is set to prefix :param praefixa: the new prefix :return: context object
PrefixContext(*praefixa, metrics=None, sep='/')¶Returns a context in which the prefix of the logger is set to prefix :param praefixa: the new prefix :return: context object
Sync(clean=False, **kwargs)[source]¶Returns a context in which the logger logs synchronously. The new synchronous request pool is cached on the logging client, so this context can happen repetitively without creating a run-away number of parallel threads.
The context object can only be used once b/c it is create through generator using the @contextmanager decorator.
| Parameters: |
|
|---|---|
| Returns: | context object |
SyncContext(clean=False, **kwargs)¶Returns a context in which the logger logs synchronously. The new synchronous request pool is cached on the logging client, so this context can happen repetitively without creating a run-away number of parallel threads.
The context object can only be used once b/c it is create through generator using the @contextmanager decorator.
| Parameters: |
|
|---|---|
| Returns: | context object |
abspath(*paths)[source]¶returns the absolute path w.r.t the logging directory.
print(logger.abspath("some", "path"))
# /home/ge/some/path
| Parameters: | *paths – position arguments for each segment of the path. |
|---|---|
| Returns: | absolute path w.r.t. the logging directory (excluding the prefix) |
configure(prefix=None, *prefixae, root: str = None, user=None, access_token=None, asynchronous=None, max_workers=None, buffer_size=None, summary_cache_opts: dict = None, register_experiment=None, silent=False)[source]¶Configure an existing logger with updated configurations.
# LogClient Behavior
The logger.client would be re-constructed if
- root_dir is changed
- max_workers is not None
- asynchronous is not None
Because the http LogClient contains http thread pools, one shouldn’t call this configure function in a loop. Instead, use the logger.(A)syncContext() contexts. That context caches the pool so that you don’t create new thread pools again and again.
# Cache Behavior
Both key-value cache and the summary cache would be cleared if summary_cache_opts is set to not None. A new summary cache would be created, whereas the old key-value cache would be cleared.
# Print Buffer Behavior If configure is called with a buffer_size not None, the old print buffer would be cleared.
todo: I’m considering also clearing this buffer also when summary-cache is updated. The use-case of changing print_buffer_size is pretty small. Should probaly just deprecate this.
# Registering New Experiment
This is a convinient default for new users. It prints out a dashboard link to the dashboard url.
| Parameters: |
|
|---|---|
| Returns: |
diff(diff_directory='.', diff_filename='index.diff', ref='HEAD', verbose=False)[source]¶example usage:
from ml_logger import logger
logger.diff() # => this writes a diff file to the root of your logging directory.
| Parameters: |
|
|---|---|
| Returns: | string containing the content of the patch |
every(n=1, key='default', start_on=0)[source]¶returns True every n counts. Use the key to count different intervals.
Example:
for i in range(100):
if logger.every(10):
print('every tenth count!')
if logger.every(100, "hudred"):
print('every 100th count!')
if logger.every(10, "hudred", start_on=1):
print('every 10th count starting from the first call: i =', i)
| Parameters: |
|
|---|---|
| Returns: |
fn_info(fn)[source]¶logs information of the caller’s stack (module, filename etc)
| Parameters: | fn – |
|---|---|
| Returns: | info = dict( name=_[‘__name__’], doc=_[‘__doc__’], module=_[‘__module__’], file=_[‘__globals__’][‘__file__’] ) |
get_dataframe(*keys, x_key=None, path='metrics.pkl', wd=None, num_bins=None, bin_size=1, silent=False, default=None, collect='std', verbose=False)¶Returns a Pandas.DataFrame object that contains metrics from all files.
| Parameters: |
|
|---|---|
| Returns: | pandas.DataFrame or None when no metric file is found. |
get_parameters(*keys, path='parameters.pkl', not_exist_ok=False, **kwargs)[source]¶utility to obtain the hyperparameters as a flattened dictionary.
If keys are passed, returns an array with each item corresponding to those keys
lr, global_metric = logger.get_parameters('Args.lr', 'Args.global_metric')
print(lr, global_metric)
this returns:
0.03 'ResNet18L2'
Raises FileNotFound error if the parameter file pointed by the path is empty. To avoid this, add a default keyword value to the call:
param = logger.get_parameter('does_not_exist', default=None)
assert param is None, "should be the default value: None"
| Parameters: |
|
|---|---|
| Returns: |
git_rev(branch)[source]¶Helper function used by `logger.__head__` that returns the git revision hash of the branch that you pass in.
full reference here: https://stackoverflow.com/a/949391 the show-ref and the for-each-ref commands both show a list of refs. We only need to get the ref hash for the revision, not the entire branch of by tag.
glob(query, wd=None, recursive=True, start=None, stop=None)[source]¶Globs files under the work directory (wd). Note that wd affects the file paths being returned. The default is the current logging prefix. Use absolute path (with a leanding slash (/) to escape the logging prefix. Use two leanding slashes for the absolute path in the host for the logging server.
with logger.PrefixContext("<your-run-prefix>"):
runs = logger.glob('**/metrics.pkl')
for _ in runs:
exp_log = logger.load_pkl(_)
| Parameters: |
|
|---|---|
| Returns: | None if the director does not exist (internal FileNotFoundError) |
glob_gs(query='', wd=None, max_results=1000, **kwargs)[source]¶Does not support wildcard or pagination, but we could add it in the future.
| Parameters: |
|
|---|---|
| Returns: |
glob_s3(query='*', wd=None, max_keys=1000, **KWargs)[source]¶Does not support wildcard or pagination, but we could add it in the future.
| Parameters: |
|
|---|---|
| Returns: |
iload_pkl(key, **kwargs)[source]¶load a pkl file as an iterator.
for chunk in logger.iload_pkl("episodeyang/weights.pkl")
print(chunk)
or alternatively just read a single data file:
data, = logger.iload_pkl("episodeyang/weights.pkl")
when key starts with a single slash as in “/debug/some-run”, the leading slash is removed and the remaining path is pathJoin’ed with the data_dir of the server.
So if you want to access absolute path of the filesystem that the logging server is in, you should append two leadning slashes. This way, when the leanding slash is removed, the remaining path is still an absolute value and joining with the data_dir would post no effect.
“//home/ubuntu/ins-runs/debug/some-other-run” would point to the system absolute path.
| Parameters: |
|
|---|---|
| Returns: | a iterator. |
load_file(*keys, path=None)[source]¶return the binary stream, most versatile.
todo: check handling of line-separated files
when key starts with a single slash as in “/debug/some-run”, the leading slash is removed and the remaining path is pathJoin’ed with the data_dir of the server.
So if you want to access absolute path of the filesystem that the logging server is in, you should append two leadning slashes. This way, when the leanding slash is removed, the remaining path is still an absolute value and joining with the data_dir would post no effect.
“//home/ubuntu/ins-runs/debug/some-other-run” would point to the system absolute path.
| Parameters: | *keys – path string fragments that are joined together |
|---|---|
| Returns: | a tuple of each one of the data chunck logged into the file. |
load_module(module, path='weights.pkl', wd=None, stream=True, tries=5, matcher=None, map_location=None)[source]¶Load torch module from file.
Now supports:
matcher.To manipulate the prefix of a checkpoint file you can do
Using Matcher for Partial or Prefixed load
Imaging you are trying to load weights from a different module that is missing a prefix for their keys. (For example you have a L2 metric function, and is trying to load from a VAE embedding function baseline (only half of the netowrk)).
from ml_logger import logger
net = models.ResNet()
logger.load_module(
net,
path="/checkpoint/geyang/resnet.pkl",
matcher=lambda d, k, p: d[k.replace('embed.')])
To fill-in if there are missing keys:
from ml_logger import logger
net = models.ResNet()
logger.load_module(
net,
path="/checkpoint/geyang/resnet.pkl",
matcher=lambda d, k, p: d[k] if k in d else p[k])
| Parameters: |
|
|---|---|
| Returns: | None |
load_np(*keys)[source]¶load a np file
when key starts with a single slash as in “/debug/some-run”, the leading slash is removed and the remaining path is pathJoin’ed with the data_dir of the server.
So if you want to access absolute path of the filesystem that the logging server is in, you should append two leadning slashes. This way, when the leanding slash is removed, the remaining path is still an absolute value and joining with the data_dir would post no effect.
“//home/ubuntu/ins-runs/debug/some-other-run” would point to the system absolute path.
| Parameters: | keys – path strings |
|---|---|
| Returns: | a tuple of each one of the data chunck logged into the file. |
load_pkl(*keys, start=None, stop=None, tries=1, delay=1)[source]¶load a pkl file as a tuple. By default, each file would contain 1 data item.
data, = logger.load_pkl("episodeyang/weights.pkl")
You could also load a particular data chunk by index:
data_chunks = logger.load_pkl("episodeyang/weights.pkl", start=10)
when key starts with a single slash as in “/debug/some-run”, the leading slash is removed and the remaining path is pathJoin’ed with the data_dir of the server.
So if you want to access absolute path of the filesystem that the logging server is in, you should append two leadning slashes. This way, when the leanding slash is removed, the remaining path is still an absolute value and joining with the data_dir would post no effect.
“//home/ubuntu/ins-runs/debug/some-other-run” would point to the system absolute path.
Because loading is usually synchronous, we can encounter connection errors. We don’t want to halt our training session b/c of these errors without retrying a few times.
For this reason, logger.load_pkl (and iload_pkl to equal measure) both takes a tries argument and a delay argument. The delay argument is multipled by a random number, to avoid synchronized DDoS attach on your instrumentation server.
tries
| Parameters: |
|
|---|---|
| Returns: | a tuple of each one of the data chunck logged into the file. |
load_text(*keys)[source]¶return the text content of the file (in a single chunk)
todo: check handling of line-separated files
when key starts with a single slash as in “/debug/some-run”, the leading slash is removed and the remaining path is pathJoin’ed with the data_dir of the server.
So if you want to access absolute path of the filesystem that the logging server is in, you should append two leadning slashes. This way, when the leanding slash is removed, the remaining path is still an absolute value and joining with the data_dir would post no effect.
“//home/ubuntu/ins-runs/debug/some-other-run” would point to the system absolute path.
| Parameters: | *keys – path string fragments |
|---|---|
| Returns: | a tuple of each one of the data chunck logged into the file. |
load_variables(path, variables=None)[source]¶load the saved value from a pickle file into tensorflow variables.
The variables that are loaded is the intersection between the tf.global_variables() list and the variables saved in the weight_dict. When a variable in the weight_dict is not present in the current session’s computation graph, no error is reported. When a variable present in the global variables list is not present in the weight_dict, no exception is raised.
The variables argument overrides the global variable list. When a variable present in this list doesn’t exist in the weight list, an exception should be raised.
| Parameters: |
|
|---|---|
| Returns: |
log(*args, metrics=None, silent=False, sep=' ', end='\n', flush=None, cache=None, file=None, _prefix=None, **_key_values) → None[source]¶log dictionaries of data, key=value pairs at step == step.
logs *argss as line and kwargs as key / value pairs
param args: (str) strings or objects to be printed. param metrics: (dict) a dictionary of key/value pairs to be saved in the key_value_cache param sep: (str) separator between the strings in *args param end: (str) string to use for the end of line. Default to “
| param silent: | (boolean) whether to also print to stdout or just log to file |
|---|---|
| param flush: | (boolean) whether to flush the text logs |
| param cache: | optional (str) a specific cache key, useful for scoped reporting |
| param kwargs: | key/value arguments |
| return: |
log_data(data, path=None, overwrite=False)[source]¶Append data to the file located at the path specified.
| Parameters: |
|
|---|---|
| Returns: | None |
log_line(*args, sep=' ', end='\n', flush=True, file=None, **kwargs)[source]¶this is similar to the print function. It logs *args with a default EOL postfix in the end.
n = 10
logger.log_line("Mary", "has", n, "sheep.", color="green")
This outputs:
>>> "Mary has 10 sheep" (colored green)
| Parameters: |
|
|---|---|
| Returns: | None |
log_metrics(metrics=None, _prefix=None, silent=None, cache: Optional[str] = None, file: Optional[str] = None, flush=None, **_key_values) → None[source]¶| Parameters: |
|
|---|---|
| Returns: |
log_metrics_summary(key_values: dict = None, cache: str = None, key_stats: dict = None, default_stats=None, silent=False, flush: bool = True, _prefix=None, **_key_modes) → None[source]¶logs the statistical properties of the stored metrics, and clears the summary_cache if under tiled mode, and keeps the data otherwise (under rolling mode).
To enable explicit mode without specifying *only_keys, set get_only to True
Modes for the Statistics:
| Parameters: |
|
|---|---|
| Returns: | None |
log_params(path='parameters.pkl', silent=False, **kwargs)[source]¶Log namespaced parameters in a list.
Examples:
logger.log_params(some_namespace=dict(layer=10, learning_rate=0.0001))
generates a table that looks like:
══════════════════════════════════════════
some_namespace
────────────────────┬─────────────────────
layer │ 10
learning_rate │ 0.0001
════════════════════╧═════════════════════
| Parameters: |
|
|---|---|
| Returns: | None |
log_text(text: str = None, filename=None, dedent=False, overwrite=False)[source]¶logging and printing a string object.
This does not log to the buffer. It calls the low-level log_text method right away without buffering.
logger.log_text('''
some text
with indent''', dedent=True)
This logs with out the indentation at the begining of the text.
| Parameters: |
|
|---|---|
| Returns: |
ping(status='running', interval=None)[source]¶pings the instrumentation server to stay alive. Gets a control signal in return. The background thread is responsible for making the call . This method just returns the buffered signal synchronously.
| Returns: | tuple signals |
|---|
plt2data(fig)[source]¶@brief Convert a Matplotlib figure to a 4D numpy array with RGBA channels and return it @param fig a matplotlib figure @return a numpy 3D array of RGBA values
read_metrics(*keys, x_key=None, path='metrics.pkl', wd=None, num_bins=None, bin_size=1, silent=False, default=None, collect='std', verbose=False)[source]¶Returns a Pandas.DataFrame object that contains metrics from all files.
| Parameters: |
|
|---|---|
| Returns: | pandas.DataFrame or None when no metric file is found. |
read_params(*keys, path='parameters.pkl', not_exist_ok=False, **kwargs)¶utility to obtain the hyperparameters as a flattened dictionary.
If keys are passed, returns an array with each item corresponding to those keys
lr, global_metric = logger.get_parameters('Args.lr', 'Args.global_metric')
print(lr, global_metric)
this returns:
0.03 'ResNet18L2'
Raises FileNotFound error if the parameter file pointed by the path is empty. To avoid this, add a default keyword value to the call:
param = logger.get_parameter('does_not_exist', default=None)
assert param is None, "should be the default value: None"
| Parameters: |
|
|---|---|
| Returns: |
save_image(image, key: str, cmap=None, normalize=None)[source]¶Log a single image.
| Parameters: |
|
|---|
save_images(stack, key, n_rows=None, n_cols=None, cmap=None, normalize=None, background=1)[source]¶Log images as a composite of a grid. Images input as a 4-D stack.
| Parameters: |
|
|---|---|
| Returns: | None |
save_module(module, path='weights.pkl', tries=3, backup=3.0)[source]¶Save torch module. Overwrites existing file.
Now Supports nn.DataParallel modules. First try to access the state dict, if not available try the module.module attribute.
module = nn.DataParallel(lenet)
logger.save_module(module, "checkpoint.pk")
When the model is large, this function uploads the weight dictionary (state_dict) in chunks. You can specify the size for the chunks, measured in number of tensors.
The conversion convention for the upload chunks is roughly 32bit, or 8 bytes for each np.float32 entry. so the upload size for chunk = 100,000 is roughly
100_000 * 8 * <base56 encoding ration> ~ 960k.
| Parameters: |
|
|---|---|
| Returns: | None |
save_pkl(data, *keys, path=None, append=False, use_dill=False)[source]¶Save data in pkl format
| Parameters: |
|
|---|---|
| Returns: | None |
save_pyplot(path='plot.png', fig=None, format=None, **kwargs)[source]¶| Parameters: |
|
|---|---|
| Returns: | (str) path to which the figure is saved to. |
save_variables(variables, path='variables.pkl', keys=None)[source]¶save tensorflow variables in a dictionary
| Parameters: |
|
|---|
the list of variables. This parameter allows you to overwrite the key we use to save the variables.
By default, we generate the keys from the variable name, without the :[0-9] at the end that points to the tensor (from the variable itself). :return: None
save_video(frame_stack, key, format=None, fps=20, **imageio_kwargs)[source]¶Let’s do the compression here. Video frames are first written to a temporary file and the file containing the compressed data is sent over as a file buffer.
Save a stack of images to
| Parameters: |
|
|---|---|
| Returns: |
savefig(key, fig=None, format=None, **kwargs)[source]¶| Parameters: |
|
|---|---|
| Returns: | (str) path to which the figure is saved to. |
since(*keys)[source]¶returns a float in seconds when 1 key is passed, or a list of floats when multiple keys are passed in. The returned value are in seconds, measured by delta in perf_counter.
Note: This is idempotent.
from ml_logger import logger
logger.start('loop', 'iter')
it = 0
for i in range(10):
it += logger.split('iter')
print('iteration', it / 10)
print('loop', logger.since('loop'))
| Parameters: | *keys – position arguments are timed together. |
|---|---|
| Returns: | float (in seconds) |
split(*keys)[source]¶returns a float in seconds when 1 key is passed, or a list of floats when multiple keys are passed-in.
Automatically de-dupes the keys, but will return the same number of intervals. duplicates will receive the same result.
Note: This is Not idempotent, which is why it is not a property.
from ml_logger import logger
logger.split('loop', 'iter')
it = 0
for i in range(10):
it += logger.split('iter')
print('iteration', it / 10)
print('loop', logger.split('loop'))
| Parameters: | *keys – position arguments are timed together. |
|---|---|
| Returns: | float (in seconds) |
start(*keys)[source]¶starts a timer, saved in float in seconds. The returned perf_counter does not have meaning on its own. Only differences between two perf_counters make sense as time delta.
Automatically de-dupes the keys, but will return the same number of intervals. duplicates will receive the same result.
from ml_logger import logger
logger.start('loop', 'iter')
it = 0
for i in range(10):
it += logger.split('iter')
print('iteration', it / 10)
print('loop', logger.since('loop'))
| Parameters: | *keys – position arguments are timed together. |
|---|---|
| Returns: | float (in seconds) |
stem(path)[source]¶returns the stem of the filename in the path, removes the extension
path = "/Users/geyang/some-proj/experiments/rope-cnn.py"
logger.stem(path)
returns:
"/Users/geyang/some-proj/experiments/rope-cnn"
You can use this in combination with the truncate function. .. code:: python
_ = logger.truncate(path, 4) _ = logger.stem(_)
"experiments/rope-cnn"
This is useful for saving the relative path of your main script.
| Parameters: | path – “learning-to-learn/experiments/run.py” |
|---|---|
| Returns: | “run” |
store(metrics=None, silent=None, cache: Optional[str] = None, _prefix=None, **key_values)¶Store the metric data (with the default summary cache) for making the summary later. This allows the logging/saving of training metrics asynchronously from the logging.
| Parameters: |
|
|---|---|
| Returns: | None |
store_key_value(key: str, value: Any, silent=None, cache: Optional[str] = None) → None[source]¶store the key: value awaiting future summary.
| Parameters: |
|
|---|---|
| Returns: |
store_metrics(metrics=None, silent=None, cache: Optional[str] = None, _prefix=None, **key_values)[source]¶Store the metric data (with the default summary cache) for making the summary later. This allows the logging/saving of training metrics asynchronously from the logging.
| Parameters: |
|
|---|---|
| Returns: | None |
truncate(path, depth=-1)[source]¶truncates the path’s parent directories w.r.t. given depth. By default, returns the filename of the path.
path = "/Users/geyang/some-proj/experiments/rope-cnn.py"
logger.truncate(path, -1)
"rope-cnn.py"
logger.truncate(path, 4)
"experiments/rope-cnn.py"
This is useful for saving the relative path of your main script.
| Parameters: |
|
|---|---|
| Returns: | “run” |
upload_dir(dir_path, target, excludes=(), archive='tar', temp_dir=None)[source]¶upload dir to gs, s3, and ml-logger.
| Parameters: |
|
|---|---|
| Returns: |
upload_file(file_path: str = None, target_path: str = 'files/', once=True) → None[source]¶uploads a file (through a binary byte string) to a target_folder. Default target is “files”
| Parameters: |
|
|---|---|
| Returns: | None |