Package/Class Reference¶

Submodules¶

Tensorforce-client only has two major submodules, which are Experiment and Cluster. Currently - as all interaction with the client happens from the command line - you won’t need to know the details of those two classes. We may add support for python scripting against this client library in the future so you will be able to automate your experiments and cluster creation/deletion tasks.

Experiments: (tensorforce_client.experiment)¶

class tensorforce_client.experiment.Experiment(**kwargs)¶

Bases: object

__init__(**kwargs)¶

Keyword Arguments:

file (str) – The Experiment’s json spec file (can contain all other args).
name (str) – The name of the Experiment. This is also the name of the folder where it is stored.
environment (str) – The filename of the json env-spec file to use (see TensorForce documentation).
agent (str) – The filename of the json agent-spec file to use (see TensorForce documentation).
network (str) – The filename of the json network-spec file to use (see TensorForce documentation).
cluster (str) – The filename of the json cluster-spec file to use (see class Cluster).
episodes (int) – The total number of episodes to run (all parallel agents).
total_timesteps (int) – The max. total number of timesteps to run (all parallel agents).
max_timesteps_per_episode (int) – The max. number of timesteps to run in each episode.
deterministic (bool) – Whether to not(!) use stochastic exploration on top of plain action outputs.
repeat_actions (int) – The number of actions to repeat for each action selection (by calling agent.act()).
debug_logging (bool) – Whether to switch on debug logging (default: False).
run_mode (str) – Which runner mode to use. Valid values are only ‘single’, ‘multi-threaded’ and ‘distributed’.
num_workers (int) – The number of worker processes to use (see distributed and multi-threaded run_modes).
num_parameter_servers (int) – The number of parameter servers to use (see distributed tensorflow).
saver_frequency (str) – The frequency with which to save the model. This is a combination of an int and a unit (e.g. “600s”), where unit can be “s” (seconds), “e” (episodes), or “t” (timesteps).
summary_frequency (str) – The frequency with which to save a tensorboard summary. This is a combination of an int and a unit (e.g. “600s”), where unit can be “s” (seconds) or “t” (timesteps). The episode unit (e) is not allowed here.

download()¶: Downloads the experiment’s results (model checkpoints and tensorboard summary files) so far.

generate_locally()¶: Writes the local json spec file for this Experiment object into the Experiment’s dir. This file contains all settings (including agent, network, cluster, run-mode, etc..).

pause(project_id)¶

Pauses the already running Experiment.

Parameters:	project_id (str) – The remote gcloud project-ID.

setup_cluster(cluster, project_id, start=False)¶

Given a cluster name (or None) and a remote project-ID, sets up the cluster settings for this Experiment locally. Also starts the cluster if start is set to True.

Parameters:	cluster (str) – The name of the cluster. If None, will get cluster-spec from the Experiment, or create a default Cluster object. project_id (str) – The remote gcloud project ID. start (bool) – Whether to already create (start) the cluster in the cloud.

Returns: The Cluster object.

start(project_id, resume=False, cluster=None)¶

Starts the Experiment in the cloud (using kubectl). The respective cluster is started (if it’s not already running).

Parameters:	project_id (str) – The remote gcloud project-ID. resume (bool) – Whether we are resuming an already started (and paused) experiment. cluster (str) – The name of the cluster to use (will be started if not already running). None for using the Experiment’s own cluster or - if not given either - a default cluster.

stop(no_download=False)¶

Stops an already running Experiment by deleting the Kubernetes workload. If no_download is set to False (default), will download all results before stopping. If the cluster that the experiment runs on is dedicated to this experiment, will also delete the cluster.

Parameters:	no_download (bool) – Whether to not(!) download the experiment’s results so far (default: False).

write_json_file(file=None)¶

Writes all the Experiment’s settings to disk as a json file.

Parameters:	file (str) – The filename to use. If None, will use the Experiment’s filename.

tensorforce_client.experiment.get_experiment_from_string(experiment, running=False)¶

Returns an Experiment object given a string of either a json file or a name of an already existing eperiment.

Parameters:	experiment (str) – The string to look for (either local json file or local experiment’s name) running (bool) – Whether this experiment is already running.
Returns:	The found Experiment object.

tensorforce_client.experiment.get_local_experiments(as_objects=False)¶

Parameters:	as_objects (bool) – Whether to return a list of strings (names) or actual Experiment objects.

Returns: A list of all Experiment names/objects that already exist in this project.

Clusters: (tensorforce_client.cluster)¶

class tensorforce_client.cluster.Cluster(**kwargs)¶

Bases: object

__init__(**kwargs)¶

A cloud cluster object specifying things like: number of nodes, GPUs per node and GPU type, memory per node, disk size, zone, etc..

Parameters:

kwargs (any) – See below.

Keyword Arguments:

file (str) – The filename of a cluster spec json file to use. Single settings in this file can be overwritten by specifying these in further kwargs to this c’tor.
name (str) – The name of the cluster.
machine_type (str) – The machine type to use for all nodes in the cluster. Machine types can either be gcloud-accepted strings such as everything listed in gcloud compute machine-types list or custom strings that conform to these rules: https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type. When the kwargs cpus_per_node and memory_per_node are given, tensorforce-client will automatically create the correct machine-type.
cpus_per_node (int) – The number of vCPUs per node.
gpus_per_node (int) – The number of (physical) GPUs per node.
gpu_type (str) – The GPU type to use. Supported are only ‘nvidia-tesla-k80’ and ‘nvidia-tesla-p100’.
memory_per_node (int) – The memory (in Gb) per node.
num_nodes (int) – The number of nodes for the cluster.
disk_size (int) – The amount of disk space per node in Gb.
location (str) – The location of the cluster. Default us the gcloud/project set default zone.

create()¶: Create the Kubernetes cluster with the options given in self. This also sets up the local kubectl app to point to the new cluster automatically.

delete()¶: Deletes (shuts down) this cluster in the cloud.

get_spec()¶: Returns: Dict of the important settings of this Cluster.

ssh_parallel(*items, **kwargs)¶

Runs commands via ssh and/or scp commands on all nodes in the cluster in parallel using multiple threads.

Parameters:	items (List[Union[str,tuple]]) – List of commands to execute. Could be either of type str (ssh command) or a tuple/list of two items (`from` and `to`) for an scp command. kwargs (any) – silent (bool): Whether to execute all commands silently (default: True).

tensorforce_client.cluster.get_cluster_from_string(cluster, running_clusters=None)¶

Returns a Cluster object given a string of either a json file or an already running remote cluster’s name.

Parameters:	cluster (str) – The string to look for (either local json file or remote cluster’s name) running_clusters (dict) – Specs for already running cloud clusters by cluster name.
Returns:	The found Cluster object.

Command Functions¶

tensorforce_client.commands.cmd_cluster_create(args)¶

tensorforce_client.commands.cmd_cluster_delete(args)¶

tensorforce_client.commands.cmd_cluster_list()¶

tensorforce_client.commands.cmd_experiment_download(args)¶

tensorforce_client.commands.cmd_experiment_list()¶

tensorforce_client.commands.cmd_experiment_new(args, project_id=None)¶

tensorforce_client.commands.cmd_experiment_pause(args, project_id)¶

tensorforce_client.commands.cmd_experiment_start(args, project_id)¶

tensorforce_client.commands.cmd_experiment_stop(args)¶

tensorforce_client.commands.cmd_init(args)¶