`torch_simple_timing.timer`#

This class enables timing of (PyTorch) code blocks. It internally leverages the Clock class to measure execution times.

When the constructor argument gpu is set to True, the timer’s clocks will use torch.cuda.Event to time GPU code. For timings to be meaningful, torch.cuda.synchronize() must be called before and after the code block. In the case of distributed training, torch.distributed.barrier() will also be called.

This ^ is taken care of by the Clock class, but be aware that this may slow-down your code.

Note

Wait, what?? Timing slows code down?? Yes, it does. But it’s not as bad as you might think. It mainly means that you should be careful when you define Clock and Timer objects. For example, if you want to time a forward function of a model, the fact that the overall epoch is slower does not matter, you want to accurately measure the time spent in the forward function.

Warning

Because of the torch.cuda.synchronize() calls, the Timer class should be carefully used in the context of training. For instance, if you want to time epochs the synchronization overhead will be negligible. However, if you want to time training iterations, you should be careful to only do that for 1 (or a few epochs) and not for the whole training. Otherwise, the overhead may become significant. Use the ignore argument to disable timing for a specific clock.

Example:

import torch
from torch.nn import Sequential, Linear, ReLU
from torch_simple_timing import Timer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
gpu = device.type == "cuda"
timer = Timer(gpu=gpu)

# manual start
timer.clock("init").start()

batches = 32
bs = 64
n = batches * bs
dim = 64
labels = 10
hidden = 1024
epochs = 5

t = torch.randn(n, dim, device=device)
y = torch.randint(0, labels, (n,), device=device)

model = Sequential(
    Linear(dim, hidden),
    ReLU(),
    Linear(hidden, hidden),
    ReLU(),
    Linear(hidden, labels),
).to(device)
optimizer = torch.optim.Adam(model.parameters())
loss_func = torch.nn.CrossEntropyLoss()

timer.clock("init").stop()

with timer.clock("train-loop"):
    for epoch in range(epochs):
        with timer.clock("train-epoch"):
            for batch in range(batches):
                optimizer.zero_grad()
                # only time the first 2 epochs
                with timer.clock("train-batch", ignore=epoch > 2):
                    with timer.clock("forward"):
                        pred = model(t[batch * bs : (batch + 1) * bs])
                    with timer.clock("loss", ignore=epoch > 2):
                        loss = loss_func(pred, y[batch * bs : (batch + 1) * bs])
                    with timer.clock("backward", ignore=epoch > 2):
                        loss.backward()
                    optimizer.step()

                if batch % 10 == 0:
                    print(f"Epoch {epoch}, batch {batch}, loss {loss.item():.3f}")

# compute mean/std stats for each clock in the timer
stats = timer.stats()

# stats will be computed internally if not provided
print(timer.display(stats=stats, precision=5))

init        : 0.01141           (n=  1)
train-loop  : 1.97650           (n=  1)
train-epoch : 0.39529 ± 0.07439 (n=  5)
train-batch : 0.01087 ± 0.00189 (n= 96)
forward     : 0.00327 ± 0.00703 (n=160)
loss        : 0.00010 ± 0.00023 (n= 96)
backward    : 0.00438 ± 0.00074 (n= 96)

Module Contents#

Classes#

Timer

Clock manager. Store and display timing statistics.

class torch_simple_timing.timer.Timer(gpu=False, ignore=False)[source]#

Clock manager. Store and display timing statistics.

Warning

In order to accurately measure GPU timings, torch.cuda.synchronize() will be called before and after each clock’s start and stop

Parameters:

gpu (bool, optional) – Whether or not to use GPU timing using CUDA events. Defaults to False.
ignore (bool, optional) – Whether to disable this timer. Can be useful when the same piece of code is used in various contexts, for instance in training or validation modes you may want to disable timing. Defaults to False.

__repr__()[source]#

Return repr(self).

Return type:: str

clock(name, ignore=None, gpu=None)[source]#

Create a new Clock object with name name and add it to the Timer. If the Clock already exists, it will be returned.

Note

If ignore is None, the Timer’s ignore attribute will be used.

Note

If ignore is not None, the Clock ‘s ignore attribute will be updated.

Warning

Don’t forget to call .start() and .stop() on the returned Clock if you’re not using timer.clock() as a context manager.

Parameters:

name (str) – A name for the requested clock.
ignore (Optional[bool], optional) – Whether to ignore this clock and don’t time anything. This is useful in case timing slows you down (because of torch.cuda.synchronize() and torch.distributed.barrier()) and you only want to time the first epoch for instance. Defaults to None.
gpu (Optional[bool], optional) – Whether to enable GPU timing with CUDA events. Defaults to None.

Returns:

The requested Clock object.

Return type:

Clock

disable(clock_names=None)[source]#

Disable the specified clocks based on their names.

Parameters:: clock_names (Optional[List[str]], optional) – The list of clock names to disable. If None, all clocks in this timer are disabled. Defaults to None.
Return type:: None

display(clock_names=None, precision=3, sort_keys_func=None, stats=None)[source]#

Display the mean, standard deviation and support of the times for each clock.

Timer.stats() is called internally to compute the stats. You can pre-compute stats independently and pass them to this method with the stats= argument.

Optionally, you can provide a function sort_keys_func to sort the clocks by a specific key. For instance, you can sort them alphabetically with sort_keys_func=lambda k: k. By default, they will be displayed according to their creation order.

>>> print(timer.display())
epoch    : 0.251 ± 0.027 (n=10)
forward  : 0.002 ± 0.005 (n=50)
backward : 0.002 ± 0.004 (n=50)

Parameters:

clock_names (Optional[List[str]], optional) – The list of clock names to display. If None, all clocks in this timer are displayed. Defaults to None.
precision (int, optional) – The number of digits to display after the decimal point. Defaults to 3.
sort_keys_func (Callable, optional) – A function to use to sort the displayed clocks. Defaults to None, i.e. creation order.
stats (Dict[str, Dict[str, Union[int, float]]], optional) – The stats to display. If None, the stats will be computed internally. Defaults to None.

Returns:

A string representation of the stats.

Return type:

str

reset(keys=None)[source]#

Deletes specified keys. If keys is None, resets all timers.

Parameters:: keys (Union[str, List[str]], optional) – Specific named timers to reset, or all of them if keys is None . Defaults to None.
Return type:: None

stats(clock_names=None, map_funcs=None)[source]#

Computes the mean and standard deviation of the times for each clock. Returns a dictionary of dictionaries with the following structure:

{
    "clock_name": {
        "mean": float,
        "std": float,
        "n": int
    }
}

Optionally, you can provide a dictionary of functions to apply to the list of times for each clock. If a clock name is not in the dictionary, no function will be applied (equivalent to lambda t: t).

throughput = timer.stats(
    map_funcs={"forward": lambda t: batch_size / t}
)

This method will be called internally by timer.display() or you can provide it there if you want to do something else with the stats (log them for instance).

Parameters:

clock_names (List[str], optional) – List of clock names to compute the stats for, or all of them if None . Defaults to None.
map_funcs (Dict[str, callable], optional) – Dictionary of functions to pre-process the list of times for each clock. Defaults to None.

Returns:

A dictionary of dictionaries,: mapping clock names to a dictionary of statistics.

Return type:

Dict[str, Dict[str, Union[int, float]]]

torch_simple_timing.timer#

Module Contents#

Classes#

`torch_simple_timing.timer`#