torch_simple_timing.timer#
This class enables timing of (PyTorch) code blocks. It internally
leverages the Clock class to measure
execution times.
When the constructor argument gpu is set to True, the timer’s clocks
will use torch.cuda.Event to time GPU code. For timings to be meaningful,
torch.cuda.synchronize() must be called before and after the code block.
In the case of distributed training, torch.distributed.barrier() will also
be called.
This ^ is taken care of by the Clock class,
but be aware that this may slow-down your code.
Note
Wait, what?? Timing slows code down?? Yes, it does. But it’s not as bad as
you might think. It mainly means that you should be careful when you define
Clock and
Timer
objects. For example, if you want to time a forward
function of a model, the fact that the overall epoch is slower does not matter,
you want to accurately measure the time spent in the forward function.
Warning
Because of the torch.cuda.synchronize() calls, the
Timer class should
be carefully used in the context of training. For instance, if you want to time
epochs the synchronization overhead will be negligible. However, if you want
to time training iterations, you should be careful to only do that for 1 (or
a few epochs) and not for the whole training. Otherwise, the overhead may become
significant. Use the ignore argument to disable timing for a specific clock.
Example:
import torch
from torch.nn import Sequential, Linear, ReLU
from torch_simple_timing import Timer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
gpu = device.type == "cuda"
timer = Timer(gpu=gpu)
# manual start
timer.clock("init").start()
batches = 32
bs = 64
n = batches * bs
dim = 64
labels = 10
hidden = 1024
epochs = 5
t = torch.randn(n, dim, device=device)
y = torch.randint(0, labels, (n,), device=device)
model = Sequential(
Linear(dim, hidden),
ReLU(),
Linear(hidden, hidden),
ReLU(),
Linear(hidden, labels),
).to(device)
optimizer = torch.optim.Adam(model.parameters())
loss_func = torch.nn.CrossEntropyLoss()
timer.clock("init").stop()
with timer.clock("train-loop"):
for epoch in range(epochs):
with timer.clock("train-epoch"):
for batch in range(batches):
optimizer.zero_grad()
# only time the first 2 epochs
with timer.clock("train-batch", ignore=epoch > 2):
with timer.clock("forward"):
pred = model(t[batch * bs : (batch + 1) * bs])
with timer.clock("loss", ignore=epoch > 2):
loss = loss_func(pred, y[batch * bs : (batch + 1) * bs])
with timer.clock("backward", ignore=epoch > 2):
loss.backward()
optimizer.step()
if batch % 10 == 0:
print(f"Epoch {epoch}, batch {batch}, loss {loss.item():.3f}")
# compute mean/std stats for each clock in the timer
stats = timer.stats()
# stats will be computed internally if not provided
print(timer.display(stats=stats, precision=5))
init : 0.01141 (n= 1)
train-loop : 1.97650 (n= 1)
train-epoch : 0.39529 ± 0.07439 (n= 5)
train-batch : 0.01087 ± 0.00189 (n= 96)
forward : 0.00327 ± 0.00703 (n=160)
loss : 0.00010 ± 0.00023 (n= 96)
backward : 0.00438 ± 0.00074 (n= 96)
Module Contents#
Classes#
|
- class torch_simple_timing.timer.Timer(gpu=False, ignore=False)[source]#
Clockmanager. Store and display timing statistics.Warning
In order to accurately measure GPU timings,
torch.cuda.synchronize()will be called before and after each clock’sstartandstop- Parameters:
gpu (bool, optional) – Whether or not to use GPU timing using CUDA events. Defaults to
False.ignore (bool, optional) – Whether to disable this timer. Can be useful when the same piece of code is used in various contexts, for instance in training or validation modes you may want to disable timing. Defaults to
False.
- clock(name, ignore=None, gpu=None)[source]#
Create a new
Clockobject with namenameand add it to theTimer. If theClockalready exists, it will be returned.Note
If
ignoreisNone, theTimer’signoreattribute will be used.Note
If
ignoreis notNone, theClock‘signoreattribute will be updated.Warning
Don’t forget to call
.start()and.stop()on the returnedClockif you’re not usingtimer.clock()as a context manager.- Parameters:
name (str) – A name for the requested clock.
ignore (Optional[bool], optional) – Whether to ignore this clock and don’t time anything. This is useful in case timing slows you down (because of
torch.cuda.synchronize()andtorch.distributed.barrier()) and you only want to time the first epoch for instance. Defaults toNone.gpu (Optional[bool], optional) – Whether to enable GPU timing with CUDA events. Defaults to
None.
- Returns:
The requested
Clockobject.- Return type:
- disable(clock_names=None)[source]#
Disable the specified clocks based on their names.
- Parameters:
clock_names (Optional[List[str]], optional) – The list of clock names to disable. If
None, all clocks in this timer are disabled. Defaults toNone.- Return type:
None
- display(clock_names=None, precision=3, sort_keys_func=None, stats=None)[source]#
Display the mean, standard deviation and support of the times for each clock.
Timer.stats()is called internally to compute the stats. You can pre-compute stats independently and pass them to this method with thestats=argument.Optionally, you can provide a function
sort_keys_functo sort the clocks by a specific key. For instance, you can sort them alphabetically withsort_keys_func=lambda k: k. By default, they will be displayed according to their creation order.>>> print(timer.display()) epoch : 0.251 ± 0.027 (n=10) forward : 0.002 ± 0.005 (n=50) backward : 0.002 ± 0.004 (n=50)
- Parameters:
clock_names (Optional[List[str]], optional) – The list of clock names to display. If
None, all clocks in this timer are displayed. Defaults toNone.precision (int, optional) – The number of digits to display after the decimal point. Defaults to
3.sort_keys_func (Callable, optional) – A function to use to sort the displayed clocks. Defaults to
None, i.e. creation order.stats (Dict[str, Dict[str, Union[int, float]]], optional) – The stats to display. If
None, the stats will be computed internally. Defaults toNone.
- Returns:
A string representation of the stats.
- Return type:
str
- reset(keys=None)[source]#
Deletes specified
keys. Ifkeysis None, resets all timers.- Parameters:
keys (Union[str, List[str]], optional) – Specific named timers to reset, or all of them if
keysisNone. Defaults toNone.- Return type:
None
- stats(clock_names=None, map_funcs=None)[source]#
Computes the mean and standard deviation of the times for each clock. Returns a dictionary of dictionaries with the following structure:
{ "clock_name": { "mean": float, "std": float, "n": int } }
Optionally, you can provide a dictionary of functions to apply to the list of times for each clock. If a clock name is not in the dictionary, no function will be applied (equivalent to
lambda t: t).throughput = timer.stats( map_funcs={"forward": lambda t: batch_size / t} )
This method will be called internally by
timer.display()or you can provide it there if you want to do something else with the stats (log them for instance).- Parameters:
clock_names (List[str], optional) – List of clock names to compute the stats for, or all of them if
None. Defaults toNone.map_funcs (Dict[str, callable], optional) – Dictionary of functions to pre-process the list of times for each clock. Defaults to
None.
- Returns:
- A dictionary of dictionaries,
mapping clock names to a dictionary of statistics.
- Return type:
Dict[str, Dict[str, Union[int, float]]]