phlower.services.trainer.PhlowerTrainer¶

class phlower.services.trainer.PhlowerTrainer(setting, restart_directory=None)[source]¶

Bases: object

PhlowerTrainer is a class that manages the training process.

Examples

>>> trainer = PhlowerTrainer.from_setting(setting)
>>> trainer.train(
...     output_directory,
...     train_directories,
...     validation_directories
... )

Methods

`__init__`(setting[, restart_directory])	Initialize PhlowerTrainer without updating trainer's
`attach_handler`(name, handler[, allow_overwrite])	Attach handler to the trainer :param name: Name of the handler :type name: str :param handler: Handler to attach :type handler: PhlowerHandlersRunner
`draw_model`(output_directory)	Draw model
`from_setting`(setting[, decrypt_key])	Create PhlowerTrainer from PhlowerSetting
`get_n_handlers`()	Get the number of handlers
`get_registered_trainer_setting`()	Get registered trainer setting
`load_pretrained`(model_directory, selection_mode)	Load pretrained model
`restart_from`(model_directory[, decrypt_key])	Restart PhlowerTrainer from model directory
`train`(-> float)
`train_ddp`(rank, world_size, output_directory)	Train the model with Distributed Data Parallel (DDP)
`train_fsdp`(rank, world_size, output_directory)	Train the model with Fully Shard Data Parallel (FSDP)

Parameters:

setting (PhlowerSetting)
restart_directory (pathlib.Path | None)

attach_handler(name, handler, allow_overwrite=False)[source]¶

Attach handler to the trainer :param name: Name of the handler :type name: str :param handler: Handler to attach :type handler: PhlowerHandlersRunner

Raises:

ValueError – If handler with the same name already exists

Parameters:

name (str)
handler (PhlowerHandlersRunner)
allow_overwrite (bool)

Return type:

None

draw_model(output_directory)[source]¶

Draw model

Parameters:: output_directory (Path) – pathlib.Path Output directory
Return type:: None

classmethod from_setting(setting, decrypt_key=None)[source]¶

Create PhlowerTrainer from PhlowerSetting

Parameters:

setting (PhlowerSetting) – PhlowerSetting PhlowerSetting
decrypt_key (bytes | None)

Returns:

PhlowerTrainer

Return type:

PhlowerTrainer

get_n_handlers()[source]¶

Get the number of handlers

Returns:: Number of handlers
Return type:: int

get_registered_trainer_setting()[source]¶

Get registered trainer setting

Returns:: Trainer setting
Return type:: PhlowerTrainerSetting

load_pretrained(model_directory, selection_mode, target_epoch=None, map_location=None, decrypt_key=None)[source]¶

Load pretrained model

Parameters:

model_directory (Path) – pathlib.Path Model directory
selection_mode (Literal['best', 'latest', 'train_best', 'specified']) – Literal[“best”, “latest”, “train_best”, “specified”] Selection mode
target_epoch (int | None) – int | None Target epoch. Defaults to None.
device – str | None Device. Defaults to None.
decrypt_key (bytes | None) – bytes | None Decrypt key. Defaults to None.
map_location (str | dict | None)

Return type:

None

classmethod restart_from(model_directory, decrypt_key=None)[source]¶

Restart PhlowerTrainer from model directory

Parameters:

model_directory (Path) – pathlib.Path Model directory
decrypt_key (bytes | None)

Returns:

PhlowerTrainer

Return type:

PhlowerTrainer

train_ddp(rank, world_size, output_directory, train_directories=None, validation_directories=None, disable_dimensions=False, decrypt_key=None, encrypt_key=None)[source]¶

Train the model with Distributed Data Parallel (DDP)

Parameters:

rank (int) – Rank of the current process
world_size (int) – Total number of processes
output_directory (pathlib.Path) – Output directory
train_directories (list[pathlib.Path] | None, optional) – List of directories containing training data. If None, directories defined in the setting are used. Default is None.
validation_directories (list[pathlib.Path] | None, optional) – List of directories containing validation data. If None, directories defined in the setting are used. Default is None.
disable_dimensions (bool, optional) – Disable dimensions. Default is False.
decrypt_key (bytes | None, optional) – Key used for decrypting data files, if necessary. Default is None.
encrypt_key (bytes | None, optional) – Key used for encrypting output files, if necessary. Default is None.

Return type:

float

Examples

>>> import torch.multiprocessing as mp
>>> trainer = PhlowerTrainer.from_setting(setting)
>>> mp.spawn(
...     trainer.train_ddp,
...     args=(world_size, output_directory),
...     nprocs=world_size,
...     join=True
... )

train_fsdp(rank, world_size, output_directory, train_directories=None, validation_directories=None, disable_dimensions=False, decrypt_key=None, encrypt_key=None)[source]¶

Train the model with Fully Shard Data Parallel (FSDP)

Parameters:

rank (int) – Rank of the current process
world_size (int) – Total number of processes
output_directory (pathlib.Path) – Output directory
train_directories (list[pathlib.Path] | None, optional) – List of directories containing training data. If None, directories defined in the setting are used. Default is None.
validation_directories (list[pathlib.Path] | None, optional) – List of directories containing validation data. If None, directories defined in the setting are used. Default is None.
disable_dimensions (bool, optional) – Disable dimensions. Default is False.
decrypt_key (bytes | None, optional) – Key used for decrypting data files, if necessary. Default is None.
encrypt_key (bytes | None, optional) – Key used for encrypting output files, if necessary. Default is None.

Return type:

float

Examples

>>> import torch.multiprocessing as mp
>>> trainer = PhlowerTrainer.from_setting(setting)
>>> mp.spawn(
...     trainer.train_fsdp,
...     args=(world_size, output_directory),
...     nprocs=world_size,
...     join=True
... )