siml.preprocessing package¶

Subpackages¶

siml.preprocessing.siml_scalers package

Submodules¶

siml.preprocessing.converted_objects module¶

class siml.preprocessing.converted_objects.SimlConvertedItem¶

Bases: object

failed(message: str | None = None) → None¶

Set status as failed

Parameters:: message (Optional[str]) – If fed, register failed message

classmethod from_interim_directory(interim_directory: pathlib.Path, decrypt_key: bytes | None = None)¶

get_failed_message() → str¶

get_status() → str¶

get_values() → tuple[Optional[dict[str, numpy.ndarray]], Optional[femio.fem_data.FEMData]]¶

Get items which this object manages

Returns:: Return dict_data and fem_data if necessary
Return type:: tuple[ Union[dict[str, np.ndarray], None], Union[femio.FEMData, None] ]

property is_failed¶

property is_skipped¶

property is_successed¶

register(*, dict_data: dict[str, numpy.ndarray] | None, fem_data: FEMData | None) → None¶

Register result items

Parameters:

dict_data (Optional[dict[str, np.ndarray]]) – dict data of features
fem_data (Optional[femio.FEMData]) – femio data

Raises:

ValueError – If dict_data has been already registered. raise this error.
ValueError – If fem_data has been already registered. raise this error.

skipped(message: str | None = None) → None¶: Set status as skipped

successed() → None¶: Set status as successed

class siml.preprocessing.converted_objects.SimlConvertedItemContainer(values: dict[str, siml.preprocessing.converted_objects.SimlConvertedItem])¶

Bases: object

classmethod from_interim_directories(interim_directories: list[pathlib.Path], decrypt_key: bytes | None = None)¶

property is_all_successed: bool¶

keys() → dict_keys[str]¶

merge(other: SimlConvertedItemContainer) → SimlConvertedItemContainer¶

return new object merging self data and others. if same key exists in both objects, key in other is prioritised.

Parameters:: other (SimlConvertedItemContainer) – container to merge
Returns:: new container object which has merged data
Return type:: SimlConvertedItemContainer

query_num_status_items(*status: str) → int¶

query the number of data which has the status.

Returns:: number of data to be selected
Return type:: int
Raises:: ValueError – If status is not defined, raise this error.

select_non_successed_items() → dict[str, siml.preprocessing.converted_objects.SimlConvertedItem]¶

Select items of which status is not successed, such as failed,: skipped, unfinished.

Returns:: non successed items
Return type:: dict[str, SimlConvertedItem]

select_successed_items() → dict[str, siml.preprocessing.converted_objects.SimlConvertedItem]¶

Select items of which status is successed.

Returns:: successed items
Return type:: dict[str, SimlConvertedItem]

class siml.preprocessing.converted_objects.SimlConvertedStatus(value)¶

Bases: Enum

An enumeration.

failed = 2¶

not_finished = 0¶

skipped = 3¶

successed = 1¶

siml.preprocessing.converter module¶

class siml.preprocessing.converter.DefaultFilterFunction¶: Bases: IFilterFunction

class siml.preprocessing.converter.DefaultLoadFunction(file_type: str, read_npy: bool, read_res: bool, skip_femio: bool, time_series: bool, conversion_function: IConvertFunction | None = None)¶: Bases: ILoadFunction

class siml.preprocessing.converter.DefaultSaveFunction(main_setting: MainSetting, write_ucd: bool, to_first_order: bool, *, user_save_function: ISaveFunction | None = None)¶: Bases: ISaveFunction

class siml.preprocessing.converter.IConvertFunction¶: Bases: object

class siml.preprocessing.converter.IFilterFunction¶: Bases: object

class siml.preprocessing.converter.ILoadFunction¶: Bases: object

class siml.preprocessing.converter.ISaveFunction¶: Bases: object

class siml.preprocessing.converter.RawConverter(main_setting: MainSetting, *, recursive: bool = True, conversion_function: IConvertFunction | None = None, filter_function: IFilterFunction | None = None, load_function: ILoadFunction | None = None, save_function: ISaveFunction | None = None, force_renew: bool = False, read_npy: bool = False, write_ucd: bool = True, read_res: bool = True, max_process: int | None = None, to_first_order: bool = False)¶

Bases: object

convert(raw_directory: Path | None = None, *, return_results: bool = False) → SimlConvertedItemContainer¶

Perform conversion.

Parameters:

raw_directory (pathlib.Path, optional) – Raw data directory name. If not fed, self.setting.data.raw is used instead.
return_results (bool, optional) – If True, save results and dump files

Returns:

key is a path to raw directory. If return_results is False, values is a list of None. If return_results is True, values is a dictionary

of converted values.

Return type:

dict[str, Union[dict, None]]

convert_single_data(raw_path: Path, *, output_directory: Path | None = None, raise_when_overwrite: bool = False, return_results: bool = False) → SimlConvertedItemContainer¶

Convert single directory.

Parameters:

raw_path (pathlib.Path) – Input data path of raw data.
output_directory (pathlib.Path, optional) – If fed, use the fed path as the output directory.
raise_when_overwrite (bool, optional) – If True, raise when the output directory exists. The default is False.

Returns:

key is a path to raw directory. If return_results is False, values is a list of None. If return_results is True, values is a dictionary

of converted values.

Return type:

dict[str, Union[dict, None]]

classmethod read_settings(settings_yaml, **args)¶

class siml.preprocessing.converter.SingleDataConverter(setting: ConversionSetting, raw_path: Path, load_function: ILoadFunction, filter_function: IFilterFunction, *, save_function: ISaveFunction | None = None, output_directory: Path | None = None, raise_when_overwrite: bool = False, force_renew: bool = False, return_results: bool = False)¶

Bases: object

property output_directory: Path¶

run() → SimlConvertedItem¶

siml.preprocessing.converter.save_dict_data(output_directory: ~pathlib.Path, dict_data: dict[str, numpy.ndarray], *, dtype=<class 'numpy.float32'>, encrypt_key=None, finished_file='converted', save_dtype_dict: ~typing.Dict | None = None) → None¶

Save dict_data.

Parameters:

output_directory (pathlib.Path) – Output directory path.
dict_data (dict) – Data dictionary to be saved.
dtype (type, optional) – Data type to be saved.
encrypt_key (bytes, optional) – Data for encryption.

Return type:

None

siml.preprocessing.scalers_composition module¶

class siml.preprocessing.scalers_composition.ScalersComposition(variable_name_to_scalers: dict[str, str], scalers_dict: dict[str, siml.preprocessing.siml_scalers.scaler_wrapper.SimlScalerWrapper], max_process: int | None = None, decrypt_key: bytes | None = None)¶

Bases: object

REGISTERED_KEY: Final[str] = 'variable_name_to_scalers'¶

classmethod create_from_dict(preprocess_dict: dict, max_process: int | None = None, key: bytes | None = None) → ScalersComposition¶

classmethod create_from_file(converter_parameters_pkl: Path, max_process: int | None = None, key: bytes | None = None) → ScalersComposition¶

get_dumped_object() → dict¶

get_scaler(variable_name: str, allow_missing: bool = False) → SimlScalerWrapper¶

get_scaler_names(group_id: int | None = None) → list[str]¶

get_variable_names(group_id: int | None = None) → list[str]¶

inverse_transform(variable_name: str, data: ndarray | coo_matrix | csr_matrix | csc_matrix) → ndarray | coo_matrix | csr_matrix | csc_matrix¶

inverse_transform_dict(dict_data: dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]], raise_missing_warning: bool = True) → dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]]¶

lazy_partial_fit(scaler_name_to_files: dict[str, list[siml.path_like_objects.siml_files.interface.ISimlNumpyFile]]) → None¶

transform(variable_name: str, data: ndarray | coo_matrix | csr_matrix | csc_matrix) → ndarray | coo_matrix | csr_matrix | csc_matrix¶

transform_dict(dict_data: dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]], raise_missing_warning: bool = True) → dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]]¶

transform_file(variable_name: str, siml_file: ISimlNumpyFile) → ndarray | coo_matrix | csr_matrix | csc_matrix¶

siml.preprocessing.scaling_converter module¶

class siml.preprocessing.scaling_converter.Config¶

Bases: object

arbitrary_types_allowed = True¶

frozen = True¶

init = True¶

class siml.preprocessing.scaling_converter.PreprocessInnerSettings(preprocess_dict: 'dict', interim_directories: 'list[pathlib.Path]', preprocessed_root: 'pathlib.Path', recursive: 'bool' = True, REQUIRED_FILE_NAMES: 'Optional[list[str]]' = None, FINISHED_FILE: 'str' = 'preprocessed', PREPROCESSORS_PKL_NAME: 'str' = 'preprocessors.pkl', cached_interim_directories: 'Optional[list[SimlDirectory]]' = None)¶

Bases: object

FINISHED_FILE: str = 'preprocessed'¶

PREPROCESSORS_PKL_NAME: str = 'preprocessors.pkl'¶

REQUIRED_FILE_NAMES: list[str] | None = None¶

cached_interim_directories: list[siml.path_like_objects.siml_directory.SimlDirectory] | None = None¶

collect_interim_directories() → list[siml.path_like_objects.siml_directory.SimlDirectory]¶

classmethod default_list_check(v)¶

get_default_preprocessors_pkl_path() → Path¶

get_output_directory(data_directory: Path) → Path¶

get_scaler_fitting_files(variable_name: str) → list[siml.path_like_objects.siml_files.interface.ISimlNumpyFile]¶

interim_directories: list[pathlib.Path]¶

preprocess_dict: dict¶

preprocessed_root: Path¶

recursive: bool = True¶

class siml.preprocessing.scaling_converter.ScalingConverter(main_setting: MainSetting, *, force_renew: bool = False, save_func: IScalingSaveFunction | None = None, max_process: int | None = None, allow_missing: bool = False, recursive: bool = True, scalers: ScalersComposition | None = None)¶

Bases: object

This is Facade Class for scaling process

fit_transform(group_id: int | None = None) → None¶

This function is consisted of these three process. - Determine parameters of scalers by reading data files lazily - Transform interim data and save result - Save file of parameters

Parameters:: group_id (int, optional) – group_id to specify chunk of preprocessing group. Useful when MemoryError occurs with all variables preprocessed in one node. If not specified, process all variables.
Return type:: None

inverse_transform(dict_data: dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]]) → dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]]¶

lazy_fit_all(*, group_id: int | None = None) → None¶

Determine preprocessing parameters by reading data files lazily.

Parameters:: group_id (int, optional) – group_id to specify chunk of preprocessing group. Useful when MemoryError occurs with all variables preprocessed in one node. If not specified, process all variables.
Return type:: None

classmethod read_pkl(main_setting: MainSetting, converter_parameters_pkl: Path, key: bytes | None = None)¶

classmethod read_settings(settings_yaml: Path, **args)¶

save() → None¶: Save Parameters of scaling converters

transform_interim(*, group_id: int | None = None) → None¶

Apply scaling process to data in interim directory and save results in preprocessed directory.

Parameters:: group_id (int, optional) – group_id to specify chunk of preprocessing group. Useful when MemoryError occurs with all variables preprocessed in one node. If not specified, process all variables.
Return type:: None

siml.preprocessing package¶

Subpackages¶

Submodules¶

siml.preprocessing.converted_objects module¶

siml.preprocessing.converter module¶

siml.preprocessing.scalers_composition module¶

siml.preprocessing.scaling_converter module¶

Module contents¶