siml.preprocessing package¶
Subpackages¶
- siml.preprocessing.siml_scalers package
- Subpackages
- siml.preprocessing.siml_scalers.scale_functions package
- Submodules
- siml.preprocessing.siml_scalers.scale_functions.identity_scaler module
- siml.preprocessing.siml_scalers.scale_functions.interface_scaler module
- siml.preprocessing.siml_scalers.scale_functions.isoam_scaler module
- siml.preprocessing.siml_scalers.scale_functions.max_abs_scaler module
- siml.preprocessing.siml_scalers.scale_functions.min_max_scaler module
- siml.preprocessing.siml_scalers.scale_functions.sparse_standard_scaler module
- siml.preprocessing.siml_scalers.scale_functions.standard_scaler module
- siml.preprocessing.siml_scalers.scale_functions.user_defined_scaler module
- Module contents
- siml.preprocessing.siml_scalers.scale_functions package
- Submodules
- siml.preprocessing.siml_scalers.scaler_result_save module
- siml.preprocessing.siml_scalers.scaler_wrapper module
- Module contents
- Subpackages
Submodules¶
siml.preprocessing.converted_objects module¶
- class siml.preprocessing.converted_objects.SimlConvertedItem¶
Bases:
object
- failed(message: str | None = None) None ¶
Set status as failed
- Parameters:
message (Optional[str]) – If fed, register failed message
- classmethod from_interim_directory(interim_directory: pathlib.Path, decrypt_key: bytes | None = None)¶
- get_failed_message() str ¶
- get_status() str ¶
- get_values() tuple[Optional[dict[str, numpy.ndarray]], Optional[femio.fem_data.FEMData]] ¶
Get items which this object manages
- Returns:
Return dict_data and fem_data if necessary
- Return type:
tuple[ Union[dict[str, np.ndarray], None], Union[femio.FEMData, None] ]
- property is_failed¶
- property is_skipped¶
- property is_successed¶
- register(*, dict_data: dict[str, numpy.ndarray] | None, fem_data: FEMData | None) None ¶
Register result items
- Parameters:
dict_data (Optional[dict[str, np.ndarray]]) – dict data of features
fem_data (Optional[femio.FEMData]) – femio data
- Raises:
ValueError – If dict_data has been already registered. raise this error.
ValueError – If fem_data has been already registered. raise this error.
- skipped(message: str | None = None) None ¶
Set status as skipped
- successed() None ¶
Set status as successed
- class siml.preprocessing.converted_objects.SimlConvertedItemContainer(values: dict[str, siml.preprocessing.converted_objects.SimlConvertedItem])¶
Bases:
object
- classmethod from_interim_directories(interim_directories: list[pathlib.Path], decrypt_key: bytes | None = None)¶
- property is_all_successed: bool¶
- keys() dict_keys[str] ¶
- merge(other: SimlConvertedItemContainer) SimlConvertedItemContainer ¶
return new object merging self data and others. if same key exists in both objects, key in other is prioritised.
- Parameters:
other (SimlConvertedItemContainer) – container to merge
- Returns:
new container object which has merged data
- Return type:
- query_num_status_items(*status: str) int ¶
query the number of data which has the status.
- Returns:
number of data to be selected
- Return type:
int
- Raises:
ValueError – If status is not defined, raise this error.
- select_non_successed_items() dict[str, siml.preprocessing.converted_objects.SimlConvertedItem] ¶
- Select items of which status is not successed, such as failed,
skipped, unfinished.
- Returns:
non successed items
- Return type:
dict[str, SimlConvertedItem]
- select_successed_items() dict[str, siml.preprocessing.converted_objects.SimlConvertedItem] ¶
Select items of which status is successed.
- Returns:
successed items
- Return type:
dict[str, SimlConvertedItem]
siml.preprocessing.converter module¶
- class siml.preprocessing.converter.DefaultFilterFunction¶
Bases:
IFilterFunction
- class siml.preprocessing.converter.DefaultLoadFunction(file_type: str, read_npy: bool, read_res: bool, skip_femio: bool, time_series: bool, conversion_function: IConvertFunction | None = None)¶
Bases:
ILoadFunction
- class siml.preprocessing.converter.DefaultSaveFunction(main_setting: MainSetting, write_ucd: bool, to_first_order: bool, *, user_save_function: ISaveFunction | None = None)¶
Bases:
ISaveFunction
- class siml.preprocessing.converter.IConvertFunction¶
Bases:
object
- class siml.preprocessing.converter.IFilterFunction¶
Bases:
object
- class siml.preprocessing.converter.ILoadFunction¶
Bases:
object
- class siml.preprocessing.converter.ISaveFunction¶
Bases:
object
- class siml.preprocessing.converter.RawConverter(main_setting: MainSetting, *, recursive: bool = True, conversion_function: IConvertFunction | None = None, filter_function: IFilterFunction | None = None, load_function: ILoadFunction | None = None, save_function: ISaveFunction | None = None, force_renew: bool = False, read_npy: bool = False, write_ucd: bool = True, read_res: bool = True, max_process: int | None = None, to_first_order: bool = False)¶
Bases:
object
- convert(raw_directory: Path | None = None, *, return_results: bool = False) SimlConvertedItemContainer ¶
Perform conversion.
- Parameters:
raw_directory (pathlib.Path, optional) – Raw data directory name. If not fed, self.setting.data.raw is used instead.
return_results (bool, optional) – If True, save results and dump files
- Returns:
key is a path to raw directory. If return_results is False, values is a list of None. If return_results is True, values is a dictionary
of converted values.
- Return type:
dict[str, Union[dict, None]]
- convert_single_data(raw_path: Path, *, output_directory: Path | None = None, raise_when_overwrite: bool = False, return_results: bool = False) SimlConvertedItemContainer ¶
Convert single directory.
- Parameters:
raw_path (pathlib.Path) – Input data path of raw data.
output_directory (pathlib.Path, optional) – If fed, use the fed path as the output directory.
raise_when_overwrite (bool, optional) – If True, raise when the output directory exists. The default is False.
- Returns:
key is a path to raw directory. If return_results is False, values is a list of None. If return_results is True, values is a dictionary
of converted values.
- Return type:
dict[str, Union[dict, None]]
- classmethod read_settings(settings_yaml, **args)¶
- class siml.preprocessing.converter.SingleDataConverter(setting: ConversionSetting, raw_path: Path, load_function: ILoadFunction, filter_function: IFilterFunction, *, save_function: ISaveFunction | None = None, output_directory: Path | None = None, raise_when_overwrite: bool = False, force_renew: bool = False, return_results: bool = False)¶
Bases:
object
- property output_directory: Path¶
- run() SimlConvertedItem ¶
- siml.preprocessing.converter.save_dict_data(output_directory: ~pathlib.Path, dict_data: dict[str, numpy.ndarray], *, dtype=<class 'numpy.float32'>, encrypt_key=None, finished_file='converted', save_dtype_dict: ~typing.Dict | None = None) None ¶
Save dict_data.
- Parameters:
output_directory (pathlib.Path) – Output directory path.
dict_data (dict) – Data dictionary to be saved.
dtype (type, optional) – Data type to be saved.
encrypt_key (bytes, optional) – Data for encryption.
- Return type:
None
siml.preprocessing.scalers_composition module¶
- class siml.preprocessing.scalers_composition.ScalersComposition(variable_name_to_scalers: dict[str, str], scalers_dict: dict[str, siml.preprocessing.siml_scalers.scaler_wrapper.SimlScalerWrapper], max_process: int | None = None, decrypt_key: bytes | None = None)¶
Bases:
object
- REGISTERED_KEY: Final[str] = 'variable_name_to_scalers'¶
- classmethod create_from_dict(preprocess_dict: dict, max_process: int | None = None, key: bytes | None = None) ScalersComposition ¶
- classmethod create_from_file(converter_parameters_pkl: Path, max_process: int | None = None, key: bytes | None = None) ScalersComposition ¶
- get_dumped_object() dict ¶
- get_scaler(variable_name: str, allow_missing: bool = False) SimlScalerWrapper ¶
- get_scaler_names(group_id: int | None = None) list[str] ¶
- get_variable_names(group_id: int | None = None) list[str] ¶
- inverse_transform(variable_name: str, data: ndarray | coo_matrix | csr_matrix | csc_matrix) ndarray | coo_matrix | csr_matrix | csc_matrix ¶
- inverse_transform_dict(dict_data: dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]], raise_missing_warning: bool = True) dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]] ¶
- lazy_partial_fit(scaler_name_to_files: dict[str, list[siml.path_like_objects.siml_files.interface.ISimlNumpyFile]]) None ¶
- transform(variable_name: str, data: ndarray | coo_matrix | csr_matrix | csc_matrix) ndarray | coo_matrix | csr_matrix | csc_matrix ¶
- transform_dict(dict_data: dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]], raise_missing_warning: bool = True) dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]] ¶
- transform_file(variable_name: str, siml_file: ISimlNumpyFile) ndarray | coo_matrix | csr_matrix | csc_matrix ¶
siml.preprocessing.scaling_converter module¶
- class siml.preprocessing.scaling_converter.Config¶
Bases:
object
- arbitrary_types_allowed = True¶
- frozen = True¶
- init = True¶
- class siml.preprocessing.scaling_converter.PreprocessInnerSettings(preprocess_dict: 'dict', interim_directories: 'list[pathlib.Path]', preprocessed_root: 'pathlib.Path', recursive: 'bool' = True, REQUIRED_FILE_NAMES: 'Optional[list[str]]' = None, FINISHED_FILE: 'str' = 'preprocessed', PREPROCESSORS_PKL_NAME: 'str' = 'preprocessors.pkl', cached_interim_directories: 'Optional[list[SimlDirectory]]' = None)¶
Bases:
object
- FINISHED_FILE: str = 'preprocessed'¶
- PREPROCESSORS_PKL_NAME: str = 'preprocessors.pkl'¶
- REQUIRED_FILE_NAMES: list[str] | None = None¶
- cached_interim_directories: list[siml.path_like_objects.siml_directory.SimlDirectory] | None = None¶
- collect_interim_directories() list[siml.path_like_objects.siml_directory.SimlDirectory] ¶
- classmethod default_list_check(v)¶
- get_default_preprocessors_pkl_path() Path ¶
- get_output_directory(data_directory: Path) Path ¶
- get_scaler_fitting_files(variable_name: str) list[siml.path_like_objects.siml_files.interface.ISimlNumpyFile] ¶
- interim_directories: list[pathlib.Path]¶
- preprocess_dict: dict¶
- preprocessed_root: Path¶
- recursive: bool = True¶
- class siml.preprocessing.scaling_converter.ScalingConverter(main_setting: MainSetting, *, force_renew: bool = False, save_func: IScalingSaveFunction | None = None, max_process: int | None = None, allow_missing: bool = False, recursive: bool = True, scalers: ScalersComposition | None = None)¶
Bases:
object
This is Facade Class for scaling process
- fit_transform(group_id: int | None = None) None ¶
This function is consisted of these three process. - Determine parameters of scalers by reading data files lazily - Transform interim data and save result - Save file of parameters
- Parameters:
group_id (int, optional) – group_id to specify chunk of preprocessing group. Useful when MemoryError occurs with all variables preprocessed in one node. If not specified, process all variables.
- Return type:
None
- inverse_transform(dict_data: dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]]) dict[str, Union[numpy.ndarray, scipy.sparse._coo.coo_matrix, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]] ¶
- lazy_fit_all(*, group_id: int | None = None) None ¶
Determine preprocessing parameters by reading data files lazily.
- Parameters:
group_id (int, optional) – group_id to specify chunk of preprocessing group. Useful when MemoryError occurs with all variables preprocessed in one node. If not specified, process all variables.
- Return type:
None
- classmethod read_pkl(main_setting: MainSetting, converter_parameters_pkl: Path, key: bytes | None = None)¶
- classmethod read_settings(settings_yaml: Path, **args)¶
- save() None ¶
Save Parameters of scaling converters
- transform_interim(*, group_id: int | None = None) None ¶
Apply scaling process to data in interim directory and save results in preprocessed directory.
- Parameters:
group_id (int, optional) – group_id to specify chunk of preprocessing group. Useful when MemoryError occurs with all variables preprocessed in one node. If not specified, process all variables.
- Return type:
None