time_series_transform.transform_core_api package

Submodules

time_series_transform.transform_core_api.base module

class time_series_transform.transform_core_api.base.Time_Series_Data(data=None, time_index=None)[source]

Bases: object

property data
dropna()[source]

dropna drop null values

it will drop null values for the time index. For example, time_index:[1,2,3], data1:[1,2,np.nan], data2[1,2,3] dropna will return time_index:[1,2], data1:[1,2], data2[1,2]

Returns

it will return a new Time_Series_Data without null values

Return type

Time_Series_Data

property labels
remove(key, remove_type=None)[source]

remove remove data or label

this function will remove the target key and values from the data structure

Parameters
  • key (str) – the name of data or label

  • remove_type (['data','label'], optional) – passing the type of removed data will improve the performance of searching, by default None

Returns

it will pass self

Return type

self

set_data(inputData, label)[source]

set_data setter of data

the alternative of setting data. Before setting data, time_series_Ix should be initialized beforehand.

Parameters
  • inputData (list) – input value of data

  • label (str) – the name of list input

Returns

it will return self

Return type

self

Raises

ValueError – different time length error

set_labels(inputData, label)[source]

set_data setter of label

the alternative of setting data. Before setting data, time_series_Ix should be initialized beforehand.

Parameters
  • inputData (list) – input value of data

  • label (str) – the name of list input

Returns

it will return self

Return type

self

Raises

ValueError – different time length error

set_time_index(inputData, label)[source]

set_time_index alternative of setting time_index

setting time_index

Parameters
  • inputData (list) – input values

  • label (str) – name of time_index

Returns

it will return self

Return type

self

sort(ascending=True)[source]

sort sorting data by time_index

sort data by index

Parameters

ascending (bool, optional) – whether to sort the time index ascending, by default True

Returns

it will return a sorted self

Return type

self

property time_index
transform(inputLabels, newName, func, *args, **kwargs)[source]

transform the way of manipulating data

this function is a wrapper of executing data manipulation

Parameters
  • inputLabels (str or list of string) – the input data pass into functions

  • newName (str) – the new name or prefix for the output data if the function has specify the output name, it will become prefix

  • func (function) – the function for data manipulation. the output of function requires to be dictiony of list, numpy array or pandas dataFrame. The final output should also have the same length as time_index

Returns

Return type

self

class time_series_transform.transform_core_api.base.Time_Series_Data_Collection(time_series_data, time_seriesIx, categoryIx)[source]

Bases: object

dropna(categoryKey=None)[source]

dropna drop null values by a specific key or all

if categoryKey is None, it will drop all keys

Parameters

categoryKey (str or numeric data, optional) – the key of target data, by default None

Returns

Return type

self

pad_time_index(fillMissing=nan)[source]

fill certain values for each missing time_index for the Time_Series_Data comparing to different keys

Parameters

fillMissing (object, optional) – the filling values, by default np.nan

Returns

Return type

self

remove(key)[source]

remove remove the target key of Time_Series_Data

remove the target key of Time_Series_Data

Parameters

key (str) – target key

Returns

Return type

self

remove_different_time_index()[source]

remove_different_time_index remove the time period which does not exisit in other Time_Series_Data

Returns

Return type

self

set_time_series_data_collection(ix, time_series_data)[source]

set_time_series_data_collection alternative of setting time_series_collection data

using this function, one can add a new key of Time_Series_Data.

Parameters
Raises

ValueError – invalid input data type

sort(ascending=True, categoryList=None)[source]

sort sort the Time_Series_Data for specific keys or all keys

Parameters
  • ascending (bool, optional) – sorting for ascending order, by default True

  • categoryList (list, optional) – list of key names. if None, it will sort all, by default None

Returns

Return type

self

property time_series_data_collection
transform(inputLabels, newName, func, n_jobs=1, verbose=0, backend='loky', *args, **kwargs)[source]

transform the function of manipulating data for each keys.

this function implments joblib parallel execution. Hence, each key of data can be compute in the parallel fashion.

Parameters
  • inputLabels (str or list of string) – the input data pass into functions

  • newName (str) – the new name or prefix for the output data if the function has specify the output name, it will become prefix

  • func (function) – the function for data manipulation. the output of function requires to be dictiony of list, numpy array or pandas dataFrame. The final output should also have the same length as time_index

  • n_jobs (int, optional) – number of processes (joblib), by default 1

  • verbose (int, optional) – log level (joblib), by default 0

  • backend (str, optional) – backend type (joblib), by default ‘loky’

Returns

Return type

self

time_series_transform.transform_core_api.tfDataset_adopter module

class time_series_transform.transform_core_api.tfDataset_adopter.TFRecord_Reader(fileName, dtypeDict, compression_type='GZIP')[source]

Bases: object

feature_des_builder()[source]

feature_des_builder create feature description ojbect for tensorflow dataset

using dtypeDict to build the feature description object notice: currently this builder only creates FixedLenFeature.

Returns

feature description object

Return type

dict

make_tfDataset(tensor_opt_dtype=tf.float32)[source]

make_tfDataset making tensorflow dataset

Parameters

tensor_opt_dtype (tf dtypes, optional) – the tensorflow data type used for casting dataset features, by default tf.float32

Returns

tensorflow dataset prepared for model training/testing

Return type

tensorflow dataset

class time_series_transform.transform_core_api.tfDataset_adopter.TFRecord_Writer(fileName, compression_type='GZIP')[source]

Bases: object

get_tfRecord_dtype(pickleDir=None)[source]

get_tfRecord_dtype geting dtype dictionary for reading tfRecord

this method returns or pickle the dictionary of tfRecord feature datatype

Parameters

pickleDir (str, optional) – the directory of pickling dataType dictionary if not None, by default None

Returns

dictionary for making TFRecord_Reader

Return type

dict

write_tfRecord(data)[source]

write_tfRecord writing tfRecord

transforming list of dict object to tfRecord

Parameters

data (list of dict) – list of dict data such as [{‘col’:1,’col2’:”123”,’col3’:[1,2]}]

time_series_transform.transform_core_api.time_series_transformer module

class time_series_transform.transform_core_api.time_series_transformer.Time_Series_Transformer(data, timeSeriesCol, mainCategoryCol=None)[source]

Bases: object

dropna(categoryKey=None)[source]

dropna drop null values

remove null values for all or a specific category

Parameters

categoryKey (str or numeric, optional) – if None all category will be chosen, by default None

Returns

Return type

self

classmethod from_arrow_table(arrow_table, timeSeriesCol, mainCategoryCol)[source]

from_arrow_table import data from apache arrow table

Parameters
  • arrow_table (arrow table) – input data

  • timeSeriesCol (str or numeric) – time series column name

  • mainCategoryCol (str or numeric) – main category name

Returns

Return type

Time_Series_Transformer

classmethod from_feather(feather_dir, timeSeriesCol, mainCategoryCol, columns=None)[source]

from_feather import data from feather

Parameters
  • feather_dir (str) – directory of feather file

  • timeSeriesCol (str or numeric) – time series column name

  • mainCategoryCol (str or numeric) – main category name

  • columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None

Returns

Return type

Time_Series_Transformer

classmethod from_numpy(numpyData, timeSeriesCol, mainCategoryCol)[source]

from_numpy import data from numpy

Parameters
  • numpyData (numpy ndArray) – input data

  • timeSeriesCol (int) – index of time series column

  • mainCategoryCol (int) – index of main category column

Returns

Return type

Time_Series_Transformer

classmethod from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol)[source]

from_pandas import data from pandas dataFrame

Parameters
  • pandasFrame (pandas DataFrame) – input data

  • timeSeriesCol (str or numeric) – time series column name

  • mainCategoryCol (str or numeric) – main category name

Returns

Return type

Time_Series_Transformer

classmethod from_parquet(parquet_dir, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]

from_parquet import data from parquet file

Parameters
  • parquet_dir (str) – directory of parquet file

  • timeSeriesCol (str or numeric) – time series column name

  • mainCategoryCol (str or numeric) – main category name

  • columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None

  • partitioning (str, optional) – type of partitioning, by default ‘hive’

  • filters (str, optional) – filter (apache arrow implmentation), by default None

  • filesystem (str, optional) – filesystem (apache arrow implmentation), by default None

Returns

Return type

Time_Series_Transformer

make_identical_sequence(inputLabels, windowSize, suffix=None, verbose=0, n_jobs=1)[source]

make_identical_sequence making sequences having same data

this function will make same data for a givne sequence. it could be useful for category data in deep learning.

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • windowSize (int) – the length of sequence

  • suffix (str, optional) – the suffix of new data, by default None

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_label(key, collectionKey=None)[source]

make_label make label data

it will turn the data into label. when using io functions, specifing sepLabel parameter can seperate label and data.

Parameters
  • key (str or numeric data) – the target data name

  • collectionKey (str or numeric data, optional) – the target collection, if None, all collection is selected, by default None

Returns

Return type

self

make_lag(inputLabels, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]

make_lag making lag data for a given list of data

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • lagNum (int) – the target lag period to make

  • suffix (str, optional) – the suffix of new data, by default None

  • fillMissing (object, optional) – the data for filling missing data, by default np.nan

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lag_sequence(inputLabels, windowSize, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]

make_lag_sequence making lag sequence data

this function could be useful for deep learning.

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • windowSize (int) – the length of sequence

  • lagNum (int) – the lag period of sequence

  • suffix (str, optional) – the suffix of new data, by default None

  • fillMissing (object, optional) – the data for filling missing data, by default np.nan

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lead(inputLabels, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]

make_lead make_lead making lead data for a given list of data

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • leadNum (int) – the target lead period to make

  • suffix (str, optional) – the suffix of new data, by default None

  • fillMissing (object, optional) – the data for filling missing data, by default np.nan

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lead_sequence(inputLabels, windowSize, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]

make_lead_sequence making lead sequence data

this function could be useful for deep learning.

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • windowSize (int) – the length of sequence

  • leadNum (int) – the lead period of sequence

  • suffix (str, optional) – the suffix of new data, by default None

  • fillMissing (object, optional) – the data for filling missing data, by default np.nan

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_stack_sequence(inputLabels, newName, axis=- 1, verbose=0, n_jobs=1)[source]

make_stack_sequence stacking sequences data

making multiple seqeunce data into one on the given axis

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • newName (str) – new name for the stacking data

  • axis (int, optional) – the axis for stacking (numpy stack implmentation), by default -1

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

[description]

Return type

[type]

pad_different_category_time(fillMissing=nan)[source]

pad time length if mainCategoryCol is not specified, this function has no function.

Parameters

fillMissing (object, optional) – data for filling paded data, by default np.nan

Returns

Return type

self

remove_category(categoryName)[source]

remove_category remove a specific category data

Parameters

categoryName (str or numeric data) – the target category to be removed

Returns

Return type

self

remove_different_category_time()[source]

remove different time index for category if mainCategoryCol is not specified, this function has no function. :returns: :rtype: self

remove_feature(colName)[source]

remove_feature remove certain data or labels

Parameters

colName (str or numeric) – target column or data to be removed

Returns

Return type

self

to_arrow_table(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]

to_arrow_table output data as apache arrow table format

Parameters
  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

arrow table

to_dict()[source]

to_dict output data as dictionary list

Returns

Return type

dict of list

to_feather(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version=1, chunksize=None)[source]

to_feather output data into feather format

Parameters
  • dirPaths (str) – directory of output data

  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

  • version (int, optional) – fether version (apache arrow implmentation), by default 1

  • chunksize (int, optional) – chunksize for output (apache arrow implmentation), by default None

to_numpy(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]

to_numpy output data into numpy format

Parameters
  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

numpy ndArray

to_pandas(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]

to_pandas output data into pandas dataFrame

Parameters
  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

pandas dataFrame

to_parquet(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version='1.0', isDataset=False, partition_cols=None)[source]

to_parquet output data into parquet format

Parameters
  • dirPaths (str) – directory of output data

  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

  • version (str, optional) – parquet version (apache arrow implmentation), by default ‘1.0’

  • isDataset (bool, optional) – whether to output data as dataset format (apache arrow implmentation), by default False

  • partition_cols (str, optional) – whether to partition data (apache arrow implmentation), by default None

transform(inputLabels, newName, func, n_jobs=1, verbose=0, backend='loky', *args, **kwargs)[source]

transform the wrapper of functions performing data manipulation

This function provides a way to do different data manipulation. The output data should be either pandas dataFrame, numpy ndArray, or list of dict. Also, the data should have the same time length as the original data.

Parameters
  • inputLabels (str, numeric data or list of data or numeric data) – the input data columns passing to function

  • newName (str) – the output data name or prefix if the out function provides the new name, it will automatically become prefix

  • func (function) – the data manipulation function

  • n_jobs (int, optional) – joblib implemention, only used when mainCategoryCol is given, by default 1

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • backend (str, optional) – joblib implmentation only used when mainCategoryCol is given, by default ‘loky’

Returns

Return type

self

time_series_transform.transform_core_api.util module

time_series_transform.transform_core_api.util.differencing(arr, order=1)[source]

differencing time series differencing

it simply perform series differencing For example: order 1

Xt, Xt+1 –> Xt+1 - Xt

order 2

Xt, Xt+1, Xt+2 –> Xt+1 - Xt, Xt+2-Xt+1 = a,b –> b - a

and so on

Parameters
  • arr (numpy array) – input array

  • order (int, optional) – number of differencing, by default 1

Returns

differenced array

Return type

numpy array

time_series_transform.transform_core_api.util.ema(arr, com=None, span=None, halflife=None, alpha=None, adjust=True, min_periods=0, ignore_na=False, axis=0)[source]

this is the panads ema implmentation

time_series_transform.transform_core_api.util.geometric_ma(arr, windowSize)[source]

geometric_ma geometric moving average

it use pandas rolling window with sicpy gmean function

Parameters
  • arr (numpy array) – input arrray

  • windowSize (int) – grouping size

Returns

geometric moving average array

Return type

numpy array

time_series_transform.transform_core_api.util.madev(d, axis=None)[source]

Mean absolute deviation

time_series_transform.transform_core_api.util.moving_average(arr, windowSize=3)[source]

moving_average the arithimetic moving average

Given the window size, this function will perform simple moving average

Parameters
  • arr (numpy array) – the input array

  • windowSize (int, optional) – the grouping size, by default 3

Returns

the moving average array

Return type

numpy array

time_series_transform.transform_core_api.util.rfft_transform(arr, threshold=1000.0)[source]

rfft_transform real fast fourier transformation

Fast fourier trnasformation and ignoring the imagine number note: numpy implmentation

Parameters
  • arr (numpy array) – input array

  • threshold (float, optional) – the threshold used for filter frequency, by default 1e3

Returns

rfft array

Return type

numpy array

time_series_transform.transform_core_api.util.wavelet_denoising(arr, wavelet='db4', coeff_mode='per', threshold_mode='hard', rec_mode='per', level=1, matchOriginLenth=True)[source]

wavelet_denoising wavelet transformation

wavelet transformation, with pywt implmentation

Parameters
  • arr (numpy array) – input array

  • wavelet (str, optional) – wavelet transform family, by default ‘db4’

  • coeff_mode (str, optional) – the coefficient mode, by default “per”

  • threshold_mode (str, optional) – the threshold tye, by default ‘hard’

  • rec_mode (str, optional) – recover mode, by default ‘per’

  • level (int, optional) – sigma level for theshold, by default 1

  • matchOriginLenth (bool, optional) – whether to match the input array length, by default True

Returns

wevelet transformed array

Return type

numpy array

Module contents