time_series_transform.transform_core_api package¶

Submodules¶

time_series_transform.transform_core_api.base module¶

class time_series_transform.transform_core_api.base.Time_Series_Data(data=None, time_index=None)[source]¶

Bases: object

property data¶

dropna()[source]¶

dropna drop null values

it will drop null values for the time index. For example, time_index:[1,2,3], data1:[1,2,np.nan], data2[1,2,3] dropna will return time_index:[1,2], data1:[1,2], data2[1,2]

Returns: it will return a new Time_Series_Data without null values
Return type: Time_Series_Data

property labels¶

remove(key, remove_type=None)[source]¶

remove remove data or label

this function will remove the target key and values from the data structure

Parameters

key (str) – the name of data or label
remove_type (['data','label'], optional) – passing the type of removed data will improve the performance of searching, by default None

Returns

it will pass self

Return type

self

set_data(inputData, label)[source]¶

set_data setter of data

the alternative of setting data. Before setting data, time_series_Ix should be initialized beforehand.

Parameters

inputData (list) – input value of data
label (str) – the name of list input

Returns

it will return self

Return type

self

Raises

ValueError – different time length error

set_labels(inputData, label)[source]¶

set_data setter of label

the alternative of setting data. Before setting data, time_series_Ix should be initialized beforehand.

Parameters

inputData (list) – input value of data
label (str) – the name of list input

Returns

it will return self

Return type

self

Raises

ValueError – different time length error

set_time_index(inputData, label)[source]¶

set_time_index alternative of setting time_index

setting time_index

Parameters

inputData (list) – input values
label (str) – name of time_index

Returns

it will return self

Return type

self

sort(ascending=True)[source]¶

sort sorting data by time_index

sort data by index

Parameters: ascending (bool, optional) – whether to sort the time index ascending, by default True
Returns: it will return a sorted self
Return type: self

property time_index¶

transform(inputLabels, newName, func, *args, **kwargs)[source]¶

transform the way of manipulating data

this function is a wrapper of executing data manipulation

Parameters

inputLabels (str or list of string) – the input data pass into functions
newName (str) – the new name or prefix for the output data if the function has specify the output name, it will become prefix
func (function) – the function for data manipulation. the output of function requires to be dictiony of list, numpy array or pandas dataFrame. The final output should also have the same length as time_index

Returns

Return type

self

class time_series_transform.transform_core_api.base.Time_Series_Data_Collection(time_series_data, time_seriesIx, categoryIx)[source]¶

Bases: object

dropna(categoryKey=None)[source]¶

dropna drop null values by a specific key or all

if categoryKey is None, it will drop all keys

Parameters: categoryKey (str or numeric data, optional) – the key of target data, by default None
Returns
Return type: self

pad_time_index(fillMissing=nan)[source]¶

fill certain values for each missing time_index for the Time_Series_Data comparing to different keys

Parameters: fillMissing (object, optional) – the filling values, by default np.nan
Returns
Return type: self

remove(key)[source]¶

remove remove the target key of Time_Series_Data

remove the target key of Time_Series_Data

Parameters: key (str) – target key
Returns
Return type: self

remove_different_time_index()[source]¶

remove_different_time_index remove the time period which does not exisit in other Time_Series_Data

Returns
Return type: self

set_time_series_data_collection(ix, time_series_data)[source]¶

set_time_series_data_collection alternative of setting time_series_collection data

using this function, one can add a new key of Time_Series_Data.

Parameters

ix (str) – new key name
time_series_data (Time_Series_Data) – data of the key

Raises

ValueError – invalid input data type

sort(ascending=True, categoryList=None)[source]¶

sort sort the Time_Series_Data for specific keys or all keys

Parameters

ascending (bool, optional) – sorting for ascending order, by default True
categoryList (list, optional) – list of key names. if None, it will sort all, by default None

Returns

Return type

self

property time_series_data_collection¶

transform(inputLabels, newName, func, n_jobs=1, verbose=0, backend='loky', *args, **kwargs)[source]¶

transform the function of manipulating data for each keys.

this function implments joblib parallel execution. Hence, each key of data can be compute in the parallel fashion.

Parameters

inputLabels (str or list of string) – the input data pass into functions
newName (str) – the new name or prefix for the output data if the function has specify the output name, it will become prefix
func (function) – the function for data manipulation. the output of function requires to be dictiony of list, numpy array or pandas dataFrame. The final output should also have the same length as time_index
n_jobs (int, optional) – number of processes (joblib), by default 1
verbose (int, optional) – log level (joblib), by default 0
backend (str, optional) – backend type (joblib), by default ‘loky’

Returns

Return type

self

time_series_transform.transform_core_api.tfDataset_adopter module¶

class time_series_transform.transform_core_api.tfDataset_adopter.TFRecord_Reader(fileName, dtypeDict, compression_type='GZIP')[source]¶

Bases: object

feature_des_builder()[source]¶

feature_des_builder create feature description ojbect for tensorflow dataset

using dtypeDict to build the feature description object notice: currently this builder only creates FixedLenFeature.

Returns: feature description object
Return type: dict

make_tfDataset(tensor_opt_dtype=tf.float32)[source]¶

make_tfDataset making tensorflow dataset

Parameters: tensor_opt_dtype (tf dtypes, optional) – the tensorflow data type used for casting dataset features, by default tf.float32
Returns: tensorflow dataset prepared for model training/testing
Return type: tensorflow dataset

class time_series_transform.transform_core_api.tfDataset_adopter.TFRecord_Writer(fileName, compression_type='GZIP')[source]¶

Bases: object

get_tfRecord_dtype(pickleDir=None)[source]¶

get_tfRecord_dtype geting dtype dictionary for reading tfRecord

this method returns or pickle the dictionary of tfRecord feature datatype

Parameters: pickleDir (str, optional) – the directory of pickling dataType dictionary if not None, by default None
Returns: dictionary for making TFRecord_Reader
Return type: dict

write_tfRecord(data)[source]¶

write_tfRecord writing tfRecord

transforming list of dict object to tfRecord

Parameters: data (list of dict) – list of dict data such as [{‘col’:1,’col2’:”123”,’col3’:[1,2]}]

time_series_transform.transform_core_api.time_series_transformer module¶

class time_series_transform.transform_core_api.time_series_transformer.Time_Series_Transformer(data, timeSeriesCol, mainCategoryCol=None)[source]¶

Bases: object

dropna(categoryKey=None)[source]¶

dropna drop null values

remove null values for all or a specific category

Parameters: categoryKey (str or numeric, optional) – if None all category will be chosen, by default None
Returns
Return type: self

classmethod from_arrow_table(arrow_table, timeSeriesCol, mainCategoryCol)[source]¶

from_arrow_table import data from apache arrow table

Parameters

arrow_table (arrow table) – input data
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name

Returns

Return type

Time_Series_Transformer

classmethod from_feather(feather_dir, timeSeriesCol, mainCategoryCol, columns=None)[source]¶

from_feather import data from feather

Parameters

feather_dir (str) – directory of feather file
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None

Returns

Return type

Time_Series_Transformer

classmethod from_numpy(numpyData, timeSeriesCol, mainCategoryCol)[source]¶

from_numpy import data from numpy

Parameters

numpyData (numpy ndArray) – input data
timeSeriesCol (int) – index of time series column
mainCategoryCol (int) – index of main category column

Returns

Return type

Time_Series_Transformer

classmethod from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol)[source]¶

from_pandas import data from pandas dataFrame

Parameters

pandasFrame (pandas DataFrame) – input data
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name

Returns

Return type

Time_Series_Transformer

classmethod from_parquet(parquet_dir, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]¶

from_parquet import data from parquet file

Parameters

parquet_dir (str) – directory of parquet file
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None
partitioning (str, optional) – type of partitioning, by default ‘hive’
filters (str, optional) – filter (apache arrow implmentation), by default None
filesystem (str, optional) – filesystem (apache arrow implmentation), by default None

Returns

Return type

Time_Series_Transformer

make_identical_sequence(inputLabels, windowSize, suffix=None, verbose=0, n_jobs=1)[source]¶

make_identical_sequence making sequences having same data

this function will make same data for a givne sequence. it could be useful for category data in deep learning.

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
suffix (str, optional) – the suffix of new data, by default None
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_label(key, collectionKey=None)[source]¶

make_label make label data

it will turn the data into label. when using io functions, specifing sepLabel parameter can seperate label and data.

Parameters

key (str or numeric data) – the target data name
collectionKey (str or numeric data, optional) – the target collection, if None, all collection is selected, by default None

Returns

Return type

self

make_lag(inputLabels, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶

make_lag making lag data for a given list of data

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
lagNum (int) – the target lag period to make
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lag_sequence(inputLabels, windowSize, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶

make_lag_sequence making lag sequence data

this function could be useful for deep learning.

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
lagNum (int) – the lag period of sequence
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lead(inputLabels, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶

make_lead make_lead making lead data for a given list of data

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
leadNum (int) – the target lead period to make
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lead_sequence(inputLabels, windowSize, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶

make_lead_sequence making lead sequence data

this function could be useful for deep learning.

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
leadNum (int) – the lead period of sequence
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_stack_sequence(inputLabels, newName, axis=- 1, verbose=0, n_jobs=1)[source]¶

make_stack_sequence stacking sequences data

making multiple seqeunce data into one on the given axis

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
newName (str) – new name for the stacking data
axis (int, optional) – the axis for stacking (numpy stack implmentation), by default -1
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

[description]

Return type

[type]

pad_different_category_time(fillMissing=nan)[source]¶

pad time length if mainCategoryCol is not specified, this function has no function.

Parameters: fillMissing (object, optional) – data for filling paded data, by default np.nan
Returns
Return type: self

remove_category(categoryName)[source]¶

remove_category remove a specific category data

Parameters: categoryName (str or numeric data) – the target category to be removed
Returns
Return type: self

remove_different_category_time()[source]¶: remove different time index for category if mainCategoryCol is not specified, this function has no function. :returns: :rtype: self

remove_feature(colName)[source]¶

remove_feature remove certain data or labels

Parameters: colName (str or numeric) – target column or data to be removed
Returns
Return type: self

to_arrow_table(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶

to_arrow_table output data as apache arrow table format

Parameters

expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

arrow table

to_dict()[source]¶

to_dict output data as dictionary list

Returns
Return type: dict of list

to_feather(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version=1, chunksize=None)[source]¶

to_feather output data into feather format

Parameters

dirPaths (str) – directory of output data
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
version (int, optional) – fether version (apache arrow implmentation), by default 1
chunksize (int, optional) – chunksize for output (apache arrow implmentation), by default None

to_numpy(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶

to_numpy output data into numpy format

Parameters

expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

numpy ndArray

to_pandas(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶

to_pandas output data into pandas dataFrame

Parameters

expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

pandas dataFrame

to_parquet(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version='1.0', isDataset=False, partition_cols=None)[source]¶

to_parquet output data into parquet format

Parameters

dirPaths (str) – directory of output data
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
version (str, optional) – parquet version (apache arrow implmentation), by default ‘1.0’
isDataset (bool, optional) – whether to output data as dataset format (apache arrow implmentation), by default False
partition_cols (str, optional) – whether to partition data (apache arrow implmentation), by default None

transform(inputLabels, newName, func, n_jobs=1, verbose=0, backend='loky', *args, **kwargs)[source]¶

transform the wrapper of functions performing data manipulation

This function provides a way to do different data manipulation. The output data should be either pandas dataFrame, numpy ndArray, or list of dict. Also, the data should have the same time length as the original data.

Parameters

inputLabels (str, numeric data or list of data or numeric data) – the input data columns passing to function
newName (str) – the output data name or prefix if the out function provides the new name, it will automatically become prefix
func (function) – the data manipulation function
n_jobs (int, optional) – joblib implemention, only used when mainCategoryCol is given, by default 1
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
backend (str, optional) – joblib implmentation only used when mainCategoryCol is given, by default ‘loky’

Returns

Return type

self

time_series_transform.transform_core_api.util module¶

time_series_transform.transform_core_api.util.differencing(arr, order=1)[source]¶

differencing time series differencing

it simply perform series differencing For example: order 1

Xt, Xt+1 –> Xt+1 - Xt

order 2: Xt, Xt+1, Xt+2 –> Xt+1 - Xt, Xt+2-Xt+1 = a,b –> b - a

and so on

Parameters

arr (numpy array) – input array
order (int, optional) – number of differencing, by default 1

Returns

differenced array

Return type

numpy array

time_series_transform.transform_core_api.util.ema(arr, com=None, span=None, halflife=None, alpha=None, adjust=True, min_periods=0, ignore_na=False, axis=0)[source]¶: this is the panads ema implmentation

time_series_transform.transform_core_api.util.geometric_ma(arr, windowSize)[source]¶

geometric_ma geometric moving average

it use pandas rolling window with sicpy gmean function

Parameters

arr (numpy array) – input arrray
windowSize (int) – grouping size

Returns

geometric moving average array

Return type

numpy array

time_series_transform.transform_core_api.util.madev(d, axis=None)[source]¶: Mean absolute deviation

time_series_transform.transform_core_api.util.moving_average(arr, windowSize=3)[source]¶

moving_average the arithimetic moving average

Given the window size, this function will perform simple moving average

Parameters

arr (numpy array) – the input array
windowSize (int, optional) – the grouping size, by default 3

Returns

the moving average array

Return type

numpy array

time_series_transform.transform_core_api.util.rfft_transform(arr, threshold=1000.0)[source]¶

rfft_transform real fast fourier transformation

Fast fourier trnasformation and ignoring the imagine number note: numpy implmentation

Parameters

arr (numpy array) – input array
threshold (float, optional) – the threshold used for filter frequency, by default 1e3

Returns

rfft array

Return type

numpy array

time_series_transform.transform_core_api.util.wavelet_denoising(arr, wavelet='db4', coeff_mode='per', threshold_mode='hard', rec_mode='per', level=1, matchOriginLenth=True)[source]¶

wavelet_denoising wavelet transformation

wavelet transformation, with pywt implmentation

Parameters

arr (numpy array) – input array
wavelet (str, optional) – wavelet transform family, by default ‘db4’
coeff_mode (str, optional) – the coefficient mode, by default “per”
threshold_mode (str, optional) – the threshold tye, by default ‘hard’
rec_mode (str, optional) – recover mode, by default ‘per’
level (int, optional) – sigma level for theshold, by default 1
matchOriginLenth (bool, optional) – whether to match the input array length, by default True

Returns

wevelet transformed array

Return type

numpy array

time_series_transform.transform_core_api package¶

Submodules¶

time_series_transform.transform_core_api.base module¶

time_series_transform.transform_core_api.tfDataset_adopter module¶

time_series_transform.transform_core_api.time_series_transformer module¶

time_series_transform.transform_core_api.util module¶

Module contents¶