time_series_transform.transform_core_api package¶
Submodules¶
time_series_transform.transform_core_api.base module¶
-
class
time_series_transform.transform_core_api.base.
Time_Series_Data
(data=None, time_index=None)[source]¶ Bases:
object
-
property
data
¶
-
dropna
()[source]¶ dropna drop null values
it will drop null values for the time index. For example, time_index:[1,2,3], data1:[1,2,np.nan], data2[1,2,3] dropna will return time_index:[1,2], data1:[1,2], data2[1,2]
- Returns
it will return a new Time_Series_Data without null values
- Return type
-
property
labels
¶
-
remove
(key, remove_type=None)[source]¶ remove remove data or label
this function will remove the target key and values from the data structure
- Parameters
key (str) – the name of data or label
remove_type (['data','label'], optional) – passing the type of removed data will improve the performance of searching, by default None
- Returns
it will pass self
- Return type
self
-
set_data
(inputData, label)[source]¶ set_data setter of data
the alternative of setting data. Before setting data, time_series_Ix should be initialized beforehand.
- Parameters
- Returns
it will return self
- Return type
self
- Raises
ValueError – different time length error
-
set_labels
(inputData, label)[source]¶ set_data setter of label
the alternative of setting data. Before setting data, time_series_Ix should be initialized beforehand.
- Parameters
- Returns
it will return self
- Return type
self
- Raises
ValueError – different time length error
-
set_time_index
(inputData, label)[source]¶ set_time_index alternative of setting time_index
setting time_index
-
sort
(ascending=True)[source]¶ sort sorting data by time_index
sort data by index
- Parameters
ascending (bool, optional) – whether to sort the time index ascending, by default True
- Returns
it will return a sorted self
- Return type
self
-
property
time_index
¶
-
transform
(inputLabels, newName, func, *args, **kwargs)[source]¶ transform the way of manipulating data
this function is a wrapper of executing data manipulation
- Parameters
inputLabels (str or list of string) – the input data pass into functions
newName (str) – the new name or prefix for the output data if the function has specify the output name, it will become prefix
func (function) – the function for data manipulation. the output of function requires to be dictiony of list, numpy array or pandas dataFrame. The final output should also have the same length as time_index
- Returns
- Return type
self
-
property
-
class
time_series_transform.transform_core_api.base.
Time_Series_Data_Collection
(time_series_data, time_seriesIx, categoryIx)[source]¶ Bases:
object
-
dropna
(categoryKey=None)[source]¶ dropna drop null values by a specific key or all
if categoryKey is None, it will drop all keys
- Parameters
categoryKey (str or numeric data, optional) – the key of target data, by default None
- Returns
- Return type
self
-
pad_time_index
(fillMissing=nan)[source]¶ fill certain values for each missing time_index for the Time_Series_Data comparing to different keys
- Parameters
fillMissing (object, optional) – the filling values, by default np.nan
- Returns
- Return type
self
-
remove
(key)[source]¶ remove remove the target key of Time_Series_Data
remove the target key of Time_Series_Data
- Parameters
key (str) – target key
- Returns
- Return type
self
-
remove_different_time_index
()[source]¶ remove_different_time_index remove the time period which does not exisit in other Time_Series_Data
- Returns
- Return type
self
-
set_time_series_data_collection
(ix, time_series_data)[source]¶ set_time_series_data_collection alternative of setting time_series_collection data
using this function, one can add a new key of Time_Series_Data.
- Parameters
ix (str) – new key name
time_series_data (Time_Series_Data) – data of the key
- Raises
ValueError – invalid input data type
-
sort
(ascending=True, categoryList=None)[source]¶ sort sort the Time_Series_Data for specific keys or all keys
-
property
time_series_data_collection
¶
-
transform
(inputLabels, newName, func, n_jobs=1, verbose=0, backend='loky', *args, **kwargs)[source]¶ transform the function of manipulating data for each keys.
this function implments joblib parallel execution. Hence, each key of data can be compute in the parallel fashion.
- Parameters
inputLabels (str or list of string) – the input data pass into functions
newName (str) – the new name or prefix for the output data if the function has specify the output name, it will become prefix
func (function) – the function for data manipulation. the output of function requires to be dictiony of list, numpy array or pandas dataFrame. The final output should also have the same length as time_index
n_jobs (int, optional) – number of processes (joblib), by default 1
verbose (int, optional) – log level (joblib), by default 0
backend (str, optional) – backend type (joblib), by default ‘loky’
- Returns
- Return type
self
-
time_series_transform.transform_core_api.tfDataset_adopter module¶
-
class
time_series_transform.transform_core_api.tfDataset_adopter.
TFRecord_Reader
(fileName, dtypeDict, compression_type='GZIP')[source]¶ Bases:
object
-
feature_des_builder
()[source]¶ feature_des_builder create feature description ojbect for tensorflow dataset
using dtypeDict to build the feature description object notice: currently this builder only creates FixedLenFeature.
- Returns
feature description object
- Return type
-
make_tfDataset
(tensor_opt_dtype=tf.float32)[source]¶ make_tfDataset making tensorflow dataset
- Parameters
tensor_opt_dtype (tf dtypes, optional) – the tensorflow data type used for casting dataset features, by default tf.float32
- Returns
tensorflow dataset prepared for model training/testing
- Return type
tensorflow dataset
-
-
class
time_series_transform.transform_core_api.tfDataset_adopter.
TFRecord_Writer
(fileName, compression_type='GZIP')[source]¶ Bases:
object
time_series_transform.transform_core_api.time_series_transformer module¶
-
class
time_series_transform.transform_core_api.time_series_transformer.
Time_Series_Transformer
(data, timeSeriesCol, mainCategoryCol=None)[source]¶ Bases:
object
-
dropna
(categoryKey=None)[source]¶ dropna drop null values
remove null values for all or a specific category
- Parameters
categoryKey (str or numeric, optional) – if None all category will be chosen, by default None
- Returns
- Return type
self
-
classmethod
from_arrow_table
(arrow_table, timeSeriesCol, mainCategoryCol)[source]¶ from_arrow_table import data from apache arrow table
- Parameters
- Returns
- Return type
-
classmethod
from_feather
(feather_dir, timeSeriesCol, mainCategoryCol, columns=None)[source]¶ from_feather import data from feather
- Parameters
- Returns
- Return type
-
classmethod
from_numpy
(numpyData, timeSeriesCol, mainCategoryCol)[source]¶ from_numpy import data from numpy
- Parameters
- Returns
- Return type
-
classmethod
from_pandas
(pandasFrame, timeSeriesCol, mainCategoryCol)[source]¶ from_pandas import data from pandas dataFrame
- Parameters
- Returns
- Return type
-
classmethod
from_parquet
(parquet_dir, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]¶ from_parquet import data from parquet file
- Parameters
parquet_dir (str) – directory of parquet file
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None
partitioning (str, optional) – type of partitioning, by default ‘hive’
filters (str, optional) – filter (apache arrow implmentation), by default None
filesystem (str, optional) – filesystem (apache arrow implmentation), by default None
- Returns
- Return type
-
make_identical_sequence
(inputLabels, windowSize, suffix=None, verbose=0, n_jobs=1)[source]¶ make_identical_sequence making sequences having same data
this function will make same data for a givne sequence. it could be useful for category data in deep learning.
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
suffix (str, optional) – the suffix of new data, by default None
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_label
(key, collectionKey=None)[source]¶ make_label make label data
it will turn the data into label. when using io functions, specifing sepLabel parameter can seperate label and data.
-
make_lag
(inputLabels, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶ make_lag making lag data for a given list of data
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
lagNum (int) – the target lag period to make
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_lag_sequence
(inputLabels, windowSize, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶ make_lag_sequence making lag sequence data
this function could be useful for deep learning.
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
lagNum (int) – the lag period of sequence
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_lead
(inputLabels, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶ make_lead make_lead making lead data for a given list of data
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
leadNum (int) – the target lead period to make
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_lead_sequence
(inputLabels, windowSize, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶ make_lead_sequence making lead sequence data
this function could be useful for deep learning.
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
leadNum (int) – the lead period of sequence
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_stack_sequence
(inputLabels, newName, axis=- 1, verbose=0, n_jobs=1)[source]¶ make_stack_sequence stacking sequences data
making multiple seqeunce data into one on the given axis
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
newName (str) – new name for the stacking data
axis (int, optional) – the axis for stacking (numpy stack implmentation), by default -1
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
[description]
- Return type
[type]
-
pad_different_category_time
(fillMissing=nan)[source]¶ pad time length if mainCategoryCol is not specified, this function has no function.
- Parameters
fillMissing (object, optional) – data for filling paded data, by default np.nan
- Returns
- Return type
self
-
remove_category
(categoryName)[source]¶ remove_category remove a specific category data
- Parameters
categoryName (str or numeric data) – the target category to be removed
- Returns
- Return type
self
-
remove_different_category_time
()[source]¶ remove different time index for category if mainCategoryCol is not specified, this function has no function. :returns: :rtype: self
-
remove_feature
(colName)[source]¶ remove_feature remove certain data or labels
- Parameters
colName (str or numeric) – target column or data to be removed
- Returns
- Return type
self
-
to_arrow_table
(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶ to_arrow_table output data as apache arrow table format
- Parameters
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
- Returns
- Return type
arrow table
-
to_feather
(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version=1, chunksize=None)[source]¶ to_feather output data into feather format
- Parameters
dirPaths (str) – directory of output data
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
version (int, optional) – fether version (apache arrow implmentation), by default 1
chunksize (int, optional) – chunksize for output (apache arrow implmentation), by default None
-
to_numpy
(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶ to_numpy output data into numpy format
- Parameters
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
- Returns
- Return type
numpy ndArray
-
to_pandas
(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶ to_pandas output data into pandas dataFrame
- Parameters
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
- Returns
- Return type
pandas dataFrame
-
to_parquet
(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version='1.0', isDataset=False, partition_cols=None)[source]¶ to_parquet output data into parquet format
- Parameters
dirPaths (str) – directory of output data
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
version (str, optional) – parquet version (apache arrow implmentation), by default ‘1.0’
isDataset (bool, optional) – whether to output data as dataset format (apache arrow implmentation), by default False
partition_cols (str, optional) – whether to partition data (apache arrow implmentation), by default None
-
transform
(inputLabels, newName, func, n_jobs=1, verbose=0, backend='loky', *args, **kwargs)[source]¶ transform the wrapper of functions performing data manipulation
This function provides a way to do different data manipulation. The output data should be either pandas dataFrame, numpy ndArray, or list of dict. Also, the data should have the same time length as the original data.
- Parameters
inputLabels (str, numeric data or list of data or numeric data) – the input data columns passing to function
newName (str) – the output data name or prefix if the out function provides the new name, it will automatically become prefix
func (function) – the data manipulation function
n_jobs (int, optional) – joblib implemention, only used when mainCategoryCol is given, by default 1
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
backend (str, optional) – joblib implmentation only used when mainCategoryCol is given, by default ‘loky’
- Returns
- Return type
self
-
time_series_transform.transform_core_api.util module¶
-
time_series_transform.transform_core_api.util.
differencing
(arr, order=1)[source]¶ differencing time series differencing
it simply perform series differencing For example: order 1
Xt, Xt+1 –> Xt+1 - Xt
- order 2
Xt, Xt+1, Xt+2 –> Xt+1 - Xt, Xt+2-Xt+1 = a,b –> b - a
and so on
- Parameters
arr (numpy array) – input array
order (int, optional) – number of differencing, by default 1
- Returns
differenced array
- Return type
numpy array
-
time_series_transform.transform_core_api.util.
ema
(arr, com=None, span=None, halflife=None, alpha=None, adjust=True, min_periods=0, ignore_na=False, axis=0)[source]¶ this is the panads ema implmentation
-
time_series_transform.transform_core_api.util.
geometric_ma
(arr, windowSize)[source]¶ geometric_ma geometric moving average
it use pandas rolling window with sicpy gmean function
- Parameters
arr (numpy array) – input arrray
windowSize (int) – grouping size
- Returns
geometric moving average array
- Return type
numpy array
-
time_series_transform.transform_core_api.util.
moving_average
(arr, windowSize=3)[source]¶ moving_average the arithimetic moving average
Given the window size, this function will perform simple moving average
- Parameters
arr (numpy array) – the input array
windowSize (int, optional) – the grouping size, by default 3
- Returns
the moving average array
- Return type
numpy array
-
time_series_transform.transform_core_api.util.
rfft_transform
(arr, threshold=1000.0)[source]¶ rfft_transform real fast fourier transformation
Fast fourier trnasformation and ignoring the imagine number note: numpy implmentation
- Parameters
arr (numpy array) – input array
threshold (float, optional) – the threshold used for filter frequency, by default 1e3
- Returns
rfft array
- Return type
numpy array
-
time_series_transform.transform_core_api.util.
wavelet_denoising
(arr, wavelet='db4', coeff_mode='per', threshold_mode='hard', rec_mode='per', level=1, matchOriginLenth=True)[source]¶ wavelet_denoising wavelet transformation
wavelet transformation, with pywt implmentation
- Parameters
arr (numpy array) – input array
wavelet (str, optional) – wavelet transform family, by default ‘db4’
coeff_mode (str, optional) – the coefficient mode, by default “per”
threshold_mode (str, optional) – the threshold tye, by default ‘hard’
rec_mode (str, optional) – recover mode, by default ‘per’
level (int, optional) – sigma level for theshold, by default 1
matchOriginLenth (bool, optional) – whether to match the input array length, by default True
- Returns
wevelet transformed array
- Return type
numpy array