time_series_transform.io package¶

Submodules¶

time_series_transform.io.arrow module¶

time_series_transform.io.arrow.from_arrow_record_batch(time_series, timeSeriesCol, mainCategoryCol)[source]¶

from_arrow_record_batch transform arrow record batch to Time_Series_Data or Time_Series_Data_Collection

Parameters

time_series (Time_Series_Data or Time_Series_Data_Collection) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column

Returns

Return type

arrow record batch

time_series_transform.io.arrow.from_arrow_table(time_series, timeSeriesCol, mainCategoryCol)[source]¶

from_arrow_table transform arrow table: to Time_Series_Data or Time_Series_Data_Collection

Parameters

time_series (Time_Series_Data or Time_Series_Data_Collection) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column

Returns

Return type

arrow table

time_series_transform.io.arrow.to_arrow_record_batch(time_series, max_chunksize, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]¶

to_arrow_record_batch [summary]

[extended_summary]

Parameters

time_series (Time_Series_Data or Time_Series_Data_Collection) – input data
max_chunksize (int) – max size of record batch
expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

arrow record batch

time_series_transform.io.arrow.to_arrow_table(time_series, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]¶

to_arrow_table Time_Series_Data or Time_Series_Data_Collection to arrow table

Parameters

time_series (Time_Series_Data or Time_Series_Data_Collection) – input data
expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

arrow table

time_series_transform.io.base module¶

class time_series_transform.io.base.io_base(time_series, timeSeriesCol, mainCategoryCol)[source]¶

Bases: object

from_collection(expandCategory, expandTimeIx, preprocessType='ignore')[source]¶

from_collection prepare Time_Series_Data_Collection into dict of list

Parameters

expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories

Returns

Return type

dict of list

Raises

ValueError – invalid data
KeyError – invalid key

from_single(expandTime)[source]¶

from_single transform Time_Series_Data into dict of list

Parameters: expandTime (bool) – whether to expand Time
Returns
Return type: Time_Series_Data

to_collection()[source]¶

to_collection transform data into Time_Series_Data_Collection

Returns
Return type: Time_Series_Data_Collection
Raises: KeyError – invalid input

to_single()[source]¶

to_single transform data to Time_Series_Data

Returns
Return type: Time_Series_Data
Raises: KeyError – invalid data

time_series_transform.io.feather module¶

time_series_transform.io.feather.from_feather(dirPath, timeSeriesCol, mainCategoryCol, columns=None)[source]¶

from_feather read feather file into Time_Series_Data or Time_Series_Data_Collection

Parameters

dirPaths (str) – directory to feather file
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) –
index of category column columns : list of str

column names to fetch

Returns

Return type

Time_Series_Data or Time_Series_Collection

time_series_transform.io.feather.to_feather(dirPaths, time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False, version=1, chunksize=None)[source]¶

transform Time_Series_Data or Time_Series_Data_Collection to feather file

Parameters

dirPaths (str) – directory to feather file
time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data
expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data
version (int, optional) – feather version, by default 1
chunksize (int) – size of feather file

time_series_transform.io.generator module¶

time_series_transform.io.numpy module¶

time_series_transform.io.numpy.from_numpy(numpyArray, timeSeriesCol, mainCategoryCol=None)[source]¶

from_numpy transform numpy ndArray: to Time_Series_Data or Time_Series_Data_Collection

Parameters

numpyArray (numpy ndArray) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

Raises

ValueError – invalid input data

time_series_transform.io.numpy.to_numpy(time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]¶

transform Time_Series_Data or Time_Series_Data_Collection to numpy ndArray

Parameters

time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data
expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data

Returns

[description]

Return type

[type]

Raises

ValueError – [description]

time_series_transform.io.pandas module¶

time_series_transform.io.pandas.from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol=None)[source]¶

from_pandas transform dataFrame to Time_Series_Data or Time_Series_Data_Collection

Parameters

pandasFrame (pandas dataFrame) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

time_series_transform.io.pandas.to_pandas(time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]¶

transform Time_Series_Data or Time_Series_Data_Collection into pandas dataFrame

Parameters

time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data
expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

pandas dataFrame

Raises

ValueError – invalid data input

time_series_transform.io.parquet module¶

time_series_transform.io.parquet.from_parquet(dirPath, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]¶

from_parquet transform parquet into Time_Series_Data or Time_Series_Data_Collection

Parameters

dirPaths (str) – directory to parquet file
time_series (Time_Series_Data or Time_Series_Data_Collection) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column
columns (list, optional) – columns to fetch, by default None
partitioning (str, optional) – partition type, by default ‘hive’
filters (str, optional) – parquet filter, by default None
filesystem (str, optional) – filesystem, by default None

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

time_series_transform.io.parquet.to_parquet(dirPaths, time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False, version='1.0', isDataset=False, partition_cols=None)[source]¶

to_parquet transform Time_Series_Data or Time_Series_Data_Collection: to parquet

Parameters

dirPaths (str) – directory to parquet file
time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data
version (str, optional) – parquet version, by default ‘1.0’
isDataset (bool, optional) – whether to output as dataset, by default False
partition_cols (list, optional) – partition columns, by default None

Module contents¶

time_series_transform.io.from_arrow_table(time_series, timeSeriesCol, mainCategoryCol)[source]¶

from_arrow_table transform arrow table: to Time_Series_Data or Time_Series_Data_Collection

Parameters

time_series (Time_Series_Data or Time_Series_Data_Collection) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column

Returns

Return type

arrow table

time_series_transform.io.from_feather(dirPath, timeSeriesCol, mainCategoryCol, columns=None)[source]¶

from_feather read feather file into Time_Series_Data or Time_Series_Data_Collection

Parameters

dirPaths (str) – directory to feather file
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) –
index of category column columns : list of str

column names to fetch

Returns

Return type

Time_Series_Data or Time_Series_Collection

time_series_transform.io.from_numpy(numpyArray, timeSeriesCol, mainCategoryCol=None)[source]¶

from_numpy transform numpy ndArray: to Time_Series_Data or Time_Series_Data_Collection

Parameters

numpyArray (numpy ndArray) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

Raises

ValueError – invalid input data

time_series_transform.io.from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol=None)[source]¶

from_pandas transform dataFrame to Time_Series_Data or Time_Series_Data_Collection

Parameters

pandasFrame (pandas dataFrame) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

time_series_transform.io.from_parquet(dirPath, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]¶

from_parquet transform parquet into Time_Series_Data or Time_Series_Data_Collection

Parameters

dirPaths (str) – directory to parquet file
time_series (Time_Series_Data or Time_Series_Data_Collection) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column
columns (list, optional) – columns to fetch, by default None
partitioning (str, optional) – partition type, by default ‘hive’
filters (str, optional) – parquet filter, by default None
filesystem (str, optional) – filesystem, by default None

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

time_series_transform.io.to_arrow_table(time_series, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]¶

to_arrow_table Time_Series_Data or Time_Series_Data_Collection to arrow table

Parameters

time_series (Time_Series_Data or Time_Series_Data_Collection) – input data
expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

arrow table

time_series_transform.io.to_feather(dirPaths, time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False, version=1, chunksize=None)[source]¶

transform Time_Series_Data or Time_Series_Data_Collection to feather file

Parameters

dirPaths (str) – directory to feather file
time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data
expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data
version (int, optional) – feather version, by default 1
chunksize (int) – size of feather file

time_series_transform.io.to_numpy(time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]¶

transform Time_Series_Data or Time_Series_Data_Collection to numpy ndArray

Parameters

time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data
expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data

Returns

[description]

Return type

[type]

Raises

ValueError – [description]

time_series_transform.io.to_pandas(time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]¶

transform Time_Series_Data or Time_Series_Data_Collection into pandas dataFrame

Parameters

time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data
expandCategory (bool) – whether to expand category
expandTime (bool) – whether to expand time
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

pandas dataFrame

Raises

ValueError – invalid data input

time_series_transform.io.to_parquet(dirPaths, time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False, version='1.0', isDataset=False, partition_cols=None)[source]¶

to_parquet transform Time_Series_Data or Time_Series_Data_Collection: to parquet

Parameters

dirPaths (str) – directory to parquet file
time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data
timeSeriesCol (str or int) – index of time period column
mainCategoryCol (str of int) – index of category column
preprocessType (['ignore','pad','remove']) – preprocess data time across categories
seperateLabels (bool) – whether to seperate labels and data
version (str, optional) – parquet version, by default ‘1.0’
isDataset (bool, optional) – whether to output as dataset, by default False
partition_cols (list, optional) – partition columns, by default None