time_series_transform.io package

Submodules

time_series_transform.io.arrow module

time_series_transform.io.arrow.from_arrow_record_batch(time_series, timeSeriesCol, mainCategoryCol)[source]

from_arrow_record_batch transform arrow record batch to Time_Series_Data or Time_Series_Data_Collection

Parameters
Returns

Return type

arrow record batch

time_series_transform.io.arrow.from_arrow_table(time_series, timeSeriesCol, mainCategoryCol)[source]
from_arrow_table transform arrow table

to Time_Series_Data or Time_Series_Data_Collection

Parameters
Returns

Return type

arrow table

time_series_transform.io.arrow.to_arrow_record_batch(time_series, max_chunksize, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]

to_arrow_record_batch [summary]

[extended_summary]

Parameters
  • time_series (Time_Series_Data or Time_Series_Data_Collection) – input data

  • max_chunksize (int) – max size of record batch

  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

arrow record batch

time_series_transform.io.arrow.to_arrow_table(time_series, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]

to_arrow_table Time_Series_Data or Time_Series_Data_Collection to arrow table

Parameters
  • time_series (Time_Series_Data or Time_Series_Data_Collection) – input data

  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

arrow table

time_series_transform.io.base module

class time_series_transform.io.base.io_base(time_series, timeSeriesCol, mainCategoryCol)[source]

Bases: object

from_collection(expandCategory, expandTimeIx, preprocessType='ignore')[source]

from_collection prepare Time_Series_Data_Collection into dict of list

Parameters
  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

Returns

Return type

dict of list

Raises
from_single(expandTime)[source]

from_single transform Time_Series_Data into dict of list

Parameters

expandTime (bool) – whether to expand Time

Returns

Return type

Time_Series_Data

to_collection()[source]

to_collection transform data into Time_Series_Data_Collection

Returns

Return type

Time_Series_Data_Collection

Raises

KeyError – invalid input

to_single()[source]

to_single transform data to Time_Series_Data

Returns

Return type

Time_Series_Data

Raises

KeyError – invalid data

time_series_transform.io.feather module

time_series_transform.io.feather.from_feather(dirPath, timeSeriesCol, mainCategoryCol, columns=None)[source]

from_feather read feather file into Time_Series_Data or Time_Series_Data_Collection

Parameters
  • dirPaths (str) – directory to feather file

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) –

    index of category column columns : list of str

    column names to fetch

Returns

Return type

Time_Series_Data or Time_Series_Collection

time_series_transform.io.feather.to_feather(dirPaths, time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False, version=1, chunksize=None)[source]

transform Time_Series_Data or Time_Series_Data_Collection to feather file

Parameters
  • dirPaths (str) – directory to feather file

  • time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data

  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

  • version (int, optional) – feather version, by default 1

  • chunksize (int) – size of feather file

time_series_transform.io.generator module

time_series_transform.io.numpy module

time_series_transform.io.numpy.from_numpy(numpyArray, timeSeriesCol, mainCategoryCol=None)[source]
from_numpy transform numpy ndArray

to Time_Series_Data or Time_Series_Data_Collection

Parameters
  • numpyArray (numpy ndArray) – input data

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) – index of category column

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

Raises

ValueError – invalid input data

time_series_transform.io.numpy.to_numpy(time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]

transform Time_Series_Data or Time_Series_Data_Collection to numpy ndArray

Parameters
  • time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data

  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

Returns

[description]

Return type

[type]

Raises

ValueError – [description]

time_series_transform.io.pandas module

time_series_transform.io.pandas.from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol=None)[source]

from_pandas transform dataFrame to Time_Series_Data or Time_Series_Data_Collection

Parameters
  • pandasFrame (pandas dataFrame) – input data

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) – index of category column

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

time_series_transform.io.pandas.to_pandas(time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]

transform Time_Series_Data or Time_Series_Data_Collection into pandas dataFrame

Parameters
  • time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data

  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

pandas dataFrame

Raises

ValueError – invalid data input

time_series_transform.io.parquet module

time_series_transform.io.parquet.from_parquet(dirPath, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]

from_parquet transform parquet into Time_Series_Data or Time_Series_Data_Collection

Parameters
  • dirPaths (str) – directory to parquet file

  • time_series (Time_Series_Data or Time_Series_Data_Collection) – input data

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) – index of category column

  • columns (list, optional) – columns to fetch, by default None

  • partitioning (str, optional) – partition type, by default ‘hive’

  • filters (str, optional) – parquet filter, by default None

  • filesystem (str, optional) – filesystem, by default None

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

time_series_transform.io.parquet.to_parquet(dirPaths, time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False, version='1.0', isDataset=False, partition_cols=None)[source]
to_parquet transform Time_Series_Data or Time_Series_Data_Collection

to parquet

Parameters
  • dirPaths (str) – directory to parquet file

  • time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) – index of category column

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

  • version (str, optional) – parquet version, by default ‘1.0’

  • isDataset (bool, optional) – whether to output as dataset, by default False

  • partition_cols (list, optional) – partition columns, by default None

Module contents

time_series_transform.io.from_arrow_table(time_series, timeSeriesCol, mainCategoryCol)[source]
from_arrow_table transform arrow table

to Time_Series_Data or Time_Series_Data_Collection

Parameters
Returns

Return type

arrow table

time_series_transform.io.from_feather(dirPath, timeSeriesCol, mainCategoryCol, columns=None)[source]

from_feather read feather file into Time_Series_Data or Time_Series_Data_Collection

Parameters
  • dirPaths (str) – directory to feather file

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) –

    index of category column columns : list of str

    column names to fetch

Returns

Return type

Time_Series_Data or Time_Series_Collection

time_series_transform.io.from_numpy(numpyArray, timeSeriesCol, mainCategoryCol=None)[source]
from_numpy transform numpy ndArray

to Time_Series_Data or Time_Series_Data_Collection

Parameters
  • numpyArray (numpy ndArray) – input data

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) – index of category column

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

Raises

ValueError – invalid input data

time_series_transform.io.from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol=None)[source]

from_pandas transform dataFrame to Time_Series_Data or Time_Series_Data_Collection

Parameters
  • pandasFrame (pandas dataFrame) – input data

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) – index of category column

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

time_series_transform.io.from_parquet(dirPath, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]

from_parquet transform parquet into Time_Series_Data or Time_Series_Data_Collection

Parameters
  • dirPaths (str) – directory to parquet file

  • time_series (Time_Series_Data or Time_Series_Data_Collection) – input data

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) – index of category column

  • columns (list, optional) – columns to fetch, by default None

  • partitioning (str, optional) – partition type, by default ‘hive’

  • filters (str, optional) – parquet filter, by default None

  • filesystem (str, optional) – filesystem, by default None

Returns

Return type

Time_Series_Data or Time_Series_Data_Collection

time_series_transform.io.to_arrow_table(time_series, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]

to_arrow_table Time_Series_Data or Time_Series_Data_Collection to arrow table

Parameters
  • time_series (Time_Series_Data or Time_Series_Data_Collection) – input data

  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

arrow table

time_series_transform.io.to_feather(dirPaths, time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False, version=1, chunksize=None)[source]

transform Time_Series_Data or Time_Series_Data_Collection to feather file

Parameters
  • dirPaths (str) – directory to feather file

  • time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data

  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

  • version (int, optional) – feather version, by default 1

  • chunksize (int) – size of feather file

time_series_transform.io.to_numpy(time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]

transform Time_Series_Data or Time_Series_Data_Collection to numpy ndArray

Parameters
  • time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data

  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

Returns

[description]

Return type

[type]

Raises

ValueError – [description]

time_series_transform.io.to_pandas(time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False)[source]

transform Time_Series_Data or Time_Series_Data_Collection into pandas dataFrame

Parameters
  • time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data

  • expandCategory (bool) – whether to expand category

  • expandTime (bool) – whether to expand time

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

Returns

Return type

pandas dataFrame

Raises

ValueError – invalid data input

time_series_transform.io.to_parquet(dirPaths, time_series_data, expandCategory, expandTime, preprocessType, seperateLabels=False, version='1.0', isDataset=False, partition_cols=None)[source]
to_parquet transform Time_Series_Data or Time_Series_Data_Collection

to parquet

Parameters
  • dirPaths (str) – directory to parquet file

  • time_series_data (Time_Series_Data or Time_Series_Data_Collection) – input data

  • timeSeriesCol (str or int) – index of time period column

  • mainCategoryCol (str of int) – index of category column

  • preprocessType (['ignore','pad','remove']) – preprocess data time across categories

  • seperateLabels (bool) – whether to seperate labels and data

  • version (str, optional) – parquet version, by default ‘1.0’

  • isDataset (bool, optional) – whether to output as dataset, by default False

  • partition_cols (list, optional) – partition columns, by default None