time_series_transform package

Module contents

class time_series_transform.Stock_Transformer(time_series_data, time_seriesIx, symbolIx, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]

Bases: time_series_transform.transform_core_api.time_series_transformer.Time_Series_Transformer

classmethod from_arrow_table(arrow_table, timeSeriesCol, symbolIx, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]

from_arrow_table [summary]

[extended_summary]

Parameters
  • arrow_table (arrow table) – input data

  • timeSeriesCol (str or numeric) – time series column name

  • symbolIx (str or numeric) – main category name

  • symbolName (str or numeric, option) – ticker name only used when single stock, by default None

  • High (str or int, optional) – the index or name for High, by default ‘High’

  • Low (str or int, optional) – the index or name for Low, by default ‘Low’

  • Close (str or int, optional) – the index or name for Close, by default ‘Close’

  • Open (str or int, optional) – the index or name for Open, by default ‘Open’

  • Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

classmethod from_feather(feather_dir, timeSeriesCol, symbolIx, symbolName=None, columns=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]

from_feather import data from feather

Parameters
  • feather_dir (str) – directory of feather file

  • timeSeriesCol (str or numeric) – time series column name

  • symbolIx (str or numeric) – main category name

  • symbolName (str or numeric, option) – ticker name only used when single stock, by default None

  • columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None

  • High (str or int, optional) – the index or name for High, by default ‘High’

  • Low (str or int, optional) – the index or name for Low, by default ‘Low’

  • Close (str or int, optional) – the index or name for Close, by default ‘Close’

  • Open (str or int, optional) – the index or name for Open, by default ‘Open’

  • Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

classmethod from_numpy(numpyData, timeSeriesCol, mainCategoryCol, High, Low, Close, Open, Volume, symbolName=None)[source]

from_numpy from_numpy import data from numpy

Parameters
  • numpyData (numpy ndArray) – input data

  • timeSeriesCol (int) – index of time series column

  • mainCategoryCol (int) – index of main category column

  • High (int, optional) – the column index for High, by default ‘High’

  • Low (int, optional) – the column index for Low, by default ‘Low’

  • Close (int, optional) – the column index for Close, by default ‘Close’

  • Open (int, optional) – the column index for Open, by default ‘Open’

  • Volume (int, optional) – the column index for Volume, by default ‘Volume’

  • symbolName (str or numeric, option) – ticker name only used when single stock, by default None

Returns

Return type

Stock_Transformer

classmethod from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]

from_pandas import data from pandas dataFrame

Parameters
  • pandasFrame (pandas DataFrame) – input data

  • timeSeriesCol (str or numeric) – time series column name

  • mainCategoryCol (str or numeric) – main category name

  • symbolName (str or numeric, option) – ticker name only used when single stock, by default None

  • High (str or int, optional) – the column name for High, by default ‘High’

  • Low (str or int, optional) – the column name for Low, by default ‘Low’

  • Close (str or int, optional) – the column name for Close, by default ‘Close’

  • Open (str or int, optional) – the column name for Open, by default ‘Open’

  • Volume (str or int, optional) – the column name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

classmethod from_parquet(parquet_dir, timeSeriesCol, symbolIx, symbolName=None, columns=None, partitioning='hive', filters=None, filesystem=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]

from_parquet import data from parquet file

Parameters
  • parquet_dir (str) – directory of parquet file

  • timeSeriesCol (str or numeric) – time series column name

  • symbolIx (str or numeric) – main category name

  • symbolName (str or numeric, option) – ticker name only used when single stock, by default None

  • columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None

  • partitioning (str, optional) – type of partitioning, by default ‘hive’

  • filters (str, optional) – filter (apache arrow implmentation), by default None

  • filesystem (str, optional) – filesystem (apache arrow implmentation), by default None

  • High (str or int, optional) – the index or name for High, by default ‘High’

  • Low (str or int, optional) – the index or name for Low, by default ‘Low’

  • Close (str or int, optional) – the index or name for Close, by default ‘Close’

  • Open (str or int, optional) – the index or name for Open, by default ‘Open’

  • Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

classmethod from_stock_engine_date(symbols, start_date, end_date, engine, n_threads=8, *args, **kwargs)[source]

from_stock_engine_date [summary]

[extended_summary]

Parameters
  • symbols (str or list) – ticker name

  • start_date (str) – start of the data format: “%Y-%m-%d”, eg “2020-02-20”

  • end_date (str) – end of the data format: “%Y-%m-%d”, eg “2020-02-20”

  • engine (['yahoo','investing']) – fetching api

  • n_threads (int, optional) – multi-thread fetching support only when symbols is a list, by default 8

Returns

Return type

Stock_Transformer

classmethod from_stock_engine_intraday(symbols, start_date, end_date, engine='yahoo', interval='1m', n_threads=8, *args, **kwargs)[source]
classmethod from_stock_engine_period(symbols, period, engine, n_threads=8, *args, **kwargs)[source]

from_stock_engine_period fetching data from online

the current engine support yfinance and investpy

Parameters
  • symbols (str or list) – ticker name

  • period (str) – period of the data for example, 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max

  • engine (['yahoo','investing']) – fetching api

  • n_threads (int, optional) – multi-thread fetching support only when symbols is a list, by default 8

Returns

Return type

Stock_Transformer

classmethod from_time_series_transformer(time_series_transformer, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]

from_time_series_transformer making Stock_Transformer from Time_Series_Transformer

Parameters
  • time_series_transformer (Time_Series_Transformer) – input data

  • symbolName (str or numeric, option) – ticker name only used when single stock, by default None

  • High (str or int, optional) – the index or name for High, by default ‘High’

  • Low (str or int, optional) – the index or name for Low, by default ‘Low’

  • Close (str or int, optional) – the index or name for Close, by default ‘Close’

  • Open (str or int, optional) – the index or name for Open, by default ‘Open’

  • Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

get_technial_indicator(strategy, n_jobs=1, verbose=10, backend='loky')[source]

get_technical_indicator making different technical indicator

pandas-ta implmentation https://github.com/twopirllc/pandas-ta

Parameters
  • strategy (Strategy) – pandas-ta strategy

  • n_jobs (int, optional) – number of processes (joblib), by default 1

  • verbose (int, optional) – log level (joblib), by default 0

  • backend (str, optional) – backend type (joblib), by default ‘loky’

Returns

Return type

self

class time_series_transform.Time_Series_Transformer(data, timeSeriesCol, mainCategoryCol=None)[source]

Bases: object

dropna(categoryKey=None)[source]

dropna drop null values

remove null values for all or a specific category

Parameters

categoryKey (str or numeric, optional) – if None all category will be chosen, by default None

Returns

Return type

self

classmethod from_arrow_table(arrow_table, timeSeriesCol, mainCategoryCol)[source]

from_arrow_table import data from apache arrow table

Parameters
  • arrow_table (arrow table) – input data

  • timeSeriesCol (str or numeric) – time series column name

  • mainCategoryCol (str or numeric) – main category name

Returns

Return type

Time_Series_Transformer

classmethod from_feather(feather_dir, timeSeriesCol, mainCategoryCol, columns=None)[source]

from_feather import data from feather

Parameters
  • feather_dir (str) – directory of feather file

  • timeSeriesCol (str or numeric) – time series column name

  • mainCategoryCol (str or numeric) – main category name

  • columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None

Returns

Return type

Time_Series_Transformer

classmethod from_numpy(numpyData, timeSeriesCol, mainCategoryCol)[source]

from_numpy import data from numpy

Parameters
  • numpyData (numpy ndArray) – input data

  • timeSeriesCol (int) – index of time series column

  • mainCategoryCol (int) – index of main category column

Returns

Return type

Time_Series_Transformer

classmethod from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol)[source]

from_pandas import data from pandas dataFrame

Parameters
  • pandasFrame (pandas DataFrame) – input data

  • timeSeriesCol (str or numeric) – time series column name

  • mainCategoryCol (str or numeric) – main category name

Returns

Return type

Time_Series_Transformer

classmethod from_parquet(parquet_dir, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]

from_parquet import data from parquet file

Parameters
  • parquet_dir (str) – directory of parquet file

  • timeSeriesCol (str or numeric) – time series column name

  • mainCategoryCol (str or numeric) – main category name

  • columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None

  • partitioning (str, optional) – type of partitioning, by default ‘hive’

  • filters (str, optional) – filter (apache arrow implmentation), by default None

  • filesystem (str, optional) – filesystem (apache arrow implmentation), by default None

Returns

Return type

Time_Series_Transformer

make_identical_sequence(inputLabels, windowSize, suffix=None, verbose=0, n_jobs=1)[source]

make_identical_sequence making sequences having same data

this function will make same data for a givne sequence. it could be useful for category data in deep learning.

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • windowSize (int) – the length of sequence

  • suffix (str, optional) – the suffix of new data, by default None

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_label(key, collectionKey=None)[source]

make_label make label data

it will turn the data into label. when using io functions, specifing sepLabel parameter can seperate label and data.

Parameters
  • key (str or numeric data) – the target data name

  • collectionKey (str or numeric data, optional) – the target collection, if None, all collection is selected, by default None

Returns

Return type

self

make_lag(inputLabels, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]

make_lag making lag data for a given list of data

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • lagNum (int) – the target lag period to make

  • suffix (str, optional) – the suffix of new data, by default None

  • fillMissing (object, optional) – the data for filling missing data, by default np.nan

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lag_sequence(inputLabels, windowSize, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]

make_lag_sequence making lag sequence data

this function could be useful for deep learning.

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • windowSize (int) – the length of sequence

  • lagNum (int) – the lag period of sequence

  • suffix (str, optional) – the suffix of new data, by default None

  • fillMissing (object, optional) – the data for filling missing data, by default np.nan

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lead(inputLabels, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]

make_lead make_lead making lead data for a given list of data

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • leadNum (int) – the target lead period to make

  • suffix (str, optional) – the suffix of new data, by default None

  • fillMissing (object, optional) – the data for filling missing data, by default np.nan

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lead_sequence(inputLabels, windowSize, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]

make_lead_sequence making lead sequence data

this function could be useful for deep learning.

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • windowSize (int) – the length of sequence

  • leadNum (int) – the lead period of sequence

  • suffix (str, optional) – the suffix of new data, by default None

  • fillMissing (object, optional) – the data for filling missing data, by default np.nan

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_stack_sequence(inputLabels, newName, axis=- 1, verbose=0, n_jobs=1)[source]

make_stack_sequence stacking sequences data

making multiple seqeunce data into one on the given axis

Parameters
  • inputLabels (str, numeric or list of str, or numeric) – the name of input data

  • newName (str) – new name for the stacking data

  • axis (int, optional) – the axis for stacking (numpy stack implmentation), by default -1

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

[description]

Return type

[type]

pad_different_category_time(fillMissing=nan)[source]

pad time length if mainCategoryCol is not specified, this function has no function.

Parameters

fillMissing (object, optional) – data for filling paded data, by default np.nan

Returns

Return type

self

remove_category(categoryName)[source]

remove_category remove a specific category data

Parameters

categoryName (str or numeric data) – the target category to be removed

Returns

Return type

self

remove_different_category_time()[source]

remove different time index for category if mainCategoryCol is not specified, this function has no function. :returns: :rtype: self

remove_feature(colName)[source]

remove_feature remove certain data or labels

Parameters

colName (str or numeric) – target column or data to be removed

Returns

Return type

self

to_arrow_table(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]

to_arrow_table output data as apache arrow table format

Parameters
  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

arrow table

to_dict()[source]

to_dict output data as dictionary list

Returns

Return type

dict of list

to_feather(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version=1, chunksize=None)[source]

to_feather output data into feather format

Parameters
  • dirPaths (str) – directory of output data

  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

  • version (int, optional) – fether version (apache arrow implmentation), by default 1

  • chunksize (int, optional) – chunksize for output (apache arrow implmentation), by default None

to_numpy(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]

to_numpy output data into numpy format

Parameters
  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

numpy ndArray

to_pandas(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]

to_pandas output data into pandas dataFrame

Parameters
  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

pandas dataFrame

to_parquet(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version='1.0', isDataset=False, partition_cols=None)[source]

to_parquet output data into parquet format

Parameters
  • dirPaths (str) – directory of output data

  • expandCategory (bool, optional) – whether to expand category, by default False

  • expandTime (bool, optional) – whether to expand time index column, by default False

  • preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’

  • sepLabel (bool, optional) – whether to seperate label data, by default False

  • version (str, optional) – parquet version (apache arrow implmentation), by default ‘1.0’

  • isDataset (bool, optional) – whether to output data as dataset format (apache arrow implmentation), by default False

  • partition_cols (str, optional) – whether to partition data (apache arrow implmentation), by default None

transform(inputLabels, newName, func, n_jobs=1, verbose=0, backend='loky', *args, **kwargs)[source]

transform the wrapper of functions performing data manipulation

This function provides a way to do different data manipulation. The output data should be either pandas dataFrame, numpy ndArray, or list of dict. Also, the data should have the same time length as the original data.

Parameters
  • inputLabels (str, numeric data or list of data or numeric data) – the input data columns passing to function

  • newName (str) – the output data name or prefix if the out function provides the new name, it will automatically become prefix

  • func (function) – the data manipulation function

  • n_jobs (int, optional) – joblib implemention, only used when mainCategoryCol is given, by default 1

  • verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0

  • backend (str, optional) – joblib implmentation only used when mainCategoryCol is given, by default ‘loky’

Returns

Return type

self