time_series_transform package¶

Subpackages¶

Module contents¶

class time_series_transform.Stock_Transformer(time_series_data, time_seriesIx, symbolIx, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶

Bases: time_series_transform.transform_core_api.time_series_transformer.Time_Series_Transformer

classmethod from_arrow_table(arrow_table, timeSeriesCol, symbolIx, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶

from_arrow_table [summary]

[extended_summary]

Parameters

arrow_table (arrow table) – input data
timeSeriesCol (str or numeric) – time series column name
symbolIx (str or numeric) – main category name
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
High (str or int, optional) – the index or name for High, by default ‘High’
Low (str or int, optional) – the index or name for Low, by default ‘Low’
Close (str or int, optional) – the index or name for Close, by default ‘Close’
Open (str or int, optional) – the index or name for Open, by default ‘Open’
Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

classmethod from_feather(feather_dir, timeSeriesCol, symbolIx, symbolName=None, columns=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶

from_feather import data from feather

Parameters

feather_dir (str) – directory of feather file
timeSeriesCol (str or numeric) – time series column name
symbolIx (str or numeric) – main category name
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None
High (str or int, optional) – the index or name for High, by default ‘High’
Low (str or int, optional) – the index or name for Low, by default ‘Low’
Close (str or int, optional) – the index or name for Close, by default ‘Close’
Open (str or int, optional) – the index or name for Open, by default ‘Open’
Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

classmethod from_numpy(numpyData, timeSeriesCol, mainCategoryCol, High, Low, Close, Open, Volume, symbolName=None)[source]¶

from_numpy from_numpy import data from numpy

Parameters

numpyData (numpy ndArray) – input data
timeSeriesCol (int) – index of time series column
mainCategoryCol (int) – index of main category column
High (int, optional) – the column index for High, by default ‘High’
Low (int, optional) – the column index for Low, by default ‘Low’
Close (int, optional) – the column index for Close, by default ‘Close’
Open (int, optional) – the column index for Open, by default ‘Open’
Volume (int, optional) – the column index for Volume, by default ‘Volume’
symbolName (str or numeric, option) – ticker name only used when single stock, by default None

Returns

Return type

Stock_Transformer

classmethod from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶

from_pandas import data from pandas dataFrame

Parameters

pandasFrame (pandas DataFrame) – input data
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
High (str or int, optional) – the column name for High, by default ‘High’
Low (str or int, optional) – the column name for Low, by default ‘Low’
Close (str or int, optional) – the column name for Close, by default ‘Close’
Open (str or int, optional) – the column name for Open, by default ‘Open’
Volume (str or int, optional) – the column name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

classmethod from_parquet(parquet_dir, timeSeriesCol, symbolIx, symbolName=None, columns=None, partitioning='hive', filters=None, filesystem=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶

from_parquet import data from parquet file

Parameters

parquet_dir (str) – directory of parquet file
timeSeriesCol (str or numeric) – time series column name
symbolIx (str or numeric) – main category name
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None
partitioning (str, optional) – type of partitioning, by default ‘hive’
filters (str, optional) – filter (apache arrow implmentation), by default None
filesystem (str, optional) – filesystem (apache arrow implmentation), by default None
High (str or int, optional) – the index or name for High, by default ‘High’
Low (str or int, optional) – the index or name for Low, by default ‘Low’
Close (str or int, optional) – the index or name for Close, by default ‘Close’
Open (str or int, optional) – the index or name for Open, by default ‘Open’
Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

classmethod from_stock_engine_date(symbols, start_date, end_date, engine, n_threads=8, *args, **kwargs)[source]¶

from_stock_engine_date [summary]

[extended_summary]

Parameters

symbols (str or list) – ticker name
start_date (str) – start of the data format: “%Y-%m-%d”, eg “2020-02-20”
end_date (str) – end of the data format: “%Y-%m-%d”, eg “2020-02-20”
engine (['yahoo','investing']) – fetching api
n_threads (int, optional) – multi-thread fetching support only when symbols is a list, by default 8

Returns

Return type

Stock_Transformer

classmethod from_stock_engine_intraday(symbols, start_date, end_date, engine='yahoo', interval='1m', n_threads=8, *args, **kwargs)[source]¶

classmethod from_stock_engine_period(symbols, period, engine, n_threads=8, *args, **kwargs)[source]¶

from_stock_engine_period fetching data from online

the current engine support yfinance and investpy

Parameters

symbols (str or list) – ticker name
period (str) – period of the data for example, 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
engine (['yahoo','investing']) – fetching api
n_threads (int, optional) – multi-thread fetching support only when symbols is a list, by default 8

Returns

Return type

Stock_Transformer

classmethod from_time_series_transformer(time_series_transformer, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶

from_time_series_transformer making Stock_Transformer from Time_Series_Transformer

Parameters

time_series_transformer (Time_Series_Transformer) – input data
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
High (str or int, optional) – the index or name for High, by default ‘High’
Low (str or int, optional) – the index or name for Low, by default ‘Low’
Close (str or int, optional) – the index or name for Close, by default ‘Close’
Open (str or int, optional) – the index or name for Open, by default ‘Open’
Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’

Returns

Return type

Stock_Transformer

get_technial_indicator(strategy, n_jobs=1, verbose=10, backend='loky')[source]¶

get_technical_indicator making different technical indicator

pandas-ta implmentation https://github.com/twopirllc/pandas-ta

Parameters

strategy (Strategy) – pandas-ta strategy
n_jobs (int, optional) – number of processes (joblib), by default 1
verbose (int, optional) – log level (joblib), by default 0
backend (str, optional) – backend type (joblib), by default ‘loky’

Returns

Return type

self

class time_series_transform.Time_Series_Transformer(data, timeSeriesCol, mainCategoryCol=None)[source]¶

Bases: object

dropna(categoryKey=None)[source]¶

dropna drop null values

remove null values for all or a specific category

Parameters: categoryKey (str or numeric, optional) – if None all category will be chosen, by default None
Returns
Return type: self

classmethod from_arrow_table(arrow_table, timeSeriesCol, mainCategoryCol)[source]¶

from_arrow_table import data from apache arrow table

Parameters

arrow_table (arrow table) – input data
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name

Returns

Return type

Time_Series_Transformer

classmethod from_feather(feather_dir, timeSeriesCol, mainCategoryCol, columns=None)[source]¶

from_feather import data from feather

Parameters

feather_dir (str) – directory of feather file
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None

Returns

Return type

Time_Series_Transformer

classmethod from_numpy(numpyData, timeSeriesCol, mainCategoryCol)[source]¶

from_numpy import data from numpy

Parameters

numpyData (numpy ndArray) – input data
timeSeriesCol (int) – index of time series column
mainCategoryCol (int) – index of main category column

Returns

Return type

Time_Series_Transformer

classmethod from_pandas(pandasFrame, timeSeriesCol, mainCategoryCol)[source]¶

from_pandas import data from pandas dataFrame

Parameters

pandasFrame (pandas DataFrame) – input data
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name

Returns

Return type

Time_Series_Transformer

classmethod from_parquet(parquet_dir, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]¶

from_parquet import data from parquet file

Parameters

parquet_dir (str) – directory of parquet file
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None
partitioning (str, optional) – type of partitioning, by default ‘hive’
filters (str, optional) – filter (apache arrow implmentation), by default None
filesystem (str, optional) – filesystem (apache arrow implmentation), by default None

Returns

Return type

Time_Series_Transformer

make_identical_sequence(inputLabels, windowSize, suffix=None, verbose=0, n_jobs=1)[source]¶

make_identical_sequence making sequences having same data

this function will make same data for a givne sequence. it could be useful for category data in deep learning.

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
suffix (str, optional) – the suffix of new data, by default None
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_label(key, collectionKey=None)[source]¶

make_label make label data

it will turn the data into label. when using io functions, specifing sepLabel parameter can seperate label and data.

Parameters

key (str or numeric data) – the target data name
collectionKey (str or numeric data, optional) – the target collection, if None, all collection is selected, by default None

Returns

Return type

self

make_lag(inputLabels, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶

make_lag making lag data for a given list of data

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
lagNum (int) – the target lag period to make
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lag_sequence(inputLabels, windowSize, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶

make_lag_sequence making lag sequence data

this function could be useful for deep learning.

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
lagNum (int) – the lag period of sequence
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lead(inputLabels, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶

make_lead make_lead making lead data for a given list of data

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
leadNum (int) – the target lead period to make
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_lead_sequence(inputLabels, windowSize, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶

make_lead_sequence making lead sequence data

this function could be useful for deep learning.

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
leadNum (int) – the lead period of sequence
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

Return type

self

make_stack_sequence(inputLabels, newName, axis=- 1, verbose=0, n_jobs=1)[source]¶

make_stack_sequence stacking sequences data

making multiple seqeunce data into one on the given axis

Parameters

inputLabels (str, numeric or list of str, or numeric) – the name of input data
newName (str) – new name for the stacking data
axis (int, optional) – the axis for stacking (numpy stack implmentation), by default -1
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1

Returns

[description]

Return type

[type]

pad_different_category_time(fillMissing=nan)[source]¶

pad time length if mainCategoryCol is not specified, this function has no function.

Parameters: fillMissing (object, optional) – data for filling paded data, by default np.nan
Returns
Return type: self

remove_category(categoryName)[source]¶

remove_category remove a specific category data

Parameters: categoryName (str or numeric data) – the target category to be removed
Returns
Return type: self

remove_different_category_time()[source]¶: remove different time index for category if mainCategoryCol is not specified, this function has no function. :returns: :rtype: self

remove_feature(colName)[source]¶

remove_feature remove certain data or labels

Parameters: colName (str or numeric) – target column or data to be removed
Returns
Return type: self

to_arrow_table(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶

to_arrow_table output data as apache arrow table format

Parameters

expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

arrow table

to_dict()[source]¶

to_dict output data as dictionary list

Returns
Return type: dict of list

to_feather(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version=1, chunksize=None)[source]¶

to_feather output data into feather format

Parameters

dirPaths (str) – directory of output data
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
version (int, optional) – fether version (apache arrow implmentation), by default 1
chunksize (int, optional) – chunksize for output (apache arrow implmentation), by default None

to_numpy(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶

to_numpy output data into numpy format

Parameters

expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

numpy ndArray

to_pandas(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶

to_pandas output data into pandas dataFrame

Parameters

expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False

Returns

Return type

pandas dataFrame

to_parquet(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version='1.0', isDataset=False, partition_cols=None)[source]¶

to_parquet output data into parquet format

Parameters

dirPaths (str) – directory of output data
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
version (str, optional) – parquet version (apache arrow implmentation), by default ‘1.0’
isDataset (bool, optional) – whether to output data as dataset format (apache arrow implmentation), by default False
partition_cols (str, optional) – whether to partition data (apache arrow implmentation), by default None

transform(inputLabels, newName, func, n_jobs=1, verbose=0, backend='loky', *args, **kwargs)[source]¶

transform the wrapper of functions performing data manipulation

This function provides a way to do different data manipulation. The output data should be either pandas dataFrame, numpy ndArray, or list of dict. Also, the data should have the same time length as the original data.

Parameters

inputLabels (str, numeric data or list of data or numeric data) – the input data columns passing to function
newName (str) – the output data name or prefix if the out function provides the new name, it will automatically become prefix
func (function) – the data manipulation function
n_jobs (int, optional) – joblib implemention, only used when mainCategoryCol is given, by default 1
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
backend (str, optional) – joblib implmentation only used when mainCategoryCol is given, by default ‘loky’

Returns

Return type

self