time_series_transform package¶
Subpackages¶
- time_series_transform.io package
- Submodules
- time_series_transform.io.arrow module
- time_series_transform.io.base module
- time_series_transform.io.feather module
- time_series_transform.io.generator module
- time_series_transform.io.numpy module
- time_series_transform.io.pandas module
- time_series_transform.io.parquet module
- Module contents
- time_series_transform.plot package
- time_series_transform.sklearn package
- time_series_transform.stock_transform package
- time_series_transform.transform_core_api package
Module contents¶
-
class
time_series_transform.
Stock_Transformer
(time_series_data, time_seriesIx, symbolIx, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶ Bases:
time_series_transform.transform_core_api.time_series_transformer.Time_Series_Transformer
-
classmethod
from_arrow_table
(arrow_table, timeSeriesCol, symbolIx, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶ from_arrow_table [summary]
[extended_summary]
- Parameters
arrow_table (arrow table) – input data
timeSeriesCol (str or numeric) – time series column name
symbolIx (str or numeric) – main category name
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
High (str or int, optional) – the index or name for High, by default ‘High’
Low (str or int, optional) – the index or name for Low, by default ‘Low’
Close (str or int, optional) – the index or name for Close, by default ‘Close’
Open (str or int, optional) – the index or name for Open, by default ‘Open’
Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’
- Returns
- Return type
-
classmethod
from_feather
(feather_dir, timeSeriesCol, symbolIx, symbolName=None, columns=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶ from_feather import data from feather
- Parameters
feather_dir (str) – directory of feather file
timeSeriesCol (str or numeric) – time series column name
symbolIx (str or numeric) – main category name
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None
High (str or int, optional) – the index or name for High, by default ‘High’
Low (str or int, optional) – the index or name for Low, by default ‘Low’
Close (str or int, optional) – the index or name for Close, by default ‘Close’
Open (str or int, optional) – the index or name for Open, by default ‘Open’
Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’
- Returns
- Return type
-
classmethod
from_numpy
(numpyData, timeSeriesCol, mainCategoryCol, High, Low, Close, Open, Volume, symbolName=None)[source]¶ from_numpy from_numpy import data from numpy
- Parameters
numpyData (numpy ndArray) – input data
timeSeriesCol (int) – index of time series column
mainCategoryCol (int) – index of main category column
High (int, optional) – the column index for High, by default ‘High’
Low (int, optional) – the column index for Low, by default ‘Low’
Close (int, optional) – the column index for Close, by default ‘Close’
Open (int, optional) – the column index for Open, by default ‘Open’
Volume (int, optional) – the column index for Volume, by default ‘Volume’
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
- Returns
- Return type
-
classmethod
from_pandas
(pandasFrame, timeSeriesCol, mainCategoryCol, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶ from_pandas import data from pandas dataFrame
- Parameters
pandasFrame (pandas DataFrame) – input data
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
High (str or int, optional) – the column name for High, by default ‘High’
Low (str or int, optional) – the column name for Low, by default ‘Low’
Close (str or int, optional) – the column name for Close, by default ‘Close’
Open (str or int, optional) – the column name for Open, by default ‘Open’
Volume (str or int, optional) – the column name for Volume, by default ‘Volume’
- Returns
- Return type
-
classmethod
from_parquet
(parquet_dir, timeSeriesCol, symbolIx, symbolName=None, columns=None, partitioning='hive', filters=None, filesystem=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶ from_parquet import data from parquet file
- Parameters
parquet_dir (str) – directory of parquet file
timeSeriesCol (str or numeric) – time series column name
symbolIx (str or numeric) – main category name
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None
partitioning (str, optional) – type of partitioning, by default ‘hive’
filters (str, optional) – filter (apache arrow implmentation), by default None
filesystem (str, optional) – filesystem (apache arrow implmentation), by default None
High (str or int, optional) – the index or name for High, by default ‘High’
Low (str or int, optional) – the index or name for Low, by default ‘Low’
Close (str or int, optional) – the index or name for Close, by default ‘Close’
Open (str or int, optional) – the index or name for Open, by default ‘Open’
Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’
- Returns
- Return type
-
classmethod
from_stock_engine_date
(symbols, start_date, end_date, engine, n_threads=8, *args, **kwargs)[source]¶ from_stock_engine_date [summary]
[extended_summary]
- Parameters
- Returns
- Return type
-
classmethod
from_stock_engine_intraday
(symbols, start_date, end_date, engine='yahoo', interval='1m', n_threads=8, *args, **kwargs)[source]¶
-
classmethod
from_stock_engine_period
(symbols, period, engine, n_threads=8, *args, **kwargs)[source]¶ from_stock_engine_period fetching data from online
the current engine support yfinance and investpy
- Parameters
- Returns
- Return type
-
classmethod
from_time_series_transformer
(time_series_transformer, symbolName=None, High='High', Low='Low', Close='Close', Open='Open', Volume='Volume')[source]¶ from_time_series_transformer making Stock_Transformer from Time_Series_Transformer
- Parameters
time_series_transformer (Time_Series_Transformer) – input data
symbolName (str or numeric, option) – ticker name only used when single stock, by default None
High (str or int, optional) – the index or name for High, by default ‘High’
Low (str or int, optional) – the index or name for Low, by default ‘Low’
Close (str or int, optional) – the index or name for Close, by default ‘Close’
Open (str or int, optional) – the index or name for Open, by default ‘Open’
Volume (str or int, optional) – the index or name for Volume, by default ‘Volume’
- Returns
- Return type
-
get_technial_indicator
(strategy, n_jobs=1, verbose=10, backend='loky')[source]¶ get_technical_indicator making different technical indicator
pandas-ta implmentation https://github.com/twopirllc/pandas-ta
-
classmethod
-
class
time_series_transform.
Time_Series_Transformer
(data, timeSeriesCol, mainCategoryCol=None)[source]¶ Bases:
object
-
dropna
(categoryKey=None)[source]¶ dropna drop null values
remove null values for all or a specific category
- Parameters
categoryKey (str or numeric, optional) – if None all category will be chosen, by default None
- Returns
- Return type
self
-
classmethod
from_arrow_table
(arrow_table, timeSeriesCol, mainCategoryCol)[source]¶ from_arrow_table import data from apache arrow table
- Parameters
- Returns
- Return type
-
classmethod
from_feather
(feather_dir, timeSeriesCol, mainCategoryCol, columns=None)[source]¶ from_feather import data from feather
- Parameters
- Returns
- Return type
-
classmethod
from_numpy
(numpyData, timeSeriesCol, mainCategoryCol)[source]¶ from_numpy import data from numpy
- Parameters
- Returns
- Return type
-
classmethod
from_pandas
(pandasFrame, timeSeriesCol, mainCategoryCol)[source]¶ from_pandas import data from pandas dataFrame
- Parameters
- Returns
- Return type
-
classmethod
from_parquet
(parquet_dir, timeSeriesCol, mainCategoryCol, columns=None, partitioning='hive', filters=None, filesystem=None)[source]¶ from_parquet import data from parquet file
- Parameters
parquet_dir (str) – directory of parquet file
timeSeriesCol (str or numeric) – time series column name
mainCategoryCol (str or numeric) – main category name
columns (str or numeric, optional) – target columns (apache arrow implmentation), by default None
partitioning (str, optional) – type of partitioning, by default ‘hive’
filters (str, optional) – filter (apache arrow implmentation), by default None
filesystem (str, optional) – filesystem (apache arrow implmentation), by default None
- Returns
- Return type
-
make_identical_sequence
(inputLabels, windowSize, suffix=None, verbose=0, n_jobs=1)[source]¶ make_identical_sequence making sequences having same data
this function will make same data for a givne sequence. it could be useful for category data in deep learning.
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
suffix (str, optional) – the suffix of new data, by default None
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_label
(key, collectionKey=None)[source]¶ make_label make label data
it will turn the data into label. when using io functions, specifing sepLabel parameter can seperate label and data.
-
make_lag
(inputLabels, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶ make_lag making lag data for a given list of data
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
lagNum (int) – the target lag period to make
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_lag_sequence
(inputLabels, windowSize, lagNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶ make_lag_sequence making lag sequence data
this function could be useful for deep learning.
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
lagNum (int) – the lag period of sequence
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_lead
(inputLabels, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶ make_lead make_lead making lead data for a given list of data
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
leadNum (int) – the target lead period to make
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_lead_sequence
(inputLabels, windowSize, leadNum, suffix=None, fillMissing=nan, verbose=0, n_jobs=1)[source]¶ make_lead_sequence making lead sequence data
this function could be useful for deep learning.
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
windowSize (int) – the length of sequence
leadNum (int) – the lead period of sequence
suffix (str, optional) – the suffix of new data, by default None
fillMissing (object, optional) – the data for filling missing data, by default np.nan
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
- Return type
self
-
make_stack_sequence
(inputLabels, newName, axis=- 1, verbose=0, n_jobs=1)[source]¶ make_stack_sequence stacking sequences data
making multiple seqeunce data into one on the given axis
- Parameters
inputLabels (str, numeric or list of str, or numeric) – the name of input data
newName (str) – new name for the stacking data
axis (int, optional) – the axis for stacking (numpy stack implmentation), by default -1
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
n_jobs (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 1
- Returns
[description]
- Return type
[type]
-
pad_different_category_time
(fillMissing=nan)[source]¶ pad time length if mainCategoryCol is not specified, this function has no function.
- Parameters
fillMissing (object, optional) – data for filling paded data, by default np.nan
- Returns
- Return type
self
-
remove_category
(categoryName)[source]¶ remove_category remove a specific category data
- Parameters
categoryName (str or numeric data) – the target category to be removed
- Returns
- Return type
self
-
remove_different_category_time
()[source]¶ remove different time index for category if mainCategoryCol is not specified, this function has no function. :returns: :rtype: self
-
remove_feature
(colName)[source]¶ remove_feature remove certain data or labels
- Parameters
colName (str or numeric) – target column or data to be removed
- Returns
- Return type
self
-
to_arrow_table
(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶ to_arrow_table output data as apache arrow table format
- Parameters
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
- Returns
- Return type
arrow table
-
to_feather
(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version=1, chunksize=None)[source]¶ to_feather output data into feather format
- Parameters
dirPaths (str) – directory of output data
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
version (int, optional) – fether version (apache arrow implmentation), by default 1
chunksize (int, optional) – chunksize for output (apache arrow implmentation), by default None
-
to_numpy
(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶ to_numpy output data into numpy format
- Parameters
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
- Returns
- Return type
numpy ndArray
-
to_pandas
(expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False)[source]¶ to_pandas output data into pandas dataFrame
- Parameters
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
- Returns
- Return type
pandas dataFrame
-
to_parquet
(dirPaths, expandCategory=False, expandTime=False, preprocessType='ignore', sepLabel=False, version='1.0', isDataset=False, partition_cols=None)[source]¶ to_parquet output data into parquet format
- Parameters
dirPaths (str) – directory of output data
expandCategory (bool, optional) – whether to expand category, by default False
expandTime (bool, optional) – whether to expand time index column, by default False
preprocessType ({'ignore','pad','remove'}, optional) – the preprocessing type before out data, by default ‘ignore’
sepLabel (bool, optional) – whether to seperate label data, by default False
version (str, optional) – parquet version (apache arrow implmentation), by default ‘1.0’
isDataset (bool, optional) – whether to output data as dataset format (apache arrow implmentation), by default False
partition_cols (str, optional) – whether to partition data (apache arrow implmentation), by default None
-
transform
(inputLabels, newName, func, n_jobs=1, verbose=0, backend='loky', *args, **kwargs)[source]¶ transform the wrapper of functions performing data manipulation
This function provides a way to do different data manipulation. The output data should be either pandas dataFrame, numpy ndArray, or list of dict. Also, the data should have the same time length as the original data.
- Parameters
inputLabels (str, numeric data or list of data or numeric data) – the input data columns passing to function
newName (str) – the output data name or prefix if the out function provides the new name, it will automatically become prefix
func (function) – the data manipulation function
n_jobs (int, optional) – joblib implemention, only used when mainCategoryCol is given, by default 1
verbose (int, optional) – joblib implmentation only used when mainCategoryCol is given, by default 0
backend (str, optional) – joblib implmentation only used when mainCategoryCol is given, by default ‘loky’
- Returns
- Return type
self
-