TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Follow publication

Making Sense of Big Data

Time Series Feature Extraction on (Really) Large Data Samples

Nils Braun
TDS Archive
Published in
8 min readDec 7, 2020

Photo by Chris Liverani on Unsplash

Entering tsfresh

from tsfresh import extract_featuresdf_features = extract_features(df, column_id="id", column_sort="time")

Challenge: Large Data Samples

from tsfresh.examples import robot_execution_failuresrobot_execution_failures.download_robot_execution_failures()
df, target = robot_execution_failures.load_robot_execution_failures()

Multiprocessing

Distributors

from tsfresh.utilities.distribution import ClusterDaskDistributordistributor = ClusterDaskDistributor(address="<dask-master>")
X = extract_features(df, distributor=distributor, ...)
$ dask-scheduler
Scheduler at: <dask-master>
$ dask-worker <dask-master>
Start worker at: ...
Registered to: <dask-master>

First Summary

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Nils Braun
Nils Braun

Written by Nils Braun

Python Enthusiast, Data Engineer/Scientist

Responses (1)

Write a response