Dask: Scaling Data Science in Python

Dr. Dhavide Aruliah (Quansight LLC)

Start

March 3, 2020 - 12:30 pm

End

March 3, 2020 - 1:30 pm

Address

OnTechU, North Oshawa campus, UA 3230 View map

Speaker: Dr. Dhavide Aruliah (Quansight LLC)

Abstract: Data engineers and data scientists have benefited significantly from the proliferation of open-source packages supporting data visualization and exploratory data analysis and visualization of data in languages like Python or R. These tools permit practitioners to rapidly develop sophisticated models and algorithms through interactive data exploration. At the same time, storage costs have dropped and rates of data accumulation have increased, so they typically have to master highly specialized tools to manage larger and larger data sets especially when moving models into production. This often involves patching together frameworks with distinct interfaces creating many debugging and maintenance challenges.
Following some historical context, we’ll present Dask as a simple Python package to help with moving data science into production. Dask uses familiar Pythonic interfaces (notably NumPy & Pandas) to provide simple API extensions that work easily with out-of-core data sets and that capitalize on parallelism available. We present a straightforward introduction to using Dask with examples to illustrate how existing code manipulating NumPy arrays and Pandas DataFrames in memory can be extended to work on massive data sets cleanly.
Familiarity with the Python data science stack (e.g., NumPy, Pandas, etc) will be useful but not mandatory.

+ GOOGLE CALENDAR + ICAL IMPORT

Dask: Scaling Data Science in Python

Start

End

Address

Live Tweets

Events

Latest News

About

Quick Links

Latest News

CONTACT

Dask: Scaling Data Science in Python

Start

End

Address

Related Events

Adaptive Time-Sepping in the Numerical Solution of Stochastic Differential Equations and Applications

An agent based simulation of an online social network

Live Tweets

Events

Latest News