Dask: Scaling Data Science in Python

Dr. Dhavide Aruliah (Quansight LLC)

Start

March 3, 2020 - 12:30 pm

End

March 3, 2020 - 1:30 pm

Address

OnTechU, North Oshawa campus, UA 3230   View map

 

Speaker: Dr. Dhavide Aruliah (Quansight LLC)

Abstract: Data engineers and data scientists have benefited significantly from the proliferation of open-source packages supporting data visualization and exploratory data analysis and visualization of data in languages like Python or R. These tools permit practitioners to rapidly develop sophisticated models and algorithms through interactive data exploration. At the same time, storage costs have dropped and rates of data accumulation have increased, so they typically have to master highly specialized tools to manage larger and larger data sets especially when moving models into production. This often involves patching together frameworks with distinct interfaces creating many debugging and maintenance challenges.
Following some historical context, we’ll present Dask as a simple Python package to help with moving data science into production. Dask uses familiar Pythonic interfaces (notably NumPy & Pandas) to provide simple API extensions that work easily with out-of-core data sets and that capitalize on parallelism available. We present a straightforward introduction to using Dask with examples to illustrate how existing code manipulating NumPy arrays and Pandas DataFrames in memory can be extended to work on massive data sets cleanly.
Familiarity with the Python data science stack (e.g., NumPy, Pandas, etc) will be useful but not mandatory.

About

The Modelling and Computational Science graduate program offers MSc. and PhD. projects in applied mathematics, physics, computational chemistry, nuclear engineering and marketing and logistics.

Latest News

CONTACT

Email: gradsecretary@science.uoit.ca

Address:

Ontario Tech University
2000 Simcoe Street North
Oshawa, Ontario L1G 0C5
Canada

TOP