Data Science Workshop: Python for Big Data
February 18, 2022 @ 2:00 pm – 5:00 pm
Python for Big Data Analytics
In recent years, Python has become one of the top programming languages for doing data analysis due to its inherent advantages such as simplicity, readability, portability, etc., However, Python is slow compared to C or Fortran, and it does not manage memory well. These limitations, with speed and memory management, may not be significant when analyzing small datasets, but they become bottlenecks when analyzing big datasets.
To address the challenges associated with big data analytics, the Python community developed and tested several techniques. In this workshop, we will go through some of these techniques including vectorization, parallelization, just in time compilation, and distributed task executions. We will do hands-on exercises to emphasize the following solutions.
How to speed up the data analysis?
What to do when the data set size exceeds the available physical memory?
How to distribute the workloads when doing machine learning for big data sets?
What is needed?
Laptop/Desktop with Internet connection
On-line resource or Laptop. Instructions for on-line resources will be given in the workshop.
Basic laptop usage. Basic knowledge of Python is helpful for doing the hands-on session.
Slides and materials:
Will be provided in the workshop
After registration, the zoom link will appear on the registration page and as well as in your confirmation email. Register on Eventbrite now!
This workshop is hosted by the Office of Advanced Research Computing (OARC), Rutgers University, organized by Bala Desinghu in collaboration with the Eastern Regional Network (ERN) and the New Jersey Big Data Alliance (NJBDA).
Participants are encouraged to attend with campus partners representing a variety of stakeholders for campus research and research computing (e.g., researchers, research computing professionals, students, staff, faculty, and practitioners, etc.).