Parth Shrivastava

Senior Data Engineer
Pune, IN.

About

Highly accomplished Senior Data Engineer with over 4 years of expertise in designing and optimizing robust, scalable data pipelines within cloud-native environments. Proven track record in developing real-time and batch ETL frameworks using PySpark, SQL, and Delta Lake on Azure Databricks and AWS, consistently reducing execution times by up to 60% and processing 1TB of data daily. Adept at leveraging advanced data engineering techniques to enhance performance, optimize costs, and deliver high-quality data solutions for complex analytical and business intelligence needs.

Work

Mediaocean
|

Software Engineer (Data Engineering)

Pune, Maharashtra, India

Summary

Designed and optimized end-to-end ETL pipelines across cloud environments, ensuring efficient data processing and significant cost reduction.

Highlights

Designed and optimized end-to-end ETL pipelines on Databricks using PySpark and Delta Lake, processing 1TB of records daily across Azure Data Lake Storage (ADLS) and AWS S3.

Optimized complex SQL queries and Spark jobs through partitioning, broadcast joins, and skew mitigation, cutting execution times by up to 60%.

Optimized storage and compute costs by implementing partitioning, bucketing, and Z-Ordering in Databricks, applying lifecycle policies in AWS S3 and Azure Blob Storage.

Collaborated with cross-functional teams to migrate critical datasets from on-prem to hybrid cloud environments (Azure), ensuring compliance with data governance and security standards.

Partnered with business stakeholders, analysts, and data scientists to translate reporting requirements into efficient data models and pipelines.

Connection Loops
|

Software Engineering Intern

Summary

Contributed to the development of AI-powered solutions for biomedical signal processing, focusing on arrhythmia detection and classification.

Highlights

Pre-processed and segmented over 50K ECG signals using NumPy and SciPy for arrhythmia detection, enhancing data readiness for analysis.

Built advanced deep learning models (1D CNN, Temporal Convolutional Networks) achieving a 94% F1-score on arrhythmia classification.

Deployed batch scoring pipelines to automate arrhythmia detection on new ECG data, improving diagnostic efficiency.

Education

Savitribai Phule Pune University
Pune, Maharashtra, India

Bachelor of Engineering

Computer Engineering

Grade: 9.57/10 CGPA

Courses

Advanced Data Structures

Embedded Systems and IoT

Artificial Intelligence and Robotics

Awards

Rising Star Award

Awarded By

Mediaocean Pvt.Ltd

Recognized for outstanding performance and significant contributions at Mediaocean Pvt.Ltd.

All India Rank 12, Robocon 2019

Awarded By

IIT Delhi

Achieved 12th rank in the national robotics competition Robocon 2019, representing Team Automatons at IIT Delhi with two robots.

Publications

Optimization of Multi Wavelength Drone Images Using Geo Reference Model

Published by

India - Intellectual Property

Summary

Secured a patent for an innovative method to optimize multi-wavelength drone images using a geo-reference model.

Disruptive Developments in Biomedical applications

Published by

Taylor Francis

Summary

Authored a chapter in the book 'Disruptive Developments in Biomedical applications', contributing to advanced research in the field.

Skills

Programming Languages

Python, SQL, Java.

Data Frameworks & Tools

Databricks, Spark (PySpark), Pandas, Delta Lake, Apache Airflow.

Cloud & Infrastructure

AWS, Azure (Data Lake Gen2, Event Hubs, Blob Storage), AWS S3, Azure Blob Storage.

Operating Systems

Windows, Linux.

Data & Analytics

ETL Pipelines, Data Lineage, Data Governance, Data Quality, Machine Learning, Automation, Data Testing.

DevOps & CI/CD

Git, Docker, Jenkins, CI/CD Pipelines.

Projects

AI-Powered Job Aggregator & Resume Tailoring Platform

Summary

Developed an end-to-end data pipeline for a job aggregator and resume tailoring platform, leveraging NLP and cloud technologies to provide real-time job discovery and resume optimization.