Hi,
I'm Harshit Hemant Gupta

I'm a

Data Engineer Intern @ Bloom | MS Computer Science @ IUB | Expertise in Software Development, Data Science, and Data Engineer.

About Me

Hi, I'm Harshit Gupta, a dynamic Computer Science professional with a strong background in software development, and a data enthusiast. πŸ’» I bring a diverse skill set and deep industry understanding. πŸ“Š Committed to excellence and driven by a passion for technology, πŸš€ I seek collaborations with like-minded professionals and organizations to bring technical expertise and creativity to projects. 🀝 Let's connect and explore new horizons together! 🌟

Professional Experience

May, 2024 - Present

Data Engineer Intern - Bloom Insurance Agency

Bloomington, Indiana

Bitbucket, NLP, Power BI, Python, SQL, SSIS, SSMS

  • πŸš€ Implemented ETL pipeline to process over ~500 GB data to extract insights on insurance policies using Python and SSIS.
  • πŸ” Integrated SSMS, and NLP Techniques for developing data applications to enhance policy analysis capability by ~10%.
  • πŸ“Š Conducted over 25 analyses, and designed 10+ interactive dashboards using Power BI dashboards to provide visualization on AQE reports using DAX to drive revenue growth by 20 percent.
  • 🐞 Debugged code for Broker product implementing unit and integration test automation to enhance operational resilience.

January, 2024 - Present

Software Data Engineer - Part time

Indiana University Bloomington - Bloomington, Indiana

Jetstream, Pyspark, React, JavaScript, Java, Spring Boot, ETL

  • πŸš€ Enhanced campus-wide operational efficiency by designing microservices and implementing automated scripts, utilizing ServiceNow to resolve technical challenges, which resulted in a 40% reduction in response times.
  • πŸ’» Developed scalable system APIs for Student Technology Centers, resulting in a 40% increase in multi-tier web application uptime and seamless functionality for over 45,000 users monthly using Java and Spring Boot.
  • ⚑ Developed and optimized ELT pipelines using Apache Spark consolidating data and enhancing scalability for Technology Centers ingesting Jetstream2 improving data analysis performance and uptime for over 60,000 users daily.

December, 2022 - July, 2023

Software Developer - Data Team

Mahindra & Mahindra Financial Services - Pune, India

  • πŸ“Š Programmed a customer risk assessment scorecard model to help credit underwriters for loan products, reducing approval delay time by ~65%, and ensuring data privacy and compliance with RBI and ISO/IEC 27001.
  • πŸ€– Scripted chatbot integrating LLMs using GPT-3.5-turbo APIs to provide answers about company policies, deployed the solution using Lambda and Snowflake for data management, and API Gateway for managing low-latency API requests.
  • πŸ”„ Collaborated with stakeholders to develop ETL pipelines from Salesforce, S3, and APIs to Snowflake for customer leads management dashboard used by 5000+ executives using Python, and AWS Glue.
  • πŸ’Ό Developed finance data mart integrating LMS and LOS with 150 features on Snowflake, improving regulatory reporting efficiency by 90% and lowering operational costs via Lambda and S3 automation.
  • βš™οΈ Engineered PySpark ETL pipeline leveraging Debezium and AWS to process 2 million records for Change Data Capture (CDC), reducing processing time by 15% and improving reliability.
  • 🐳 Containerized applications using Docker and automated deployments with Jenkins (CI/CD), reducing downtime by 30-35%.

March, 2021 - December, 2022

Software Developer / Data Engineer

Piramal Capital & Housing Finance - Mumbai, India

  • πŸ› οΈ Designed an automated system using Python and macros in Excel for monthly masking information of 45,000+ customers.
  • πŸ” Constructed an anomaly detection workflow with Elasticsearch and Kafka to monitor API performance and generate real-time alerts for faulty APIs with an accuracy of ~80%.
  • πŸ“§ Constructed anomaly detection system which calculates error thresholds in the APIs and generates email notifications to alert about the faulty API in real-time with an accuracy of ~80%.
  • πŸ“‰ Streamlined API management and monitoring by leveraging Elasticsearch and Logstash reducing API downtime to ~10 minutes and improving overall system reliability.
  • πŸ‘₯ Led a team of 4 interns, delivering key milestones on time by managing tasks using JIRA, following Agile framework to demonstrate leadership skills.
  • πŸ” Reviewed code processes, utilizing SonarQube to rectify code smells, and vulnerabilities integrating with Jenkins for automated deployment reducing post-release defects by 20% writing unit and integration tests.

July, 2020 - March, 2021

Software (Python) Developer

Cogno AI, Mumbai, India

Mumbai, India

  • πŸ€– Enhanced customer engagement by 20% with 10+ chatbots for the clients, using the Rasa framework with natural language processing (NLP) and dialogue management in the Agile setting.
  • πŸ”„ Remodeled systems for clients using JavaScript and Postman implementing agile methodology.
  • πŸ“‹ Administered clients’ requirements to prepare flows and technical documentation using Agile for application developments.

Education

πŸ“… August, 2023 - May, 2025 (Expected)

πŸŽ“ Master of Science in Computer Science - Indiana University, Bloomington, Indiana

πŸ“š GPA: 3.69/4

πŸ“… July, 2020 - July, 2023

πŸ”§ Gathering industry experiences. 😊

πŸ“… August, 2016 - October, 2020

πŸŽ“ Bachelor of Engineering in Computer Engineering - University of Mumbai, India

πŸ“š GPA: 8.62/10

Achievements

πŸ“… October 2024

Runner Up - Hack-A-House Ideathon 2024

πŸ“š Associated with Harvard University, UC Berkley, University of Utah

πŸ“… May 2021

Winner - TSEC Hall of Fame

πŸ“š Associated with Thadomal Shahani Engineering College

πŸ“… October 2019

Winner - Smart India Hackathon

πŸ“š Associated with Government of India

Tech Stack

Programming Languages & ETL Tools

  • Java
  • JavaScript
  • Python
  • R
  • React
  • πŸ”₯ PySpark
  • GIT
  • Jenkins
  • 🌬️ Airflow
  • ❄️ Snowflake
  • πŸ”· Databricks
  • πŸ“Š DBT

Databases and Data Management Systems

  • MongoDB
  • MySQL
  • PostgreSQL
  • Oracle DB
  • Redis
  • DynamoDB
  • πŸ”„ SSIS
  • πŸ–₯️ SSMS
  • πŸ“„ JSON
  • πŸ“„ XML
  • πŸ“Š Redshift
  • πŸ“¦ Parquet

Tools and Technologies

  • AWS
  • Azure
  • Django
  • 🌐 Apigee
  • Docker
  • Kubernetes
  • πŸ” Elasticsearch
  • πŸ”„ Logstash
  • πŸ“Š Kibana
  • Flask
  • ⚑ Fast API
  • JIRA
  • Kafka
  • πŸ“¬ Postman
  • 🐧 Linux
  • πŸ“ˆ Power BI
  • β˜• Spring Boot
  • πŸ”— GraphQL

Architectural/ Software Skills

  • πŸ—„οΈ Database Design
  • πŸ”§ DevOps
  • πŸ”§ Microservices
  • πŸ€– Generative AI
  • πŸ—οΈ System Design
  • πŸ› οΈ Data Engineering
  • πŸ“ˆ Data Science
  • πŸ“Š Data Analysis
  • πŸ“‰ Data Visualization
  • πŸ“‹ Business Intelligence

Projects

Health Insurance Market Analysis

Indians Diabetes Analysis

Detection of Credit Card Transactions Fraud

Credit Card Fraud Detection Using SMOTE ADASYN

Doodle Recognition Using Ensemble Technique

Doodle Recognition Using Ensemble Learning

Comic Analysis and Script Generation

Comic Analysis Using NLP Techniques

Face Detection and Recognition

Face detection and recognition Using Adaboost

Face Detection and Recognition

Real Time Stream Bank Lending Kafka

Transformer Based Customer Complaint Model

Transformer Based Customer Complaint Model

Face Detection and Recognition

MedChain Leveraging AI for Medical Applications

Face Detection and Recognition

AWS Based Medical Data Pipeline