I am Shubham Mahalank

Data Scientist

Name: Shubham Mahalank

Profile: Data Scientist

Email: shubhammahalank@gmail.com

About me

I'm a Data Scientist with an insatiable intellectual curiosity, and the ability to mine hidden gems located within large sets of structured and unstructured datasets. Able to leverage a heavy dose of Statistics and Machine Learning with visualization and a healthy sense of exploration.

Skillset
Languages
Python 85%
SQL 80%
Tableau 70%
Power BI 80%
Tools and Frameworks

  • Python - NumPy, Pandas, SciPy, Scikit-learn, SpaCy, NLTK, OpenCV, Beautifulsoup4, PySpark
  • Python (Visualization) - Matplotlib, Seaborn, Plotly, Folium, Streamlit
  • Machine Learning - Linear and Non-Linear Regression, Logistic Regression, KNN, SVM, Decision Tree, Random Forest, AdaBoost, Gradient Boost, XgBoost, K-Means, CART, Neural Network, Naïve Bayes
  • Deep Learning - Tensorflow, Keras
  • Statistics - Hypothesis Test, T-Test, PCA, Monte Carlo Simulation, A/B Testing, ARIMA, ANOVA, Chi-Square
  • SQL - MySQL, PostgreSQL, Snowflake, SQLite
  • NoSQL - Cassandra, MongoDB
  • Big Data - Spark (SparkSQL, SparkML), Hadoop (HDFS, MapReduce)
  • Tableau
  • Microsoft Power Platform - BI, Apps, Automate
  • AWS - EC2, S3, RDS, DynamoDB, SageMaker
  • Containers - Dockers, Kubernetes
  • Project Management - Agile and Scrum, Git, Jira, Jenkins

Operating Systems

  • Linux (Redhat, CentOS, Ubuntu)
  • MacOS
  • Windows

8

YEARS OF EXPERIENCE

11

PROJECTS

8

INTERNATIONAL RESEARCH PUBLICATIONS

184

CITATIONS
Experience

Caterpillar India
Data Scientist II Oct 2022 - Present
Bangalore KA, India

    Description:

    Responsibilities:
  • Developing ML Pipeline (Regression, Neural Network) to derive insights for Condition Monitoring & perform CVA (customer value agreement) incremental analysis
  • Applied statistics (clustering & anomaly detection) to enhance the Automated Fault Code (AFC) PSE (Prioritized Service Events) development, improving recommendation accuracy & driving $15M in Closed Won revenue
  • Developed custom functions using the Tableau API to strengthen data governance practices within Tableau, ensuring standardized data access, consistency, & security. This initiative improved operational efficiency, & reinforced compliance with governance standards, supporting long-term data reliability across reporting workflows
  • Created & optimized Power BI dashboards using advanced DAX calculations & custom semantic models to streamline data flows, improve report accuracy, & enhance decision-making for global teams
  • Developed data-driven solutions in Power BI, leveraging advanced DAX & data modeling techniques to identify inefficiencies, recommend improvements, & automate reporting processes, boosting overall team productivity
  • Collaborated with cross-functional teams to migrate & integrate diverse data sources into Power BI, streamlining reporting processes & improving data accessibility for global stakeholders, resulting in a 20% reduction in report generation time & enhanced decision-making capabilities
  • Gained proficiency in Power Apps & Power Automate, collaborated with the team to develop & implement the Target Setting Dashboard in Power BI
  • Leveraged Snowflake for data management & optimization, contributing to the consolidation of multiple data pipelines using DBT (Data Build Tool), improving data accessibility, performance, and reporting efficiency across projects


Caterpillar
Data Scientist Mar 2021 - Jul 2022
Peoria IL, USA

    Description:
    The role is in Caterpillar Electric Power Division in an analytics team. The objectives of role include onboarding new Caterpillar products onto Remote Asset Monitoring (RAM) Platform. Develop, maintain and deploy data pipelines on various data science projects.

    Responsibilities:
  • Leveraged Python (Pandas, SciPy) to streamline data collection processes, perform advanced analytics, thus reducing analysis time & enhancing data quality for better decision-making in the form of Tableau dashboard
  • Applied statistical analysis (ANOVA, regression) & ML models to enhance the RAM Premium offering, streamlining data visualization & event tracking
  • Engineered data-driven solutions, to refine templates & optimize server load, improving efficiency & quality in OSI PI AF
  • Collaborated with MWM to integrate advanced analytics like anomaly detection into the RAM Premium platform, enhancing asset monitoring & fault prediction accuracy
  • Developed data pipeline to extract data from Azure InfluxDB and OSI PI Rest API to track the real-time movement of rental products in the form of a Tableau dashboard, thus assisting users in tracking the movement.
  • Worked on existing data pipelines to reduce the latency of delivering real-time email notifications.
  • Used OSI Tools like PI System Explorer, PI System Management Tool, and PI Vision to onboard new products onto the CAT Remote Asset Monitoring platform with interactive visualizations.
  • Designed ML pipeline using an Anomaly detection algorithm to predict the critical parameters of the product and notify users in the form of an email in advance to prevent product damage.


Marlabs, Inc
Data Scientist Feb 2020 - Feb 2021
Piscataway NJ, USA


    Description:
    The project is with Hilton Corporation, client of Marlabs, Inc. Hilton is an American multinational hospitality company that manages a broad portfolio of Hotels and Resorts, headquartered in Virginia. The project is under Data Science Department focusing on improving customer experience and maximizing the business model.The main objective of project is to forecast the future traffic of customers at various properties from a data which has a historic value of hotels for each day over the period of three years.

    Responsibilities:
  • Deploy, automate and maintain ETL/ML pipelines in existing Hadoop ecosystem for a client.
  • Applied time series forecasting models (e.g., ARIMA, Facebook Prophet) to predict customer traffic trends, providing stakeholders with accurate, data-driven insights to optimize maintenance schedules & reduce unplanned downtime
  • Developed a time series forecasting model that accounted for seasonality & other temporal features, utilizing RMSE as the evaluation metric. The model successfully predicted traffic patterns with over 87% accuracy, enabling more efficient resource allocation & operational planning


Quality Theorem
Software Engineer Jul 2019 - Jan 2020
Hamilton NJ, USA


    Description:
    Quality Theorem is an Information Technology Service Company headquartered in New Jersey, USA. The Data Science and Analytics team at Quality Theorem aims at providing smart and innovative models to the clients. The main goal of the project was to build a chatbot for customer support and also understand the text from reviews to perform Named-Entity Analysis, Sentiment Analysis and Summarizer using Natural Language Processing.

    Responsibilities:
  • Developed automated Chatbot for an e-commerce client and used sentiment analysis to infer product reviews.
  • Improved data mining process, resulting in 20% decrease in time needed to infer insights from customer data.
  • Used predictive analytics and data mining techniques to forecast sales of new products with 95% accuracy rate.
  • Deployed Chatbot using Oozie that helped to improve customer service satisfaction of the client significantly.


Havenow Foodtech
Data Scientist May 2016 - Jul 2017
Hubballi KA, India


    Description:
    The main objective of a project is to build recommendation system and predictive models to identify key products based on multiple factors and to offer solutions based on insights from data analysis. Customer information data was analyzed and predicted by applying machine learning algorithms and statistical methods.

    Responsibilities:
  • Implemented a real-time recommendation model using applied statistics and ML to increase customer engagement.
  • Designed ETL pipeline on customer data by extracting from PostgreSQL database & performed feature engineering.
  • Developed model resulted in increased customer engagement on website/app and thus increasing sales by 51%.


KLE Technological University, BVB Campus
Research Assistant – Data Science May 2015 - May 2016
Hubballi KA, India


    Description:
    The main objective of a project is to improve the existing object detection model for Traffic Department to reduce traffic violations using Computer Vision. Also, develop novel algorithms to reduce traffic congestions in urban areas by controlling traffic signals using traffic density index.

    Responsibilities:
  • Collected, interpreted and analyzed raw video data provided by the Transport Department, Govt. of India.
  • Employed Computer Vision to detect vehicle registration number on live videos and deployed the application.
  • The application was able to perform 18% better than the existing model, resulting in 25% less traffic violations.


Education

New York University
Master of Science in Computer Engineering Sep 2017 - Dec 2019
New York NY, USA

  • Coursework:Applied Matrix Theory, Sensor-Based Robotics, Data Structures and Algorithms, Digital Signal Processing Lab, Systems Engineering, Probability and Stochastic Processes

B. V. B. College of Engineering and Technology
Bachelor of Engineering in Electronics & Communication Sep 2012 - May 2016
Hubli KA, India

  • Coursework: Linear Algebra, Single and Multivariate Calculus, Statistics and Probability,
    Internet of Things (IoT), Control Systems, Automotive Electronics, Operating Systems, Computer Communication Networks

Projects

Movie Recommendation System using Spark
Jul 2019 - Sep 2019 | New York NY, USA

  • Transformed raw data into PostgreSQL with custom made ETL application to prepare data for Machine Learning.
  • Formulated a model to recommend movies to a million users depending on their subscription and ratings.
  • Visualized the data using MDS technique and created a dissimilarity matrix to observe user clusters.

Life Insurance Assessment and Analysis
Mar 2019 - May 2019 | New York NY, USA

  • Developed a predictive model to accurately classify risk using an automated approach using XGBoost.
  • The model improved the buying rate of insurance by easing the application process by 30 % through automation.

The Battle of Neighborhoods
Mar 2019 - May 2019 | New York NY, USA

  • Designed a comparative analysis model by using location data provided by Foursquare by RESTful API calls.
  • Employed K-means clustering technique to determine the best suitable location to open a restaurant in New York City.
  • Finally used Folium Library to create maps of geospatial data for data visualization.

Android Application for Neural Artistic Style Transfer
Sep 2018 - Dec 2018 | New York NY, USA

  • Developed a server-based application for Neural Artistic Style Transfer using Tensorflow.
  • Optimized style transfer implementation deployed on a server using TFLite.
  • Developed an android application for the style transfer of real-time photos using Android Studio.

Sound-sensing Navigating Robot
Jan 2018 - May 2018 | New York NY, USA

  • Designed and developed 4WD navigating robot based on sound-sensitivity.
  • Further implemented Obstacle Detection module and also Alternate Path Algorithm to drive through obstacles.

Crop Recommendation System
Aug 2016 - Jul 2017 | Bangalore KA, India

  • Designed a real-time recommendation system for farmers based on multiple linear regression; provided info about the most suitable crops for cultivation depending on forecasted weather and soil condition at a given location.
  • Identified, measured and recommended improved strategies for better performance of the system.
  • Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.
  • The designed model is expected to increase the crop yield at least by 27% in southern parts of India.

UAV i.e., Quadcopter with FPV for Agriculture Applications
Aug 2016 - Jul 2017 | Bangalore KA, India

  • Designed a nonlinear controller for quadrotor and discussed the dynamically feasible trajectory.
  • Developed a planning & control stack for 3D Quadrotor with the ability to generate and track trajectories through waypoints.
  • Quadcopter with First Person View (FPV) technology for live Audio-Video Transmission.
  • Real-Time Image Processing to determine crop quality.

Smart Traffic Management System using Raspberry Pi
Jun 2015 - May 2016 | Hubli KA, India

  • Designed and Created a model to detect traffic density in real time with the help of local traffic police.
  • Developed a data acquisition model using ultrasonic sensors to sense traffic with the Raspberry Pi board as a controller.
  • Designed and developed a Printed Circuit Board of data acquisition model using Eagle.
  • Developed a traffic light switching system based on traffic density and implemented in a real-time scenario.
Achievements

Publications

Certifications

Awards and Accomplishments

  • Awarded with Excellence in Research Certificate by Electronics Association, BVBCET for outstanding research
    work in the field of Embedded Systems for Academic Year 2015-16May 2016
  • Successfully completed the International Internship on CAN Protocol with ARM Cortex M3Jul 2015
  • Participated and enrolled in a certified course, Global Immersion in Innovation & Entrepreneurship as a part of
    Student exchange program held at University of Massachusetts, Lowell, aimed at providing participants with
    technology, innovation and business development tools & techniquesJun 2014
  • Second Runner-Up in Indo-US Robo League organised by Robotics & Computer Applications, Institute Of USA
    and Technophilia Systems Mar 2014
  • Awarded with Certificate of Academic Excellence for securing first rank to the institution at
    State Pre-University Examinations May 2012