Ankit Tripathi

नमस्ते !!

ANKIT TRIPATHI

I'm a

About Me

About Ankit Tripathi

Hello, I'm Ankit Tripathi 👋, a dedicated data scientist and software engineer with a passion for transforming complex data into actionable insights 🔍. My journey in the world of data began with intense curiosity and has evolved into a career of continuous learning and innovation.

I hold a Master’s in Applied Data Science from USC 🎓 and a Bachelor’s in Electronics and Telecommunication from Dwarkadas J Sanghvi College of Engineering, Mumbai. With hands-on experience at organizations like Easley Dunn Productions, KPMG, and USC’s Muin Research Group, I've honed my skills in deep learning, data mining, and machine learning to solve real-world problems with creative, data-driven solutions.

Beyond my technical pursuits, I'm a cricket lover 🏏, proud Potterhead ⚡, and beach enthusiast 🏖 who thrives on collaboration and new challenges. Thank you for visiting my website and taking the time to learn about me.

SKILLS

Languages:

Python

PYTHON

R

R

HTML5

HTML5

CSS3

CSS3

JavaScript

JAVASCRIPT

MATLAB

MATLAB

Databases:

MySQL

MYSQL

SQLite

SQLITE

SQL Server

MICROSOFT SQL

PostgreSQL

POSTGRE SQL

MongoDB

MONGODB

Firebase

FIREBASE

Frameworks & Libraries:

LangChain

LANGCHAIN

PyTorch

PYTORCH

Keras

KERAS

TensorFlow

TENSORFLOW

Scikit Learn

SCIKIT LEARN

D3.js

D3.JS

Cloud/Tools:

Azure

AZURE

AWS

AWS

GCP

GCP

Databricks

DATABRICKS

Git

GIT

Airflow

AIRFLOW

R Studio

R STUDIO

Tableau

TABLEAU

Power BI

POWER BI

Docker

DOCKER

Experience

2024-Present

Easley Dunn Productions, Inc. – Los Angeles, USA

Software Engineer Intern | December 2024 – Present

  • Collaborated with IT and data analytics teams to execute ETL processes using Microsoft SQL Server on AWS data, enhancing ingestion, accuracy, and efficiency by 50%.
  • Leveraged SQL and Tableau to develop real-time dashboards for in-app analytics that informed targeted ad creation—reducing decision-making time by 40% and increasing player engagement by 60%.
  • Enabled Robot Race game newsletter by integrating Google Firebase for subscriber storage with Google Analytics 4 for real-time demographic insights, driving a 25% boost in engagement.
2024-2025

KPMG – Los Angeles, USA

Data Scientist | August 2024 – January 2025

  • Engineered a Gen AI-powered Q&A chatbot for financial document retrieval using Python, LangChain, OpenAI GPT, YOLO, and Azure AI Search.
  • Created Databricks pipelines on Azure to process machine- and non-machine-readable PDFs by implementing semantic chunking with Azure Document Layout Model, increasing data retrieval accuracy by 30%.
  • Developed a natural language-to-SQL chatbot (NL2SQL) using LangChain, Python, and Microsoft SQL Server to enable conversational database querying; improved system accuracy by 30% through dynamic few-shot learning.
  • Enhanced call volume forecasting by implementing an ensemble model using BiLSTM and Prophet, boosting accuracy by 30%, reducing MAPE by 14%, and automating daily forecasts to optimize staffing.

Muin Research Group (MRG) – USC, Los Angeles, USA

NLP Research Analyst | May 2023 – August 2024

  • Orchestrated an end-to-end RAG application for University Professors using LangChain, OpenAI API, and Streamlit—reducing processing time by 30%.
  • Designed a Streamlit interface and integrated FAISS for custom vector database storage, achieving 40% faster retrieval speeds and reducing teachers’ workload by 25%.
  • Led development of an Apache Airflow workflow to preprocess and store 110K tweets from the Twitter API using Tweepy and Python, saving CSVs on AWS S3 and boosting storage efficiency by 50%.
  • Deployed a RoBERTa-based transfer learning model with PyTorch and Scikit-learn for temporal sentiment analysis, identifying 100 vulnerable communities from post-disaster social media data.
  • Utilized NLTK, boto3, and Pandas for data processing and analysis, reducing processing time by 35% and enhancing decision-making with Tableau visualizations.
2023-2024

Projects

GCP Employee Churn Prediction

GCP Employee Churn Prediction

Engineered an end-to-end employee churn prediction solution on Google Cloud using BigQuery, Colab with Python, PyCaret for AutoML, and a dashboard in Looker Studio—enhancing data management efficiency by 45% and prediction accuracy by 35%.

GitHub Repository
Fake News Detection

Fake News Detection

Leveraged the LIAR dataset to validate models using TensorFlow and Keras with architectures including Gated RNN, BiLSTM, BERT, and RoBERTa—achieving 75% binary classification accuracy and a 19.05% improvement over baseline.

GitHub Repository
Smart Choice – Movie Recommendation System

Smart Choice – Movie Recommendation System

Engineered a Hadoop-like file system for 125K rows from Netflix, Amazon, and Disney—reducing compute time from 100ms to 23ms. Implemented adaptive Partitioned MapReduce with Firebase/MySQL, boosting efficiency by 70%, and launched a colorblind-friendly Django app for personalized movie recommendations.

GitHub Repository
Starbucks Analysis

Starbucks Analysis

Built an interactive dashboard using d3.js and Mapbox to analyze Starbucks store distribution. Identified optimal store locations by correlating population density and median income, aiding strategic decision-making.

GitHub Repository
Data Science Salaries Analysis

Data Science Salaries Analysis

Built an interactive Tableau dashboard to analyze global data science salary trends across experience levels, job roles, and employment types. Visualized salary distributions, company sizes, employee locations, and industry patterns, providing insights for professionals and recruiters to make informed career decisions. 📊💼

Dashboard
London Bike Rides Analysis

London Bike Rides Analysis

Created an interactive analysis of London bike ride patterns, examining the impact of weather conditions using Python for data processing and Tableau for visualization. Implemented dynamic filters, heatmaps, and bar charts to uncover trends in cycling behavior based on time and weather. 🚴‍♂️📊

Dashboard

Education

University of Southern California, Los Angeles

Master of Science in Applied Data Science | Aug 2022 – May 2024

GPA: 3.8/4.0

Coursework: Deep Learning, Data Mining, Data Management, Machine Learning, User Research Studies, Data Visualization, Fairness in AI

Dwarkadas J Sanghvi College of Engineering, Mumbai

Bachelor of Engineering in Electronics and Telecommunication | Aug 2018 – May 2022

GPA: 9.39/10

Coursework: Applied Statistics, Data Warehouse & Modelling, Predictive Analytics, Descriptive Analytics & Big Data Analytics

Get in Touch