Applied Data Analytics Graduate
Technical Skills: Python, R, SQL, Bash Scripting, AWS, GCP, Apache Spark, Apache Kafka, Power BI, Linux Programming
Education
-
M.S, Applied Data Analytics |
Boston University (Jan 2024) |
-
B.S (Hons.), Computer Science |
Heriot-Watt University (May 2022) |
Work Experience
Graduate Data Science Research Assistant (Dec 2022 - Present)
Boston University Henry M. Goldman School of Dental Medicine
- Examined 40,000+ American’s tweets related to Vaccine and fluoride, to identify trends in beliefs.
- Reduced manual assigning of sentiments on tweets by 40%; created a ML pipeline to perform Preprocessing, fine-tuning a transformer model to automatically assign sentiments.
- Fine-Tuned an existing Transformer model to predict the sentiment of more than 40,000 tweets with an accuracy of 85%.
- Developed correlation matrices and Bar plots to assess the trends over 3 time periods: Pre Pandemic, During Pandemic and post pandemic.
- Skills: Python, Machine Learning, NLP, Predictive Modelling, Research Skills
Information Technology Help Desk Technician (Sept 2022 - Present)
Boston University Metropolitan College
- Resolved 200+ tickets within a span of 1 year with over 90% client satisfaction rate.
- Troubleshooted hardware and software issues such as Power BI, VMWare and Citrix Virtual Labs via ServiceNow ticketing system, phone calls and Bomgar remote desktop assistant.
- Trained a team of 4 new hirees to assist clients with various troubleshooting tasks and handling clients face-to-face.
- Skills: Team leading, Attention to detail, Customer-facing role, Management
Data Science Intern (Jun 2022 - Aug 2022)
Apparel Group
- Optimized the inventory for the Charles & Keith brand by developing a sales predictive algorithm; Reduced unused inventory by an average of 15% per store.
- Analyzed diverse data factors, such as customer demographics, store locations, seasonal trends, store sizes, and fashion cycles, to enhance the algorithm’s performance.
- Deployed a Sales Predictor Model, with an accuracy of 75% at the time of deployment, providing valuable insights and optimization solutions for the brand.
- Skills: Time Series Forecasting, Qualitative Analysis, Natural Language Processing (NLP), MicroStrategy, Microsoft Excel, PowerBi
Undergraduate Teaching Assistant (Sep 2021 - Jun 2022)
Heriot-Watt University
- Provided teaching and support to students in introductory computer science courses, including Java, Python and R.
- Graded student assignments and exams, and provided feedback to help students improve their learning.
- Led weekly discussion sections and helped students understand complex concepts and solve problems.
- Skills: Management, Research Skills, Git, Problem Solving, Python
Data Consultant (Nov 2020 - May 2022)
COGOS Technologies
- Analyzed data and created a platform for integrating operations post-acquisition at Cogos.
- Led machine learning projects for route and capacity optimization after the acquisition.
- Demonstrated expertise in data-driven decision-making and machine learning.
- Created a data warehouse and platform pre-acquisition.
- Skills: Project Management, MS Excel, GitHub, Open Source Softwares
Projects
Generating customer service responses using Hugging face LLMs
- Automated the process of responding to customer’s client support messages. The goal is to improve customer satisfaction and reduce the amount of time spent on customer support.
- Designed a framework to connect a front-end chat system to a Hugging Face LLM using Streamlit.
- Produced a Fine-tuned LLM with a large dataset (15 GB) and converted the response into an audio file using AWS Polly.
- Scaled the system with Apache Kafka, AWS S3 and Redis – for faster access and reduced latency
- Fine tuned the model on Google’s TPUs and hosted the rest of the framework on AWS, GitHub, and Redis Cloud.
Apache Kafka - Real-time Hate Speech analysis on Discord servers
- Designed a framework to listen to messages on subscribed Discord servers and analyze the type of hate speech.
- Scaled the system using Apache Kafka and Streamlit to accommodate multiple discord servers without changing any backend code.
- Hosted the code on GitHub and Streamlit for ease of access and usage.
Trends in American’s Beliefs about Fluoride from Twitter
- Conducted in-depth analysis of public sentiment regarding water fluoridation. Employing advanced techniques to extract 80,000 relevant tweets leveraging a Web Scraping tool via Digital Ocean VPC.
- Collected a subset of 1000 tweets for manual labeling; Later used for fine tuning a Transformer model (RoBERTa).
- Created a ML Pipeline to pre process the data, normalize and remove unnecessary tweets for finetuning the transformer model.
- Analyzed the sentiment of 40,000+ tweets with the Fine-Tuned RoBERTa model, enabling effective predictions on a large volume of unlabeled data.
- Developed correlation matrices by clustering topics derived from unstructured data sources.
Size Profile Optimization
- Forecasting size level demand at each store for any set of store-option pair.
- Using AI/ML methodologies to determine true demand and optimize inventory.
- The forecast was created by consuming BI reports and SQL dumps.
- A decision tree regressor was trained on historical sales data to understand trends and predict future sales.
- A frontend was developed using Streamlit to host the deployed model.
Stock price prediction using past stock prices and tweets
- Collected more than 5 years worth of stoick prices for the Alphabet stock ticker. Conducted predictive analysis on historical stock prices and public sentiment data to project future stock prices.
- Explored and evaluated the performance of 9 different Machine Learning models and three word embedding techniques, namely TF-IDF and “Bag of Words” for analysis.
- Implemented a data pipeline for daily Twitter data processing, normalization, and sentiment analysis.
- Identified “Bag of Words” as the optimal word embedding model and Support Vector Classifier (SVC) as the most effective classifier.
Ensemble Machine Learning Model for sentiment analysis
- Developed an ensemble Machine Learning model by combining two Deep Leaning models with TensorFlow and OpenAI; Utilized SpaCy and NLTK libraries to prepare the extracted data for seamless model training.
- Utilized Digital Ocean VPCs to establish connections with multiple data centers worldwide, allowing for the simultaneous extraction of terabytes of tweets for data acquisition.
- Orchestrated a robust data pipeline utilizing SpaCy and NLTK libraries to preprocess and prepare the extracted data for seamless model training.
- Evaluated the model with Accuracy, MCC Coefficient and other ML metrics to reveal an impressive accuracy of 77% for ensemble model, while the existing state-of-the-art model achieved approximately 85%.
Data Mining and Machine Learning - Portfolio
- Implemented various NLP techniques such as Naive Bayes, k-means clustering, hierarchical clustering, decision trees, and linear classifiers, all evaluated using a 10-fold cross-validation approach.
- Employed a comprehensive set of evaluation metrics, including Accuracy, F-Score, ROC/AUC curve, precision, and recall, to assess the effectiveness of each model.
Volunteering & Student Clubs
HW Tech Club
Design Director (May 2021 - Jun 2022)
- Managed a team of 5 students, providing guidance and mentorship on UI/UX design principles and tools.
- Organized and executed a university-wide competition for the best UI/UX frontend website, promoting creativity and innovation among students.
- Stayed up-to-date on the latest UI/UX trends and technologies, and shared knowledge with the team through regular presentations and discussions.
- Fostered a positive and supportive team environment, encouraging collaboration and creativity.
Cyber Security Analyst (Oct 2020 - Jun 2022)
- Developed and maintained a Virus scanner application using Python which checks for potential scamming websites via a chrome extension.
- Hosted and delivered educational YouTube talks to promote awareness on different types of Wireless Hacking and OSINT Technologies.
- Collaborated with other Tech Club members to organize and execute cybersecurity workshops and events.
- Kept up-to-date with the latest cybersecurity trends and technologies, and shared this knowledge with the Tech Club community.
- Provided cybersecurity support and guidance to Tech Club members and other students.
Creatives and Video Editor (Jul 2020 - May 2021)
- Created and edited engaging videos to promote the Tech Club’s events and initiatives.
- Collaborated with other creatives to develop and produce video content that was both informative and entertaining.
- Managed the Tech Club’s social media accounts and used video to create a strong online presence.
- Analyzed video performance data to identify areas for improvement.
- Stayed up-to-date on the latest video editing trends and technologies.
The Uplift Foundation (Jul 2020 - Jun 2021)
Graphic Designer
- Created brand guidelines for posting consistent content on social media accounts
- Spearheaded a team of 4 people to convert text to interative and eye-catching content.
Blog
- Predicting stock prices — a sentiment analysis approach
- Apache Kafka — Real-time Hate Speech analysis on Discord servers