Quora Question Pairs - Predicting Duplicate Questions

(Code: GitHub )

March 2023

  • Developed a natural language processing model that achieved a 90% accuracy rate on predicting whether two questions have the same meaning by training Word2Vec embeddings and a Long Short-Term Memory (LSTM) model on a dataset of 400,000 Quora question pairs.
  • Conducted extensive exploratory data analysis on the Quora question pairs dataset, utilizing various data visualization techniques to identify patterns and correlations in the data and improve the model's accuracy rate.
  • Pre-processed the Data and Fine-tuned the hyperparameters of the Word2Vec embeddings and LSTM model by conducting extensive experiments and utilizing techniques such as grid search and random search, resulting in a highly optimized and accurate model.
  • Deployed the web application to a cloud server using AWS Elastic Beanstalk and Docker technologies, ensuring scalability and reliability for many concurrent users.
  • Utilized various machine learning interpretability techniques, such as SHAP values and LIME, to gain insights into how the model made its predictions and improve its transparency and interpretability.
  • Stock Market sentiment and time series analysis,
    Northeastern University

    (Code: GitHub )

    May 2022 - August 2022

  • Extracted recent 30 days' stock data from WSJ and historical prices from Tiingo API using ticker symbols to store them in CSV format.
  • Developed the Naïve-Bayes model to predict trends and future prices using the LSTM model with more than 90% accuracy.
  • Created interactive front-end utilizing Streamlit library to find insights into any listed stock and reach a mass audience.
  • Flight booking website,
    Northeastern University

    (Code: GitHub )

    January 2022 - May 2022

  • Created a flight booking website that fetches live flight ticket information from sky scanner API using React.JS, NodeJs for backend infrastructure containing controllers, and MongoDB to persist data.
  • Built User, Admin, and Airline modules with REST APIs for performing different CRUD operations like creating and updating user, flight, airline information, as well as creating and applying deals to different flights for different users.
  • Deployed the React framework in Netlify and the NodeJs portion in Heroku and also dockerized the application to run in a remote environment and support CI/CD pipeline.
  • Analytics and Visulization using R programming and Tableau,
    Northeastern University

    (Code: GitHub )

    September 2021 - December 2021

  • Performed clustering, probabilistic analysis, and text mining to find insights on University, crimes in India, and E-commerce datasets.
  • Derived the most affecting factors for ranking the top 100 universities and reasons behind the changes in hierarchy.
  • Obtained the correlation between discounts, sales, and profit for e-commerce websites and concluded that festive seasons have 35% higher sales and non-festive seasons have 30% higher deals.
  • Built the Tablue Dashboards to analyse the speed, price, and penetration of the internet across the world, also created the interactive google site deep dive into happiness across the world.
  • Hyperspectral Image classification using Deep Learning, NIT Surat

    (Code: GitHub )

    July 2021- May 2021

  • Collaborated with Dr. Jigish Patel to implement Hyperspectral Image Classification using CNNs, GANs, and PCA.
  • Worked on PCA and k-PCA to reduce the time and space complexity by 50%, projecting higher dimension data in reduced dimensions using kernel methods.
  • Compared the existing CNN algorithm with PCA-CNN and obtained at least 30% less training time and complexity.
  • GANs based modulation recognition,
    NIT Surat

    July 2020 - December 2020

  • Drafted a seminar report, including GANs based semi-supervised techniques to recognize modulation.
  • Delivered exceptional results using almost 50% smaller dataset to generate samples and used for recognition.