Shivam Lalakiya

Data Analyst, Grateful Patient Program - University Advancement
Washington University in St. Louis

Feb 2024 - Present

Associate Data Manager,
Invicro

Sep 2023 - Feb 2024

Developed an automated data pipeline with Apache Airflow for transferring pathological images, reducing data processing time by 20% and ensuring reliable, on-time data delivery for research studies.

Engineered and deployed Computer Vision Models, resulting in a 10% improvement in pathological image analysis and bio-marker Identification, translating into 15% of more study sponsors.

Established data quality control procedures, validation protocols, and data auditing frameworks to improve data precision and ensure conformity with industry benchmarks, resulting in dependable research outcomes and reports.

Collaborated with lab scientists to automate and standardize slide and file naming, eliminating errors by 17%.

Data Science co-op,
Ring Therapeutics

May 2022 - Present

Designed, built, validated, and deployed Machine Learning Models and software tools for analyzing and optimizing Anello therapeutics.

Provided analytical insights regarding binding sites, tissue, and cell specificity from sequenced data collected by the Discovery team from the patients' protein sequences to enhance the virology team's outcome and develop Anello-backed programmable medicine.

Leveraged NLP-based models for genes/protein sequences to predict tropism to create viral vectors that can safely and effectively deliver therapies to target cells and tissues.

Implemented positional encoded models to find the tissue-specific motifs and predict the tropism with important binding site positions.

Developed Graph Neural Network to find essential features for Anello protein sequence from whole genome sequencing data for improving tropism using TensorFlow and TensorFlow on CUDA using high-performance computing cluster environments.

Reported to the head of the Genomics team and worked closely with the drug discovery and platform teams to analyze biological data from various sources and build machine learning-based tools for drug vector design.

Built ETL Docker containers and pipeline to load experimental data into fasta format using Airflow DAG to eliminate manual data loading to the server.

Engaged with cross-functional project teams and external collaborators to support data-driven biological modeling using Statistics and Data Science.

Created interactive front-end for company-wide utilization of developed ML models and functionalities implemented by the Data Science Team.

Course Assistant,
Northeastern University

January 2022 - May 2022

Collaborated with Prof. Milad Siami to design the assignment for Introduction to Distributed Intelligence.

Guided 50 students with course and conducted 5 office hours to ensure learner success and course completion.

Research Assistant,
IIT Madras
(Code: GitHub)

April 2022 - September 2020

Developed and executed the 'Caching using Deep Learning' project, focusing on time-series prediction of user preferences by leveraging RNNs and LSTM-based models.

Preprocessed and filtered a custom 12GB dataset using Pandas, ensuring the effective training of deep learning models to optimize caching policies.

Designed and implemented an LSTM-based caching policy, employing RNNs to accurately predict future user requests and preferences with a success rate of 90%.

Conducted extensive benchmarking and performance analysis, demonstrating a 130% improvement in hit rates compared to traditional caching policies such as LIFO, LRU, and LFU.

Collaborated with team members to integrate the LSTM-based caching policy into the existing system, contributing to significant performance enhancements and improved user experience.

Continuously monitored and refined the deep learning models, ensuring optimal performance and accuracy in predicting user preferences for caching purposes.

Documented and presented the project findings, showcasing the effectiveness of the deep learning approach in outperforming traditional caching policies.

Data Science Intern,
SenseGrass Inc.

January 2020 - April 2020

Conducted extensive crop yield prediction and disease identification research using remote sensing techniques.

Developed and tested machine learning models to predict crop yield and identify diseases using satellite images and vegetation indices.

Leveraged Python and machine learning libraries such as scikit-learn and TensorFlow to develop and train predictive models.

Tested and refined the models using real-world data from farms that implemented computer vision and remote sensing.

Achieved a 90% accuracy rate in predicting crop diseases and yields, significantly improving the efficiency and profitability of the farms that implemented the developed models.

Successfully connected the farm with the machine, enabling real-time monitoring and control of irrigation, fertilization, and other critical processes.

Research Assistant,
NIT Surat

July 2019 - December 2019

Conducted a project entitled "Multimodal Biometric System," combining Iris, Facial, Speech Recognition, and fingerprints using Convolutional Neural Networks (CNNs).

Integrated the various biometric factors to achieve an 85% precision rate on multiple datasets and create a reliable biometric system.

Optimized the biometric model by fine-tuning the hyperparameters, resulting in a 10% increase in accuracy.

Developed a data preprocessing pipeline that cleaned and transformed the data before feeding it into the model, resulting in a 20% improvement in its performance.

Conducted exploratory data analysis to understand the data distribution and identify any outliers or anomalies.

Deployed the model on a cloud-based platform for real-time biometric authentication, leveraging AWS and GCP.

Conducted rigorous testing and validation to ensure the biometric system's reliability and accuracy, achieving an F1 score of 0.87 and a precision of 0.91.

Applied the model to the department's attendance system, saving 15 minutes for taking attendance in each class daily, and implemented it in the professor's cabin to enhance security.

Research Intern,
DAIICT, Gandhinagar

May 2019 - May 2019

Implemented a Kullback Leibler Divergence-based support vector machine(SVM) for speech spoofing detection, significantly increasing accuracy from 70% to 85%.

Utilized Mel-frequency cepstral coefficient (MFCC) feature extraction to retrieve essential information from speech samples of approximately 2GBs. The extracted features were mapped in higher dimensions using kernel functions, allowing for improved separation between genuine and spoofed speech samples.

Optimized hyperparameters of the SVM model through a grid search method, further improving model accuracy.

Conducted rigorous data preprocessing, including noise reduction and feature scaling, to improve the quality of speech inputs for machine learning models.

Developed and tested various machine learning speech and voice recognition models, including hidden Markov models and neural networks.

Collaborated with a team of researchers and engineers to develop innovative speech and voice recognition solutions, including speech-based virtual assistants and speech-enabled devices.

Remained current with the latest advancements in machine learning for speech and voice recognition by attending conferences and reading research papers.