projects

- newer stuff -

Some recent projects that I’ve worked on…

*For a more frequently updated list of projects, see my Github repo here.

math to code converter

Math2Code: Engineered the core architecture and led a team of fellow machine learning & data science students in designing a web application that takes in a picture of a math equation / notation and automatically converts that into its Python/Numpy equivalent of working code. Presented a working version and demo in front of a VC panel along with 5 other finalists and their products from our masters program. Code documentation found here.

Used: Optical Character Recognition (OCR), Mathematical Syntax Analysis, Lexical Analysis & Parsing, Code-Generation & Templating, AWS (S3, RDS, ElasticBeanstalk, CodePipeline, Route53), Flask, Bootstrap4, Jinja2, Sphinx Documentation, Javascript, HTML, CSS, Google Analytics, Git Continuous Integration

medical diagnosis

Built a custom reinforcement learning environment + series of models that predicts and classifies hospital-acquired Sepsis (aka - blood-poisoning) in patients at each hour using multivariate timeseries data of patient vital signs and lab results (ex - heart rate, O2 saturation, temperature, mean arterial pressure, serum white blood cell count, etc) using OpenAI Gym.

Sepsis is a life-threatening condition that arises when the body’s response to infection causes injury to its tissues and organs. It is the most common cause of death for people who have been hospitalized, and results in a $15.4 billion annual cost in the US. Early detection and treatment are essential for prevention and a 1-hour delay in antibiotic treatment can lead to 4% increase in hospital mortality.

Project was built with my multi-talented partner, Zachary Barnes. Some of the algorithms and policies we tried include: Proximal Policy Optimization Algorithm + Multi-Layer Perceptron, Proximal Policy Optimization Algorithm + LSTM, Synchronous, deterministic variant of Asynchronous Advantage Actor Critic + LSTM, and Deep Q-Networks + LSTM. See here for our code, analysis, and tutorial on how to setup and run this on your local machine.

Used: OpenAI Gym, TensorFlow, Bayesian Optimization, Keras Preprocessing

bioinformatics

As part of our Genomic Data Science Series of tutorials, my friend Nick Parker and I analyzed microbiome genetic sequencing data from simple swabs of surfaces in different cities to reverse engineer the location of where these swabs came from without knowing the location beforehand. We investigated the genetic differences of these communities of microorganisms using unsupervised machine learning and visualization methods based on various dimensionality reduction techniques.

Used: Uniform Approximation and Projection (UMAP), t-Distributed Stochastic Neighbor Embedding (t-SNE), Principal Component Analysis (PCA)

oceanography

Project Argo: Built various machine learning models such as spectral clustering, k-means, & gaussian mixture models ran on PySpark’s distributed computing framework to cluster temperature profiles of the fleet of Argo floating buoys around the world. This network of floats monitors temperature, salinity, currents, and bio-optical properties of the world’s oceans, providing sensor measurements for climate and oceanographic research. Data set found here. Our team’s initial project slide-deck found here.

Used: PySpark, AWS EMR/S3, Spectral Clustering, K-Means, Gaussian-Mixture Models, PCA

music & linguistics

Experimented with different machine learning classifiers for determining iambic pentameter in song lyrics and sonnets. Iambic Pentameter is a type of rhythm or meter in which five small groups of syllables called Iambs or “feet”, which in English are unstressed followed by stressed syllables, are found coupled together. Poem and sonnet data were obtained and cleaned from a variety of sources like Shakespeare, Keats, Frost, Shelley, and Jackson -scraped texts available on the web, while songs from artists such as Taylor Swift and the Backstreet Boys were obtained from Data.World.

Used: Text-Parsing, Natural Language Processing, Logistic Regression, Step-Wise Regression, Akaike Information Criteria + Bayesian Information Criteria, Wald Test, Deviance Chi-Squared Test, Cross-Validation, DFFITS, Cook’s Distance, Variance Inflation Factor, ROC Curves

charity

Designed a recommendation system in conjunction with PySpark’s Distributed Computing framework that matches classroom charity projects (teachers and their classroom requests) to the most probable donors nationwide. DonorChoose.org in partnership with Google helped provide this data set freely available on Kaggle. Our team’s project slide-deck can be found here.

Used: PySpark, AWS EMR/S3, Plotly

dark web markets

Dark Market Cocaine Price Prediction: Used various machine learning models to predict the bitcoin price of dark market cocaine. The data set was scraped by a third party and contains approximately 1,400 standardized product listings from Dream Market’s Cocaine category. Our team’s project slide-deck can be found here.

Used: Text Extraction, Feature Engineering, Simple Linear Regression, LASSO Regression (L1), Ridge Regression (L2), Random Forest Regressor, Cross-Validation, Scikit-learn’s Data Pipeline

- older stuff -

A hodge-podge of former stuff that I’ve been involved in…

(I mainly use this page to connect all the disparate links on the interwebs to one central location)

computational neuroscience

Built and ran simulations of a spiking neural network modeled with endocannabinoid retrograde signaling, GABA-mediated inhibitory, and excitatory Glutamate pathways found in the medial prefrontal cortex of mice. Attempted to investigate the changes in synaptic activity and firing patterns associated with cannabis use, brain development, and psychosis. This was done at the Hungarian Academy of Science as part of the Theoretical Neuroscience and Complex Systems group ran by Dr. Péter Érdi and mentored by both Dr. Zoltán Somogyvári and Mihály Bányai. See my undergraduate honors thesis here.

I also took part in the separate BSCS program studying neuroscience, cognitive science, and philosophy (as well as the Hungarian language) at Eötvös Loránd University. This experience allowed me to study, live, and explore Budapest and other European countries (my old travel blog can be found here).

Used: MATLAB, Spiking Neural Networks (from scratch)

electrophysiology

Researched endocannabinoid signaling mechanisms through recording and analyzing the electrical activity of populations of neurons in the brain slices of mice as part of the Korzus lab under the excellent mentorship of Dr. Jonny Lovelace. Also wrote a paper and did my undergraduate honors thesis that is now preserved in the school library’s cold dark basement.

Used: Population-level Electrophysiological Recording, MATLAB, ANOVA, Immunohistochemistry, Neuropharmacology, Behavioral Experiments

neural networks

Gave talks about the mathematics of neural nets and tapped into the collective hive-mind of people writing proofs. These Pacific Summer Unsolved Math Seminars are ran by Dr. Dana Clahane, a kind-hearted and inspiring mathematician who works on cultivating the mathematical curiosity in the surrounding Fullerton, CA community. He hosts talks given by both students, colleagues, and professional mathematicians. He also heads the local community college Putnam exam math training on a volunteer basis.

Used: LaTeX, Mathematical Proofs, Classical Neural Networks

theoretical chemistry

Explored stable molecular geometries and the steric interactions of sp and sp3 carbons in acyclic alkynes and nitriles as part of a group led by Dr. Thomas Morton. Performed the data analysis and wrote the methods and results sections of the paper which was then published in the undergraduate research journal and presented in the symposium talks and poster sessions.

Used: Ab Initio Geometric Optimization Methods, sp-sp3 Synclinal + Antiperiplanar Interaction Analysis

neuropharmacology

Studied the evolution of neurotoxin proteins in Agelenopsis aperta (the desert grass / funnel weaving spider) under the guidance of Dr. Michael Adams. Also assisted in the research of Dr. Do Hyoung Kim by dissecting fruit fly brains(!) as part of the effort to analyze the peptidergic neural networks underlying ecdysis. Provided additional help in the milking of neurotoxins produced by the emerald jewel wasp used to make zombie cockroaches (from which the wasp parasitically reproduces via the cockroaches in the style of the classic Alien movies!)

Used: Immunohistochemistry, Western Blotting, Proteomics, Molecular Biology methods, Invertebrate dissection

genetics

Learned common molecular biology and genetics techniques while studying the roles of the agamous transcription factor & wuschel genes in floral stem cell maintenance of the Arabidopsis thaliana (mustard plant) in Dr. Xuemei Chen’s lab under the excellent guidance of Rae Yumul.

Used: PCR, Primer Design, Illumina Sequencing, other basic molecular biology/genetic protocols

behavioral neuroscience

As part of the UCLA Comparative Cognition Lab directed by Dr. Aaron Blaisdell, I learned about reward probability and behavioral variability by building my own operant conditioning chamber (“Skinner box”), feeding rats cocoa puffs, and teaching pigeons video games :)

Used: Behavioral Experimentation, Instrumental + Operant Conditioning Protocols