Projects
Browse selected personal projects. Click a card to view a short summary, visuals, and links.
Computer Vision Phenophase Image Analysis (ResNet-50 + GANs)

Detect leaf phenophase from PhenoCam images and forecast SOS/EOS across sites with augmentation for rare phases.
- ResNet-50 classifier; GANs for data scarcity
- Cross-site generalization beyond single camera tuning
- Calendar-level SOS/EOS with confidence bands
Data Engineering YouTube Data Pipeline with Apache Airflow

Config-driven ETL from YouTube API to S3/Snowflake with Airflow orchestration.
- Incremental loads, retries, schema checks, logs
- Idempotent upserts; downstream content analytics
Analytics House Price Profiler on Snowflake

60k+ listings scraped and standardized; modeled price drivers and sensitivity.
- Flattened JSON → ~40% faster queries; full geocoding
- Answered 11 key business questions
ML Breast Cancer Prediction (LR vs. NN)

Compare classical and deep approaches for early detection on tabular features.
- LR 92.9% acc; Keras NN 97.3% acc
- SMOTE for imbalance; calibrated probabilities
DE & BI Uber Data Analytics (GCS · Mage · BigQuery · Looker)

End-to-end pipeline to BI with KPI queries returning in seconds.
- Stakeholder dashboard for demand peaks & supply gaps
ML Credit Card Fraud Detection

Imbalanced classification with SMOTE and model comparison (DT, LR, RF, NB).
- Best model ~99% accuracy; +~10% after rebalancing
Regression Salary Prediction (Gradient Descent)

From baseline to tuned GD with strong MSE reduction and clear diagnostics.
- MSE reduced from 91.2% → 6.3% with scaling & step tuning
Research
Work in progress from the ARID Lab at the University of Arizona. Click to view a brief, non-confidential summary.
Causal Inference Insurance at Birth & Infant Outcomes (EHR, multi-site)

Causal/logistic modeling to assess payer-type effects on infant survival using multi-site EHR.
- Disparities: uninsured/self-pay highest risk; Medicaid ~10% survival gain; private ~70% strongest protection
- Reproducible pipelines for subgroup analyses and equity-focused reporting
Health Analytics Healthcare Utilization & Guideline Adherence

ML/statistical models on 50k+ records to evaluate compliance and long-horizon utilization patterns.
- Identified 3 distinct utilization patterns + top 5 predictors of continuous care
- Analyzed 10-year trajectories across pediatric/adult cohorts; equity-focused insights and scalable sequence models