Projects
Browse selected projects. Click a card to view a short summary, visuals, and links.
ML · Data Engineering Regulatory Risk Pipeline (Airflow · LLMs · T5 · LSTM)
Automated daily ingestion of GDPR/CCPA updates into a governed dataset with model-driven risk signals for compliance teams.
- Developed Airflow pipeline ingesting daily regulatory updates into BigQuery with data validation, audit logging, and lineage tracking.
- Fine-tuned T5 model to classify policy changes across multiple categories and built LSTM/Prophet models forecasting regulatory trends with under 15% error.
- Reduced manual monitoring effort by 75% through automated risk dashboards, experiment tracking, and CI/CD deployment with rollback capabilities.
Computer Vision · Generative AI Phenophase Image Analysis (ResNet-50 + GANs)
Built computer vision pipeline to classify plant leaf stages and forecast seasonal timing patterns across field sites.
- Trained deep learning model (ResNet-50) to classify leaf development phases, using data augmentation to address class imbalance.
- Validated model performance across multiple field sites to ensure predictions worked in different environmental conditions.
- Automated model evaluation and report generation to deliver results and performance metrics to research teams.
Research
Work in progress from the ARID Lab at the University of Arizona. Click to view a brief, non-confidential summary.
Healthcare Analytics · Large-Scale EHR Data Analysis · Multi-Site Data Harmonization Insurance at Birth & Infant Outcomes (EHR, multi-site)
Evaluated how payer type and care access influence infant survival using real-world multi-site EHR data and causal inference modeling.
- Built a large-scale cohort from CDC's CHD STAR surveillance system spanning 7 states, harmonizing EHR, claims, and vital statistics data with complex ICD-9/10 coding algorithms to classify 114K infants by CHD severity, insurance status, and 50+ clinical/demographic variables
- Applied multivariate logistic regression adjusting for confounders (CHD severity, socioeconomic status, geography, clinical factors) to reveal that uninsured infants had 2.65x increased odds of infant mortality (95% CI: 1.74-3.92, p < 0.001) compared to privately insured infants
- Identified critical disparities showing publicly insured infants had 11-21% increased odds of organ dysfunction morbidities (neurologic, respiratory, sepsis) despite similar survival, while uninsured infants exhibited paradoxical survival bias with highest mortality but lowest documented morbidity rates
Care Utilization Patterns · Risk Stratification · Longitudinal Analysis Healthcare Utilization & Guideline Adherence
Evaluated how closely patients with congenital heart disease received care aligned with AHA/ACC guidance, and which factors drove staying in care or returning after gaps.
- Built a 10-year, multi-site CHD cohort (50K+ records) harmonizing claims, encounters, and EHR data.
- Applied clustering and regression to identify utilization patterns and predictors of adherence.
- Delivered dashboards and policy briefs highlighting three care profiles and high-risk subgroups.