Experience



Python LogoARID Lab, DEPARTMENT OF PEDIATRICS
Data Science Manager       March 2024 - Present


1. Health Data Pipeline Management:

Spearheaded the development and maintenance of a robust Health Data Pipeline, overseeing the onboarding of clinical sites and their integration into the system. Additionally, collaborated with the University of Arizona's Center for Biomedical Informatics and Biostatistics (CB2) on various population health projects.

2. Process Optimization:

Established standardized protocols for data processing through CB2, streamlining data extraction, cleaning, and consolidation processes. Made data-driven decisions to optimize pipeline efficiency.

3. Data Transformation and Quality Assurance:

Led efforts in data transformation and adherence to the OHDSI CDM framework, implementing rigorous quality checks to ensure data integrity. Upheld regulatory compliance and best practices for data security.

4. Project Coordination and Communication:

Facilitated meetings and maintained clear communication with stakeholders, coordinating activities with internal and external partners. Cultivated collaborative relationships with clinical sites and research partners.

5. Continuous Improvement and Innovation:

Enhanced processes and methodologies by staying abreast of emerging trends in data science and healthcare informatics. Fostered a culture of creativity and innovation within the team.



Python LogoUNIVERSITY OF ARIZONA, DEPARTMENT OF PEDIATRICS
Graduate Research Assistant       February 2023 - December 2023


1. Thorough Data Extraction and Standardization in REDCap:

Extract data from various sources and transform them into standardized formats and build analytical project databases in REDCap, enhancing data accessibility and saving 15% of data preparation time. This involves a meticulous approach to ensure that data is accurately captured and formatted for further analysis.

2. Optimization of Database Programs for Seamless Querying:

Optimize database programs to facilitate efficient data querying processes; thereby enhancing the performance of database systems, allowing users to retrieve the required information rapidly and without encountering performance constraints.

3. Precise Data Cleaning and Linkage Protocols with Comprehensive Documentation:

Execute data cleaning and data linkage protocols with a high degree of precision. This involves identifying and rectifying inaccuracies, inconsistencies, and missing values within datasets. Also, maintain detailed work logs that document every step of the data refinement process, ensuring the reproducibility and bolstering the credibility of subsequent analyses.

4. Creation of Data Pipelines using Python and utilizing Amazon Athena:

Leverage Python scripting and utilize Amazon Athena, to design and implement data pipelines that transfer data from MariaDB to PostgreSQL, transforming and mapping the data to adhere to OMOP (Observational Medical Outcomes Partnership) table structures. Enforcing security protocols for data transfers, increasing data transfer speed by 30% compared to previous methods, ensuring system integrity, and maintaining 100% compliance with data protection regulations.

5. Efficient Integration of MTurk Tasks and REDCap Surveys with Data Analysis:

Manage the integration of MTurk tasks and REDCap surveys into a cohesive workflow. Conduct thorough data analysis using R programming after survey completion, aiding in faster comprehension and informed decision-making, and reducing post-survey processing time by 20%.

6. Statistical Analysis of Patient Data to Uncover Care Patterns and Demographics:
Conduct statistical tests and logistic regression on patient data to reveal insights like care trends, demographics, and factors impacting engagement. This approach identifies trends, correlations, and potential causal links, informing healthcare strategies and improving patient care.



Python LogoTATA CONSULTANCY SERVICES
Systems Engineer       March 2018 - July 2022


ETL experience working with Informatica PowerCenter 9.x:

1. Utilized Informatica Power Exchange and PowerCenter 9.x for data extraction, transformation, and loading from a range of sources—including mainframes, flat files, Teradata databases, and EDW systems—have been expertly executed.

2. Developed intricate ETL mappings, worklets, and reusable transformations has been a focus. These encompass filters, expressions, joiners, aggregators, and more, ensuring precise and purposeful data transformation with a 25% reduction in deployment time and improved efficiency.

3. Estahblished and maintained dynamic data pipelines to support business intelligence, reporting, and analytics needs. This approach guaranteed smooth data integration and processing and resulted in a 30% increase in efficiency.

UNIX and shell scripting for validation testing:

1. Developed UNIX shell scripts tailored for specific validation testing requirements. Leveraged these scripts to validate data integrity and accuracy within complex datasets. Additionally, customized PLSQL scripts to further enhance the validation process, ensuring comprehensive and reliable testing outcomes.

Performance tuning and optimization:

1. Identified and addressed performance bottlenecks within long-running CI/CD jobs. Employed advanced optimization techniques at various levels—source, target, mapping, and session—ensuring streamlined processes and efficient data flows.

2. Leveraged the power of cloud computing, particularly in AWS, to significantly optimize ETL operations. Achieved remarkable results with a 50% reduction in processing time, alongside heightened scalability, demonstrating a clear commitment to efficient and agile data processing.

3. Demonstrated proficiency in utilizing ETL tools to streamline data integration workflows. This approach led to a reduction in errors by 28%, enhancement of data quality, and 2.5% cost savings on infrastructure, showcasing an ability to balance effectiveness and efficiency in data management.
4. Engineered a retrofit solution leveraging contextual knowledge, resulting in the integration of 2.5 million customers for the client, and concurrently boosting performance and code execution speed by 30%.

Database object migration

1. Managed the migration of crucial database objects across a range of environments, including Development, Testing, UAT, and Production. Ensured a smooth transition of these objects, maintaining data integrity and consistency throughout the lifecycle. This approach facilitated multi-stage deployment, enhancing robustness and reliability.

Documentation and Project Tracking:

1. Engaged in collaboration with onshore and offshore data stewards as well as application development leads to ensure smooth project tracking through JIRA, resulting in a 20% increase in project visibility and coordination efficiency. This joint endeavor enabled effective communication, task handling, and progress oversight, resulting in coordinated project implementation.

2. Oversaw ETL code repositories through continuous integration techniques. Carried out comprehensive code evaluations to maintain rigorous code excellence and compliance with guidelines, guaranteeing the dependability and manageability of the ETL procedures, leading to a 15% decrease in defects and improved development cycle.

3. Demonstrated adeptness in articulating findings and insights to stakeholders in a clear and impactful manner. This effective communication approach facilitated informed decision-making and fostered a shared understanding of project progress and outcomes.

Leadership:

1. Offered valuable technical guidance and oversaw review processes in a dynamic and fast-paced setting, demonstrating the ability to work effectively with minimal supervision. Ensured high-quality outcomes and maintained governance standards despite the challenging environment.

2. Led a 12-member ETL development team, driving the design and implementation of data transformation processes for precise data integration into the warehouse, while prioritizing data quality and performance enhancements. Additionally, adeptly communicated project status updates to foster transparency and cohesion within the team, contributing to efficient workflows.

3. Provided expert guidance and comprehensive training to junior ETL developers. Covered essential ETL development methodologies, data modeling best practices, and fundamental data integration concepts. This mentoring effort empowered the team with essential skills and knowledge for optimized data workflows.