Projects
-
Data Science & Machine Learning:
- ๐ฟ Identifying Leaf Phenology
- ๐ Breast Cancer Prediction
- ๐ณ Credit Card Fraud Detection
- ๐ฐ Salary Prediction
- ๐ท Image Classifier Using CNN Data Engineering & Data Analytics:
- ๐ House Price Profiler using Snowflake Database
- ๐ฅ Youtube Data Pipeline using Apache Airflow
- ๐ Uber Data Analytics
Phenophase Image Analysis
December 2023
The intricate relationship between vegetation phenology and ecosystem functions shapes key ecological processes, influenced by climate-induced asynchrony. Phenological events, known as Phenophase, tie to seasonal transitions and precipitation. Consequences include disruptions like food scarcity and insect population growth. Despite predictive models, uncertainties persist in aligning Phenophase across species.
Leaf Growth and Dormancy phase for different years:
GAN Architechture:
To address this, tools like PhenoCam monitor phenological variations, prompting a demand for automated methods. This project develops tools to recognize Phenophase changes and predictive models for leaf phenology in deciduous broadleaf forests. Unlike single-site methods, it enhances prediction accuracy across multiple sites, focusing on the start (SOS) and end (EOS) dates of leaf growth using PhenoCam images. Steps include data labeling, preprocessing, ResNet50 CNN model development, and Generative Adversarial Networks (GANs) incorporation, aiming to anticipate key ecological events related to leaf growth.
YOUTUBE DATA PIPELINE USING APACHE AIRFLOW
October 2023
This YouTube Data ETL with Airflow project automates the extraction, transformation, and loading of data based on YouTube channels IDs. It uses the YouTube Data API, transforms the data, and stores it in destinations like Amazon S3. Apache Airflow schedules and orchestrates the ETL process, ensuring the data is up-to-date and reliable for analysis.
The project consists of components for data extraction, transformation, loading, Airflow integration, and error handling. Its key objectives are to automate data retrieval, ensure data cleanliness, offer flexibility in storage, provide a dependable ETL process with Airflow integration, and offer a customizable framework. In summary, it simplifies YouTube channel data ETL, benefiting data enthusiasts, analysts, and engineers.
HOUSE PRICE PROFILER USING SNOWFLAKE DATABASE
October 2023
Executed a House Price Profiler Project using Snowflake Database, resulting in the extraction of over 60,000 data entries from the Otodom website. Leveraging Bright Data, the project achieved an impressive 95% data extraction accuracy. The data was stored in Snowflake, and flattening operations reduced query response times by 40%. The dataset was enriched through the successful conversion of latitude and longitude to physical addresses for all entries, and translation of 95% of Polish texts to English using Google Translator.
The project successfully answered 11 pivotal business questions, yielding actionable insights for stakeholders, resulting in enhanced decision-making capabilities. This project highlights skills in data extraction, transformation, and database management, showcasing a holistic approach to problem-solving in data analysis.
BREAST CANCER PREDICTION
October 2023
Executed a breast cancer prediction project, crafting two models: a Logistic Regression model delivering 92.9% accuracy and a superior Neural Network model achieving 97.3% accuracy. This initiative stemmed from a dedication to enhancing early detection. By employing versatile modeling techniques and precise parameter optimization, they highlighted the transformative potential of data-driven solutions in healthcare, emphasizing the criticality of accurate early detection.
For the Logistic Regression Model, utilized historical patient data, selected relevant features, conducted thorough data preprocessing, fine-tuned model parameters for improved performance, ultimately achieving a 92.9% prediction accuracy, a significant advancement in early detection. In the case of the Neural Network Model, implemented the model using TensorFlow and Keras, experimented with multiple architectures and activation functions, closely monitored training progress, and adjusted hyperparameters to attain an impressive accuracy of 97.3%, indicating a substantial improvement in predictive capabilities.
UBER DATA ANALYTICS
August 2023
Executed a comprehensive end-to-end data engineering project leveraging a real-world Uber dataset to demonstrate my expertise in data handling and analysis. Employed Google Cloud Storage to efficiently manage data extraction and transformation processes. Successfully processed and loaded data using Mage ETL, achieving an impressive average extraction rate of 500 records per second.
Performed complex analytical queries on the enriched dataset using BigQuery, showcasing my ability to derive meaningful insights from vast datasets. Developed and optimized queries that consistently yielded results within seconds, contributing to efficient data-driven decision-making. Answered crucial business questions, such as demand patterns and peak hours, enhancing my analytical capabilities.
Implemented an interactive and intuitive Looker dashboard that translated raw data into actionable visualizations. The dashboard facilitated dynamic exploration of key performance indicators, resulting in a 30% increase in data accessibility for stakeholders. This project not only strengthened my technical skills but also highlighted my proficiency in creating user-friendly data representation tools.
Looker Dashboard
CREDIT CARD FRAUD DETECTION
August 2023
In the realm of digital transactions, security is paramount. My project focuses on creating a robust credit card fraud detection model using advanced machine learning algorithms.
The standout model boasts an accuracy rate that sets a new standard for fraud detection.Machine learning algorithm such as Decision Tree, Logistic Regression, Random Forest and Naive Bayes, have been employed, and obtained an impressive accuracy rate of 99% for the best model.
Later, the data imbalance was tackled using SMOTE Technique. Each algorithm was fine-tuned for optimal performance and further achieved a 10% increase in the accuracy.
SALARY PREDICTION
July 2023
The objective of this project is to construct a model for salary prediction contingent on years of experience.
Through meticulous refinement and adept utilization of the Gradient Descent technique, an exceptionally efficient model was constructed. The core of the accomplishment rests in the notable decrease of the Mean Square Error (MSE). The model rapidly reduced the MSE from a substantial 91.2% to an impressive 6.3%.
DESIGN AND IMPLEMENTATION OF AN IMAGE CLASSIFIER USING CNN
December 2022
In the realm of cutting-edge technology, the project embarked on a journey of unraveling the potential of deep convolution networks for largescale image classification. Demonstrating precision, the project culminated in an impressive accuracy rate of 91.21%.
Employing a sophisticated arsenal of Python libraries, including NumPy, Pandas, and PyTorch, the project navigated the complex landscape of image classification.