USDA Health Score Prediction Using Deep Learning

Hariharasudhan T

๐Ÿฅ— USDA Health Score Prediction

This project focuses on predicting USDA Health Scores for recipes using deep learning models, alongside comprehensive data analytics and visualizations. Built using PySpark for scalable data preprocessing and TensorFlow/Keras for model training, this pipeline explores the impact of data quality on prediction accuracy.

๐Ÿ“Œ Project Overview

๐Ÿ” Goal: Predict USDA Health Scores for recipes based on their ingredients and cuisine types.
๐Ÿ“Š Approach: Analyze and clean the recipe dataset using PySpark, extract features, and train a Neural Network using TensorFlow/Keras.
๐Ÿ“‰ Challenge: The dataset (sourced from Harvard Dataverse) was significantly incomplete, especially in the ingredient listings, leading to reduced model accuracy.
๐Ÿงผ Solution: Custom data cleaning pipelines were developed to handle missing and inconsistent data. Results showed improved prediction accuracy after rigorous preprocessing.

๐Ÿงฐ Tech Stack

Data Processing: Apache Spark (PySpark)
Machine Learning: TensorFlow + Keras
Visualization: Matplotlib, Seaborn
Notebook Environment: Jupyter Notebooks
Dataset Source: Harvard Dataverse

๐Ÿ“ˆ Key Features

โš™๏ธ Feature engineering from raw recipe data
๐Ÿงช Data cleaning and deduplication using Spark
๐Ÿฝ๏ธ Ingredient-level and cuisine-level distribution plots
๐Ÿง  Deep Neural Network model to predict USDA scores
๐Ÿ“‰ Accuracy degradation analysis due to incomplete data
๐Ÿ”„ Pipeline to improve dataset consistency and model performance

๐Ÿ“ Getting Started

Requirements

Python 3.8+
PySpark
TensorFlow / Keras
NumPy
Matplotlib
Like this project

Posted Apr 23, 2025

Predicted USDA Health Scores using deep learning and data analytics.

Swift Soft Jobs Project
Swift Soft Jobs Project
Hariharasudhan's Personal Portfolio Website
Hariharasudhan's Personal Portfolio Website

Join 50k+ companies and 1M+ independents

Contra Logo

ยฉ 2025 Contra.Work Inc