USDA Health Score Prediction Using Deep Learning by Hariharasudhan TUSDA Health Score Prediction Using Deep Learning by Hariharasudhan T

USDA Health Score Prediction Using Deep Learning

Hariharasudhan T

Data Analyst

Data Scientist

ML Engineer

Matplotlib

PySpark

TensorFlow

Food & Beverage

🥗 USDA Health Score Prediction

This project focuses on predicting USDA Health Scores for recipes using deep learning models, alongside comprehensive data analytics and visualizations. Built using PySpark for scalable data preprocessing and TensorFlow/Keras for model training, this pipeline explores the impact of data quality on prediction accuracy.

📌 Project Overview

🔍 Goal: Predict USDA Health Scores for recipes based on their ingredients and cuisine types.

📊 Approach: Analyze and clean the recipe dataset using PySpark, extract features, and train a Neural Network using TensorFlow/Keras.

📉 Challenge: The dataset (sourced from Harvard Dataverse) was significantly incomplete, especially in the ingredient listings, leading to reduced model accuracy.

🧼 Solution: Custom data cleaning pipelines were developed to handle missing and inconsistent data. Results showed improved prediction accuracy after rigorous preprocessing.

🧰 Tech Stack

Data Processing: Apache Spark (PySpark)

Machine Learning: TensorFlow + Keras

Visualization: Matplotlib, Seaborn

Notebook Environment: Jupyter Notebooks

Dataset Source: Harvard Dataverse

📈 Key Features

⚙️ Feature engineering from raw recipe data

🧪 Data cleaning and deduplication using Spark

🍽️ Ingredient-level and cuisine-level distribution plots

🧠 Deep Neural Network model to predict USDA scores

📉 Accuracy degradation analysis due to incomplete data

🔄 Pipeline to improve dataset consistency and model performance

📁 Getting Started

Requirements

Python 3.8+

PySpark

TensorFlow / Keras

NumPy

Matplotlib

Like this project

Posted Apr 23, 2025

Predicted USDA Health Scores using deep learning and data analytics.

Likes

Views