Innovation Challenge:
Data Science in Research & Development
This Challenge is
CLOSED

Key Dates
Challenge Starts
October 10, 2019
Solutions Due By
October 27, 2019
Virtual Pitch
November 19, 2019
Winners Announced
November 21, 2019

Data Science in Research & Development
From October 10th to 27th, 2019, over 1200 innovators signed up to create a model that predicts the shelf life of snack products. PepsiCo and the New York Academy of Sciences convened a diverse panel of creative problem-solvers to help identify a specific focus area where we could generate high-impact ideas. We invited solvers from the USA, Ireland and the UK to participate in the challenge for a chance to join the PepsiCo R&D Intern Team in the summer of 2020. The winning solution, created by Pallavi Gupta, was meticulously crafted following data exploration, data carpentry, and model training steps. You can learn more about the winning solution and the solver who designed it.
As studies have shown, snack products change as they age and ultimately taste different from the time they were packaged. There are many potential factors that may influence how product ages: the base ingredient, how a product is processed, the stability of the ingredients used when making the product, etc. A snack product’s shelf life is determined with a specific test protocol. As samples age, a sensory panel evaluates the aged small versus a fresh sample. The goal of this challenge was to develop a useful shelf-life model that would allow a product developer to predict shelf life based on the product, process, packaging information and storage conditions related to where the product will be sold.
The Challenge
PepsiCo and the New York Academy of Sciences invited solvers to develop a shelf-life model using a sample dataset. You can also learn more about the partnership between PepsiCo and the New York Academy of Sciences.
How It Works
After signing up to participate, solvers received a dataset comprised of 81 individual shelf-life studies across a wide variety of snack products. Solvers worked independently to develop a model that predicts product shelf life based on the available dataset. The finalists were invited to a Virtual Pitch Event where they presented their model in front of a panel of judges.
Want to be notified when our next challenge is announced?
Key Dates
Challenge Starts
October 10, 2019
Solutions Due By
October 24, 2019
Virtual Pitch
November 4, 2019
Winners Announced
November 21, 2019
Grand Prize Winner
Pallavi Gupta
To solve the issues found during data exploration, including unclassified and missing data, categorical features and the lack of balance distribution od data points, ideal data carpentry techniques were used to construct the machine learning model. Binary categorization of data was implemented for classification, one-hot encoding was applied for categorical variables and a smart sampling technique was used to train the model considering data imbalance. Eventually, Random Forest Classifier was used with a balanced fraction of data for training. Snack shelf-life predictions for the entire dataset were then successfully made using the pre-trained model.
Finalist
Jhansi Kurma
The target variable was converted into 1’s and 0’s, representing “Not expired” and “Expired” respectively, given 20 as the threshold value. After transformation, missing values are substituted as “Not Mentioned/Captured” to avoid biased results. The categorical variables then underwent one-hot encoding only to perform standardization. The data set was split into 20% of testing set and 80% of training set to go through different classification techniques: Logistic Regression, Logistic Regression with CV and SVM. The logistic Regression with CV was chosen to be the best algorithm to predict snack shelf-life