Support The World's Smartest Network
×

Help the New York Academy of Sciences bring late-breaking scientific information about the COVID-19 pandemic to global audiences. Please make a tax-deductible gift today.

DONATE
This site uses cookies.
Learn more.

×

This website uses cookies. Some of the cookies we use are essential for parts of the website to operate while others offer you a better browsing experience. You give us your permission to use cookies, by continuing to use our website after you have received the cookie notification. To find out more about cookies on this website and how to change your cookie settings, see our Privacy policy and Terms of Use.

We encourage you to learn more about cookies on our site in our Privacy policy and Terms of Use.

Innovation Challenge:

Data Science in Research & Development

This Challenge is

CLOSED

Data Science in Research & Development

Key Dates

Challenge Starts

October 10, 2019

Solutions Due By

October 27, 2019

Virtual Pitch

November 19, 2019

Winners Announced

November 21, 2019

Data Science in Research & Development

Data Science in Research & Development

From October 10th to 27th, 2019, over 1200 innovators signed up to create a model that predicts the shelf life of snack products. PepsiCo and the New York Academy of Sciences convened a diverse panel of creative problem-solvers to help identify a specific focus area where we could generate high-impact ideas. We invited solvers from the USA, Ireland and the UK to participate in the challenge for a chance to join the PepsiCo R&D Intern Team in the summer of 2020. The winning solution, created by Pallavi Gupta, was meticulously crafted following data exploration, data carpentry, and model training steps. You can learn more about the winning solution and the solver who designed it.

As studies have shown, snack products change as they age and ultimately taste different from the time they were packaged. There are many potential factors that may influence how product ages: the base ingredient, how a product is processed, the stability of the ingredients used when making the product, etc. A snack product’s shelf life is determined with a specific test protocol. As samples age, a sensory panel evaluates the aged small versus a fresh sample. The goal of this challenge was to develop a useful shelf-life model that would allow a product developer to predict shelf life based on the product, process, packaging information and storage conditions related to where the product will be sold.

The Challenge

PepsiCo and the New York Academy of Sciences invited solvers to develop a shelf-life model using a sample dataset.  You can also learn more about the partnership between PepsiCo and the New York Academy of Sciences.

How It Works

After signing up to participate, solvers received a dataset comprised of 81 individual shelf-life studies across a wide variety of snack products. Solvers worked independently to develop a model that predicts product shelf life based on the available dataset. The finalists were invited to a Virtual Pitch Event where they presented their model in front of a panel of judges.

Want to be notified when our next challenge is announced?

Subscribe >

Sponsor

Key Dates

Challenge Starts

October 10, 2019

Solutions Due By

October 24, 2019

Virtual Pitch

November 4, 2019

Winners Announced

November 21, 2019

Grand Prize Winner

Pallavi Gupta

To solve the issues found during data exploration, including unclassified and missing data, categorical features and the lack of balance distribution od data points, ideal data carpentry techniques were used to construct the machine learning model. Binary categorization of data was implemented for classification, one-hot encoding was applied for categorical variables and a smart sampling technique was used to train the model considering data imbalance. Eventually, Random Forest Classifier was used with a balanced fraction of data for training. Snack shelf-life predictions for the entire dataset were then successfully made using the pre-trained model.

Finalist

Jhansi Kurma

The target variable was converted into 1’s and 0’s, representing “Not expired” and “Expired” respectively, given 20 as the threshold value. After transformation, missing values are substituted as “Not Mentioned/Captured” to avoid biased results. The categorical variables then underwent one-hot encoding only to perform standardization. The data set was split into 20% of testing set and 80% of training set to go through different classification techniques: Logistic Regression, Logistic Regression with CV and SVM. The logistic Regression with CV was chosen to be the best algorithm to predict snack shelf-life

Want to be notified when our next challenge is announced?

Subscribe >