Support The World's Smartest Network
×

Help the New York Academy of Sciences bring late-breaking scientific information about the COVID-19 pandemic to global audiences. Please make a tax-deductible gift today.

DONATE
This site uses cookies.
Learn more.

×

This website uses cookies. Some of the cookies we use are essential for parts of the website to operate while others offer you a better browsing experience. You give us your permission to use cookies, by continuing to use our website after you have received the cookie notification. To find out more about cookies on this website and how to change your cookie settings, see our Privacy policy and Terms of Use.

We encourage you to learn more about cookies on our site in our Privacy policy and Terms of Use.

Innovation Challenge:

PepsiCo Data Science

This Challenge is

CLOSED

PepsiCo Data Science

Key Dates

Challenge Began

September 25, 2020

Solutions Due

October 14, 2020

Virtual Pitch

November 9, 2020

Winners Announced

November 16, 2020

Create the Unexpected

From September 25th to October 14th, 2020, over 250 innovators from 38 different states across the U.S. signed up to participate in the challenge in which they were tasked to develop a model that predicts the effect of growing location, soil type, fertilizer, and crop parameters associated with growth and development on product assessment. PepsiCo and the New York Academy of Sciences invited solvers to participate in the challenge for a chance to join the PepsiCo R&D Intern Team in the summer of 2021. The winning solutions, created by Md Taufeeq Uddin and Blake Bullwinkel, were both meticulously crafted following data exploration, data carpentry, and model training steps. You can learn more about the winning solutions here.

The Challenge

Solvers were asked to use an analytics software of their choosing (including but not limited to R, Python, MatLab) to create a predictive model based on the Crop and Grain dataset provided by PepsiCo. Read the full challenge statement including the question and background here.

How It Works

Webinar

After signing up to participate, solvers were invited to register for the webinar A Recruiter’s Perspective: Leveraging STEM skills to meet the needs of Industry where they heard from a panel of PepsiCo R&D, data science and HR leaders on PepsiCo’s creative approaches to data science and Research & Development and how a STEM proficient workforce is leading their innovative efforts.

Challenge

On September 25th,2020, participants received an exclusive link to the crop and grain dataset along with the challenge survey. Solvers worked independently to develop a model that predicted the effect of growing location, soil type, fertilizer, and crop parameters associated with growth and development on product assessment.

Virtual Pitch

The top 5 finalists were invited to a virtual pitch session with the Challenge Judges as well as members of the Academy’s challenge design team. Finalists were given 10 minutes each to present their solution and model design followed by judges Q&A.

Want to be notified when our next challenge is announced?

Subscribe >

Judges

Robert Nolan
Robert Nolan

Senior Director of Digitization and Data Engineering, PepsiCo

Hua Xu
Hua Xu

Data Scientist and Software Engineer, PepsiCo

Jason Parcon
Jason Parcon

Senior Principal Scientist, PepsiCo

James Yuan
James Yuan

Sr. Director of Data Science & Analytics, PepsiCo

Jingting Hui
Jingting Hui

Senior Data Scientist, PepsiCo

Sponsor

Key Dates

Challenge Began

September 25, 2020

Solutions Due

October 14, 2020

Virtual Pitch

November 9, 2020

Winners Announced

November 16, 2020

Grand Prize Winners

Md Taufeeq Uddin, University of South Florida

The raw data was first pre-processed to extract relevant information based on the challenge goal to assess the quality of agricultural products using product and assessment information, geographic location, and weather data. Second, new features were created from the assessment types, weather, and location data. Finally, the data was fed to random forests (RF) regressor to predict the assessment score. The developed model was validated using cross-validation strategy and obtained impressive results in terms of normalized RMSE (root mean square error) and Spearman rank correlation coefficient metrics. In terms of interpretation, based on the importance score of the applied RF model, the created features from the assessment types made the major contribution towards predicting the target assessment score. Also, the product growth stage and weather data made a fair contribution.

Blake Bullwinkel, Harvard University

In order to predict the assessment score of crops as accurately as possible, a variety of regression models were considered, including multiple linear regression, decision tree, and random forest models. First, the crop grain data was combined with site specific data, and weather data was incorporated using rolling averages. Categorical variables were converted to dummy variables using one-hot encoding. Second, the pre-processed data was split into train (80%) and test (20%) sets, and a baseline multiple linear regression model using all the features achieved an R2 of 0.8357 on the test set. Incorporating interaction terms achieved marginal improvements, and applying a Yeo-Johnson transformation on the response increased the test R2 to 0.8886. Finally, decision tree and random forest models tuned using 10-fold cross-validation performed even better, the latter achieving a test R2 of 0.9815. All models indicated that assessment type was the most important feature for predicting assessment score.

Finalists

  • Christopher Cammilleri, Rensselaer Polytechnic Institute
  • Alexander Shen, University of Michigan
  • Sam Tauke, University of Wisconsin-Madison

Want to be notified when our next challenge is announced?

Subscribe >