Click here to learn about Academy events, publications and initiatives around COVID-19.

We are experiencing intermittent technical difficulties. At this time, you may not be able to log in, register for an event, or make a donation via the website. We appreciate your patience, and apologize for any inconvenience this may cause.

Our site is under planned maintenance. At this time, you will not be able to log in, register for an event, or make a donation via the website. We appreciate your patience, and apologize for any inconvenience this may cause.

Support The World's Smartest Network
×

Help the New York Academy of Sciences bring late-breaking scientific information about the COVID-19 pandemic to global audiences. Please make a tax-deductible gift today.

DONATE
This site uses cookies.
Learn more.

×

This website uses cookies. Some of the cookies we use are essential for parts of the website to operate while others offer you a better browsing experience. You give us your permission to use cookies, by continuing to use our website after you have received the cookie notification. To find out more about cookies on this website and how to change your cookie settings, see our Privacy policy and Terms of Use.

We encourage you to learn more about cookies on our site in our Privacy policy and Terms of Use.

Key Dates

Solutions Due

October 7, 2020

Virtual Pitch

November 9, 2020

Winners Announced

November 16, 2020

The challenge is closed. 

Background

Cereal based products rely on raw grain from crops grown by farmers. Different growing locations and environmental conditions are known to influence grain physical and compositional elements which can affect processing and final product quality. The goal of this exercise is to create a model that can predict the effect of growing location, soil type, fertilizer, and crop parameters associated with growth and development on product assessment.


Data

The Crop and Grain dataset consists of field trials conducted over several years in 5 different locations identified by the Site ID column. Each of the trials had several replicates on which 28 assessment types were measured. Note that the year appended within the Site ID is the harvest year.

The other columns in the dataset are:

  • Growth Stage: The crop growth stage on the date that the assessments were conducted; values are ordinal.
  • Variety: B and M are the crop varieties grown in the trials.
  • Assessment Type: The types of assessments (or KPIs - key performance indicators) measured on the crop.
  • Assessment Date: The dates when the assessment types were evaluated.
  • Assessment Score: The evaluation scores on each KPI.

Site metadata are also provided and include:

  • Latitude and Elevation of the trial sites
  • Sowing and Harvest dates of the trial crops
  • Relevant soil parameters and fertilizer quantities; Soil Parameter A has valued at the nominal level.

Weather data from weather stations installed at each of the trial sites is also included. There are 6 relevant weather parameters in the data. The dates when the data were gathered are included in the weather worksheet.


The Challenge

A model predicting the assessment score would allow growers to assess product quality. Given the provided data sets, derive a model that would predict the assessment scores as accurately as possible using relevant features or predictor variables.


Challenge Sponsors