Espace client
Datascience, Artificial Intelligence

Can we predict the quality of a wine?

In the report below, we provide you with all the analysis made on the 7 appellations. We invite you to focus on the first 2 appellations whose temperature / rating relationship is the strongest to calculate the optimal temperature of the specific appellation.

Data :

We simply crossed the temperature history for each appellation with the history of Parker’s notes.

The objective :

Verify the relationship between the temperature during a year on a vintage and the final score given to the wine.

Difficulties encountered :

Some weather data and data from Parker’s are missing to allow us to perform even more detailed analyses.

Selection of 7 appellations:

  1. Margaux (FRA)
  2. Barsac/Sauternes (FRA)
  3. South-Châteauneuf du Pape (FRA)
  4. Barolo (ITA)
  5. Ribera del Duero (SPA)
  6. Barbaresco (ITA)
  7. Pomerol (FRA)

Motivations in the choice of appellations :

  • They are all located in the northern hemisphere (so they have fairly close characteristics regarding the weather data);
  • We have found a sufficient number of notes according to the “Parker’s 100-point rating system” for the vintages of these appellations;
  • From a geographical point of view, these appellations are not very extensive. This allows us to select a credible weather coordinate to generalize to all the châteaux in the appellation concerned (geographical coordinates calculated via the website: http://www.geomidpoint.com/).

Data Sets :

  1. Ratings according to the “Parker’s 100-point rating system” (PI) for each appellation, from the 1970 vintage to 2013 (depending on availability). site: https://www.erobertparker.com/entrance.aspx
  2. The average growing season temperatures (TMSC) of each appellation (period 01 April – 31 October), from the 1970 vintage to 2013 (depending on availability). Site: “The Dark Sky Forecast API” (1000 free requests every day). https://developer.forecast.io/

Methodology – algorithm :

To carry out this work from a methodological point of view, we have relied on the work of the excellent Gregory V. Jones – Past and Future Impacts of Climate Change on Wine Quality // Department of Geography, Southern Oregon University – 2006.

Résults :

To finish this first stage of analysis, we were able to highlight 2 designations for which the temperature / note relationship is considered relevant: Barbaresco and Pomerol. Indeed, the other 5 appellations only achieve statistical significance percentages ranging from 2 to 60%, which is insufficient to apply this statistical model with any relevance.

In the report below, we provide you with the full analysis. We still invite you to focus on the first 2 appellations whose temperature / rating relationship is stronger to calculate the optimal temperature of the specific appellation.

— Pomerol :

Available data (PI-TMSC crossover): 38

Historical graph “Parker ratings” over the period considered:

Predicted notes:

For the Pomerol appellation, with the data at our disposal, we can highlight an interesting relationship between TMSC and PI: therefore, we can consider calculating the quality of future vintages according to a specific TMSC.

By analyzing these data, we verified that TMSC has a sufficient probability to consider it relevant for calculating PI. In particular, TMSC is represented by the variable “temp” in the equation of second degree where, in this case, we have :

• “temp” is significant at 85.97%.

• “temp2” is significant at 84.22%.

We observe that when considering our data sets, the explanatory variables “temp” and “temp2” (i.e., TMSC) seem to be significant enough to determine Parker’s scores for this appellation.

We try to calculate Parker’s score according to TMSC’s value range:

In the graph above we notice that the optimal temperature TMSC predicted for the Pomerol appellation is 17.97 °C.

The importance of this value becomes clear when we compare it to the long-term average TMSCM (1970-2013): in this case, we obtain that TMSCM is 17.02 °C, i.e. about one degree less than the predicted optimal temperature. If we now try to calculate TMSCM only for the last 10 years (2004-2013) we find a higher value than that calculated over the period 1970-2013: 17.53°C, a value closer to the optimal TMSC temperature predicted in our model. We can therefore observe that the average TMSC temperature in the Pomerol appellation has increased by almost half a degree.

–Barbaresco :

Available data (PI-TMSC crossover): 36

Historical graph “Parker ratings” over the period considered :

Predicted notes:

For the Barbaresco appellation, with the data at our disposal, we can highlight a good relationship between TMSC and PI: therefore, we can consider calculating the quality of future vintages according to a specific TMSC.

By analyzing these data, we verified that TMSC has a sufficient probability to consider it relevant for calculating PI. In particular, TMSC is represented by the variable “temp” in the equation of second degree where, in this case, we have :

• “temp” is significant at 90.25%.

• “temp2” is significant at 91.04%.

As a result, we see that, considering our data sets, the explanatory variables “temp” and “temp2” (i.e., TMSC) seem to be significant enough to determine Parker’s scores for this term.

We try to calculate Parker’s score according to TMSC’s value range :

However, in the graph above we notice a strange behaviour of our model towards this appellation, the name Barbaresco: we will have high scores for TMSC values progressively lower by 16.73°C, but at the same time, for TMSC values progressively higher by 16.73°C. This result shows a good confirmation from a mathematical point of view, but from a real point of view (i. e. considering the quality of the wine as a function of temperature) it seems difficult for our teams to obtain a valid result.

We could explain this behaviour simply with the data at our disposal. As with all the other appellations, for the Barberesco appellation, we had at our disposal a non-exhaustive sample in order to help us draw conclusions (36 observations in this specific case). This data set has a particularity that does not help us to adjust our model to the observed data. Frequently, we have two different vintages with the same score, but two opposite TMSC values: a very high value in a first case, and a very low value in a second (obviously “very high / very low” compared to the range of values seen in this appellation, i.e. Min = 15.29 °C, Max = 18.46 °C). For example:

Vintage 1979 : note = 89, TMSC = 15.86 °CMillesime 1986 : note = 89, TMSC = 18.10 °C

–Margaux :

Available data (PI-TMSC crossover): 38

Historical graph “Parker ratings” over the period considered :

Predicted notes:

For the Margaux appellation, with the data at our disposal, we cannot demonstrate a solid relationship between TMSC and PI: therefore, we cannot accurately calculate the quality of future vintages based on a specific TMSC.

By analyzing these data, we verified that TMSC is too low a probability to be relevant to calculate PI. In particular, TMSC is represented by the variable “temp” in the equation of second degree where, in this case, we have :

• “temp” is significant at 21.2%.

• “temp2” is significant at 25.34%.

Thus, we observe that when considering our data sets, the explanatory variables “temp” and “temp2” (i.e., TMSC) are not significant in determining Parker scores for this appellation and, therefore, will not yield valid results.

Barsac / Sauternes :

Available data (PI-TMSC crossover): 38

Historical graph “Parker ratings” over the period considered :

Predicted notes:

For the Barsac/Sauternes appellation, with the data at our disposal, we cannot demonstrate a solid relationship between TMSC and PI: therefore, we cannot accurately calculate the quality of future vintages based on a specific TMSC.

By analyzing these data, we verified that TMSC has too low a probability of being relevant to calculate PI. In particular, TMSC is represented by the variable “temp” in the equation of second degree where, in this case, we have :

• “time” is significant at 28.19%.

• “temp2” is significant at 21.16%.

We note that when considering our data sets, the explanatory variables “temp” and “temp2” (i.e., TMSC) are not significant in determining Parker scores for this appellation and, therefore, will not yield valid results.

South-Châteauneuf du Pape :

Available data (PI-TMSC crossover): 37

Historical graph “Parker ratings” over the period considered :

Predicted notes:

For the South-Châteauneuf du Pape appellation, with the data at our disposal, we cannot demonstrate a solid relationship between TMSC and PI: therefore, we cannot accurately calculate the quality of future vintages based on a specific TMSC.

By analyzing these data, we verified that TMSC has an insufficient probability to be relevant for calculating PI. In particular, TMSC is represented by the variable “temp” in the equation of second degree where, in this case, we have :

• “temp” is significant at 47.79%.

• “temp2” is significant at 48.86%.

As a result, we see that, considering our data sets, the explanatory variables “temp” and “temp2” (i.e., TMSC) are not significant enough to determine Parker scores for this designation and, therefore, they will not give valid results.

Barolo :

Available data (PI-TMSC crossover): 35

Historical graph “Parker ratings” over the period considered :

Predicted notes:

For the Barolo appellation, with the data at our disposal, we cannot demonstrate a solid relationship between TMSC and PI: therefore, we cannot accurately calculate the quality of future vintages based on a specific TMSC.

By analyzing these data, we verified that TMSC has an insufficient probability to calculate PI. In particular, TMSC is represented by the variable “temp” in the equation of second degree where, in this case, we have :

• “temp” is significant at 3.73%.

• “temp2” is significant at 2.51%.

As a result, we observe that when considering our data sets, the explanatory variables “temp” and “temp2” (i.e., TMSC) are not at all significant in determining Parker scores for this appellation and, therefore, they will not yield valid results.

Ribera del Duero :

Available data (PI-TMSC crossover): 34

Historical graph “Parker ratings” over the period considered :

Predicted notes:

For the Ribera del Duero appellation, with the data at our disposal, we cannot demonstrate a solid relationship between TMSC and PI: therefore, we cannot accurately calculate the quality of future vintages according to a specific TMSC.

By analyzing these data, we verified that TMSC has a good probability insufficient to be relevant to calculate PI. In particular, TMSC is represented by the variable “temp” in the equation of second degree where, in this case, we have :

• “temp” is significant at 60.78%.

• “temp2” is significant at 60.90%.

As a result, we observe that when considering our data sets, the explanatory variables “temp” and “temp2” (i.e., TMSC) are not yet significant enough to determine Parker scores for this appellation and, therefore, they will not give valid results.

 

Conclusion of the analysis :

It would be interesting to continue this first stage of analysis by predicting the rating of a château within an appellation. Indeed, many wines, especially on Pomerol, are sold at very high prices. This predictive analysis would provide a tangible mathematical value to buyers looking for rare wine in addition to the intangible value that some of the great châteaux work brilliantly.

0 / 5 5
ESPACE CLIENT