Hey—we've moved. Visit
The Keyword
for all the latest news and stories from Google
Official Blog
Insights from Googlers into our products, technology, and the Google culture
Google Cloud Platform predicts the World Cup (and so can you!)
July 11, 2014
In 2010, we had
Paul the Octopus
. This year, there’s
Google Cloud Platform
. For the past couple weeks, we’ve been using Cloud Platform to make predictions for the World Cup—analyzing data, building a statistical model and using machine learning to predict outcomes of each match since the group round. So far, we’ve gotten 13 out of 14 games correct. But with the finals ahead this weekend, we’re not only ready to make our prediction, but we’re doing something a little extra for you data geeks out there. We’re giving you the keys to our prediction model so you can make your own model and run your own predictions.
A little background
Using data from
Opta
covering multiple seasons of professional soccer leagues as well as the group stage of the World Cup, we were able to examine how activity in previous games predicted performance in subsequent ones. We combined this modeling with a power ranking of relative team strength developed by one of our engineers, as well as a metric to stand in for hometeam advantage based on fan enthusiasm and the number of fans who had traveled to Brazil. We used a whole bunch of Google Cloud Platform products to build this model, including
Google Cloud Dataflow
to import all the data and
Google BigQuery
to analyze it. So far, we’ve only been wrong on one match (we underestimated Germany when they faced France in the quarterfinals).
Watch
+Jordan Tigani
and
Felipe Hoffa
from the BigQuery team talk about the project in
this video from Google I/O
, or look at our
quarterfinals
and
semifinals
blog posts to learn more.
A narrow win for Germany in the final
Drumroll please… Though we think it’s going to be close, Germany has the edge: our model gives them a 55 percent chance of defeating Argentina. Both teams have had excellent tournaments so far, but the model favors Germany for a number of factors. Thus far in the tournament, they’ve had better passing in the attacking half of their field, a higher number of shots (64 vs. 61) and a higher number of goals scored (17 vs. 8).
(Oh, and we think Brazil has a tiny advantage in the third place game. They may have had a disappointing defeat on Tuesday, but their numbers still look good.)
Channel your inner data nerd
Now it’s your turn. We’ve put together a step-by-step guide (warning: code ahead) showing how we built our model and used it for predictions. You could try different statistical techniques or adding in your own data, like player salaries or team travel distance. Even though we’ve been right 92.86 percent of the time, we’re sure there’s room for improvement.
The model works for other hypothetical situations, and it includes data going back to the 2006 World Cup, three years of English Barclays Premier League, two seasons of Spanish La Liga, and two seasons of U.S. MLS. So, you could try modeling how the USA would have done against Argentina if their game against Belgium had gone differently, or pit this year’s German team against the unstoppable Spanish team of 2010. The world (er, dataset) is your oyster.
Ready to kick things off? Read our post on the
Cloud Platform blog
to learn more (or, if you’re familiar with all the technology, you can jump right over to
GitHub
and start crunching numbers for yourself).
Posted by Benjamin Bechtolsheim, Product Marketing Manager, Google Cloud Platform
Labels
accessibility
41
acquisition
26
ads
131
Africa
19
Android
58
apps
419
April 1
4
Asia
39
books + book search
48
commerce
12
computing history
7
crisis response
33
culture
12
developers
120
diversity
35
doodles
68
education and research
144
entrepreneurs at Google
14
Europe
46
faster web
16
free expression
61
google.org
73
googleplus
50
googlers and culture
202
green
102
Latin America
18
maps and earth
194
mobile
124
online safety
19
open source
19
photos
39
policy and issues
139
politics
71
privacy
66
recruiting and hiring
32
scholarships
31
search
505
search quality
24
search trends
118
security
36
small business
31
user experience and usability
41
youtube and video
140
Archive
2016
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2007
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2006
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2005
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2004
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Google
on
Follow @google
Follow
Give us feedback in our
Product Forums
.