Creating a Flowchart of a Goal in Soccer/Football
By Aaron Nielsen
After the recent OPTA pro forum, I posted some comments including my proposed paper - Opta Pro Research Paper I got a decent amount of back and forth on my twitter account @ENBSports including links to other work. What I noticed from others is that the MCFC data released in 2011-2012 by OPTA is having an influence with writers trying to take advantage of the 211 columns of data. Personally the only thing I really looked at when it came out were free kick shot opportunities - Direct Free Kicks as data for the la liga was also available through Marca magazine so I thought it created a good comparison piece of analysis.
My main concern with the detail set is the amount of cost/time to tabulate this depth of data and what can be gained from it. So in tabulating my own data I broke down over time what I thought were the most valuable statistics so I'm now able to cover over 60 leagues of data as have been tabulating data for over 20 years. Although now with more detailed statistics becoming a bigger part of the everyday conversation including most recently Statsbomb.com - Expected Goals, I decided to reopen my MCFC excel sheet and reexamine the data from the 2011-2012 English Premier League season and see if I can find specific information from it that I think could be used in understanding the game better through statistical data.
I thought my first attempt was using the data if I could work out a proper flowchart to a goal. One reason for this is that one of my biggest issue in soccer is that most people including statistician see shots (including missed and blocked shots) as a positive stat. Its true almost all goals come from a shot via a foot or head although they also come from a shot on target (on average around 30% of all shots) well most shots off target lead to a restart from a goalkeeper and an end to the offensive possession. So in recording data of less detail I've decided to only include shots on target as shots and ignore other shot attempts as I feel shots in total gives a poor representation of the players success assumed by this stat. Although since OPTA makes it available I've decided to use it as part of this analysis in help breaking down each offensive possession.
I start in looking at the flowchart of a goal with offensive possession which can either come from Open Play, Penalty, Free-Kick (Direct or In Direct), Corner, or a Throw-In. All goals including own goals fit under each applicable category and an opponent’s defensive turnover leads to an open play on offensive possession even if the offensive position only includes a shot. OPTA doesn't record each individual offensive possession in the stats but does show 1025 goals, 10891 total shots which led to 10496 Goalkeeper Distribution and 22555 clearances that stats do not include other forms where we can assume lost of possession but one could estimate in total there were about 40,000 offensive possession or a goal in every 40 or so attempts at possessions.
OPTA does a great job in analyzing penalties, free kicks and corners so I will start there. First in terms of free kicks OPTA judged that there were 1884 Free Kicks in a dangerous area not including penalties from these free kick opportunities there were 854 that lead to a shot including 553 direct attempts on net that led to 29 direct goals or one in every 19.1 attempts. OPTA did record shots information but in terms of a flow chart it’s difficult to analyze the play because we don't know of the play conclusion on a direct attempt that did not lead to a goal. Of the 1291 free kicks passed instead of direct shot 301 lead to shots via a key pass and 50 goals or a goal in every 25.8 attempt although again we have no information on plays that didn't lead to a goal. Meanwhile there were 100 penalties with 72 converted, 23 being saved and 5 were off target.
In terms of Corners, OPTA has a total 4321 corners that lead to 129 goals inside the box or a goal in every 33.5 attempt. Of these Corners 3496 were attempted in the box with 1163 being "successful" which I assume means lead to a shot and 377 of those shots were on target. There were 663 short corners although OPTA gives no detail on any further actions from these short corners. Meanwhile throw-ins lead to 20 goals from inside the box, 67 shots on target and 184 shots off target although we don't have information on how many throw-in were attempted in that zone or again what happen to the play on corners and throw-ins outside of ones that lead to goals and shots.
Which leads to open play opportunities, unfortunately with some of the data missing above we can't assume all the open play data. The data shows there were 725 goals not counted above so we can assume they were from open play and of these goals 577 goals were inside the box and 148 from outside. We can't breakdown the inside shots due to missing data but from the outside minus direct free kicks 1164 were on target, 1395 were blocked and 1819 missed the net. Overall there were 3522 shots in total on target, 2902 were blocked, and 4467 missed the target and we can also assume most of the shots that missed the target lead to an opposition restart.
The problem with the flow chart concept is we’re missing key stats related to creating data that OPTA does record for example of the Shots on Target we don't know how many of them were controlled by the goalkeeper to change over possession. So I would suggest that OPTA starts recording the stat rebound and save leading to corner so we have a better sense of shots on target that do not count as a goal. We also don't have any information regarding an offensive player mistake in the offensive zone for example a foul, wayward pass, or loss of possession out of bounds so we can only assume when the offensive team losses possession by looking at clearances and goalkeeper distribution which we assume the offensive possession is over since the keeper has control of the ball.
There is also an issue in my view with the stat clearances and as a defensive statistic the assumption like shots is that this is a positive play although we have no further information regarding the clearances which affect the flow chart concept and also like total shots a misleading stat. I'm assuming clearance includes kicking a ball out of bounds and giving the opponent the ball as a throw-in or corner. If so well looking at the goal per attempt stat this could be problematic because with only the data in front of us the ratio in terms of how many attempts per goal is lower for a corner or a throw-in then in open play. So analytically a clearance could actually increase his opponent’s chance of scoring depending on the clearance.
Ultimately for a flow chart you would want the play and then the lead in to the next possession be a goal, the offense getting another offensive possession via a rebound, clearance, or set play opportunity or the defense controlling the ball and putting an end to the offensive possession. Based on this we can start looking at general views in terms of possession but also specific for example if a particular player tends to kick the ball out for a corner well under pressure a team can use that to their advantage when breaking down opposition analysis. Or alternatively if a team likes to shoot maybe allow it because you know a chance to score is poor and a shot off target leads to a loss in possession.
I'm impressed with OPTA work and with a background in most sports it amazes me the effort they put into collecting it especially when traditionally there is a long history of stats recorded for North American sports in far less in detail. Although I must say that when breaking down a sport I find much more useful, traditional North American sports data. And in doing my own soccer stats is the same because I know what I want to get out of it before I start to tabulate the data. I fight with teams and leagues all the time over the value of stats in soccer and to dismiss my work they show me huge books of animated drawings in what they believe breaks down every play in soccer. Although the reality of what breaks down every play is actually what really happens over a good number of samples which statistics can do, all we need to do now is prove it through our work.