How to Scrape Sports Data with Python

Felicitations, malefactors! This is my first post ever, so thank you for joining me today. The objective of this post is to help you get started and hit the ground running for quick data analysis and visualization of any type of sports data that you may be interested in. My programming language of choice will be python, because it's easy to learn and extremely useful for data manipulation.

I'm of the firm belief that the best way to learn is to do something - the goal here is to give you a taste of what it takes to accomplish this particular task, and if you're interested enough, hopefully you'll be able to revisit the basics and fundamentals of basic programming with a passion.

Here's a high-level overview of what we're going to accomplish in this tutorial. We're going to:

Choose what type of data we want to collect and investigate. In this tutorial, I'm going to look at fantasy football data from the 2019 NFL season.
Scrape that data and store it somewhere
Create a very basic visualization of it
Make some inferences from our dataset

1. Choose data to gather and collect

In this tutorial, I'm choosing to collect fantasy football data from the 2019 NFL season. I'm going to assume that you already have python installed on your computer, but for the sake of this tutorial, we'll use a Google Colaboratory notebook. Google Colab is an interactive python notebook that allows you to write and execute python code in your browser. If you are having issues with installing python, then please use Colab for the time being - this will allow you to test some code out to start with.

Let's start with targeting some fantasy football data to play with. I'm choosing to gather some data from FantasyPros.com. If we navigate to their Fantasy Leaders page, we can see the top fantasy scorers from the most recent season. The table in the middle of the page is the one we're interested in.

2. Scrape our data

Python is extremely powerful because of the abundance of open source libraries that exist to tackle any particular task. Pandas is a python library for data manipulation and analysis that is ridiculously useful for this scenario, and Seaborn is a data visualization library that makes displaying charts and graphs extremely easily. There are tons of other python libraries to leverage, but we'll start with these.

In your Colab notebook, execute the following code below in a code block.

The seaborn library contains the function .barplot() that, when provided a dataframe and data fields, will generate a bar plot based on those arguments. In this scenario, we are only concerned with the top 10 scorers - applying the .head() method with a value of 10 will use only the top 10 entries in our dataframe. We then specify "Player" and "Points" to plot by.

If you're a relatively knowledgeable FF player, you'll notice that 9 of the top 10 overall fantasy producers are quarterbacks (Christian McCaffrey is the lone exception of RB because he's a fucking beast). If we want to show both a player's name and position, we can create a new column in our dataframe that contains both a player's name and position. We can then view our top ten fantasy performers with this additional context.

How to Scrape Sports Data with Python

1. Choose data to gather and collect

2. Scrape our data

How to scrape sports data with python - python tutorial