Sloan Sports Analytics Conference 2013: My Thoughts and Opinions

I attended the Sloan conference last week, and I left with a lot of opinions.

My opinions come from an outsider’s perspective, but an outsider who has done analytics in disciplines other than sports, and who works every day on a team of analytics nerds at your typical Silicon Valley tech company.  Also note that I’ve attended the prior two Sloan conferences, so I knew what to expect to hear at the various panels.  Also, I’m a terrible networker.  That’s important to know, because…

1. You don’t go to Sloan to learn how to do analytics

You go to Sloan to network.  Despite its reputation as a geek convention, many of its 2,700 attendees are not actually analytics/Bill James-y types of people.  Many are MBA candidates (Sloan is MIT’s business school, after all), many are high school/college students who want to be close to sports people, some are vendors hawking their products, and many are marketing/sales/business operations people who work for teams, who want to work for teams, or work in the sports industry.  You have a lot of consumers of analytics, but not necessarily many producers of analytics.

I would guess that these people comprise the majority of the attendees.  After them, you have the people that do some form of sports analysis: academic researchers, bloggers, ESPN employees, and the team’s sports analytics employees themselves.  If I were to guess, the count of sports analysts definitely wouldn’t be greater than 700, and is probably no more than 150-200.

The Sloan conference isn’t as geeky as its reputation implies, and it’s questionable whether the Sloan sports conference is a good place to find up-and-coming analytics talent.  I had spoken to a few people who volunteered that the level of skills found at this conference weren’t as high as they expected.  My boss wanted me to keep my eyes out for good analysts, and probably the best unemployed analyst I heard was Stan van Gundy during the Basketball Analytics panel.

There’s probably good analysis being done by teams.  But you’re not going to learn that at Sloan, because teams keep it secret.  Why should they share, right?

Funny enough, the best analytics at the Sloan conference actually happens on the business panels.  An Orlando Magic executive presented good stuff around sales analytics, in particular the likelihood of selling season tickets as a function of time of day, as well as time during season.  I even heard the words “decision tree” and “linear model” thrown around.

In fact, the only piece of analysis I remember from last year’s conference came from NBA marketing, who shared the effectiveness of making the Boston Celtics’ center court ad display shorter, and putting two small ad displays in the corners near the players bench.  It turns out that television cameras spend more time at each end of the court, and having ad consoles at each end of the court increased exposure time for the advertiser, leading to bigger ad dollars.  It also had the positive side effect of substituting lower-priced, less-desired corner seats for higher-priced, more-desired courtside seats.

You may not like ads or business-y things, but man that’s some interesting analysis.

In my opinion, if you want to learn how to do analytics or learn new methods, go to the business panels.  They’ll actually share information.  Also, go to the research paper presentations.  Most of the other panels aren’t illuminating.

2. Data analysis isn’t the issue, it’s data management

When people think “analytics”, I get the impression that people think it means fancy algorithms and distribution curves and lots of Greek letters.  In my opinion, a big chunk of data analysis is just smart counting, and optionally dividing by something so a number makes sense contextually.

I think the sports analytics community does this fairly well.  But this smart counting assumes that the data collection piece has been taken care of.  In basketball, the box score has typically been the place where analytics begins.

But in the new world of basketball analytics, the starting point will probably be SportVu’s XYZ coordinate data.  The definition of the word “data” will change.  Where “data” used to mean sanitized, summarized, and tractable, “data” will mean messy, overwhelming, error-prone, and frustrating.  Living in a world of data streaming 25 times per second means you won’t be living in an Excel spreadsheet, but in something much, much bigger.  Databases matter more.  Parallel computing might have to be used.  Data cleansing will really, really matter.

I believe too much of sports analytics focuses on stats, and not enough on data management.  To me, stats are the by-product of good data management.  Good data management makes for good analytics, and leads to confidence in your final results.

In fact, Vorped is simply an exercise in data warehousing, of which some components I’ve actually plugged into stuff at my day job.  About 80% of my effort on Vorped is spent on making sure I have clean data, and even then, I know that parts of my data aren’t 100% accurate.  Despite this, I have confidence in the data I present, because I know the underlying data has been processed pretty rigorously.

We need more people who are both willing and able to endure the drudgery of data management. It’s completely unsexy, but absolutely vital to making analytics work.  And in the coming world of streaming sensor data similar to SportVu, having more data won’t make things easier, it’ll make things harder, because you have much more noise to sift through to find the signal, let alone finding the right signal.

At some point in the future, the XYZ coordinate data will likely lead to awesome findings, especially around screening, defense, and off-the-ball movement.  However, I believe that there exist simpler methods and data sources that could answer useful basketball-related questions as good as, and in some cases better than, the current SportVu data.  But if the NBA put cameras in all 30 arenas and disseminated all that data, my opinion would probably change.

3. Communication matters

I found it interesting that so many panels devoted time to discussing how to communicate analytics findings.  Often communication proves to be the most challenging part of analysis, because human beings have emotions and egos that can prevent objectivity from carrying the discussion.  I’ve experienced this countless times myself.  If your listener fundamentally does not believe in data, or in you as an analyst, it usually doesn’t matter how good your models are, because in the end, that knowledge won’t be used.

The big exception is baseball.  Moneyball worked so well because baseball’s rules create situations and data that make statistical analysis very natural.  Assigning credit and blame for an at-bat is relatively straightforward.  You have a batter and a pitcher and sometimes a fielder with an error.  Also in baseball, at-bats are well-defined by the rules of the game, which make counting events pretty easy, which then allow stats to be relatively self-explanatory.

Basketball is so much harder.  Assigning credit and blame gets very complicated when you consider non-box score things like screening, cuts to the basket, missed rotations, and bad spacing.  A player can play an effective 30 minutes without registering a single shot attempt or assist.

I think this is why communicating analytics is much harder in non-baseball sports: collecting the right data to get the right model is hard, so we have to make-do with simpler data, which limits the depth of actionable knowledge we can gain from that analysis.

Current basketball statistics do a good job of identifying what teams and lineups are good (i.e. efficient).  But they don’t necessarily tell us why they’re efficient.  Is it because the lineup has better ball movement?  Better screening?  Better shot selection?  Questions that start with “who” and “what” can be answered.  Answering “why” is much, much harder.

I would guess that communication becomes challenging because basketball analytics has a hard time answering “why” questions.  Decision-makers want actionable insights.  In these cases, stats (or metrics) aren’t good enough alone.  You need interpretation, too, which requires contextual knowledge outside of the data.  And in my opinion, this is where the next opportunity lies for sports analysts in the near future, to deftly combine quantitative data with qualitative contextual information to tell a believable and accurate story.

In my experience, I’ve always tried to communicate to decision-makers that data will tell us some things, but won’t explain things fully.  Like Nate Silver says, data analysis tends to be probabilistic.  If you can use data to make a CEO or coach 70% confident instead of 50% confident in using a particular strategy, that’s a win.  Learnings from data are typically incremental, and I think the goal should be to accumulate as many incremental learnings as possible, instead of searching for the silver bullet analysis that explains everything.

The prevalence of this topic makes me believe that the statistical movement hasn’t truly taken hold.  To me, the statistical revolution will have happened when teams operate as data-driven organizations, not just organizations that happen to use data.  Being data-driven means questioning assumptions, measuring the right things, and continually testing those assumptions with the data you’ve collected.  Based on the chatter at the conference, I would guess that not many basketball teams meet these criteria.

Too long; didn’t read (TL;DR)

The Sloan conference isn’t as nerdy as its media coverage implies.  Sports analytics is still in its nascent stages, more evolution than revolution, and still behind business analytics that have been doing this for decades.

While there are plenty good stats and quality data analyzers out there, we need more people involved in the ugly but important work of data collection.  We also need open data, because that’s how we’re going to discover the next generation of sports analysts.

Finally, we need to be comfortable communicating both what data analysis does and doesn’t tell us, because we’re comfortable knowing that data analysis can’t explain everything.

Overall, the conference was a good experience.  I met many good people doing good things, and yet I didn’t get to meet as many people as I hoped (I’m terrible at networking).  I just wish more actual analytics happened.  It would be awesome if there were a hackathon during next year’s conference.