I've recently been doing a bunch of stuff with sports stats, which involves lots of GIS data. This sort of thing comes up a lot - trying to find a player's 'territory' based on coordinates of their actions in a game, without including outlying events that cause you to overestimate the area.
There's a concept in animal behaviour called a 'home range' which is more or less the same thing - GPS attached to tigers in the wild etc. Some of the algorithms there are quite interesting, from simply drawing a bounding box around the the data points, to working out the probability density, to things like LoCoH, which sort of recursively build up convex hulls from nearest neighbours.
All of these things are pretty much possible in SQL to one degree of performance or another. But ultimately I'm fascinated by things like SQL Server's R support - you can get far simpler, more natural implementations of these things in R (or indeed in custom aggregates or functions in other languages). I think in the long term, database engines that offer this sort of extensibility are going to thrive for analytics work, be they SQL based or otherwise.
I hope it works out but they need to do a lot of work on it between these CTPs and RTM. They've made a critical mistake in the way they've marketed and documented it.
* They call every use a "data scientist". This is fun and assuredly deserved for R experts but is going to scare a hell of a lot of potential users away.
* The installation notes give you a lot of steps but zero insight into why you're doing them. Install these two packages; okay but what exactly are they doing? We need to know this kind of thing. R people might know...
* There's very little troubleshooting documentation either on the installation/initial setup (and there are a LOT of bugs in getting it going; trust me I spent most of last week digging through them) but also once it's operational how it's going to be tracked and managed within the context of everything else that's going on in the SQL instance. Like the memory pools, wait states, resource governor pools, etc. We kind of need to know this stuff. Otherwise we see a server that gets reported as slow, we know it uses R, but we don't know where to look to determine whether R is causing it or not and what we can do about it.
* AFAIK it's single-threaded. Considering most places have super-downsized their CPUs and are going for massively threaded servers these days (which is arguable given Microsoft's 2012+ per-core licensing model; but I'm talking about servers in very big enterprises that have unlimited licensing agreements), that's not going to end well. It's very possible this will be fixed before RTM though.
* And finally, because they've bought revolution and rebranded it as Microsoft R a few days ago, and now it's split into SQL and non-SQL products, there's going to be a lot of confusion and a long wait for appropriate training materials to catch up for all of us non-R users.
* I haven't been impressed with the current tutorial materials. As a non-R user we're going to need a lot more to really understand how to use it. I'm sure there are fantastic R resources available but if you want to understand it with a Microsoft SQL Server background then it's a slightly different story...
We used a custom PowerBI visualization (d3 + TypeScript). For dev purposes I just averaged data with some simple SQL queries, but the real thing used (I believe) 10k datapoints fore each player over the course of the match.
There's a concept in animal behaviour called a 'home range' which is more or less the same thing - GPS attached to tigers in the wild etc. Some of the algorithms there are quite interesting, from simply drawing a bounding box around the the data points, to working out the probability density, to things like LoCoH, which sort of recursively build up convex hulls from nearest neighbours.
All of these things are pretty much possible in SQL to one degree of performance or another. But ultimately I'm fascinated by things like SQL Server's R support - you can get far simpler, more natural implementations of these things in R (or indeed in custom aggregates or functions in other languages). I think in the long term, database engines that offer this sort of extensibility are going to thrive for analytics work, be they SQL based or otherwise.