tl;dr Big data 2.0 – newly enabled vertical applications riding the big data wave
In the same way the open source LAMP stack of Linux, Apache, MySQL and PHP, the big data stack has been established and will be the foundation of a next generation of applications. Technology, previously only possible at Google, has been made accessible through the generation of this open source infrastructure. This stack has accelerated the pace of innovation.
I was fortunate to be able to attend the Structure Data in NY last week. It was a fascinating group of deep technologists, business users and a venture capitalist here and there. In an excellent presentation, Geoff McGrath, managing director for McLaren spoke about how through better instrumentation, predictive analytics and advanced simulation they can design better and faster systems from domains as diverse as car, to bike, to airports. From this and other talks my takeaway was with the big data stack now built, new vertical applications previously only possible at companies like Facebook and Google are now at any technologists fingertips.
More complete overviews of this next generation of big data infrastructure exist but as a frame of reference here is a quick look at the stack.
- AWS – cloud compute and storage
- Hadoop, HDFS, HBase, MongoDB, Cassandra, Riak, Splunk, Amazon Redshift … – data stores
- Scribe, Flume, Sqoop, Kafka – data ingestion
- Map Reduce, Hive, Pig, Cascading – batch data processing
- Spark, Impala, Drill, Presto, Tez / Stringer, Druid, Storm, S4 – real time data processing
- Lucene, SOLR, ElasticSearch – search
- Apache Mahout, Oryx / Myrrix, Kiji – modeling
Companies like Cloudera, Hortonworks, MapR1 and new comers like DataBricks, drive this big data stack innovation. By offering, open source and licensed infrastructure, they have created and productized technology and services that enable big data.
Beyond the Tech – Applications
Web activity streams, mobile events, IoT sensors and more, create data growing exponentially. The first order of business is to make it accessible and useful. Actionable analytics is a first step, allowing for manual optimizations.
- Analytics – databases and exploration
- Business Intelligence – dashboards and visualization
The manual modeling and offline optimization gives way to algorithmic online models. Here are a handful of machine learning modeling applications.
- Relevance – ranking given a direct user intent (query)
- Recommendation – given indirect user intent
- Personalization – customization on a per user level
- Anomaly Detection – fraud, threat
- Forecast – model generation
Companies like Rich Relevance, WibiData, Sift Sciences, and Optimizely are providing recommendation, personalization and fraud detection as a service.
The result of is the proliferation of tools and technology that is enabling big data applications. These data scientists are creating the next generation of vertical applications only now possible.
Algorithmic services in a deep vertical represent the next domain for big data innovation. As the VC panel at Structure Data or Chris Dixon’s Full Stack Startup, a combination of deep industry experience, full integration across hardware, software and now predictive analytics is necessary for the next generation of companies.
As a venture capitalist, I’m learning to ask: Why now? Big data 1.0 was about productizing enabling technology. Now with the big data stack built, new vertical applications are now at any technologists fingertips.