Over the next years the way we write applications will change dramatically. Instead of writing logic-driven applications we will start writing data-driven applications. What does that mean?
Software must provide value to an end user. Usually, value means that it can do something better, faster, or cheaper than was possible before, often by replacing human labor. Sometimes, software does things that couldn’t be done before at all, but that’s quite rare, and mostly happens when new enabling technologies like smartphones or sensors are proliferating.
Over the past two decades, most of the value that software provided came from its “logic”, often replacing what people were doing manually before. Better note-taking applications (Evernote), cheaper communication tools (Skype), more efficient task management solutions (Asana), and no-friction photo sharing applications (Instagram) are all examples of that. The value of a software was driven by logic, not by data. Of course, all applications need to store and use data, and may integrate with external APIs, but that’s not the point. Data did not drive the core value, but merely enabled it.
This will change.
The next wave of applications will provide value through data instead of logic. Applications will combine streams of data in interesting ways, manage them, and extract meaning. For example, software for marketing departments may consolidate data from the analytics service, the Sales CRM, and the customer service solution to provide a holistic picture of a company’s customers, identify the most promising prospects and suggest new marketing campaigns. On the consumer side, data produced by wearable tech and smart sensors will drive the emergence of completely new consumer applications. In the industrial world, smart machines are generating more data than ever, which allows for micro-scale optimization that what humans would never be able to do. Naturally, all of these applications are enabled by the cloud, which makes the data accessible to everyone.
Applications will differentiate themselves not by providing better logic, but by using large amounts of data, combining disparate sources of data, and extracing meaning from such data. Software will become driven by data, instead of just enabled by data.
But this change won’t happen overnight. In fact, I believe we are still years way from seeing the big shift. Why? Because building data-driven applications is still difficult, just like building logic-driven applications was one or two decades ago. Building desktop applications required you to write tens of thousands of lines of custom C++ code. Today you can drag & drop together beautiful applications in a matter of hours. Building web applications required you to write and manage HTML and CSS, design your own backend, set up your own servers, manage them, and scale the application. Today, you can use an MVC framework such as Ruby on Rails (before you hate on me, it know it’s not really MVC, forgive me ;)) and host your application in the cloud on Heroku in a matter of minutes! We have developed patterns, platforms and best-practices that help us build beautiful and well-designed applications. Building these applications has become so easy that almost everyone can learn to do it, which may be one of the reasons why we see such a huge number of startups and entrepreneurs these days.
However, we don’t have such abstractions or platforms for building data-driven applications. Not yet at least. Dealing with data is still hard. You may have to integrate with APIs, sensors, databases, public data sources, or most likely, a combination of these. Each source has a different format and access semantics. Assuming you wrote your custom integrations, then how do you deal with large volumes and high speeds of data? You may need to learn about dozens of “big data” storage layers and frameworks out there. HDFS, CFS, Hadoop, Hbase, Spark, Kafka, and so on. And to actually make sense of the data you need to learn about Statistics and Machine Learning, most of which wasn’t even developed with large amounts of data in mind in the first place. The above is not impossible, but everyone who wants to build data-driven applications needs that broad range of skills, which is called “Data Science”. It’s hard, but it won’t stay hard for very long, and there exists an interesting parallel with the early days of the web.
Do you remember when every company was looking for “Webmasters”? “Webmasters” were people who knew a bit about everything. They could design a website, develop it, setup a server, and maintain it. Well, today we are no longer looking for Webmasters, but instead we are looking for “Django backend developers”. Developing patterns and platforms for many of the mundane tasks a Webmaster was doing has allowed us to move to a more specific work force.
Well, the “Data Scientist” is the Webmaster of today. We need Data Scientists to learn about the challenges of developing data-driven applications, but soon we will starting developing the same patterns and platform that we have developed for Webmasters. And once that happens, we will no longer be looking for data scientists, but people with a strong skillset in specific subdomains. Developing data-driven applications will become more accessible to everyone, which will lead to a proliferation of a new type of software, data-driven software.
|PHP/Perl Backend Development||Statistics and Machine Learning|
|Server Administration||Datastores like HDFS, Hbase, Cassandra|
|Database Management||Big Data frameworks like Hadoop, Storm, Spark|
|System Integration||Data Integration and Data Munging|