by Ramesh Koovelimadhom –
Much has been written about this. There have been arguments for and against. Those in favor say that knowing the domain helps us create the right hypotheses which data science can help test and prove or disprove. Without domain understanding, data science could become a long fishing expedition in the ever-increasing data-lake and the “Iceberg of Business” could start melting long before the changes needed are really implemented.
And those questioning the need for domain knowledge quote data mining competitions such as Kaggle1 and KDD2 that have demonstrated how data science can be successfully outsourced to people without domain expertise. Many companies have run competitions on such diverse topics as optimizing flight routes, predicting ocean health and diabetic retinopathy detection. Data scientists with little or no expertise in the domain have responded brilliantly with useful solutions. Some data scientists have even won across multiple domains, indicating that data science skills are transferable across domains.
And then there are those that provide the counter argument to Kaggle’s success, is that in these competitions, the domain experts have already generated the hypothesis by posing the right business question and preparing the data, and the competitors need only model and test.
Today’s massive data sets along with the mathematical tools and computing power to crunch these numbers, the old world paradigm of hypothesizing before modeling is likely to be challenged. Google has shown a whole new way of understanding the world without any a priori models or theories with their approach to language learning.
So are domain experts necessary? Is domain knowledge necessary? What is “domain knowledge”? How much is enough?
Every field of software engineering talks about the need for domain knowledge. Business Analysis requires domain knowledge. Testing requires domain knowledge. How much domain knowledge does Data Science need?
Let’s try and think through these questions via an example from investment banking.
I posit that Data Science will always remain a team sport. Domain knowledge will never be enough if we choose to operate under the paradigm that a Data Scientist with technology and mathematical skills should also have deep domain knowledge. The key to effective teams is communication. And as much domain knowledge as would not hinder communication will be enough to have small teams create valuable insight using Data Science. The experts in the business will bridge the other knowledge gaps.
Gartner had this new class of Citizen Data Science in their 2015 Hype Cycle3, and is expecting it to reach plateau in 2-5 years in the innovation trigger region. Gartner research director Alexander Linden suggests cultivating “citizen data scientists”—people on the business side that may have some data skills, possibly from a math or even social science degree—and putting them to work exploring and analyzing data. We certainly need to do that, noting the team approach to creating “things of value” is not going to go away any time soon.
1. Kaggle https://www.kaggle.com/
2. KD nuggets "Datasets for Data Science, Machine Learning, AI & Analytics" Retrieved from https://www.kdnuggets.com/datasets/index.html
3. KD nuggets "Gartner 2015 Hype Cycle: Big Data is Out, Machine Learning is in" Retrieved from https://www.kdnuggets.com/2015/08/gartner-2015-hype-cycle-big-data-is-out-machine-learning-is-in.html