What does it take to be a data scientist?

Home Blog What does it take to be a data scientist?

The question is not new, but the answer has slightly changed. The term ‘data science’ was coined in 2001 and serious practice commenced from 2010. Early articles in 2010 mention about three characteristics of a data scientist: IT Skills, Math/Stat Skills and Domain Expertise. Possibly, there is nothing more to add to this triad even now.

However, the last four or five years have forced some changes in the underlying make-up of the triad. The increasing gap between IT and Business, rapid changes in computing and storage, explosion of data – especially unstructured data, arrival of new algorithms – particularly in the deep learning space, proximity of data scientists with top management, the idea of unlocking value from systems thinking, and increasing value creation opportunities from understanding interconnectedness of various industries are a few factors that are driving the change in the characteristics of the triad.

Consequently, data scientists have to involve themselves in operational side of the business (e.g. CRM systems), handle more unstructured data such as text, voice and image, possess data engineering skill sets, work more with High Performance Clusters and Big Data, move to structural equation modelling rather than simple linear equations, solve more math problems now than ever before, discuss opportunities, problems and solutions with senior management using BI & visualisation tools, have systems thinking and have multiple domain experience.

Therefore, in this post I revisit past intelligence, add new ones and make it comprehensive and current.

IT Skills:

In the context of a Data Scientist, IT Skills refer to the ability to understand fully the software world that is vital for her performance. It includes knowledge of databases and ways to handle them and of statistical or mathematical software packages. A lot appears to be changing in this area.

In spite of vast amounts of data already available, data scientists are seeking new data to improve upon model performance. It calls for skills in data planning and business strategy.

Characteristics of data is changing. It will now increasingly be text, voice or image. In a separate development, untapped machine data in industries and IoT running into several EBs (Exa Bytes) are now available for analysis. All these calls for astute and robust technology to lift and analyse them. Every day, new libraries are being added to a body of open source technologies. What are the implications to a data scientist? She has to:

A data scientists has to understand how other operational IT systems function: e.g. a campaign management or sales force automation.

Stats and Math Skills:

Possibly, at the heart of a data science lies the improving ability to crunch numbers. New techniques are being uncovered to handle common issues faced by data scientists. For instance, Support Vector Machine, a tool to classify, solves no new problem. But it solves it in more efficient manner, i.e. with least classification errors. Analysing text, voice, and image has been vexing. Advancements by way of adding layers to the Neural Networks (deep learning) has allowed solving hitherto unsolved ones. Consider, for e.g., ‘Dittory’. It is in a challenging business of helping customers discover similar unbranded apparel on the web using image search. Data scientist struggled to even detect a feature (e.g. mandarin neck) in an image. However, very high processing capabilities and very large datasets have changed the old and ignored Convolutional Neural Network (CNN) into a powerhouse of new capabilities. Almost 30 million apparel images across Indian eCommerce sites were used and rest is history.

AUpdation and continuous learning are critical
for a data scientist

The examples of SVM and CNN have an important message for data scientist: keep a keen eye on what is latest in the select important techniques:

Domain Knowledge:

To be a good data scientist domain knowledge, systems thinking and cross industry exposure are important.

1. Domain knowledge is acquired with exposure to industry dynamics. Industries such as BFSI, Telecom, Retail, eCommerce, and Education have large number of customers and tech enabled data systems leading to generation of large (if not Big-) data. While application of IT Skills and Math/Stats Skills are nearly same in each of these industries, the business questions may be different. For e.g. Market Basket Analysis may be more important in the Retail Industry while Survival Analysis may be so in Insurance.

Some questions appear to be universal. For e.g. Churn Reduction. Yet, the approach and the variables that determine churn across industries would vary somewhat. Consider for e.g. churn modelling in telecom and BFSI. The broad categories of predictor variables in both the industries may be Customer Characteristics, Purchase History, Customer Product Usage Data, and Customer Payments or Billing data.

In telecom, Customer Product Usage Data may cover variables such as Number of Calls, Outgoing-, Incoming-, Roaming-, International- Calls, Number of SMS, Total Minutes, Number of VAS activated or deactivated, Data Usage, and App Usage. The same in BFSI Credit Card Business may take a different avatar. It may refer to variables such as Number of Transactions, Categories of Purchases, Days of Card Usage, Value of Purchases and Number of Automatic Debit Instructions.

Identifying the specific variables for a good analysis calls for reasonable domain expertise.

2. Systems Thinking: Clearly, data science practice calls for an interdisciplinary approach. One cannot reduce churn (marketing analytics) and continue the same (poor) product performance (marketing analytics). Or reduce warranty (marketing analytics) without appropriate changes in reverse logistics (supply chain analytics). Or improve work-force productivity (HR analytics) without changes in production scheduling (production analytics).

A data scientist can no more solve problems within a silo. Systems thinking is now essential for showing real value of the data science function within an organisation.

A data scientist has to think holistically. No wonder the function has strategic importance and in several organisation, reporting directly to the CEO.

3. Cross Industry Exposure: I think having exposure to application of data science in two or more industries adds to the effectiveness of the practice; it is due to the ‘outside-in innovation’ effect. In fact, there are early evidences analytics may soon be no more confined to an industry; it will call for analysis of data from across industries. We are already witnessing firms aggregating data from across industries such as telecom, social media and ecommerce to improve search engine data analytics and consequent marketing campaigns.

Lack of cross industry exposure can be compensated by a study of successful application of IT, Stats or Math in different domains or industries. One can also augment by talking to peers in other industries and attending data science application conferences. The picture here shows successful application of one technique in a field has spawned similar application in other fields as well.

The question is, whether such cross-industry exposure should occur in the early, mid or late career of a data scientist. While there are no studies to back my hunch, I would avoid such exposure at the early stages of a data science career; focus in one domain in early stages has advantages.

Conclusion:

Have the broad requirements of what it takes to be data scientist changed? No. The triad still comprises IT skills, Statistics and Math Skills and Domain Knowledge. However, several changes in the technology, science and business dynamics are forcing changes in the underlying characteristics of the triad. Data scientists are expected to increasingly spend time in data planning, use different and better technologies in data lifting, refurbish their stats and match armory with techniques that have never been used, perform holistic analytics that involves all functions within an organisation and use data / practices not just from the industry but from across the industries. The change calls for strategic thinking, high & quick learning and be outcome focussed.

Postscript: Reviewers of the above article pointed out that data scientists should have some very important soft skills and abilities such as communication, questioning mindset, problem solving attitude and influencing without authority. I agree and thank the reviewers.

The above article is adapted from the original one found here.

gscadmin

Leave A Comment

Your email address will not be published. Required fields are marked *