Is Data Science Dead?

I know the title is clickbait. And I know it’s subject to Betteridge’s law of headlines (“Any headline that ends in a question mark can be answered by the word ’no’.”). But hear me out and consider this quote:

“Data scientists spend around 80% of their time on preparing and managing data for analysis.” (Forbes)

The dirty truth of data science is that most of the work is not actually data science at all but data engineering. Sure, there are some very clever people at Google etc who do true data science all day, every day. But for the rest of us, it’s data pipelines all the way down.

Now, this is not necessarily a bad thing. Building pipelines is interesting stuff. But the industry needs to recognise that it’s doing boring old software engineering not sexy science.

Data scientists really must appreciate the full software development life cycle. I was talking to a systems architect who bemoaned the fact that his data scientists engage with neither the engineers nor the business stakeholders. They wanted to build LLMs. Why? “Well, haven’t you read the news recently? They’re everywhere!” The fact that LLMs are a generative model and there was no business value in generating text here meant nothing to them. Just get the data engineers to deploy it and damn the torpedoes.

A fireball

Such attitudes were common during the mid-90s in software engineering. I have many happy memories of building things then justifying them to my boss later - with great success, I hasten to add. It was the dotcom bubble and any one of those play things could have gone IPO and made us gazillions, or so he thought. This is where data science is today in most large companies. Data scientists are working on pet projects with little shareholder value. And just as the software industry collapsed in 2001 so will the data science industry if things continue this way.

How do we avoid this? Well, one way is to have data scientists engage more with the rest of the team - the data engineers, the business representatives, the testers, etc. The Agile movement in the software industry showed how developers could be more professional. They were no longer to sit in their basements, hidden away from the business. Instead, they were to have daily stand-ups with all stakeholders. And before every iteration (about every fortnight) they were to have planning sessions where suits and techies sat down and discussed what work was to be done next.

It’s interesting that the Agile movement really gained traction shortly after with the dotcom bomb. Let’s hope the data science industry learns from this and avoids a bust.