all articles
When you fall in love with a model
“The first principle is that you must not fool yourself and you are the easiest person to fool.” - Richard Feynman, Nobel physicist Carbon-neutral flying cars are an admiral goal but are they realistic? Be honest with yourself about what you can reasonably achieve with AI. When a model flatters your ego When you build a machine learning model, how do you measure how good it is? Oh sure, you’re a maths whizz so you know exactly which algorithms to employ.
Read more…Testing pipleines
A lot of developer wonder how to test pipelines - see this Discord thread. The best way we’ve found is to create fixed, known data that when our transform acts on it, we can make reasonable assertions about what comes out. Synthetic data We are operating in the healthcare domain. We have a data set of events at hospitals and we want to turn them into a running total of patients who happen to occupy the hospital on any given day.
Read more…Real Life AI
The hype cycle Gartner identified a phenomenon called the Hype Cycle where a technology dazzles everybody then as reality bites, expectations become more reasonable. Where are we on this cycle for LLMs? Innovation Trigger It’s been a year since ChatGPT was released to near universal acclaim. Now things are calming down, let’s strip away the hype and see if it can actually do something useful other than code snippets. This is a real life problem.
Read more…A checklist for a successful project
Requirements Automated tests Expectations CICD You can ignore these but you face a world of pain if you do Understand the requirements. Carbon neutral flying cars are an admirable ambition but are they realistic? Agree on the acceptance criteria with all stakeholders (this includes developers and QAs not just the end users). You should be able to answer two questions before work begins: is this feasible and is this what is wanted?
Read more…KISS Models
“Keep it Simple, Stupid!” The KISS principle is another delightful insight from the US military. It means systems work best when they are simple. The Agile Manifesto phrases it in a slightly more neutral tone: “Simplicity – the art of maximising the amount of work not done – is essential.” but the message is the same. You may be aware that there are a lot of overly complicated systems out there.
Read more…You Need Continuous Deployment!
The Spitfire was the most awesome dog-fighter of the Second World War. Captured German airmen would routinely lie that they were shot down by a Spitfire rather than any other plane as there was no shame in losing to such a formidable foe. When Field Marshal Hermann Göring asked a Luftwaffe general what he could do to help in his battle with the British, the general bitingly replied that he’d like a squadron of their Spitfires.
Read more…Right-sizing your team
Shortly after 9/11, the US military conducted war games that pitched American forces against an unnamed Middle Eastern country. The Middle Eastern country in this simulation was commanded by the retired maverick, Paul Van Riper. The US forces were commanded by four star general, Burwell B. Bell. Bell adopted a very conventional strategy - after all, he commanded the most powerful fleet in the World so what need had he of subterfuge?
Read more…Is Data Science Dead?
I know the title is clickbait. And I know it’s subject to Betteridge’s law of headlines (“Any headline that ends in a question mark can be answered by the word ’no’.”). But hear me out and consider this quote: “Data scientists spend around 80% of their time on preparing and managing data for analysis.” (Forbes) The dirty truth of data science is that most of the work is not actually data science at all but data engineering.
Read more…Tickets, please
Change is part of life whether we like it or not. Everybody reacts to it differently. There are some who resist it and some who embrace it. It’s the same in the world of development. Some want everything in its right place before work begins. Others accept flexibility. For most machine learning projects, change comes with the territory. Often, what may appear to be a straightforward piece of work actually turns out to be quite tricky and goal posts subtly shift during development.
Read more…Tips for effective MLOps
Some miscellaneous tips I’ve discovered after over a year of being hands-on with a clinical ETL pipeline. Technology Set up a local dev environment (Git, Pip, IDE of choice, Python environment etc). Being able to test locally cannot be more important. For instance, I’ve been using Palantir’s Foundry but since it’s just a wrapper around Spark, you can have a full, locally run, test suite using PySpark. If you can’t get the data on your laptop (GDPR etc), use simulated data.
Read more…