What data science looks like in Ravelin’s Detection team

Published in

Ravelin Tech Blog

6 min readJun 1, 2021

“What’s your job like day to day?” — I get asked this all the time by people thinking about applying to Ravelin’s Detection team. It’s such an important question and quite frankly there’s too much to cover without a dedicated blog post, so here we are.

In a nutshell

For context, I’m a Senior Data Scientist at Ravelin and have been here for over 3 years. I started as a Junior, fresh out of a bootcamp. We’ve doubled in size since then — at the time of writing there are 9 Detectionista’s and over 100 Ravelino’s. Our team is responsible for building and deploying the models that detect fraudulent customers for online merchants like Deliveroo, River Island, and Ola. We work closely with two other “clients” teams: Integrations who help clients send good quality data to our API, and Investigations who are client-facing data scientists and fraud specialists that work with clients to understand their data and analyse new fraud trends.

Being a machine learning startup, our model predictions represent the product, so (I like to think, at least) Detection represents the beating heart of the company. Our work is tightly coupled with production; although we deploy a custom model per client, these models are not hand crafted in a notebook. Each model is built by our fully automated pipeline, which means that we are responsible for engineering improvements into the pipeline itself: new features, data filters, adding reports for interpreting model behaviour etc. In this sense, I’d describe our work as more “machine learning engineering” compared to your average data science role. We employ good software practices and ensure that any work by one Data Scientist (DS) fits into our repository and can be used by others.

Keeping the lights on

So given that this pipeline trains, evaluates and deploys updated models for clients on a weekly basis, haven’t we automated ourselves out of a job? Not completely: although we share the same feature set and training logic across clients, we’re constantly taking on new clients with fraud patterns that we might not have encountered yet. Each DS in our team takes on up to 7 clients and ensures that we understand their fraud and are staying on top of it.

We call this part of our job “keeping the lights on” and it normally takes up the majority of a DS’s first 6–12 months in Detection. At this point you learn how to run and modify our pipeline (a directed acyclic graph orchestrated by Luigi), write good unit and integration tests (Pytest), engineer new features (Go), and use or build tools to quantify whether a candidate model is worth deploying. Because we’re involved in the full model lifecycle, I personally think that this work gives great exposure to the joys and perils of ML models in production, for example dealing with model drift, feedback loops and counterfactual problems. You also learn a lot about fraud itself and — you’ll have to trust me — it’s more exciting than you could have thought.

Projects

Once you’ve nailed keeping the lights on and have multiple clients that are live and happy, I’d say that this part of your work would drop down to 15–30% of a working week. We hold a retrospective each quarter and subsequently draw up projects that will take up the majority of a Detectionista’s time. These tend to have a primary and secondary contributor, so you’re paired with someone that understands your project goals, helps design solutions or overcome challenges, and review progress.

These are not abstract research projects. Every project aims to either improve our predictions or productivity, and starts with a problem we’re facing. For example:

We’ve baked our assumptions about which customers and periods of data we should train on into our code, but have never tested these hypotheses formally. A recent project formulated AB tests paired with dashboards to clarify these.
We have a cold start problem whereby we cannot train custom models without a period of historical data. So we experimented with building “generic” models that have been trained on data from all clients, or clients from particular industries. This involved building a new pipeline with its own set of evaluation criteria.
We can quantify performance gain from a new feature idea by engineering it in Go, but experimenting with features that rely on cross-customer or cross-client data is much trickier to implement in our infrastructure. In my most recent project I built a framework in a tool called dbt that gives our team the autonomy to test complex feature ideas via SQL queries rather than implementing them in Go.
We handle categorical fields with embedding-style features that are a lot of work to develop and maintain. So we implemented Google’s TabNet algorithm to handle categorical fields automatically. It’s worth noting that a deep learning project like this has to work backwards; we have latency constraints (<200 milliseconds response time) meaning that we have to test a new algorithm in production before giving it all the bells and whistles.
Our pipeline supports training various algorithms (eg. Random Forest, XGBoost, TabNet, DevNet). A larger project has involved training and understanding ensembles of these models and how we might deploy and explain them in production.

These are examples of projects that improve our flagship solution which prevents card not present fraud. But in the last couple of years we’ve started building solutions to solve different kinds of fraud faced by our merchants, and each one has had a machine learning component. I’ve been heavily involved in developing a new fraud solution that allows our marketplace clients to assign couriers or drivers to orders in a way that reduces risk. Given the lack of ground truth about which suppliers might be generating income by creating faking orders, abusing incentives or colluding with customers, this is an anomaly detection problem which we currently solve with isolation forests.

Another solution stops account takeover fraud, for which we have some labels that are generated by client manual review teams. We’ve approached this with both isolation forests and random forests. We’re also exploring reinforcement learning for our payment authentication solution, as this will allow us to dynamically route traffic in a way that minimises customer friction. One or two Detectionista’s have been involved with each solution from its very conception: proving there was signal in the data (or creating mock data from heuristics), building an ML pipeline and implementing any new algorithms or features in Go.

An exciting type of “project” worth mentioning is the proof of concepts we’ve held for very large, high profile clients. These have rolled around every year or so, and involve the client sending labelled historical data along with a portion of unlabelled data that we must predict fraud for. Not dissimilar to a Kaggle competition, our whole team dedicates their full time to this problem for 2–4 weeks, working collaboratively on pipeline engineering or feature improvements. Our final results on the unlabelled dataset are normally pitted against those from competitors. I’ve had my most exciting times at Ravelin working on these proof of concepts, not least for the good number of celebrations that followed each.

The team itself

When I joined I was told by my manager that I could be “any type of data scientist I wanted” and this has rang true. Our team has a variety of backgrounds and skill sets, and people naturally fall into the area they’re interested in. We have team members that boss the ML operations side of things (Kubernetes & Docker) or deep learning and NLP. Some of us are passionate about data quality and debugging client models. Others are fraud and payments fanatics that help us stay on top of the changing landscape. All skills are valued equally and the variety means we can respond to challenges effectively.

In terms of team ethos, we work pretty autonomously and are encouraged to:

challenge the assumptions that are built into our modelling process
explore hypotheses or new packages that could help performance or productivity
write up “big thinking” ideas for solving problems in the form of RFCs (request for comments) that can be circulated to the wider company

We place a big emphasis on learning: we have monthly “learning days” dedicated to independent learning or knowledge sharing, a fortnightly journal club for discussing research papers or blogs, and a £1000 annual budget for buying books, courses, tickets to conferences etc.

We also do have fun whilst we are at it. We don’t take ourselves too seriously, appreciate memes and puns, and have become a tight knit group of friends. We ensure to make use of our whole quarterly budget for team socials and find great excuses for meals or drinks on top of that. One thing to be wary of is that we have a shared passion for Negroni’s, so that might be forced upon you at some point.

To summarise

If you’re looking for a team that wants to support your growth, actually handles machine learning in production, and whose models make a positive impact on the world, I’d say you’ve found it.