Machine Learning for Recommendation Systems - Personalization by Sanket Sabharwal, PhDMachine Learning for Recommendation Systems - Personalization by Sanket Sabharwal, PhD

Machine Learning for Recommendation Systems - Personalization

Sanket Sabharwal, PhD

Data Engineer

Data Scientist

ML Engineer

Apache Spark

Python

PyTorch

Artificial Intelligence

Machine Learning for Recommendation Systems: Real-Time Personalization

The Setup

When a customer opens a retail app or logs into their banking portal, they see a screen. That screen has a finite number of slots where the business can show them something. A product, a financial service, a promotion, a piece of content. Each slot is a decision, and each decision either moves the customer closer to a purchase or wastes their attention on something irrelevant.

At small scale, you can curate those decisions manually. A merchandising team picks the featured products. A banking product manager chooses which credit card offer to show on the homepage. At 14 million active users with different purchase histories, different browsing patterns, different financial profiles, and different levels of engagement, manual curation becomes physically impossible. You cannot hire enough merchandisers to personalize the experience for 14 million individuals. The math simply does not work.

The default solution most companies reach for is segmentation. Split users into 15 or 20 buckets based on demographics and purchase history, then show each segment a curated set of items. The problem with segmentation at this scale is that it treats every person inside a bucket as interchangeable, which they are not. A 35-year-old woman in London who buys running shoes every quarter and a 35-year-old woman in London who buys formal footwear twice a year land in the same segment and see the same recommendations. The system cannot tell them apart. That is like a restaurant with 14 million customers that serves only 20 different meals and assigns you one based on your age and zip code.

Our clients, a major European eCommerce retailer and a European banking group, both came to us with the same core problem. They had massive user bases, rich behavioral data sitting in event logs and transaction systems, and recommendation experiences that were either rule-based, segment-driven, or powered by basic collaborative filtering models that had not been retrained in months. They needed production recommendation engines that could serve personalized results to every individual user in real time, at the throughput their platforms demand, with measurable business impact on revenue and product adoption.

What We Built

We designed and deployed a two-stage recommendation system architecture that serves personalized recommendations to over 14 million users across the retail and banking platforms, processing user requests through a candidate generation layer followed by a ranking layer, with a business logic re-ranking stage that applies commercial rules before the final recommendations reach the user's screen.

The candidate generation stage runs a multi-retrieval approach where several independent retrieval models each produce a set of 100 to 500 candidate items per request. These retrievers include a collaborative filtering model trained on implicit feedback signals (clicks, views, purchases, and application starts), a content-based retrieval model that matches user preference embeddings against item feature embeddings built from product metadata, category taxonomy, and textual descriptions, and a popularity-based fallback retriever that handles cold start scenarios where a new user has no behavioral history to personalize against. The retrieval models encode users and items into a shared embedding space using a two-tower neural network architecture, and candidate lookup at serving time runs through an approximate nearest neighbor index (built on vector search infrastructure) that returns candidates in under 10 milliseconds per request.

The ranking stage takes the merged candidate set from all retrievers and scores each candidate through a feature-rich ranking model that incorporates user context (device type, time of day, session depth, recency of last purchase), item features (price point, category, margin, inventory status), and cross-features that capture the interaction between user preferences and item attributes. The ranking model is a deep cross network trained on historical conversion data with careful handling of position bias in the click signal, because items shown in the first slot get clicked more often purely because of their position, and a ranking model that does not correct for that bias will learn to recommend items that are already popular rather than items that are genuinely relevant to the individual user.

The re-ranking layer applies business rules after the ML ranking is complete. These rules include diversity injection to prevent the recommendation slate from showing five variations of the same product, commercial boosting for high-margin items or new product launches within controlled bounds, filtering for out-of-stock items and items the user has already purchased or declined, and in the banking context, regulatory compliance filters that ensure product recommendations respect eligibility criteria and responsible lending guidelines.

The entire serving pipeline runs on low-latency inference infrastructure that returns a personalized recommendation slate to the user's device within the 100-millisecond response budget required to feel instantaneous in a live app or web session.

The Feature Store

Personalization at this scale depends entirely on the quality and freshness of the features the models consume, and managing features for 14 million users across two different business domains is an engineering problem that is just as demanding as the modeling work itself.

We built a centralized feature store that maintains pre-computed user features, item features, and interaction features for both the retail and banking platforms. User features include behavioral aggregates like purchase frequency, category affinity scores, average basket value, session recency, and lifetime engagement metrics. Item features include product attributes, pricing data, inventory levels, and popularity signals computed on rolling windows. Interaction features capture the relationship between specific users and specific items, including view-to-purchase conversion history and time-since-last-interaction signals.

The feature store updates in near real-time as new behavioral events stream in from the client platforms, which means a user who browses three pairs of running shoes at 9 AM sees those preference signals reflected in their recommendations by 9:01 AM. That freshness is what separates a recommendation system that feels responsive and personal from one that feels like it is showing you what you did last month.

The Results

Across the retail deployment, the recommendation engine lifted average order value by 23 percent compared to the previous segment-based recommendation approach, measured through controlled A/B testing over a 90-day evaluation window with statistical rigor applied to the measurement.

To put that AOV lift in physical terms, imagine a grocery store where every customer walks in with a basket and fills it based on what they see on the shelves. Under the old system, every customer walking the same aisle saw the same shelf arrangement regardless of what was in their basket. Under the new system, the shelves rearrange themselves for each customer, placing the items most likely to complement what they have already picked up directly at eye level. The customer does not buy things they do not want. They buy things they would have wanted anyway but would not have found without the system surfacing them at the right moment.

Across the banking deployment, cross-sell conversion rates increased by 17 percent, measured by the rate at which existing banking customers applied for and were approved for additional financial products (credit cards, personal loans, savings accounts, insurance products) that the recommendation engine surfaced through the digital banking portal. In banking, a 17 percent lift in cross-sell rates translates directly into increased revenue per customer and higher lifetime customer value, because each additional product a customer holds with the bank deepens the relationship and reduces the probability of attrition.

The system serves over 14 million users across both platforms and handles peak request volumes during promotional events and month-end banking activity periods without degradation in response latency or recommendation quality.

Why Recommendation Systems at This Scale Are a Hard Engineering Problem

Building a recommendation engine that works in a notebook on a sample dataset of 100,000 users is a well-documented exercise with dozens of open-source tutorials available. Building one that serves 14 million users in real time across two different business domains with measurable revenue impact involves a set of engineering and modeling challenges that compound at every layer of the system.

The first challenge is the cold start problem. Every new user who downloads the app or opens a banking account has zero behavioral history. The system needs to produce a reasonable recommendation slate for that user on their very first session, when it knows nothing about them beyond basic profile attributes. Every new item added to the product catalog or financial product shelf has zero interaction history. The system needs to decide where to surface that item without any signal about which users will find it relevant. Cold start is not an edge case at 14 million users. It is a continuous condition, because new users and new items enter the system every single day.

The second challenge is serving latency at scale. A recommendation request needs to travel through candidate generation, ranking, re-ranking, and business rule filtering, and the entire round trip needs to complete in under 100 milliseconds for the experience to feel instantaneous. At peak traffic, the system handles thousands of concurrent requests per second. Every additional millisecond of model inference time, every unoptimized feature lookup, and every inefficient data serialization step multiplies across that request volume and pushes the system closer to a latency ceiling where the user experience starts to degrade.

The third challenge is feedback loop management. Recommendation systems influence the data they are trained on, because users can only click on items the system shows them. If the model develops a bias toward a particular category or product type, it shows those items more often, collects more positive signals on those items, and reinforces its own bias in the next training cycle. Left unchecked, this creates a filter bubble where the system narrows its recommendation diversity over time and converges on a small set of safe, popular items rather than surfacing the long-tail inventory where much of the catalog value sits. Breaking that feedback loop requires deliberate diversity injection, controlled exploration strategies, and careful monitoring of recommendation coverage and novelty metrics alongside the standard accuracy and conversion metrics.

How We Solved It

We addressed the cold start problem through the multi-retrieval architecture, where the popularity-based fallback retriever provides reasonable baseline recommendations for new users while the collaborative filtering and content-based retrievers ramp up as behavioral data accumulates. For new items, we use the content-based retriever to surface them to users whose preference embeddings are similar to the new item's feature embedding, which gives new catalog additions immediate exposure without requiring any interaction history.

We addressed the serving latency challenge by pre-computing user and item embeddings in the feature store and running candidate retrieval through an approximate nearest neighbor index that trades a small amount of retrieval precision for a large reduction in lookup time. The ranking model runs on optimized inference infrastructure with batched prediction and model quantization to minimize per-request compute cost. The entire pipeline is instrumented with latency monitoring at every stage so the engineering team can identify and resolve bottlenecks before they affect the user experience.

We addressed the feedback loop problem through a combination of diversity injection in the re-ranking layer, exploration slots in the recommendation slate where the system deliberately surfaces items outside the user's established preference profile, and ongoing monitoring of coverage metrics (what percentage of the catalog is being recommended), novelty metrics (how often the system surfaces items the user has never interacted with), and serendipity metrics (how often the system surfaces items outside the user's established category preferences that still result in positive engagement). These metrics sit alongside click-through rate, conversion rate, and revenue per user in the system's performance dashboard, ensuring the team optimizes for long-term recommendation health rather than short-term conversion at the expense of catalog diversity.

The Takeaway

This recommendation system serves over 14 million users across a European eCommerce retailer and a European banking group, lifted retail average order value by 23 percent, increased banking cross-sell conversion rates by 17 percent, and operates within a 100-millisecond response budget at peak traffic volumes. The platform runs a two-stage candidate generation and ranking architecture with a centralized feature store, handles cold start for new users and new items continuously, and monitors its own recommendation diversity and feedback loop health alongside standard business metrics. Both clients operate it as a permanent part of their digital product experience, serving personalized results on every screen, for every user, on every visit.

Source

Building something that must work?

Algorithmic is a senior-led software engineering studio that specializes in Full Product Builds, Applied AI & Machine Learning Systems, and Data Science & Analytics. Our team includes PhDs and Masters with patents and peer-reviewed publications, bringing senior-level expertise in data, software, and visual design. We support businesses across all stages of business growth.

If you’d like to follow our research, perspectives, and case insights, connect with us on LinkedIn, Instagram, Facebook, X or simply write to us at info@algorithmic.co

Like this project

Posted Feb 5, 2026

Deployed to 14M+ users across retail and banking in Europe. Lifted eCommerce AOV by 23% and banking cross-sell rates by 17% through real-time personalization.

Likes

Views

Timeline

Aug 20, 2024 - Feb 5, 2026

Clients

eCommerce

Private Bank