Machine Learning Based Personalization and Discovery at Postmates

November 24, 2020

Share on FacebookShare on TwitterShare on LinkedIn

By Oren Shklarsky, Veer Lade, Lance Lin, Stefan Bacon, Ali Amiri, Frank Yang


Why Recommendations and Personalization are Important

In the Postmates Applied Machine Learning team, our passion is making it as easy as possible for our customers to find what they’re looking for, or help them discover what they’re looking for if they don’t know.

Postmates provides a three-sided marketplace that connects customers with merchants on our platform, and--when applicable--with independent workers (or “Postmates”) who use our Fleet app to offer their on-demand delivery services. Each side of this marketplace, however, uses our platform differently to meet their respective needs, and this series of posts focuses on our efforts to personalize the relationship between customers and merchants. In a nutshell, our customers shop for food and other products, and place orders from merchants using our platform, while our merchants use the Postmates platform as a sales channel to grow their order volume and find customers.

For us, personalization is all about relevance and discovery. Unlike traditional recommendation systems (YouTube, Netflix, etc.), where personalization is used to reduce a massive candidate set to a manageable size, our set of merchants and items to recommend is smaller, but spread out over large areas and constrained by things like merchant opening hours, fleet availability, and distance to the customer. Further, particularly with food, our customers’ tastes vary wildly both from one customer to another, and for the same customer across different visits to our app, times of the day, and their location (whether at home, in the office, etc.). Finally, we’ve all experienced wanting to have lunch and not having the faintest idea what to get. 

This will be the first in a series of posts describing how we tackle personalization at Postmates, helping our customers decide what they’re having for lunch, or where they’re getting their essentials from.

Our Journey So Far

At the beginning of the journey, recommendations were as simple as getting a list of serviceable restaurants given the customer’s location and sorting them based on a few hand-crafted heuristics. We realized the importance of having a sophisticated recommendation system that can be based on modelling each user’s taste preference, price sensitivity and also semantic understanding of different types of food available on our platform. 

For starters, we needed a baseline model to compare all future models against. This led us to a simple, linear collaborative filtering algorithm, based on the work of Koren et al. [1]. For serving, the recommendations for users were stored as key-values in Google BigTable and were returned every time the user entered the Postmates platform. We soon realized some limitations of collaborative filtering, namely the cold start problem i.e difficulty of handling new merchants and merchants that were unseen in the training dataset. The second limitation is the difficulty in adding supplementary features for merchants and customers. For example, accounting for even simple things like the time of day was difficult.

These limitations led us to move from an offline collaborative filtering approach to an online feature-based scoring model powered by Gradient Boosted algorithms. We observed significant business metrics lift when we completed this transformation validating the importance of an online recommendation system. Our first version here was based on optimizing a single objective - the customer’s probability to order from any given merchant.

One important trade-off we made when switching from implicit collaborative filtering to this new model was explicitly specifying negative examples. While positive examples were clear to us - customers’ orders and views, defining dislikes in the absence of feedback proved more challenging. Our first approach was to use merchants that the customer viewed but did not order from, immediately prior to a place that the customer did order from. In other words, we are optimizing the pCTRCVR, the conversion rate conditional on the click through rate. Our thinking was that while views do signal interest, this would give us a good proxy while we build a more “impression” centric dataset. In this second dataset, we consider as negative samples those merchants that were shown to the customer on the feed but which they chose to ignore. This time we are directly optimizing the pCVR, the conversion rate instead.

As our first model, we chose a Gradient Boosting Machine, implemented using Microsoft’s LightGBM package. We chose this package because it beat XGBoost in terms of prediction performance and inference speed in our benchmarks. It also has native support for categorical features and an implementation in Golang, which made integration with our infrastructure simpler.

To understand the impact of personalization, we are constantly running multiple experiments on our platform. An important experiment that we run every quarter is to understand the impact of personalization on the conversion rate of our platform with a holdback group i.e a small subset of our users who receive no personalization in the feed. Results from this experiment show that personalization significantly improves the conversion rate on our platform. 

Feature engineering

We divided our features into five types of features :

  1. Customer features: These include things like the customer’s order and view history, their affinity to certain categories, how long they are willing to wait for orders, their price sensitivity etc. 
  2. Merchant features: These describe the merchant’s popularity, quality, and reliability by capturing things like conversion rate, order completion rate, average order cost, etc.
  3. Customer-Merchant interaction features: To capture the relationships between customers and merchants, we collect Customer-Merchant Interaction features such as the number of times a customer has viewed or ordered from a merchant. 
  4. Context features: These are features that are only known to us when the customer comes to the platform, and include the customer’s location and the time of day which, due to the geographical and temporal constraints of our business, as well as the ordering patterns of our customers, are very important features.
  5. Embedding features: We generated embeddings using co-view merchant sequences in a customer session and a Word2Vec approach for both customer and merchant features. A model was trained to map merchants to embeddings, and our customer embeddings were defined as a function of the merchant embeddings based on [4]. We took a similar approach for generating embeddings for different categories of merchants.

We made some modifications to the Word2Vec approach to better suit our use-case :

  • Global context for ordered merchants : We used sessions that end with the user ordering from a merchant  to adapt the optimization that for every window calculation, the final ordered merchant is always within the context. Basically as the word2vec window slides, the ordered merchant always remains within it as global context.
  • Intra-market negative sampling : While performing negative sampling in word2vec, we make sure the negative samples are from the same market as the current session for better intra market merchant embedding similarities. For example : for a session that led to an order from Sugarfish in Los Angeles, Jinya ramen from Los Angeles is a better example for negative sample compared to the Burger Startup in San Francisco. 
  • Oversampling ordered sessions : We oversample our sessions that led to an order within our training data as it aligns with our goal of optimizing for conversion rate. 

We will be writing a separate blog post talking more about different types of embeddings used at Postmates. 

Deep neural networks & Multi-objective optimization

As our customers make up one side of our three-sided marketplace, we began exploring how to optimize multiple objectives whilst still recommending the best restaurants and dishes to our users. All three sides of the marketplace are equally important; hence focusing on eater’s probability to order from a merchant might not lead to overall marketplace efficiency and growth. 

We started with a single objective to optimize on top of the customer’s conversion rate, which was a hybrid of GMV (Gross Merchandise Value) and Revenue (income for the platform). This will help merchants gain more sales from the Postmates channel as well as boost marketplace growth. For this approach, we simply used the weight_column to weigh our orders based on the objective using a LightGBM model.

We then wanted to expand to a multi-objective optimization approach where we optimize for not only GMV and revenue, but also merchant diversity to ensure fairness in the marketplace, utilization of couriers and lifetime values of our customers. LightGBM does not easily support multi-objective optimization and also does not allow for model flexibility in terms of having different types of layers like embedding layers which led us to move to Deep Neural Networks (DNN). Deep learning allows us to define different types of modules and have them systematically cooperate with each other. Good support for multi-objective optimization was exactly what we needed; hence the migration from LightGBM to  DNN.[5] ( More on this in a later blog post )

Postmates Online Model Serving Infrastructure (Merv)

As we deploy more and more ML models in Postmates, having a unified Model serving infrastructure becomes a crucial need 

Here are a couple of requirements

  • Low latency 
  • Scalability and availability
  • Serving multiple versions of models
  • Online feature transformation

To address these requirements, we developed Merv as a unified ML model serving infrastructure. 

At the serving time, Postmates’ frontend sends gRPC requests to Merv. These requests contain user context (such as user_uuid and user_location) and restaurant contexts (i.e. restaurant UUID). To pass this information to the ML model, Merv first needs to collect requested features by models, such as precomputed place and user features (e.g. number of times a user ordered from a restaurant or restaurant embedding). Merve uses a Feature store to accomplish this task. 

The first generation of Merv stored features in local memory. However, as we scale, storing all the features in local memory became expensive and difficult to manage efficiently. Hence later versions of Merv started to use FEAST and Redis as the feature-store. Moreover, using a Feature Store helps us prevent possible data-leakage using offline tasks (model-training) and lets us share features among different machine learning projects. The next group of features can be computed using a functional transformation over the input features for example the haversine distance between the merchant and the customer or one-hot encoding of the categorical features.  

At the moment, Merv serves two types of models 

  • GBM (Gradient Boosting Machines), which is supported by Leaves
  • Deep Neural Network (DNN), which is supported by Tensorflow Serving

Models are stored in Model-Store, which is backed by a blob-store (such as GCS). Merv watches the model-store to detect model-version updates and load fresh models as soon as they are available. In the serving time, Merv uses an internal request router to send requests to the correct models.

Postmates runs its infrastructure on a Kubernetes cluster, and Merv is not an exception. Using Kubernetes enables us to develop Scalable and highly available services quickly. Merv uses horizontal pod autoscaling to automatically tune the number of active pods when the traffic changes during the day or on special events such as NFL games. To make Merv highly available and fault tolerance, we deploy Merv on redundant pods (Anti-Affinity Scheduler). Moreover, another out-of-the-box advantage of using Kubernetes for Merv deployment is that we easily could integrate our services with Prometheus + Grafana to collect and visualize the metrics and monitor the health of the system hassle-free.  

In online prediction, per customer feed requested, Merv usually receives hundreds of place candidates from the Buyer front-end and expects to provide a prediction for the probabilities that the customer will place an order from each of the place candidates in real time. In order to get a lower latency, we configure Merv to batch the data samples and send the requests to the TensorFlow Model Servingin parallel. Eventually, we eliminated latency spikes that occured every few minutes and achieved 99th percentile latency of 5~10 ms.

There is a chance in a distributed system that an instance of the service experiences temporary high-latency episodes. Without appropriate treatment, this might dominate the performance of downstream services [here]. A simple approach to curb the latency is to send multiple requests simultaneously to multiple service instances and use the first received response. It’s clear while this simple approach can effectively reduce the latency, it multiples the infrastructure cost. A more efficient way to implement this idea is by using request hedging. The client can send a request and wait for a few milliseconds; if the first deadline doesn’t receive the response, the client sends the second request and so on.

Load Balancing and fighting with long-tail latency

To provide the best user experience for our customers, we need to show relevant results while keeping the app responsive and fast. Using average latency is usually miss leading, e.g. as service can have a quick response time for 99% of the requests but be very slow for the other 1% of the requests. In this example, the average latency looks good, while the long tail latency is not what our users expect.

To deal with the long tail latency, we used the following trick.

Most gRPC implementations carry messages over HTTP2 framing. In contrast with HTTP, HTTP2 uses long-lasting connections. We have more than one Merv instance in the production, therefore if the load balancer is not optimized for HTTP2 sticky connections, we might see imbalance CPU utilization in Merv instances (some pods are underutilized and some pods are overutilized). This not only increases infrastructure costs but also makes the long trail latency worse. We used a client-side load balancer to distribute requests between available pods using a round-robin algorithm to address this issue.

Future work 

We are excited to tackle challenges unique to Postmates and are continuously working on improving the platform. There are some areas in personalization that we intend to explore in the near future : 

  • Real-time impressions and features: We want to feed the model with more real-time data like merchants that the customer has interacted with within the current session; therefore, the model can personalize the feed as the customer uses the platform. 
  • Positional bias: From our data, it is clear that the top-ranked merchants have a higher probability of converting solely because of their position in the feed. We want to account for this variable in our model training phase to try to mitigate the positional bias effect. 
  • More embeddings: The merchant and categorical embeddings are important features in our current personalization model. We want to expand to generate embeddings for search terms, items in our catalogue and customers and use them to improve personalization. 
  • Sequential models & Advanced architectures and models: We want to experiment with sequential models like seq2seq and CNN along with some advanced Neural network architectures to improve our personalization engine. 

Personalization at Postmates helps customers find the merchants that they are looking for and helps surface merchants most similar and relevant to the users choices. Overall, we have seen significant gains in conversion rate, click-through rate and overall marketplace health through several online experiments. 

Come join us! If you are passionate about solving challenges in this space and furthering our vision, we are hiring Data scientists and Machine learning engineers. If you are interested in learning more, check out our listing here.


[1] Collaborative Filtering for Implicit Feedback Datasets

[2] Deep Neural Networks for YouTube Recommendations

[3] Recommending What Video to Watch Next: A Multitask Ranking System

[4] Real-time Personalization using Embeddings for Search Ranking at Airbnb

[5] Improving Deep Learning for Airbnb Search


More from Engineering

View All

Density — Sound Terrain

I’m sure you already know this, but The United States of America is a huge place. And I’m sure you also know that the population is not evenly distributed. So that’s not interesting.

February 3, 2020