In recent years, sequential recommendation systems have become crucial in online platforms like Amazon and Yelp, where user preferences are inferred from historical interaction data to suggest relevant next items. Traditionally, these systems have relied heavily on classification-based methods. Our model presents a fresh perspective by leveraging the capabilities of generative models in recommendation tasks. In this blog, we will explore the core principles behind GenRec, its methodology, and how it outperforms traditional approaches.
Introduction
Our model formulates sequential recommendation as a sequence-to-sequence generation task. Inspired by the recent "pretrain, prompt, and predict" paradigm in natural language processing, the model uses the powerful Transformer architecture to model user-item interactions and generate personalized recommendations without relying on manually designed prompts. This approach differs from traditional methods that learns explicit representations of users and items. Instead, the model adopts a masked item prediction objective, which enables it to learn bidirectional sequential patterns effectively.
Architecture
The model is built upon a sequence-to-sequence Transformer encoder-decoder framework. Here’s a detailed breakdown of its components:
- Input Representation: The input is a sequence of user-item interactions, tokenized into a sequence of tokens. This sequence is enriched with token embeddings, positional embeddings, and user/item ID embeddings.
- Masked Prediction Objective: During training, a random item in the sequence is masked, and the model is trained to predict the masked item. This strategy enables the model to learn patterns from both past and future interactions within a sequence.
- Decoder Output: During inference, the model generates the top-N item recommendations by predicting the next item in a sequence auto-regressively.
Training and Finetuning
- Pretraining: In the pretraining stage, the model learns bidirectional sequential patterns by predicting masked items within user-item sequences. This helps in capturing user behavior patterns comprehensively.
- Finetuning: During finetuning, the pre-trained patterns are refined for the specific task of next-item prediction. A [MASK] token is appended at the end of the input sequence for the model to generate next items.
Experimental Results
To validate the effectiveness of GenRec, extensive experiments were conducted using public datasets such as Amazon Sports, Amazon Beauty, and Yelp. The following table shows the performance of our model in comparison to several baseline models. The results demonstrate that our model consistently achieves state-of-the-art performance across all datasets in Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) metrics.
Models | Sports (HR@5/NDCG@5) | Beauty (HR@5/NDCG@5) | Yelp (HR@5/NDCG@5) |
---|---|---|---|
Caser | 0.0116 / 0.0072 | 0.0205 / 0.0131 | 0.0151 / 0.0096 |
HGN | 0.0189 / 0.0120 | 0.0325 / 0.0206 | 0.0186 / 0.0115 |
SASRec | 0.0233 / 0.0154 | 0.0387 / 0.0249 | 0.0162 / 0.0100 |
P5-S | 0.0272 / 0.0169 | 0.0503 / 0.0370 | 0.0568 / 0.0402 |
GenRec | 0.0397 / 0.0332 | 0.0515 / 0.0397 | 0.0627 / 0.0475 |
Ablation Study
An ablation study was conducted to assess the impact of the masked sequence modeling task in the pretraining phase. The results, shown in the table below, reveal that pretraining plays a critical role in improving model performance across various datasets.
Dataset | HR@5 | NDCG@5 | HR@10 | NDCG@10 |
---|---|---|---|---|
Sports | 0.0397 | 0.0332 | 0.0462 | 0.0353 |
Sports (w/o pretraining) | 0.0360 | 0.0286 | 0.0431 | 0.0310 |
Beauty | 0.0515 | 0.0397 | 0.0641 | 0.0439 |
Beauty (w/o pretraining) | 0.0422 | 0.0313 | 0.0548 | 0.0354 |
Yelp | 0.0627 | 0.0475 | 0.0724 | 0.0507 |
Yelp (w/o pretraining) | 0.0626 | 0.0469 | 0.0716 | 0.0499 |
Key Advantages
- Lightweight and Efficient: Unlike other generative models, our model does not require extensive prompt engineering, making it easier to implement and train.
- Adaptable to Low-Resource Settings: GenRec’s training process is efficient, requiring only a few hours to achieve competitive results, which is particularly beneficial in scenarios with limited computational resources.
- Generalization Ability: Through a unified pretraining and finetuning objective, our model generalizes across different datasets, making it suitable for real-world applications.
Conclusion
Our model offers a fresh take on the problem of sequential recommendation by framing it as a generative sequence-to-sequence task. By utilizing masked sequence modeling and the powerful Transformer architecture, it achieves state-of-the-art performance while being lightweight and efficient. This makes our model a promising choice for future developments in the recommendation system space. More details, such as the code and datasets, can be found in the paper published in the ACM RecSys Workshop 2024.