FM-Intent: Predicting User Session Intent with Hierarchical Multi-Task Learning | by Netflix Technology Blog | May, 2025

0
303
FM-Intent: Predicting User Session Intent with Hierarchical Multi-Task Learning | by Netflix Technology Blog | May, 2025

[ad_1]

Authors: Sejoon Oh, Moumita Bhattacharya, Yesu Feng, Sudarshan Lamkhede, Ko-Jen Hsiao, and Justin Basilico

Recommender techniques have develop into important elements of digital providers throughout e-commerce, streaming media, and social networks [1, 2]. At Netflix, these techniques drive vital product and enterprise affect by connecting members with related content material on the proper time [3, 4]. While our advice basis mannequin (FM) has made substantial progress in understanding person preferences by large-scale studying from interplay histories (please check with this article about FM @ Netflix), there is a chance to additional improve its capabilities. By extending FM to include the prediction of underlying person intents, we purpose to complement its understanding of person classes past next-item prediction, thereby providing a extra complete and nuanced advice expertise.

Recent analysis has highlighted the significance of understanding person intent in on-line platforms [5, 6, 7, 8]. As Xia et al. [8] demonstrated at Pinterest, predicting a person’s future intent can result in extra correct and customized suggestions. However, current intent prediction approaches usually make use of easy multi-task studying that provides intent prediction heads to next-item prediction fashions with out establishing a hierarchical relationship between these duties.

To tackle these limitations, we introduce FM-Intent, a novel advice mannequin that enhances our basis mannequin by hierarchical multi-task studying. FM-Intent captures a person’s latent session intent utilizing each short-term and long-term implicit indicators as proxies, then leverages this intent prediction to enhance next-item suggestions. Unlike standard approaches, FM-Intent establishes a transparent hierarchy the place intent predictions instantly inform merchandise suggestions, making a extra coherent and efficient advice pipeline.

FM-Intent makes three key contributions:

  1. A novel advice mannequin that captures person intent on the Netflix platform and enhances next-item prediction utilizing this intent info.
  2. A hierarchical multi-task studying method that successfully fashions each short-term and long-term person pursuits.
  3. Comprehensive experimental validation displaying vital efficiency enhancements over state-of-the-art fashions, together with our basis mannequin.

In the Netflix ecosystem, person intent manifests by numerous interplay metadata, as illustrated in Figure 1. FM-Intent leverages these implicit indicators to foretell each person intent and next-item suggestions.

Figure 1: Overview of person engagement information in Netflix. User intent might be related to a number of interplay metadata. We leverage numerous implicit indicators to foretell person intent and next-item.

In Netflix, there might be a number of forms of person intents. For occasion,

Action Type: Categories reflecting what customers intend to do on Netflix, comparable to discovering new content material versus persevering with beforehand began content material. For instance, when a member performs a follow-up episode of one thing they had been already watching, this may be categorized as “continue watching” intent.

Genre Preference: The pre-defined style labels (e.g., Action, Thriller, Comedy) that point out a person’s content material preferences throughout a session. These preferences can shift considerably between classes, even for a similar person.

Movie/Show Type: Whether a person is searching for a film (usually a single, longer viewing expertise) or a TV present (probably a number of episodes of shorter length).

Time-since-release: Whether the person prefers newly launched content material, latest content material (e.g., between every week and a month), or evergreen catalog titles.

These dimensions function proxies for the latent person intent, which is usually circuitously observable however essential for offering related suggestions.

FM-Intent employs a hierarchical multi-task studying method with three main elements, as illustrated in Figure 2.

Figure 2: An architectural illustration of our hierarchical multi-task studying mannequin FM-Intent for person intent and merchandise predictions. We use ground-truth intent and item-ID labels to optimize predictions.

1. Input Feature Sequence Formation

The first element constructs wealthy enter options by combining interplay metadata. The enter function for every interplay combines categorical embeddings and numerical options, making a complete illustration of person habits.

2. User Intent Prediction

The intent prediction element processes the enter function sequence by a Transformer encoder and generates predictions for a number of intent indicators.

The Transformer encoder successfully fashions the long-term curiosity of customers by multi-head consideration mechanisms. For every prediction activity, the intent encoding is reworked into prediction scores by way of fully-connected layers.

A key innovation in FM-Intent is the attention-based aggregation of particular person intent predictions. This method generates a complete intent embedding that captures the relative significance of various intent indicators for every person, offering helpful insights for personalization and rationalization.

3. Next-Item Prediction with Hierarchical Multi-Task Learning

The ultimate element combines the enter options with the person intent embedding to make extra correct next-item suggestions.

FM-Intent employs hierarchical multi-task studying the place intent predictions are performed first, and their outcomes are used as enter options for the next-item prediction activity. This hierarchical relationship ensures that the next-item suggestions are knowledgeable by the expected person intent, making a extra coherent and efficient advice mannequin.

We performed complete offline experiments on sampled Netflix person engagement information to guage FM-Intent’s efficiency. Note that FM-Intent makes use of a a lot smaller dataset for coaching in comparison with the FM manufacturing mannequin as a consequence of its complicated hierarchical prediction structure.

Next-Item and Next-Intent Prediction Accuracy

Table 1 compares FM-Intent with a number of state-of-the-art sequential advice fashions, together with our manufacturing mannequin (FM-Intent-V0).

Table 1: Next-item and next-intent prediction outcomes of baselines and our proposed technique FM-Intent on the Netflix person engagement dataset.

All metrics are represented as relative % enhancements in comparison with the SOTA baseline: TransAct. N/A signifies {that a} mannequin shouldn’t be able to predicting a sure intent. Note that we added extra fully-connected layers to LSTM, GRU, and Transformer baselines with a purpose to predict person intent, whereas we used authentic implementations for different baselines. FM-Intent demonstrates statistically vital enchancment of seven.4% in next-item prediction accuracy in comparison with one of the best baseline (TransAct).

Most baseline fashions present restricted efficiency as they both can’t predict person intent or can’t incorporate intent predictions into next-item suggestions. Our manufacturing mannequin (FM-Intent-V0) performs properly however lacks the power to foretell and leverage person intent. Note that FM-Intent-V0 is educated with a smaller dataset for a good comparability with different fashions; the precise manufacturing mannequin is educated with a a lot bigger dataset.

Figure 3: Ok-means++ (Ok=10) clustering of person intent embeddings discovered by FM-Intent; FM-Intent finds distinctive clusters of customers that share the same intent.

FM-Intent generates significant person intent embeddings that can be utilized for clustering customers with comparable intents. Figure 3 visualizes 10 distinct clusters recognized by Ok-means++ clustering. These clusters reveal significant person segments with distinct viewing patterns:

  • Users who primarily uncover new content material versus those that proceed watching latest/favourite content material.
  • Genre fans (e.g., anime/youngsters content material viewers).
  • Users with particular viewing patterns (e.g., Rewatchers versus informal viewers).

FM-Intent has been efficiently built-in into Netflix’s advice ecosystem, might be leveraged for a number of downstream functions:

Personalized UI Optimization: The predicted person intent may inform the structure and content material choice on the Netflix homepage, emphasizing completely different rows primarily based on whether or not customers are in discovery mode, continue-watching mode, or exploring particular genres.

Analytics and User Understanding: Intent embeddings and clusters present helpful insights into viewing patterns and preferences, informing content material acquisition and manufacturing choices.

Enhanced Recommendation Signals: Intent predictions function options for different advice fashions, bettering their accuracy and relevance.

Search Optimization: Real-time intent predictions assist prioritize search outcomes primarily based on the person’s present session intent.

FM-Intent represents an development in Netflix’s advice capabilities by enhancing them with hierarchical multi-task studying for person intent prediction. Our complete experiments display that FM-Intent considerably outperforms state-of-the-art fashions, together with our prior basis mannequin that centered solely on next-item prediction. By understanding not simply what customers would possibly watch subsequent however what underlying intents customers have, we are able to present extra customized, related, and satisfying suggestions.

We thank our gorgeous colleagues within the Foundation Model staff & AIMS org. for his or her helpful suggestions and discussions. We additionally thank our companion groups for getting this up and operating in manufacturing.

[1] Amatriain, X., & Basilico, J. (2015). Recommender techniques in business: A netflix case research. In Recommender techniques handbook (pp. 385–419). Springer.

[2] Gomez-Uribe, C. A., & Hunt, N. (2015). The netflix recommender system: Algorithms, enterprise worth, and innovation. ACM Transactions on Management Information Systems (TMIS), 6(4), 1–19.

[3] Jannach, D., & Jugovac, M. (2019). Measuring the enterprise worth of recommender techniques. ACM Transactions on Management Information Systems (TMIS), 10(4), 1–23.

[4] Bhattacharya, M., & Lamkhede, S. (2022). Augmenting Netflix Search with In-Session Adapted Recommendations. In Proceedings of the sixteenth ACM Conference on Recommender Systems (pp. 542–545).

[5] Chen, Y., Liu, Z., Li, J., McAuley, J., & Xiong, C. (2022). Intent contrastive studying for sequential advice. In Proceedings of the ACM Web Conference 2022 (pp. 2172–2182).

[6] Ding, Y., Ma, Y., Wong, W. Ok., & Chua, T. S. (2021). Modeling prompt person intent and content-level transition for sequential vogue advice. IEEE Transactions on Multimedia, 24, 2687–2700.

[7] Liu, Z., Chen, H., Sun, F., Xie, X., Gao, J., Ding, B., & Shen, Y. (2021). Intent desire decoupling for person illustration on on-line recommender system. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence (pp. 2575–2582).

[8] Xia, X., Eksombatchai, P., Pancha, N., Badani, D. D., Wang, P. W., Gu, N., Joshi, S. V., Farahpour, N., Zhang, Z., & Zhai, A. (2023). TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest. In Proceedings of the twenty ninth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 5249–5259).

LEAVE A REPLY

Please enter your comment!
Please enter your name here