Personalizing Audiobooks and Podcasts with graph-based fashions


May 10, 2024 Published by Andreas Damianou, Marco De Nadai, Francesco Fabbri, Paul Gigioli, Alice Wang, Mounia Lalmas

RS062 Personalizing Audiobooks and Podcasts...

Spotify’s catalog consists of hundreds of thousands of music tracks and podcasts and has just lately expanded to Audiobooks. Personalizing this content material to customers requires our algorithms to “understand” consumer preferences in addition to content material relationships throughout all content material varieties. We argue that this degree of algorithmic understanding might be effectively achieved with graph-based machine studying, in a method that is still scalable for manufacturing. In May 2024, we current two papers round this subject on the Web Conference. 

In the primary paper, we define our analysis giving rise to a graph-based strategy for modeling Audiobooks along with podcast and music indicators. This resulted in a system which was productionized, efficiently overcoming the cold-start drawback associated to the shortage of historic Audiobook consumption knowledge. In the second paper, we take the concept one step additional to equip the system with foundational modeling capabilities, that’s, permitting it to distill data from a number of sources; after adaptation, this distilled data (illustration) can be utilized for a number of downstream duties associated to the personalization of a number of content material varieties, together with podcasts.

Figure 1: A) Our customers’ consumption patterns, which contain audiobooks and podcasts; B) We construct a co-listening graph with nodes representing audiobooks or podcasts, and edges connecting nodes every time at the least one consumer streams each; C) Audiobook IT will get beneficial as a result of 2T-HGNN per- types non-trivial suggestions utilizing 2-hop distant pat- terns. Delicious is much like Taste. Taste is co-listened with Fake Doctors, which is co-listened with IT.

Graph-based personalization for Audiobooks 

Let us think about suggestion as a specific taste of personalization. The job is to coach an algorithm that recommends Audiobooks to customers. During the event of the algorithm, in 2023, Audiobooks was a brand new content material kind which lacked consumer interactions for use as coaching knowledge. Thus, it was pure to hunt to leverage the consumer’s identified historic preferences for music and podcasts, in addition to content material similarities amongst Audiobooks and podcasts. For instance, an audiobook about medieval historical past has some similarity with a thematically related podcast. Representing audiobooks and podcasts as nodes in a graph permits us to realize the above as a result of: (a) node connectivity relies on co-listening, i.e. audiobook A and podcast P are linked if at the least one consumer has listened to each, thus capturing cross-content kind interplay data; and (b) every audiobook and podcast node is related to a set of options derived from Large Language Models (LLMs) utilized on their title and outline; content material similarities are captured on this method. 

Figure 2: The general system: Large Language Models (LLMs) carry out content material understanding from Audiobooks (A) and Podcasts (P). These are used as node options when coaching the Graph Neural Network on a co-listening graph. A two-tower mannequin (2T) consumes the representations realized by the HGNN and learns the ultimate consumer and audiobook vectors collectively, in a standard house. 

Figure 1B illustrates the development of the co-listening graph, out of a user-streamings graph which is proven in Figure 1A. The graph construction permits for studying “multi-hop” patterns; that’s, if Audiobook A1 is linked to podcast P1 (1-hop relation), and podcast P1 is linked to Audiobook A4, then the implication is that audiobook A1 and podcast A4 are someway associated (2-hop relation). This is illustrated in Figure 2A, together with our earlier rationalization relating to LLM-derived node options. Once such multi-hop relations are realized, they are often leveraged for making suggestions, even within the absence of Audiobook interplay knowledge. As proven in Figure 1C, the system can then predict the probability of a consumer listening to an Audiobook. This works as a result of the system distills the graph content material and interplay indicators into consumer and audiobook/podcast representations. Essentially, representations are sequences of numbers, such {that a} consumer illustration Repr(U1) that’s much like an audiobook illustration Repr(A2) signifies that consumer U1 will doubtless take pleasure in Audiobook A2

But how will we flip a graph sign right into a set of representations for every consumer and every Audiobook / podcast? We make use of a novel mixture of Heterogeneous Graph Neural Networks (HGNNs) and a two-tower mannequin (2T)

  • Graph neural community. The HGNN operates on the established paradigm of message-passing: preliminary representations for every node are “communicated” by means of aggregation features to their close by nodes after which gradient studying updates every illustration based on the “messages” communicated by its personal neighbors. This is repeated for a number of epochs to acquire closing Audiobook/Podcast representations, as is illustrated in Figure 2A
  • Two-tower mannequin. The HGNN representations are then fed right into a 2T mannequin which accounts for added consumer indicators, similar to demographics. This can be the element which accounts for the customers’ music preferences. Overall, the 2T mannequin associates consumer and audiobook vectors, to allow them to be in contrast in the identical mathematical house. We additionally make use of weak indicators, similar to previewing or following an Audiobook. 

Combining the HGNN with the light-weight 2T mannequin additionally permits for scalability, as a result of it signifies that we will implement the user-specific aspect of modeling exterior of the HGNN. Therefore, the HGNN is educated on the co-listening graph (Figure 1B) moderately than the user-streaming graph (Figure 1A) containing an enormous quantity of particular person user-content interactions.

After efficiently testing our mannequin in offline knowledge, we carried out an A/B take a look at involving hundreds of thousands of customers. The on-line take a look at resulted in a big 23% improve in audiobook stream charges. Remarkably, we noticed a 46% surge within the charge of individuals beginning new audiobooks. The mannequin is since then in manufacturing, uncovered to all eligible audiobooks Spotify customers. 

A unified mannequin for personalization 

In our second paper introduced within the Web Conference, within the Graph Foundation Models workshop, we take the to date mentioned thought one step additional. We are motivated by the truth that representations for Audiobooks and Podcasts are already realized collectively inside the Graph Neural Network. These representations are generic sufficient that may be seen as a basis layer, that’s, a common, domain-agnostic illustration that may be tailored to serve completely different downstream duties. Further, this basis layer is static, within the sense that it solely must be up to date occasionally because of the comparatively gradual altering catalog of podcasts/audiobooks. To allow the foundational representations for use in quite a lot of duties, we re-purpose the beforehand mentioned 2T mannequin to change into an adaptation mechanism. Specifically, the Audiobook tower of Figure 2B now turns into a common “item” tower, which is content material kind agnostic and may deal with Audiobooks, Podcast reveals and Podcast episodes in a Unified Way. This constitutes a dynamic layer, as a result of it’s light-weight and user-specific, so it may be up to date incessantly and at a low price. 

As mentioned above, the Unified Model structure decouples the content material illustration studying (static layer) from the consumer illustration studying (dynamic layer). The good thing about such an strategy is that it unifies illustration studying throughout varied duties, it allows data sharing, improves the standard of realized representations, and simplifies manufacturing pipelines. Furthermore, it’s an environment friendly strategy to cope with the problem of representing new episodes whereas avoiding bias in direction of recency and being attentive to consumer interactions in close to actual time.

The Unified Model offered quantitative positive factors in offline experiments, similar to a 16.6% improve within the HR@10 metric for audiobook suggestions, towards an Audiobook-specific mannequin. Besides, our offline outcomes additionally confirmed nearly an identical efficiency of the mannequin versus a variant which is re-trained each day. This confirms that the HGNN basis illustration stays steady over time and might be successfully utilized within the Unified 2T mannequin every day with out the necessity for frequent retraining. Since the publication of our workshop paper we’ve got additionally carried out an A/B take a look at which demonstrated that the Unified Model additionally supplies positive factors within the on-line setting.


We leveraged the ability of graph-based studying to personalize audiobook suggestions in Spotify. Our modular strategy permits us to decouple advanced item-item relationships whereas producing scalable suggestions for all customers. In subsequent work we geared up the mannequin with foundational modeling capabilities. The ensuing representations can be utilized for a number of downstream duties associated to the personalization of a number of content material varieties, together with podcasts. We think about this to be a step in direction of the primary graph-based, basis mannequin tailor-made to the area of personalization. We imagine that this work showcases the promise of graph-based basis fashions in industrial purposes. 

For extra data please consult with our papers: 

Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks
Marco De Nadai, Francesco Fabbri, Paul Gigioli, Alice Wang, Ang Li, Fabrizio Silvestri, Laura Kim, Shawn Lin, Vladan Radosavljevic, Sandeep Ghael, David Nyhan, Hugues Bouchard, Mounia Lalmas-Roelleke, Andreas Damianou
The Web Conference, 2024 (Industry Track)

Towards Graph Foundation Models for Personalization
Andreas Damianou, Francesco Fabbri, Paul Gigioli, Marco De Nadai, Alice Wang, Enrico Palumbo, Mounia Lalmas
The Web Conference, 2024 (Graph Foundation Models Workshop)


Please enter your comment!
Please enter your name here