{"id":114411,"date":"2023-11-24T10:44:02","date_gmt":"2023-11-24T10:44:02","guid":{"rendered":"https:\/\/showbizztoday.com\/index.php\/2023\/11\/24\/which-witch-artist-name-disambiguation-and-catalog-curation-using-audio-and-metadata\/"},"modified":"2023-11-24T10:44:02","modified_gmt":"2023-11-24T10:44:02","slug":"which-witch-artist-identify-disambiguation-and-catalog-curation-utilizing-audio-and-metadata","status":"publish","type":"post","link":"https:\/\/showbizztoday.com\/index.php\/2023\/11\/24\/which-witch-artist-identify-disambiguation-and-catalog-curation-utilizing-audio-and-metadata\/","title":{"rendered":"Which Witch? Artist identify disambiguation and catalog curation utilizing audio and metadata"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<div class=\"published-date\">\n<div class=\"icon-holder\">\n                                                <img decoding=\"async\" src=\"https:\/\/research.atspotify.com\/wp-content\/themes\/spotify\/images\/icon.png\" alt=\"\"\/>\n                                            <\/div>\n<p><span class=\"date\">November 17, 2023<\/span> Published by Brian Regan, Desislava Hristova, Mariano Beguerisse-D\u00edaz<\/p>\n<\/p><\/div>\n<div class=\"img-holder\">\n                                            <img src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/RS058-Which-Witch-Blog-Header-Final-01.png\" class=\"attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"Which Witch? Artist name disambiguation and catalog curation using audio and metadata\" decoding=\"async\" fetchpriority=\"high\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/RS058-Which-Witch-Blog-Header-Final-01.png 1200w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/RS058-Which-Witch-Blog-Header-Final-01-250x131.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/RS058-Which-Witch-Blog-Header-Final-01-700x368.png 700w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/RS058-Which-Witch-Blog-Header-Final-01-768x403.png 768w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/RS058-Which-Witch-Blog-Header-Final-01-120x63.png 120w\" sizes=\"(max-width: 1200px) 100vw, 1200px\"\/><figcaption\/>\n                                        <\/div>\n<h2 class=\"wp-block-heading\">TL;DR <\/h2>\n<p>We developed a <em>Named Entity Disambiguation<\/em> (NED) technique to help human curators find and correcting rare errors in a music catalog. This system can detect misattribution, when releases which might be incorrectly attributed to an artist discography and predict appropriate relocations, and duplication, when the discography of an artist is incorrectly cut up.<\/p>\n<p>To do that, the system combines audio vector representations with metadata-based options in a machine studying (ML) system. Combining audio and metadata fashions outperforms fashions based mostly on audio or metadata alone. Through a set of \u201cin-the-wild\u201d experiments with Subject Matter Experts (SMEs), we exhibit the potential of such proactive curation programs to save lots of effort and time by directing consideration the place it&#8217;s most wanted to make sure that our catalog is free from errors.<\/p>\n<h2 class=\"wp-block-heading\">Named Entity Disambiguation at scale<\/h2>\n<p>Named Entity Disambiguation offers with the issue of mapping ambiguously named entities, corresponding to homonym music artists, to their right identifiers. For instance, on Spotify there are 11 artists named <em>Witch<\/em> (plus many others with <em>Witch<\/em> within the identify). When a brand new launch by a Witch is submitted and not using a distinctive artist identifier, we should decide of the place to put it:<em> Is it by the <\/em><a href=\"https:\/\/open.spotify.com\/artist\/0LMkPoi2xIgpOPUSJMftqM?si=f28JnM-qR5-bEvrIOqp3_A\" target=\"_blank\" rel=\"noopener\"><em>Zambian psychedelic band<\/em><\/a><em>, the <\/em><a href=\"https:\/\/open.spotify.com\/artist\/6uNOBEATMcW8SSunnKy9a3?si=Sq8vUDu8RqiJWkiG7IQvOQ\" target=\"_blank\" rel=\"noopener\"><em>US doom steel band<\/em><\/a><em>, one of many different Witches, or a brand new Witch?<\/em> Given the extraordinarily giant volumes of music content material delivered to Spotify daily by suppliers that adjust from DIY artists by way of aggregators, all the best way to megastars by way of main labels, it&#8217;s inevitable that often a launch is incorrectly attributed.<\/p>\n<p>In Music Information Retrieval (MIR), NED is often formulated as a multi-class classification downside with identified artist lessons. This formulation, which depends totally on audio characteristic representations, can&#8217;t be utilized to Spotify-scale catalogs with a big and even unknown variety of artists; a quantity that grows daily. State-of-the-art NED analysis has centered just lately on automation; nevertheless, a human-in-the-loop (HITL) paradigm is commonly essential to resolve extremely ambiguous circumstances, right automated choices, and guarantee high quality.<\/p>\n<h2 class=\"wp-block-heading\">Our answer<\/h2>\n<p>In this paper, we current an ML-based semi-automated proactive curation system to detect and proper attribution errors in giant music catalogs. The system consists of two sub-models (which may be standalone programs): detects misattribution by splitting discographies with releases from a number of artists into single-artist discographies (Figure 1a), and one other detects duplication by deciding whether or not two discographies belong to the identical artist and must be merged (Figure 1b). Both programs depend on the music\u2019s metadata and the acoustic similarity between releases, utilizing deep convolutional community embeddings of their mel-spectrograms and random forests.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1509\" height=\"1352\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_1.png\" alt=\"\" class=\"wp-image-5664\" style=\"width:505px;height:auto\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_1.png 1509w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_1-250x224.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_1-700x627.png 700w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_1-768x688.png 768w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_1-120x108.png 120w\" sizes=\"auto, (max-width: 1509px) 100vw, 1509px\"\/><\/figure>\n<\/div>\n<p class=\"has-text-align-left\">Figure 1: (a) We detect misattribution on every discography\u00a0<em>A<\/em>. A misattributed launch\u00a0<em>a<sub>3<\/sub><\/em>\u00a0is cut up out from\u00a0<em>A<sub>1<\/sub><\/em>\u00a0into sub-discography A<sup>*<\/sup><sub>1<\/sub>. (b) We contemplate all (sub-)discographies for deduplication; we merge A<sup>*<\/sup><sub>1<\/sub>\u00a0into A<sub>2<\/sub>, which relocates any misattributed releases into the proper discography.\u00a0\u00a0<\/p>\n<p>The system\u2019s targets are to Ensure:<\/p>\n<ol>\n<li><strong>Correct<\/strong> discographies, the place each launch inside a discography ought to credit score the identical artist.<\/li>\n<li><strong>Complete<\/strong> discographies, the place artist\u2019s releases shouldn&#8217;t be cut up throughout a number of discographies.<\/li>\n<\/ol>\n<h2 class=\"wp-block-heading\">Misattribution detection<\/h2>\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"700\" height=\"336\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_2-stacked-1-700x336.png\" alt=\"\" class=\"wp-image-5668\" style=\"width:923px;height:auto\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_2-stacked-1-700x336.png 700w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_2-stacked-1-250x120.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_2-stacked-1-768x369.png 768w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_2-stacked-1-1536x737.png 1536w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_2-stacked-1-2048x983.png 2048w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/figure_2-stacked-1-120x58.png 120w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\"\/><\/figure>\n<p>Figure 2: Steps to detect misattribution in an artist\u2019s discography.<\/p>\n<p>We educated a random forest utilizing historic corrections of artist misattributions to find out whether or not two releases attributed to the identical artist are actually by completely different artists. We can consider the output of this mannequin as a pairwise distance between all releases attributed to an artist (Figure 2b). To forestall false positives that may come up, for instance, when an artist\u2019s sound adjustments over time, we assemble a Minimum Spanning Tree (MST) between all releases (Figure 2c).\u00a0 Applying a threshold to the MST to chop edges with lengthy distances splits the discography into elements that correspond to the completely different artists current within the discography (Figure 2nd). If we&#8217;re unable to chop the MST as a result of there aren&#8217;t any lengthy edges, we assume that the discography comprises no misattributions.<\/p>\n<h2 class=\"wp-block-heading\">Duplicate detection<\/h2>\n<p>The aim of de-duplication is to merge current discographies or sub-discographies that belong to the identical artist (e.g. launch <em>a<\/em><em><sub>3<\/sub><\/em> in Figure 1). This course of consists of two steps: (1) producing deduplication candidates by way of a blocking technique, and (2) figuring out whether or not pairs of discographies belong to the identical artist.\u00a0<\/p>\n<p>We use <em>Elasticsearch<\/em> to generate candidates; these are sometimes homonyms, or have comparable names (e.g. <em>Prince, Princess<\/em> and <em>Prince of Funk<\/em>). We use a random forest educated on historic corrections of duplicate discographies to find out whether or not two discographies within the block are prone to belong to the identical artist.<\/p>\n<h2 class=\"wp-block-heading\">Experiments and evaluations<\/h2>\n<h3 class=\"wp-block-heading\">Audio and metadata characteristic ablations<\/h3>\n<p>Both the misattribution and the duplicate detection fashions use a mixture of metadata options (corresponding to overlap of collaborators, language of efficiency or music label) and audio vector representations. Our experiments present {that a} mixture of each performs finest; audio options dominate the pairwise misattribution mannequin, and including metadata will increase common precision by 2% (Figure 3a). Interestingly, this sample is reversed within the duplicate detection system, during which metadata options drive the efficiency of the system, and including audio options will increase common precision by 6% (Figure 3b).<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"700\" height=\"425\" src=\"https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/fig3combinedSTACKED2x2-1-700x425.png\" alt=\"\" class=\"wp-image-5672\" style=\"width:925px;height:auto\" srcset=\"https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/fig3combinedSTACKED2x2-1-700x425.png 700w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/fig3combinedSTACKED2x2-1-250x152.png 250w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/fig3combinedSTACKED2x2-1-768x466.png 768w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/fig3combinedSTACKED2x2-1-1536x932.png 1536w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/fig3combinedSTACKED2x2-1-2048x1242.png 2048w, https:\/\/storage.googleapis.com\/research-production\/1\/2023\/11\/fig3combinedSTACKED2x2-1-120x73.png 120w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\"\/><\/figure>\n<p>Figure 3 Evaluation: (a) \u2013 (b): Precision-Recall curves in offline experiments with combos of audio and metadata options for misattribution detection (a) and deduplication (b). Average precision (AP) is reported within the legend for every set of options. (c) \u2013 (d): Annotation experiment outcomes for misattribution detection (c) and deduplication (d). Precision is calculated for every threshold bucket and reweighed by the distribution of predictions proven on the second y axis.<\/p>\n<h3 class=\"wp-block-heading\">Test driving our system with Subject Matter Experts (SMEs)<\/h3>\n<p>We teamed up with our annotations staff and sampled ~1K examples every for misattribution and deduplication duties. We requested SMEs to annotate examples of misattribution and duplicate discographies as<em> by the identical artist<\/em> or <em>by completely different artists. <\/em>Figure 3<em> c<\/em> and<em> d <\/em>reveals the precision at completely different thresholds of the fashions; as the brink goes up, the fraction of samples (and potential error detections) decreases whereas the precision will increase. This trade-off permits SMEs to discover a sweet-spot that balances precision and recall for catalog curation.<\/p>\n<p>Then we ran our detection duties in sequence (as described in Figure 1) to robotically predict appropriate relocations of misattributed contect utilizing the deduplication mannequin. We obtain a most precision of 45% when each the misattribution step and deduplication (relocation) step have a excessive threshold (representing 17% of the pattern). This signifies that roughly half the time, catalog curation specialists don\u2019t should spend time on the lookout for the fitting place to put a mismatched launch, main to very large time financial savings. The relocation activity is notoriously tougher as a result of it inherits the uncertainty and efficiency of every sub-system. Additionally, numerous misattributed releases won&#8217;t belong anyplace, and can turn out to be standalone discographies as a result of they belong to artists which might be new to the catalog.<\/p>\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n<p>Although discography errors are uncommon, you will need to reduce them as a lot as attainable. Systems such because the one we current in our paper that depend on ML and cautious information modeling are one device amongst many who platforms can use to make sure their catalog is right, and to safeguard the expertise of customers and artists. The energy of this method is that it could possibly scan a big catalog effectively, direct the eye of human reviewers to the place it&#8217;s most wanted, and counsel corrections. These benefits make our system a key a part of efficient proactive catalog curation methods.<\/p>\n<p>For extra element, please verify our paper:<br \/><a href=\"https:\/\/research.atspotify.com\/publications\/semi-automated-music-catalog-curation-using-audio-and-metadata\/\" target=\"_blank\" rel=\"noopener\">Semi-automated Music Catalog Curation Using Audio and Metadata<br \/><\/a>Brian Regan, Desi Hristova, Mariano Beguerisse D\u00edaz<br \/>ISMIR 2023<\/p>\n<\/p><\/div>\n<p>[ad_2]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] November 17, 2023 Published by Brian Regan, Desislava Hristova, Mariano Beguerisse-D\u00edaz TL;DR We developed a Named Entity Disambiguation (NED) technique to help human curators find and correcting rare errors in a music catalog. This system can detect misattribution, when releases which might be incorrectly attributed to an artist discography and predict appropriate relocations, and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":114413,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[],"class_list":{"0":"post-114411","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-spotify"},"_links":{"self":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/114411","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/comments?post=114411"}],"version-history":[{"count":0,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/posts\/114411\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media\/114413"}],"wp:attachment":[{"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/media?parent=114411"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/categories?post=114411"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/showbizztoday.com\/index.php\/wp-json\/wp\/v2\/tags?post=114411"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}