How machine learning finds your new music

I’m a sucker for technical dives into Spotify’s Discover Weekly, and this is a great one.

In the article, Sophia Ciocca gives three types of recommendation models that are used to generate the playlists. The first is collaborative filtering: crudely, your friends like this, you might like this too. Digging deeper, the mathematical modelling sounds fascinating. The third is raw audio models: analysis of the audio tracks themselves. This is why Release Radar works so well, despite the tracks not having been played many times.

But I didn’t know about the second one, the emphasis Spotify puts on natural language processing, or NLP:

Spotify crawls the web constantly looking for blog posts and other written texts about music, and figures out what people are saying about specific artists and songs — what adjectives and language is frequently used about those songs, and which other artists and songs are also discussed alongside them.

While I don’t know the specifics of how Spotify chooses to then process their scraped data, I can give you an understanding of how the Echo Nest used to work with them. They would bucket them up into what they call “cultural vectors” or “top terms.” Each artist and song had thousands of daily-changing top terms. Each term had a weight associated, which reveals how important the description is (roughly, the probability that someone will describe music as that term.)

Spotify’s Discover Weekly: How machine learning finds your new music

Unlike many others, I’m a fan of the Apple Music UI and implementation. But I’ve not had terrific results with their recommendation engines. The opposite is true for Spotify. It’d be nice to save some money by cancelling one or other of the services, but they do such different things for me that I can’t see that happening any time soon.

The best anagram in English

Mark Dominus on The Universe of Discourse has (fully) explored all the anagrams in Webster’s Second International dictionary.

The longest pair (cholecystoduodenostomy and duodenocholecystostomy) isn’t necessarily the most interesting, as both words are made up of three units (cholecysto, duodeno, stomy) in different orders.

So he came up with a way of scoring pairs based on the degree of rearrangement required:

This gave me the idea to score a pair of anagrams according to how many chunks one had to be cut into in order to rearrange it to make the other one. On this plan, the “cholecystoduodenostomy / duodenocholecystostomy” pair would score 3, just barely above the minimum possible score of 2. Something even a tiny bit more interesting, say “abler / blare” would score higher, in this case 4. Even if this strategy didn’t lead me directly to the most interesting anagrams, it would be a big step in the right direction, allowing me to eliminate the least interesting.

From this algorithm, the most interesting anagram pair is 15 letters long, with only two letters that stay next to each other. Go see what it is.

Cooperation against fake news

I’ve spent the past few days reading almost exclusively about the rise, dissemination and impact of fake news.

It’s not a new topic—I’ve enjoyed reading John Hermann, Mike Caulfield, Caitlin Dewey and Jeff Jarvis (among others) for some time. But Trump’s victory has turned it from a curiosity into a dangerous force.

Jarvis has co-written a list of 15 suggestions for platforms to adopt or investigate. This stands out to me as particularly important:

Create a system for media to send metadata about their fact-checking, debunking, confirmation, and reporting on stories and memes to the platforms. It happens now: Mouse over fake news on Facebook and there’s a chance the related content that pops up below can include a news site or Snopes reporting that the item is false. Please systematize this: Give trusted media sources and fact-checking agencies a path to report their findings so that Facebook and other social platforms can surface this information to users when they read these items and — more importantly — as they consider sharing them. Thus we can cut off at least some viral lies at the pass. The platforms need to give users better information and media need to help them. Obviously, the platforms can use such data from both users and media to inform their standards, ranking, and other algorithmic decisions in displaying results to users.

These linked data connections are not difficult to implement but they won’t happen without us asking for them. Platforms simply aren’t interested.

Same for this idea, also on the list:

Make the brands of those sources more visible to users. Media have long worried that the net commoditizes their news such that users learn about events “on Facebook” or “on Twitter” instead of “from the Washington Post.” We urge the platforms, all of them, to more prominently display media brands so users can know and judge the source — for good or bad — when they read and share. Obviously, this also helps the publishers as they struggle to be recognized online.

A key issue that Caulfield has repeatedly noted is that Facebook doesn’t really care whether you read articles that are posted; just whether you react to them, helping the platform learn more about you, in order to improve its ad targeting:

Facebook, on the other hand, doesn’t think the content is the main dish. Instead, it monetizes other people’s content. The model of Facebook is to try to use other people’s external content to build engagement on its site. So Facebook has a couple of problems.

First, Facebook could include whole articles, except for the most part they can’t, because they don’t own the content they monetize. (Yes, there are some efforts around full story embedding, but again, this is not evident on the stream as you see it today). So we get this weird (and think about it a minute, because it is weird) model where you get the headline and a comment box and if you want to read the story you click it and it opens up in another tab, except you won’t click it, because Facebook has designed the interface to encourage you to skip going off-site altogether and just skip to the comments on the thing you haven’t read.

Second, Facebook wants to keep you on site anyway, so they can serve you ads. Any time you spend somewhere else reading is time someone else is serving you ads instead of them and that is not acceptable.

The more I read about this, the more dispirited I become. The those of us who care about limiting fake news need to gather around a set of ideas and actions—Jarvis’s list is the best we have so far.

Publishers stop using chumboxes

Sapna Maheshwari and John Hermann’s article for the NYT, Publishers Are Rethinking Those ‘Around the Web’ Ads:

Usually grouped together under a label like “Promoted Stories” or “Around the Web,” these links are often advertisements dressed up to look like stories people might want to read. They have long provided much-needed revenue for publishers and given a wide range of advertisers a relatively affordable way to reach large and often premium audiences.

But now, some publishers are wondering about the effect these so-called content ads may be having on their brands and readers. This month, these ads stopped appearing on Slate. And The New Yorker, which restricted placement of such ads to its humor articles, recently removed them from its website altogether.

Among the reasons: The links can lead to questionable websites, run by unknown entities.

Sounds pretty terrible for readers. Just listen to Matt Crenshaw, VP of product marketing at Outbrain, one of the companies selling these terrible ads:

“As this space has grown up, this is becoming a very significant percentage-wise revenue source for publishers. We have been told from major, major publishers that we have become their No. 1 revenue provider,” he said, declining to name specific companies.

Herrman’s erstwhile colleague at the Awl, John Mahoney, previously produced an excellent and complete taxonomy of internet chum, the term given to these awful pieces of shit.

Instagram’s feed algorithm

Alex Parker on Medium discussing How Instagram’s algorithm is holding us captive:

Let’s be honest: the algorithm serves advertisers. Instagram is a free service, and it needs to make money. For years, it was free of advertisements. Then it had a few. Now, every few posts is sponsored. To tell the truth, I don’t mind the ads. They aren’t intrusive, they’re easy to scroll past, and I’m all for something I like finding ways to be sustainable. A business has to make money.

But why does it have to be at the expense of users and their enjoyment of a product?

[…]

As a journalist, who has a real-time Twitter feed inches from my face most hours of the day, I know I’m not the typical social media user (I’m also older than the average Instagram user, but age is just a number, right?). But because I use social networks so much, I want them to respond to my needs, rather than treating me like a captive pawn.

Parker is arguing that, as a heavy user, he should have a real-time view of what’s happening on Instagram. I can understand this—I exclusively use Tweetbot for Twitter so that I am always seeing posts in reverse-chronological order.

Meanwhile, Buzzfeed’s Mat Honan and Alex Kantrovitz interviewed Instagram CEO Kevin Systrom. The timeline came up, along with questions about real-time viewing:

Nowhere in our mission is it about being real-time. I don’t think we are focused on making sure you have a news feed of an unfolding event in real-time view. And I think that’s okay. You should still see rainbows, generally, together — especially if they’re good rainbows, in which case the best ones will rise to the top.

That’s OK, I guess, but it would be helpful to have an option to change the order. This wouldn’t need to affect advertising.

He also shared some other ways they thought about implementing ephemerality for what would become Instagram Stories:

As we dug into our user studies, I realized very quickly that we had to find a solution that made it so you didn’t have to post your profile,” Systrom explained. “After some tests, we added a check box that said ‘expire from my profile’ or ‘don’t post to my profile.’ But no one understood why they would do that.

I rarely ever look at the stories posted by people I follow—which are dominated by a handful of heavy users—and seldom post to my own. I’d be interested to find out usage rates across the 500m active users.

Apple Music’s new personalised playlists

Reggie Ugwu’s Inside Apple Music’s Second Act for Buzzfeed:

The other big change is the addition of two new personalized playlists: My Favorites Mix and My New Music Mix. The playlists are generated by algorithms, a first for the service, which has largely relied on human curation for its playlists up to this point. Revealing how the mixes operate for the first time to BuzzFeed News, Apple claimed a potential advantage over similar algorithmically personalized playlists, including Spotify’s Discover Weekly and Pandora’s Thumbprint Radio: deep historical knowledge of individual users’ tastes and habits, based on years of data carried over from iTunes.

If you gave high ratings to a song or album in your old iTunes library, or just played it a lot more than others, you’ll find that behavior reflected in your My Favorites Mix. Meanwhile, the My New Music Mix algorithm serves recently released songs — as well as songs that Apple Music knows you haven’t played before — that the service’s music experts have flagged as similar to others in your taste profile. Apple Music executives suggested even more personalized playlists will follow in the series; but only after prototypes have been vetted, with all possible outcomes — intentional and otherwise — given careful consideration.

I’m still using Spotify’s Discover Weekly and Release Radar on a daily basis, but the two Apple Music playlists are a good start. Looking forward to seeing more.

Update: Spotify are rolling out Daily Mix, similar to Apple Music’s My Favourites Mix.

Spotify’s Release Radar

Release Radar is Spotify’s latest personalised playlist. Whereas Discover Weekly updates on Mondays and takes its pick from all the entire Spotify catalogue, Release Radar updates on Fridays and focuses solely on the past few weeks’ releases.

Ben Popper, for The Verge, quoting Spotify’s Edward Newell:

When a new album drops, we don’t really have much information about it yet, so we don’t have any streaming data or playlisting data, and those are pretty much the two major components that make Discover Weekly work so well. So some of the innovation happening now for the product is around audio research. We have an audio research team in New York that’s been experimenting with a lot of the newer deep learning techniques where we’re not looking at playlisting and collaborative filtering of users, but instead we’re looking at the actual audio itself.

Discover Weekly is easily my favourite thing about any streaming service, and this appears to be just as good, in spite of the data challenges posed by focusing on new releases.

I got tracks by:

  • Favourite artists that I already know have new material out (Dinosaur Jr., Father John Misty)
  • Favourite artists that I didn’t know had new stuff (Wilco! Why didn’t anyone tell me about this?)
  • Long-forgotten artists I would likely otherwise never have heard of again (Cotton Mather)
  • Artist I haven’t heard of but seem up my street

It’s brilliant. My only issue is that these great features sit apart from my iTunes library, so Spotify can’t learn from my broader listening habits, but that’s clearly no fault of the product.

Facebook crack down on clickbait

Alex Peysakhovich and Kristin Hendrix:

We’ve heard from people that they specifically want to see fewer stories with clickbait headlines or link titles. These are headlines that intentionally leave out crucial information, or mislead people, forcing people to click to find out the answer. For example: “When She Looked Under Her Couch Cushions And Saw THIS… I Was SHOCKED!”; “He Put Garlic In His Shoes Before Going To Bed And What Happens Next Is Hard To Believe”; or “The Dog Barked At The Deliveryman And His Reaction Was Priceless.”

To address this feedback from our community, we’re making an update to News Feed ranking to further reduce clickbait headlines in the coming weeks. With this update, people will see fewer clickbait stories and more of the stories they want to see higher up in their feeds.