Why backfill cross-platform IDs at all
If your product was built on Spotify, your library is full of Spotify track IDs. That works fine until a user asks “can I play this on Tidal?”, or a label deal requires you to report Beatport-attributed plays, or you want to enrich a recommendation with the MusicBrainz work graph. The moment any second platform shows up, every track in the library needs its other IDs filled in.
Doing this from scratch is awkward — each DSP exposes a slightly different search API, and string-matching across them produces false positives (live versions, radio edits, regional reissues, remix re-uploads with no “Remix” in the title). The reliable shortcut is to match on ISRC— every commercial recording released since the 90s has one, and Spotify exports include it. SonoVault's ISRC index gives back every other platform's ID in a single call.
spotify_id, beatport_id, applemusic_id, tidal_id, discogs_id, musicbrainz_id. Anything the platform actually has — missing platforms are absent from the response, not null.The shape of the backfill
Two endpoints do all the work:
/v1/tracks/links— one track per call. Passisrc(or any platform-specific ID), get back every other platform's ID for that recording./v1/tracks/resolve— the same thing, in batches of up to 100 per request. One credit per input. This is what you use for a real backfill.
For tracks that have no ISRC in your source data, a fallback to /v1/tracks/search picks up the canonical ISRC from the best match, which you can then feed back through the same pipeline.
Send a batch to /v1/tracks/resolve
The request body is two fields: the type of input you're sending, and the items themselves. For ISRC the items are plain strings; for track_name each item is { artist, title }.
{
"input_type": "isrc",
"items": [
"GBDUW0000053",
"USQX91300108",
"GBARL1500236"
]
}The response is one entry per input line, in input order. Each entry tells you whether the lookup matched, the canonical SonoVault track if so, and the cross-platform links.
{
"results": [
{
"input": "GBDUW0000053",
"status": "matched",
"track": { "id": 123, "title": "One More Time", /* … */ },
"links": [
{ "source": "spotify", "external_id": "5W3cjX…", "url": "https://open.spotify.com/track/5W3cjX…" },
{ "source": "beatport", "external_id": "1234567", "url": "https://www.beatport.com/track/-/1234567" },
{ "source": "applemusic", "external_id": "1440650", "url": "https://music.apple.com/song/1440650" },
{ "source": "tidal", "external_id": "3789025", "url": "https://tidal.com/browse/track/3789025" }
]
},
{
"input": "USQX91300108",
"status": "not_found",
"track": null,
"links": []
},
/* … */
],
"partial": false,
"processed": 3,
"credits_used": 3,
"credits_remaining": 49997,
"message": null
}200 but partial: unresolved lines come back with status: "skipped_no_credits". Always check partial and credits_remaining on the response.Walk a full library in 100-row batches
Read your Spotify export, chunk by 100, POST each chunk, and write the results out alongside the original Spotify ID so you keep a join key. The whole thing fits in 60 lines:
import fs from "node:fs"; const API_KEY = process.env.SONOVAULT_API_KEY!; const BASE = "https://api.sonovault.now/v1"; const BATCH = 100; // /v1/tracks/resolve max interface SpotifyRow { isrc: string; spotify_id: string; title: string; } interface Link { source: string; external_id: string; url: string; } interface Result { input: string; status: "matched" | "not_found" | "skipped_no_credits"; links: Link[]; } async function resolveBatch(isrcs: string[]): Promise<Result[]> { const res = await fetch(`${BASE}/tracks/resolve`, { method: "POST", headers: { "x-api-key": API_KEY, "content-type": "application/json", }, body: JSON.stringify({ input_type: "isrc", items: isrcs }), }); if (!res.ok) throw new Error(`${res.status} ${res.statusText}`); const data = await res.json(); return data.results; } // Load Spotify export: isrc,spotify_id,title const rows: SpotifyRow[] = fs.readFileSync("./spotify.csv", "utf-8") .trim().split("\n").slice(1) .map(line => { const [isrc, spotify_id, title] = line.split(","); return { isrc, spotify_id, title }; }); // Only resolve rows that actually have an ISRC. Stragglers go to fallback. const resolvable = rows.filter(r => r.isrc); const missing = rows.filter(r => !r.isrc); const out: string[] = ["spotify_id,isrc,title,beatport_id,applemusic_id,tidal_id,discogs_id,musicbrainz_id"]; for (let i = 0; i < resolvable.length; i += BATCH) { const chunk = resolvable.slice(i, i + BATCH); const results = await resolveBatch(chunk.map(r => r.isrc)); for (let j = 0; j < chunk.length; j++) { const row = chunk[j]; const { links = [] } = results[j] ?? {}; const id = (s: string) => links.find(l => l.source === s)?.external_id ?? ""; out.push([ row.spotify_id, row.isrc, row.title, id("beatport"), id("applemusic"), id("tidal"), id("discogs"), id("musicbrainz"), ].join(",")); } console.log(`Resolved ${Math.min(i + BATCH, resolvable.length)} / ${resolvable.length}`); } fs.writeFileSync("./library.enriched.csv", out.join("\n")); console.log(`Wrote ${out.length - 1} resolved rows. ${missing.length} rows have no ISRC — run the fallback next.`);
For a 10K-row library that's 100 requests, takes about a minute, and burns 10K credits — well under the Growth plan's monthly quota. Run it serially; the endpoint is fast enough that parallelism mostly buys you rate-limit errors.
Handle the rows with no ISRC
Spotify ships ISRC on most rows, but not all — DJ-uploaded mixes and some podcasts come through blank. For those, search by artist + title, take the best match's ISRC, and feed it back through the resolver:
// For rows with no ISRC: fall back to /v1/tracks/search and adopt // the canonical ISRC from the top result, then re-resolve. async function findIsrcByName(artist: string, title: string): Promise<string | null> { const qs = new URLSearchParams({ artist, title, limit: "1" }); const res = await fetch(`${BASE}/tracks/search?${qs}`, { headers: { "x-api-key": API_KEY }, }); if (!res.ok) return null; const { results } = await res.json(); return results?.[0]?.isrc ?? null; } const recovered: { spotify_id: string; isrc: string }[] = []; for (const row of missing) { // row.title is "Track — Artist" in this example; split as needed. const [title, artist] = row.title.split(" — "); const isrc = await findIsrcByName(artist ?? "", title ?? ""); if (isrc) recovered.push({ spotify_id: row.spotify_id, isrc }); } console.log(`Recovered ${recovered.length} / ${missing.length} via fuzzy search`); // Feed the recovered array back through resolveBatch() the same way as the ISRC path.
Run it incrementally on new signups
Once the historical library is enriched, the same pipeline runs for each new user's import. The first time a brand-new recording is looked up by ISRC, SonoVault asks Spotify, Beatport, Apple Music, and Tidal directly and persists whatever it finds — so the second user to import the same track gets a cache hit.
For a steady-state app, budget ~1 credit per imported track on average. Most calls hit the cache; only the genuinely-new recordings cost an outbound DSP request behind the scenes.
Going further
- Reverse lookup. The same
/v1/tracks/linksendpoint acceptsspotify_id,beatport_id,applemusic_id, etc. Useful when the original ID is from a platform other than Spotify. - Dedupe while you're at it. Two rows with different Spotify IDs but the same canonical SonoVault
track.idare the same recording in different releases. See ISRC Lookup for the dedup story. - Pair with new releases. Your enriched library is the input for a tracking feed: label release tracker and (soon) a genre-filtered release radar both consume the same cross-platform shape.
Frequently asked questions
Do I need an ISRC for every track to do this?
No — but ISRC is the most accurate match key. For tracks without one, fall back to /v1/tracks/search(artist + title) and use the ISRC from the best result going forward. Spotify exports usually include ISRC, so this fallback only fires on tracks Spotify itself didn't ship one for (a small minority).
What's the difference between /v1/tracks/links and /v1/tracks/resolve?
/v1/tracks/linksis one track per call — give it any ID and it returns every other platform's ID. /v1/tracks/resolve is the bulk variant: up to 100 inputs in a single request. Use links for live UI calls, resolve for batch enrichment.
What happens if a track is on Spotify but not on Apple Music?
The response includes only the platforms that actually carry the recording. Missing platforms aren't an error — just absent from the links array. Check for the source you need before reading its external_id / url.
Do new releases need a fresh lookup, or are they pre-indexed?
When you query by ISRC, SonoVault actively asks Spotify, Beatport, Apple Music, and Tidal for that recording if we don't already have an ID. Any new IDs discovered are returned in the same response and persisted, so the second call comes straight from the DB.
How many credits does this cost?
One credit per input on /v1/tracks/resolve — a 10K-row library is 10K credits. The Starter plan (50K/month) covers about five full backfills; Growth (500K) covers ongoing daily incremental runs for a library of any reasonable size.