SonoVault is in open beta — signups are live. Get your free API key →
ResourcesLibrary ManagementNormalising Genres Across Spotify, Beatport, and Discogs

Normalising Genres Across Spotify, Beatport, and Discogs

Every source labels genres differently. Use SonoVault's canonical genre hierarchy to collapse a mixed-source library — 'tech house', 'Tech House', and 'Electronic/Dance/Tech House' — into one taxonomy.

Why mixed-source genres are a mess

Pull the same recording from three sources and you get three different genre strings:

  • Spotify: ["electronic", "dance pop", "french touch"]
  • Beatport: "Electronic" > "House" > "French House"
  • Discogs: "Electronic / Dance / Tech House / Techno"

Each source uses its own taxonomy, its own casing, its own delimiter, and its own assumptions about hierarchy. None of these are wrong, but they don't compare — a filter for “House” misses the Spotify rows; a filter for “french touch” misses everything else.

SonoVault classifies every track into one canonical hierarchy at ingest time, using ~500 regex patterns over the source-provided strings. The output is two arrays on every track payload — genre (main) and subgenre — that are consistent regardless of where the track came from.

Build
1

Fetch the canonical hierarchy

GET /v1/genres returns the full canonical tree — every main genre and every subgenre, with the parent main listed by name.

GET/v1/genres200 OK
{
  "genres": [
    { "id": 1,  "name": "Electronic", "type": "main",     "parent": null },
    { "id": 2,  "name": "Ambient",    "type": "subgenre", "parent": "Electronic" },
    { "id": 12, "name": "House",      "type": "main",     "parent": null },
    { "id": 47, "name": "French House", "type": "subgenre", "parent": "House" },
    { "id": 48, "name": "Tech House",  "type": "subgenre", "parent": "House" },
    /* … */
  ]
}

You don't need to fetch this on every request — it's stable for weeks at a time. Cache it locally and rebuild your hierarchy from the API once a day:

TypeScriptbuild-hierarchy.ts
const API_KEY = process.env.SONOVAULT_API_KEY!;
const BASE    = "https://api.sonovault.now/v1";

interface Genre      { id: number; name: string; type: "main" | "subgenre"; parent: string | null; }
interface Hierarchy  { mains: string[]; subsByMain: Record<string, string[]>; }

async function buildHierarchy(): Promise<Hierarchy> {
  const res = await fetch(`${BASE}/genres`, {
    headers: { "x-api-key": API_KEY },
  });
  const { genres }: { genres: Genre[] } = await res.json();

  const mains      = genres.filter(g => g.type === "main").map(g => g.name);
  const subsByMain: Record<string, string[]> = {};

  for (const g of genres) {
    if (g.type === "subgenre" && g.parent) {
      (subsByMain[g.parent] ??= []).push(g.name);
    }
  }
  return { mains, subsByMain };
}

const tree = await buildHierarchy();
console.log(`${tree.mains.length} main genres, ${
  Object.values(tree.subsByMain).flat().length
} subgenres`);
2

Read canonical genres straight off the track

Every track-returning endpoint already includes genre and subgenreas arrays of canonical names. You don't need to do the mapping client-side — it's done at ingest. The same track via different ingest sources comes out identical:

TypeScripttrack-genres.ts
// Same track via two different sources — same canonical taxonomy.

// Spotify-flavoured ISRC:
{
  "id": 123,
  "title": "One More Time",
  "genre":     ["House"],
  "subgenre":  ["French House"]
}

// Beatport-flavoured ISRC:
{
  "id": 123,
  "title": "One More Time",
  "genre":     ["House"],
  "subgenre":  ["French House"]
}
💡genre and subgenreare arrays because a track can legitimately belong to more than one bucket — a track can be both House and Disco, or Hip-Hop and R&B. Render both, or pick the first if your UI only has room for one.
3

Bulk-normalise an existing library

For a library you already have, resolve every row by ISRC and replace the source-specific genre string with the canonical arrays. Same batching pattern as the cross-platform backfill — 100 ISRCs per request:

TypeScriptnormalise-library.ts
// Map a mixed-source library to the canonical taxonomy.
// Input rows already have ISRC (see the cross-platform backfill article).

interface LibraryRow { id: string; isrc: string; source_genre: string; }
interface NormalisedRow { id: string; main: string[]; sub: string[]; }

async function resolveBatch(isrcs: string[]) {
  const res = await fetch(`${BASE}/tracks/resolve`, {
    method:  "POST",
    headers: { "x-api-key": API_KEY, "content-type": "application/json" },
    body:    JSON.stringify({ input_type: "isrc", items: isrcs }),
  });
  return (await res.json()).results;
}

const out: NormalisedRow[] = [];
const BATCH = 100;

for (let i = 0; i < rows.length; i += BATCH) {
  const chunk   = rows.slice(i, i + BATCH);
  const results = await resolveBatch(chunk.map(r => r.isrc));

  chunk.forEach((row, j) => {
    const track = results[j]?.track;
    if (!track) return;
    out.push({
      id:   row.id,
      main: track.genre    ?? [],
      sub:  track.subgenre ?? [],
    });
  });
}

console.log(`Normalised ${out.length} / ${rows.length} rows`);

After this runs, every row in out uses the same vocabulary regardless of where it originally came from. You can now filter, group, and chart by genre across the whole library.

Going further

  • Filter browse by canonical id. /v1/tracks/browse?genreId=12(House) returns tracks tagged in the canonical hierarchy — much more reliable than free-text genre matching against your library's pre-normalised state.
  • Suggest corrections. POST /v1/tracks/:id/suggestions lets paid-tier users propose a better genre when the classifier got something wrong. Approved suggestions become the top of the genre priority chain.
  • Combine with dedup. Library dedup collapses duplicate rows; normalisation then gives the survivors consistent genre labels. Run dedup first, normalise second.

Frequently asked questions

Why do I need to normalise genres at all? Can't I just use the source string?

If your library only comes from one source, sure. The moment a second source shows up the genres stop comparing — Spotify says “electronic”; Beatport says “Tech House”; Discogs says “Electronic / Dance / Tech House / Techno”. None of those are wrong, but you can't filter, group, or display them consistently. Mapping to one canonical hierarchy fixes that.

What is the SonoVault canonical hierarchy?

25 main genres at level 0 (Electronic, House, Techno, Pop, Rock, etc.), with subgenres nested under each (Ambient under Electronic, French House under House). Max depth is 3. The full list is at /v1/genres. Track responses already carry canonical names in their genre and subgenrearrays — you don't need to map yourself.

Why are House, Techno, and DnB top-level instead of under Electronic?

Reflects how Beatport and DJ-tooling users actually think about genre. “Electronic” is the umbrella for non-dance-floor styles (Ambient, Downtempo, IDM, Synthwave); the dance subgenres (House, Techno, Trance, DnB, Dubstep, Breakbeat, Garage, Hardstyle, Hardcore) are top-level on their own.

What happens to genres that don't map cleanly?

SonoVault classifies them under the closest parent based on ~500 regex patterns. Anything that genuinely doesn't fit goes into an “other” bucket internally, but those are never returned as a canonical genre — track responses only carry classified genres so your library stays consistent.

Ready to build?

Free API key. No credit card. 1,000 requests to get started.

Get Free API Key
More in Library Management
Library ManagementBackfilling Cross-Platform Track IDs for a Spotify-Only Library7 min readLibrary ManagementISRC Lookup: Resolve Track Identifiers Across Six Platforms in One Call5 min read