Jensen–Shannon Divergence

(en.wikipedia.org)

58 points | by teleforce 3 days ago

4 comments

vi_sextus_vi 10 minutes ago
Interesting app of JSD: to surface hidden interestingness
https://www.mdpi.com/2076-3417/15/19/10395
Collective Dynamics in the Awakening of Sleeping Beauty Patents: A BERTopic Approach
(Found it while searching about JSD and counterfactuality)
wilted-iris 2 hours ago
This looks interesting and I'm curious if anyone has more context for why it's on the frontpage today.
[-]
- acjohnson55 2 hours ago
  Every now and then, a random math or science concept hits front page. Usually, people chime in with interesting perspectives on it. Guess we'll see.
  [-]
  - raddan 1 hour ago
    I’d like to know what the advantage is over KL divergence. It seems like the important idea is symmetry? Not clear to me why that matters; I’d love to know what application this is used for.
    [-]
    - fumeux_fume 1 hour ago
      There are many applications. I mainly see it used for detecting drift in datasets for ML models. It has a nice benefit over the KL divergence in the case where the two distributions you're measuring have no overlap (KL won't compute, but JS will just return 0). Also, when taking its square root you get a distance rather than a divergence which allows you to compare it to JSD measurements of other distributions.
    - andy99 1 hour ago
      Iirc (and I could be wrong, this is from memory) JS divergence is what is minimized in GANs (where we simultaneously train a generator and real/synthetic classifier with the goal of each trying to beat the other to converge on real looking synthetic data), at least for some training methods.
      I don’t think GANs are used much now in comparison to diffusion models, but as recently as a few years ago they were the standard way to make fake data, a la “this face does not exist”
mountainriver 28 minutes ago
Why not use this instead of KL in reinforcement learning?
lasermatts 1 hour ago
The Hacker News hive mind is real!
I was just reading about JSD the other day after reading about KL divergence...seems like a nifty measurement device for things like sim-to-real evaluations in robots (the reason I was going down this rabbit hole.)
I think the appeal over raw KL is that JSD behaves a bit nicer when the simulated and real distributions don't perfectly overlap...which is basically always true in the real world!