Difference in 2D KDE produced using kde2d (R) and ksdensity2d (Matlab) -
while trying port code matlab r have run problem. gist of code produce 2d kernel density estimate , simple calculations using estimate. in matlab kde calculation done using function ksdensity2d.m. in r kde calculation done kde2d mass package. lets want calculate kde , add values (this not intend do, serves purpose). in r can done by
library(mass) set.seed(1009) x <- sample(seq(1000, 2000), 100, replace=true) y <- sample(seq(-12, 12), 100, replace=true) kk <- kde2d(x, y, h=c(30, 1.5), n=100, lims=c(1000, 2000, -12, 12)) sum(kk$z) which gives answer 0.3932732. when using ksdensity2d in matlab using same exact data , conditions answer 0.3768. looking @ code kde2d noticed bandwidth divided 4
kde2d <- function (x, y, h, n = 25, lims = c(range(x), range(y))) { nx <- length(x) if (length(y) != nx) stop("data vectors must same length") if (any(!is.finite(x)) || any(!is.finite(y))) stop("missing or infinite values in data not allowed") if (any(!is.finite(lims))) stop("only finite values allowed in 'lims'") n <- rep(n, length.out = 2l) gx <- seq.int(lims[1l], lims[2l], length.out = n[1l]) gy <- seq.int(lims[3l], lims[4l], length.out = n[2l]) h <- if (missing(h)) c(bandwidth.nrd(x), bandwidth.nrd(y)) else rep(h, length.out = 2l) if (any(h <= 0)) stop("bandwidths must strictly positive") h <- h/4 ax <- outer(gx, x, "-")/h[1l] ay <- outer(gy, y, "-")/h[2l] z <- tcrossprod(matrix(dnorm(ax), , nx), matrix(dnorm(ay), , nx))/(nx * h[1l] * h[2l]) list(x = gx, y = gy, z = z) } a simple check see if difference in bandwidth reason difference in results then
kk <- kde2d(x, y, h=c(30, 1.5)*4, n=100, lims=c(1000, 2000, -12, 12)) sum(kk$z) which gives 0.3768013 (which same matlab answer).
so question then: why kde2d divide bandwidth four? (or why doesn't ksdensity2d?)
at mirrored github source, lines 31-35:
if (any(h <= 0)) stop("bandwidths must strictly positive") h <- h/4 # s's bandwidth scale ax <- outer(gx, x, "-" )/h[1l] ay <- outer(gy, y, "-" )/h[2l] and file kde2d(), suggests looking @ file bandwidth. says:
...which scaled width argument of density , give answers 4 times large.
but why?
density() says width argument exists sake of compatibility s (the precursor r). comments in source density() read:
## s has width equal length of support of kernel ## except gaussian 4 * sd. ## r has bw multiple of sd. the default gaussian one. when bw argument unspecified , width is, width substituted in, eg.
library(mass) set.seed(1) x <- rnorm(1000, 10, 2) all.equal(density(x, bw = 1), density(x, width = 4)) # call different however, because kde2d() apparently written remain compatible s (and suppose written s, given it's in mass), ends divided four. after flipping relevant section of mass book (around p.126), seems may have picked 4 strike balance between smoothness , fidelity of data.
in conclusion, guess kde2d() divides 4 remain consistent rest of mass (and other things written s), , way you're going things looks fine.
Comments
Post a Comment