Difference in 2D KDE produced using kde2d (R) and ksdensity2d (Matlab) -


while trying port code matlab r have run problem. gist of code produce 2d kernel density estimate , simple calculations using estimate. in matlab kde calculation done using function ksdensity2d.m. in r kde calculation done kde2d mass package. lets want calculate kde , add values (this not intend do, serves purpose). in r can done by

    library(mass)     set.seed(1009)     x <- sample(seq(1000, 2000), 100, replace=true)     y <- sample(seq(-12, 12), 100, replace=true)     kk <- kde2d(x, y, h=c(30, 1.5), n=100, lims=c(1000, 2000, -12, 12))     sum(kk$z) 

which gives answer 0.3932732. when using ksdensity2d in matlab using same exact data , conditions answer 0.3768. looking @ code kde2d noticed bandwidth divided 4

    kde2d <- function (x, y, h, n = 25, lims = c(range(x), range(y)))      {     nx <- length(x)     if (length(y) != nx)       stop("data vectors must same length")     if (any(!is.finite(x)) || any(!is.finite(y)))       stop("missing or infinite values in data not allowed")     if (any(!is.finite(lims)))       stop("only finite values allowed in 'lims'")     n <- rep(n, length.out = 2l)     gx <- seq.int(lims[1l], lims[2l], length.out = n[1l])     gy <- seq.int(lims[3l], lims[4l], length.out = n[2l])     h <- if (missing(h))      c(bandwidth.nrd(x), bandwidth.nrd(y))     else rep(h, length.out = 2l)     if (any(h <= 0))       stop("bandwidths must strictly positive")     h <- h/4     ax <- outer(gx, x, "-")/h[1l]     ay <- outer(gy, y, "-")/h[2l]     z <- tcrossprod(matrix(dnorm(ax), , nx), matrix(dnorm(ay),       , nx))/(nx * h[1l] * h[2l])     list(x = gx, y = gy, z = z)     } 

a simple check see if difference in bandwidth reason difference in results then

    kk <- kde2d(x, y, h=c(30, 1.5)*4, n=100, lims=c(1000, 2000, -12, 12))     sum(kk$z) 

which gives 0.3768013 (which same matlab answer).

so question then: why kde2d divide bandwidth four? (or why doesn't ksdensity2d?)

at mirrored github source, lines 31-35:

if (any(h <= 0))     stop("bandwidths must strictly positive") h <- h/4                            # s's bandwidth scale ax <- outer(gx, x, "-" )/h[1l] ay <- outer(gy, y, "-" )/h[2l] 

and file kde2d(), suggests looking @ file bandwidth. says:

...which scaled width argument of density , give answers 4 times large.

but why?

density() says width argument exists sake of compatibility s (the precursor r). comments in source density() read:

## s has width equal length of support of kernel ## except gaussian 4 * sd. ## r has bw multiple of sd. 

the default gaussian one. when bw argument unspecified , width is, width substituted in, eg.

library(mass)  set.seed(1) x <- rnorm(1000, 10, 2) all.equal(density(x, bw = 1), density(x, width = 4)) # call different 

however, because kde2d() apparently written remain compatible s (and suppose written s, given it's in mass), ends divided four. after flipping relevant section of mass book (around p.126), seems may have picked 4 strike balance between smoothness , fidelity of data.

in conclusion, guess kde2d() divides 4 remain consistent rest of mass (and other things written s), , way you're going things looks fine.


Comments

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

session - Logging Out Using PHP -