Description
This tool calculates kernel density estimates based on a set of input points. This tool implements two types of kernel: quartic and Gaussian (bivariate normal). The quartic kernel is an approximation to the Gaussian kernel that is used because it is computationally simpler and faster. However, for most scientific applications there is little justification for using the quartic kernel over the Gaussian kernel. The Gaussian kernel is the default in this tool, although the quartic kernel has been included in order to allow users to make comparisons with software packages like Spatial Analyst which calculate only the quartic kernel.
The bandwidth you provide will depend on the type of kernel used in the calculation. If the kernel is bivariate normal the bandwidth is the covariance matrix for a bivariate normal distribution. Although this is a 2x2 matrix, you need only provide three parameters because the two parameters representing the covariance between x and y are identical. The three parameters needed are thus: the standard deviation for x, the standard deviation for y, and the covariance. Note that some software packages require you to provide a bandwidth parameter, h, while others require h^2. Although h is smaller than h^2 and therefore easier to work with, h^2 is the correct representation for the covariance matrix. It is important to be aware of how the bandwidth is represented when comparing the output from different software packages.
If the kernel is the quartic approximation to the bivariate normal distribution, then you only specify a single value that represents the radius beyond which the density estimate for the kernel is 0. Thus, the quartic kernel bandwidth parameters corresponds to a real distance on the ground, unlike the bandwidth for the bivariate normal kernel which is a covariance matrix. Thus, these two bandwidths do not directly map. You cannot, for instance, estimate the optimal bandwidth using a bivariate normal kernel algorithm (like least squared cross validation) and then use it in a quartic kernel calculation: the optimal bandwidth for the quartic kernel will be very different.
It takes some experience to learn what suitable cell size values are. A cell size that is too large will result in a 'blocky' output raster that is a poor statistical approximation to a continuous surface. A cell size that is too small will result in a very large output raster that takes a long time to calculate. I suggest the following rule of thumb to calculate a reasonable bandwidth. In the case of a bivariate normal kernel, take the square root of the x or y covariance value (whichever is smaller) and divide by 10. For a quartic kernel, divide the single bandwidth parameter by 10. (I usually round to the nearest big number - so 36.7 becomes 40). Before using this rule of thumb value calculate how many cells this will result in for the output (take the width and height of you input points, divide by the cell size, and multiply the resulting numbers together). If you get a value somewhere between 1-50 million, then you have a reasonable value. If you have a value much larger then 50 million cells then consider increasing the cell size.
A scaling factor is often used in KDE calculations to prevent a loss of precision in density values. Point density values are often very small numbers, and some raster formats do not support double-precision values (the Imagine img format is the only format that does, and for that reason I recommend it as the format for the output raster). The scaling factor is just a value that the point density values are multiplied to make them larger. The default is 1000000. Again, scaling factors may vary between software packages and this is something that must be considered when making comparisons.
By default the output extent is automatically calculated as the extent of the input point dataset plus a suitable buffer distance that ensures the density surface is not unduly truncated at the edges. However, you may override this extent using the 'ext' option which requires that you supply the minimum x, maximum x, minimum y, and maximum y values of the desired extent.
Syntax
kde(in, out, bandwidth, cellsize, [scalingfactor], [kernel], [ext], [where]);
| in | the input point data source | |
| out | the output raster data source | |
| bandwidth | the bandwidth (see the help documentation for details) | |
| cellsize | the cell size dimension of the output raster | |
| [scalingfactor] | multiplies densities by this value - see help for details (default=1000000) | |
| [kernel] | kernel type - GAUSSIAN, or QUARTIC (default=GAUSSIAN) | |
| [ext] | the extent (xmin, xmax, ymin, ymax) of the output raster (default=determine automatically | |
| [where] | the selection statement that will be applied to the feature data source to identify a subset of features to process (see full Help documentation for further details) |
Example
kde(in="C:\data\locs.shp", out="C:\data\kdeloc1.img", bandwidth=c(10000,10000,0), cellsize=20);
kde(in="C:\data\locs.shp", out="C:\data\kdeloc2", bandwidth=10000, cellsize=20);
kde(in="C:\data\locs.shp", out="C:\data\kdeloc3.tif", bandwidth=500, cellsize=50, kernel="QUARTIC");