nn {RANN}R Documentation

Near Neighbour Search

Description

Uses a kd-tree to find the p number of near neighbours for each point in an input/output dataset. The advantage of the kd-tree is that it runs in O(M log M) time.

Usage

nn(data, mask=rep.int(1, times=ncol(data)-1), p=min(10,nrow(data)))

nn2(data, query, k=min(10,nrow(data)),treetype=c("kd","bd"),
	searchtype=c("standard","priority","radius"),radius=0.0,eps=0.0)

Arguments

data

An input-output dataset. FOR nn THE OUTPUT MUST BE IN THE RIGHT MOST COLUMN OF A DATA FRAME OR MATRIX. nn2 uses ALL columns.

query

nn2: A set of points that will be queried against data - must have same number of columns.

mask

nn: A vector of 1's and 0's representing input inclusion/exclusion. The default mask is all 1's (i.e. include all inputs in the test).

p

nn:The maximum number of near neighbours to compute. The default value is set to 10.

k

nn2:The maximum number of near neighbours to compute. The default value is set to 10.

treetype

nn2: Either the standard kd tree or a bd (box-decomposition, AMNSW98) tree which may perform better for larger point sets

searchtype

nn2: See details

radius

nn2: radius of search for searchtype='radius'

eps

nn2: error bound: default of 0.0 implies exact nearest neighbour search

Details

The algorithm itself works by calculating the nearest neighbour distances in input space. This calculation is achieved in O(N log N) time, where N is the number of data points using Bentley's kd-tree. The RANN package utilizes the Approximate Near Neighbor (ANN) C++ library, which can give the exact near neighbours or (as the name suggests) approximate near neighbours to within a specified error bound. We use EXACT near neighbours in nn but nn2 provides options for approximate searches. For more information on the ANN library please visit http://www.cs.umd.edu/~mount/ANN/.

Search types: priority visits cells in increasing order of distance from the query point, and hence, should converge more rapidly on the true nearest neighbour, but standard is usually faster for exact searches. radius only searches for neighbours within a specified radius of the point. If there are no neighbours then nn.idx will contain 0 and nn.dists will contain 1.340781e+154 for that point.

Value

nn.idx

A MxP data.frame returning the near neighbour indexes.

nn.dists

A MxP data.frame returning the near neighbour Euclidean distances.

Author(s)

Samuel E. Kemp. To report any bugs or suggestions please email: sekemp@glam.ac.uk Gregory Jefferis. jefferis@gmail.com

References

Bentley J. L. (1975), Multidimensional binary search trees used for associative search. Communication ACM, 18:309-517.

Arya S. and Mount D. M. (1993), Approximate nearest neighbor searching, Proc. 4th Ann. ACM-SIAM Symposium on Discrete Algorithms (SODA'93), 271-280.

Arya S., Mount D. M., Netanyahu N. S., Silverman R. and Wu A. Y (1998), An optimal algorithm for approximate nearest neighbor searching, Journal of the ACM, 45, 891-923.

Examples

# A noisy sine wave example
x1 <- runif(100, 0, 2*pi)
x2 <- runif(100, 0,3)
e  <- rnorm(100, sd=sqrt(0.075)) 
y <- sin(x1) + 2*cos(x2) + e
DATA <- data.frame(x1, x2, y)		
nearest <- nn(DATA)

[Package RANN version 2.1.1 Index]