// 代码摘自lucene-core-8.2.0, SloppyMath工具类
/**
* Returns the Haversine distance in meters between two points
* given the previous result from {@link #haversinSortKey(double, double, double, double)}
* @return distance in meters.
*/
public static double haversinMeters(double sortKey) {
return TO_METERS * 2 * asin(Math.min(1, Math.sqrt(sortKey * 0.5)));
}
/**
* Returns a sort key for distance. This is less expensive to compute than
* {@link #haversinMeters(double, double, double, double)}, but it always compares the same.
* This can be converted into an actual distance with {@link #haversinMeters(double)}, which
* effectively does the second half of the computation.
*/
public static double haversinSortKey(double lat1, double lon1, double lat2, double lon2) {
double x1 = lat1 * TO_RADIANS;
double x2 = lat2 * TO_RADIANS;
double h1 = 1 - cos(x1 - x2);
double h2 = 1 - cos((lon1 - lon2) * TO_RADIANS);
double h = h1 + cos(x1) * cos(x2) * h2;
// clobber crazy precision so subsequent rounding does not create ties.
return Double.longBitsToDouble(Double.doubleToRawLongBits(h) & 0xFFFFFFFFFFFFFFF8L);
}
// haversin
// TODO: remove these for java 9, they fixed Math.toDegrees()/toRadians() to work just like this.
public static final double TO_RADIANS = Math.PI / 180D;
public static final double TO_DEGREES = 180D / Math.PI;
// Earth's mean radius, in meters and kilometers; see http://earth-info.nga.mil/GandG/publications/tr8350.2/wgs84fin.pdf
private static final double TO_METERS = 6_371_008.7714D; // equatorial radius
private static final double TO_KILOMETERS = 6_371.0087714D; // equatorial radius
/**
* Returns the Haversine distance in meters between two points
* specified in decimal degrees (latitude/longitude). This works correctly
* even if the dateline is between the two points.
* <p>
* Error is at most 4E-1 (40cm) from the actual haversine distance, but is typically
* much smaller for reasonable distances: around 1E-5 (0.01mm) for distances less than
* 1000km.
*
* @param lat1 Latitude of the first point.
* @param lon1 Longitude of the first point.
* @param lat2 Latitude of the second point.
* @param lon2 Longitude of the second point.
* @return distance in meters.
*/
public static double haversinMeters(double lat1, double lon1, double lat2, double lon2) {
return haversinMeters(haversinSortKey(lat1, lon1, lat2, lon2));
}
Added NumericRangeQuery and NumericRangeFilter, a fast alternative to RangeQuery/RangeFilter for numeric searches. They depend on a specific structure of terms in the index that can be created by indexing using the new NumericField or NumericTokenStream classes. NumericField can only be used for indexing and optionally stores the values as string representation in the doc store. Documents returned from IndexReader/IndexSearcher will return only the String value using the standard Fieldable interface. NumericFields can be sorted on and loaded into the FieldCache. (Uwe Schindler, Yonik Seeley, Mike McCandless)
The region quadtree represents a partition of space in two dimensions by decomposing the region into four equal quadrants, subquadrants, and so on with each leaf node containing data corresponding to a specific subregion. Each node in the tree either has exactly four children, or has no children (a leaf node). The height of quadtrees that follow this decomposition strategy (i.e. subdividing subquadrants as long as there is interesting data in the subquadrant for which more refinement is desired) is sensitive to and dependent on the spatial distribution of interesting areas in the space being decomposed. The region quadtree is a type of trie.
3.4 Elasticsearch 5.0 版本
方案优化的探索是没有没有止境的,Lucene的核心工程师 Michael McCandless受到论文《Bkd-Tree: A Dynamic Scalable kd-Tree》启发,基于BKD tree再次升级了地理位置数据索引建模和查询功能。
这个数据结构不仅仅是用于解决地理位置查询问题,更是数值类数据索引建模的通用方案。它可以处理一维的数值,从byte到BigDecimal, IPv6地址等等;它也可以处理二维乃至于N维的数据检索问题。
LUCENE-6825
This can be used for very fast 1D range filtering for numerics, removing the 8 byte (long/double) limit we have today, so e.g. we could efficiently support BigInteger, BigDecimal, IPv6 addresses, etc.
It can also be used for > 1D use cases, like 2D (lat/lon) and 3D (x/y/z with geo3d) geo shape intersection searches.
...
It should give sizable performance gains (smaller index, faster searching) over what we have today, and even over what auto-prefix with efficient numeric terms would do.
3.5 后续发展
Geo查询能力的迭代和变迁,其实也是Elasticsearch作为一个数据库对数值查询能力的升级和优化,扩展产品的适用场景,让使用者打破对Elasticsearch只能做全文检索的偏见。从全文检索数据库扩展到分析型数据库,Elasticsearch还有很长的路要走。
按照 Michael McCandless的设想,当前的多维数据只能是单个点,但是有些场景需要将形状作为一个维度进行索引。在这种需求下,需要通过一种更普适化的k-d tree ,即R-Tree来实现。
路漫漫其修远兮,ES从2.0版本支持geo-spatial开始经历6年的发展,已经走了很远,然而依然有很多值得探索的领域和场景。 参考: