All of these methods rely on variants of hashing in order to obtain near constant time behavior in distributing the data and it is preferable that they are as close as possible to being distancepreserving. Specifically, spatial objects in proximity should have similar hash values. In particular, it is desirable to be able to estimate how far apart two objects are (within a given error bound) by just considering their hash values. Such hash functions enable performing an approximate range query using simple hash table lookup operations. Other issues involve the parallel processing of spatial queries. Some easy examples are the distance join query which finds pairs (p,q) of objects (from two different sets) where the distance between p and q is less than a given threshold, or computing the shortest paths from each node to every other node in a road network. More difficult are the spatial problems which can not be easily decomposed into multiple tasks running in parallel, e.g., the distance semijoin query, and network Voronoi diagram construction. This requires developing a generic method to traverse a graph or a tree in parallel to solve these query problems. Ideally, the method should require little or no communication between parallel tasks which will be accomplished by allowing the parallel tasks to produce redundant results which can then be pruned.
The tools that we develop will help improve the robustness and scalability for spatial data management. Results from our investigation on parallel query processing can be useful for query problems which require traversing a tree or a graph which are often spatially embedded. Having a method to traverse a graph or a tree in parallel that requires little or no communication enables us to process many types of spatial queries using distributed computing resources where communication can be very costly. Specifically, we expect our tools to enable spatial applications such as online mapping, computer aided design, online gaming and scientific simulations to handle terabytes of spatial data while it is impossible or inefficient to do with the current technologies. This is of utility to all organizations that process spatial data and attempts will be made to use it in some government agencies.
NSF Grant IIS1320791, NSF ICorps Team 708, previous NSF annual report (2015) is here
Fast computation of the distance oracle ASDO (all store distance oracle) to achieve a highthroughput for shortest distance retrieval.
A framework called SPDO is presented which implements an extremely fast distributed algorithm for computing road network distance queries on Apache Spark. The approach extends our previous work of developing the distance oracle which has now been adapted to use Spark’s resilient distributed dataset (RDD).
Describes work on summarization and visualization of streaming local news.
A detailed evaluation of the two architectures for a variety of analytical query processing tasks such as region, KNN, distance matrix and trajectory queries is presented and the lessons learned are discussed.
Deals with Dynamic map labeling which arises when the location associated with the label is constantly changing.
Evaluation of algorithms for computing a spatiotextual similarity join.
A full version webbased demo can be found here, which is based on our previous year's demo. We participated in the NSF ICorps program cohort of Mar. 2016  May. 2016 to perform customer discovery for the possible commercialization of our distance oracle technology. During our customer discovery we found that shortest distance/path queries in real applications, such as package delivery and local awareness ads, occur primarily in a citylevel region, i.e., very few of the shortest distance/path queries take place across the entire USA continent. Our experiments as in the following graph show that using geodesic distance (Euclidean distance) to approximate network distance can produce significant errors. The following graph tabulates the ratio of network distance to geodesic distance for all pairs of nvertices on three example city road networks. The ratio is called the Route Directness Indexand shows the distortion resulting from approximating network distance with geodesic distance. On the average, the value of this index for New York City (NYC) is 1.213, for the San Francisco Bay Area is 1.384, and for Salt Lake City (SLC) it is 1.475, where NYC is low because most of the streets are laid out on a grid. But even for NYC, use of the approximation finds that 40% of the distance queries will have an error of 20% or more.
Notice that the distance computed by ASDO is very close in value to the true distance (the epsilon error tolerance value of 0.1 is a bound but in fact is much in excess of the actual error that we observed and thus it is quite reasonable to use it). Please also note that the Google computed network distance is only available if you use the Chrome browser; otherwise, it is missing. Additionally, we provide the SQL query that we executed against the oracle to obtain the network distance.
The second part of the demo consists of the computation of an origindestination distance matrix and for a set of userdefined delivery locations and the computation of a route that visits all of these locations. The first step is to click on the "Clear" button to start afresh. The delivery locations are specified by repeatedly clicking on the "Add More Delivery!" button, while the origindestination distance matrix is obtained by clicking the "Compute Distance Matrix" button. The route is obtained by clicking the "Solve Delivery" button. Users can also bypass clicking the "Compute Distance Matrix" button and compute the route directly in which case the time to do so is greater as it also includes the time needed to compute the origindestination distance matrix. Notice that we also provide the actual times needed to do these calculations in the row labeled "Analytics" where we also provide the time needed with the alternative methods that are based on contraction hierarchies (termed the "CH" method"). The CH methods do not scale whereas ours do and this scaling process is the result of the research that we have been doing under this grant. This demo reveals a speedup factor of approximately 5.
Observe the speedup of ASDO over the true distance using the CH method which can be as high as 100X. Moreover, ASDO permits simultaneous computation of the distance and can handle as many as 65,000 distance lookups per second assuming one database server. Adding Y servers will increase the number of lookups per second by a factor of Y. On the other hand, the true distance is computed sequentially and cannot be parallelized on one machine although it can be parallelized over Y machines in which case it is sped up by a factor of Y but this is still a much slower throughput than what is achievable with ASDO.
In addition, as the size of the region is restricted to a city or several cities, we can easily load all ASDO results inside memory instead of a diskbased database. Our evaluations show that ASDO inside memory can achieve 7 million shortest distances per query using 8 threads on a regular Macbook Pro laptop. It is 100X faster than the ASDO inside a diskbased database. It can compute a (10K X 10K) distance matrix within 13 seconds, which is desirable by delivery companies such as USPS, UPS, and Fedex. We are exploring new applications that require a super high throughput, and developing solutions using our memorybased ASDO.
1st Location

Lat: Lon: 
2nd Location

Lat: Lon: 
ASDO Network Distance


Quad Depth


Euclidean Distance


Google Maps Distance


True Network Distance


SQL Query


ASDO Delivery API


Analytics Runtime

Date of last update: Aug 4th, 2016