geohash string length and accuracy - geohashing

if length of geohash string is more, it is more accurate. But is there any direct relationship like if length is 7 it is providing 100 meter accuracy,
i.e. if two geohash (and either of their bounding box) is having first 7 char matching, both should be near 100 meter etc?
I am using geohash for finding, all near-by location for given geohash, with their distance
Also any directway to calculate distance between two geo-hash? (one way is to decode them to lat/lng, and then calculate distance)
Thanks

Saw a lot of confusion around geohashing so I am posting my understanding so far.
The principle behind geohash is very simple, you can create your own version.
For instance consider following geo-point,
156.34234534,-23.343423345
In the above example, 156 represents degrees, 2 digits after decmal (34) represents
decimal minute and rest, (34.5334) represents seconds.
If you remember school geography circumference of earth at equator is about 40,000kms and,
number of degrees around the earth (latitudes or longitudes) is 360. So at the widest
point each degree of latitude and longitude span equals to about 110kms (40,000/360).
So if you encode the above coordinates as, "156-23" (including negative sign), this will give you (110kmx110km) box.
You can go on and increase the precision,
Fist digit of minute (156.3-23.3) will give you (10kmx10km) box (each minute span equals 1km).
Increase this to include first digit of second you get (100mx100m)box,
each extra digit will add precision to another degree.
Geohashing is just the way to represent the above figure in an encoded form. You can happily use the above format as well!

Was curious about this myself.
If its any good to anyone I put together a spreadsheet here
Not 100% sure its right - feel free to comment if you find a problem.
Judging by graph below, using 6 to 10 digits gives accuracy ~1km to ~1m at 60 degree lat.

Here are the formulas for height and width in degrees of a geohash of length n characters:
First define this function:
parity(n) = 0 if n is even otherwise 1
Then
height = 180 / 2(5n-parity(n))/2 degrees
width = 180 / 2(5n+parity(n)-2)/2 degrees
Note that this is the height and width in degrees only. To convert this to metres requires that you know where on the earth the hash is.
Code for this in java is at http://github.com/davidmoten/geo.

Also any directway to calculate distance between two geo-hash? (one way is to decode them to lat/lng, and then calculate distance)
That is what you should do. Think of the geohash as just another representation of a latitude and longitude as a pair of printed decimal numbers are likewise. If I gave you a pair of lat & lon strings, you would parse them to numbers (in your programming language of choice), and then do the math. It's no different with geohashes -- decode to lat & lon then do the math.
Be very careful with any reasoning you are attempting to do with inferring closeness based on the length of the common prefix between a pair of points. If there is a long common prefix, then they are close, but the converse is not true! -- i.e. two points with no common prefix could be a millimeter apart.

Here is an equation (in pseudocode) that can approximate the optimal Geohash length for a latitude/longitude pair having a certain precision:
geohash_length = FLOOR ( LOG_2(5000000/precision_in_meters) / 2,5 + 1 )
if geohash_length > 12 then geohash_length = 12
if geohash_length < 1 then geohash_length = 1
I've used it to create the optimal Geohash from data received by the gpsddaemon, which also provide precision information via the epx and epy values.

Related

Finding geohashes of certain length within radius from a point

I have points with a given latlong and a distance around them - e.g. { 40.6826048,-74.0288632 : 20 miles, 51.5007825,-0.1258957 : 100 miles}. If I pick a fixed geohash length (say equals to ~ 1x1mile) how can I find all the geohash entries of that length that are with the given radius from each point?
To add some background - the reason I want to do that is so I can save a cache keyed by the geohash id with a value of the list of points for which the given geohash is within radius (and also matches some custom eligibility rules). Then I can do a quick lookup for a user's location geohash to find all the eligible points around them.
This is how I would try to do:
Input: Point of interest(lat, long), Query Radius
Step 1: Find the 'MINIMUM' BOUNDING RECTANGLE(MBR) which completely contains the QUERY CIRCLE
Step 2: To create the minimum bounding rectangle, first calculate its minimum and maximum lat, long using the input parameters. Please refer to section 3.1 and 3.3 of Computing the Minimum and Maximum Latitude Longitude – the Correct Way
Step 3: Using (minLat, minLon), (maxLat, maxLon) calculate the four corners of the MBR NorthWest(maxLat, minLon), SouthWest(minLat, minLon), SouthEast(minLat, maxLon), NorthEast(maxLat, maxLon)
Step 4: Calculate the GeoHash of all four corners of MBR
Ex: for a point in NYC, say (40.75798, -73.991516), distance: 800 Meters and GeoHash length: 12
NorthWest : dr5ruj4477kd
SouthWest : dr5ru46ne2ux
SouthEast : dr5ru6ryw0cp
NorthEast : dr5rumpfq534
Step 5: From these GeoHashes, calculate the Query Bounding Box(MBR) Prefix: dr5ru
This would give you the coarser GeoHash which completely contains our MBR and hence the query region. In other words, all points indexed by dr5ru, yielding with 32 GeoHashes from dr5ru0 - dr5ruz
Final Step:
To find the exact grids (or) GeoHashes that correspond to our Query Circle(Square(MBR) to be precise), we should pick from these 32 GeoHashes by representing a recurring (4X8) Matrix using 2D Array.
In our example: we get dr5ru + J, M, H, K, 5, 7, 4, 6. All these GeoHashes represent the points that are within 800 meters from the Central Query Point, Except very few GeoHashes, which could not be avoided, because of considering MBR instead of a perfect circle.
THE OVERALL PROCESS IN A SINGLE GIF: (Step 1- 5)
FINAL STEP:
Important: Please find the use of 4 x 8 Grid for GeoHash. It varies
for each character along the length of GeoHash. For ODD lengths it is
8 x 4, for even its transpose 4 X 8. In our case, we are inside dr5ru(5 + 1, 6th resolution)
and hence we use 4 X 8
Have a look at this -> ProximityHash.
ProximityHash generates a set of geohashes that cover a circular area, given the center coordinates and the radius. It also has an additional option to use GeoRaptor that creates the best combination of geohashes across various levels to represent the circle, starting from the highest level and iterating till the optimal blend is brewed. Result accuracy remains the same as that of the starting geohash level, but data size reduces considerably, thereby improving speed and performance.

Plotting a frequency response from biquad filter

This is a hard one and although I can think of a few kludge methods of doing it, I have a feeling there is a clean mathematical method, although I am having difficulty inventing it myself.
I have a number of parameters which control (software) biquad filters for audio. Essentially there are just 3 parameters, frequency, gain and Q (or bandwidth). In audio terms, the frequency represents the center frequency of the filter. The gain represents whether this frequency is boosted or cut (a gain of 0 results in no change to the audio passing through the filter). Q represents the width of the filter - IE a very wide filter might affect frequencies far away from the center frequency, whereas a narrow (low Q) filter will only affect frequencies close to the center frequency.The filters take the form of a bell curve, or at least thats an approximation, whether its mathematically accurate I am not sure.
I want to display the characteristics of these filters graphically - display a graph of gain against frequency. There are several of these filters applied to the audio channel, and I want to be able to add the different result graphs, to produce an overall graph (IE a graph summing all the components of the combined filters). But I also want to be able to access the individual filters graphs.
I can handle adding the component graphs into a single 'total' graph, but how to produce the original x-y graph from the filter parameters escapes me. I will draw bitmaps so all I need is to be able to create arrays of the form frequency[x]=y. Im doing this in C so I don't have the mathematical tools in matlab etc. So I might have a filter with a center frequency of say 1000 (Hz), a gain of say 20 (db or linear I understand how to convert that), and a Q of say 3. The Q factor is relative and does not have to be exactly mathematically correct if that causes any complication.
It seems like a quite simple mathematical function but maths is not my strong point and I don't know enough - I have been messing round with sine functions etc but its not working and I suspect is probably wasting processing power by over complicating the maths (although I might be wrong there).
TIA, Pete
I have my doubts about the relationships between biquad filters, Q values, and bell curves. But I'll put those aside and just tell you how to draw a bell curve, since that's what you asked.
From this wikipedia article, the equation for a bell curve is
where for your application
a corresponds to the gain
b determines the center frequency
2c^2 is related to Q (larger values will make the curve wider)
The C code below computes a sample bell curve. For this example, the numbers were chosen based on drawing into a window that is 250 pixels wide by 200 pixels high, with a coordinate system where the origin {0,0} is at the bottom left corner.
int width = 250;
int height = 200;
int bellCurve[width]; // the output array that holds the f(x) values
double gain = 180; // the 'a' value, determines how high the peak is from the baseline
double offset = 10; // the 'd' value, determines the y coordinate of the baseline
double qFactor = 1000; // the '2c^2' value, determines how fat the curve is
double center = 100; // the 'b' value, determines the x coordinate of the peak
double dx;
for ( int x = 0; x < width; x++ )
{
dx = x - center;
bellCurve[x] = gain * exp( -( dx * dx ) / qFactor ) + offset;
}
Plotting the curve results in an image like this where the peak is at x=100, y=10+180=190
You could input a unit impulse (an array of all zeros, except one element=1.0) into your digital filters, treating them as black boxes. Then FFT the impulse response output array to get the frequency response of the filter. Plotting the magnitude of the complex frequency samples will give you a pretty picture. Python+numpy+matplotlib would probably be an easier way to go about it. You will need to know the sampling period to get meaningful plots.
What you really want is the bode plot of the filter. This is non trivial to calculate yourself, a cursory search for a library to do it for you in C yielded nothing. If accuracy is not important and you can approximate the shape once and stretch it based on the parameters of the particular filter. For example, you might have a normalized array of relative values and construct a new curve (array) based on the parameters of the filter and the base curve you generated earlier.
The base curve could be generated from MATLAB if you can or Wolfram Alpha or something like that.
Here is one in javascript.
http://www.earlevel.com/main/2013/10/13/biquad-calculator-v2/
The filter you describe is the 'peak' filter. Use the log scale to display frequencies.
—Tom

search the closest points for a given point in 1 million points

This is an algorithm question.
Given 1 million points , each of them has x and y coordinates, which are floating point numbers.
Find the 10 closest points for the given point as fast as possible.
The closeness can be measured as Euclidean distance on a plane or other kind of distance on a globe. I prefer binary search due to the large number of points.
My idea:
save the points in a database
1. Amplify x by a large integer e.g. 10^4 and cut off the decimal part and then Amplify x integer part by 10^4 again.
2. Amplify y by a large integer e.g. 10^4
3. Sum the above result from step 1 and 2 , we call the sum as associate_value
4. Repeat 1 to 3 for each number in the database
E.g.
x = 12.3456789 , y = 98.7654321
x times 10^4 = 123456 and then times 10^4 to get 1234560000
y times 10^2 = 9876.54321 and then get 9876
Sum them, get 1234560000 + 9876 = 1234569876
In this way, I transform 2-d data to 1-d data. In the database, each point is associated with an integer (associate_value). The integer column can be set as index in the database for fast search.
For a given point (x, y), I perform step 1 - 3 for it and then find the points in the database such that their associate_value is close to the given point associate_value.
e.g.
x = 59.469797 , y = 96.4976416
their associated value is 5946979649
Then in the database, I search the associate_values that are close to 5946979649, for example, 5946979649 + 50 , 5946979649 - 50 and also 5946979649 + 50000000 , 5946979649 - 500000000. This can be done by index-search in database.
In this way, I can find a group of points that are close to the given point. I can reduce the search space greatly. Then, I can use Euclidean or other distance formula to find the closest points.
I am not sure the efficiency of the algorithm, especially, the process of generating associate_values.
My idea works or not ? Any better ideas ?
Thanks
Your idea seems like it may work, but I would be concerned with degenerate cases (like if no points are in your specified ranges, but maybe that's not possible given the constraints). Either way, since you asked for other ideas, here's my stab at it: Store all of your points in a quad tree. Then just walk down the quad tree until you have a sufficiently small group to search through. Since the points are fixed, the cost of creating the quad is constant, and this should be logarithmic in the number of points you have.
You can do better and just concatenate the binary value from the x- and y co-oordinates. Instead of a straight line it orders the points along a z-curve. Then you can compute the upper bounds with the mostsignificant bits. The z-curve is often use in mapping applications:http://msdn.microsoft.com/en-us/library/bb259689.aspx.
The way I read your algorithm you are discriminating the values along a line with a slope of -1 that are similar to your point. i.e. if your point is 2,2 you would look at points 1,3 0,4 and -1,5 and likely miss points closer. Most algorithms to solve this are O(n) which isn't terribly bad.
A simple algorithm to solve this problem is to keep a priority queue of the closest ten and a measurement of the furthest distance of the ten points as you iterate over the set. If the x or y value is not within the furthest distance discard it immediately. Otherwise calculate it with whatever distance measurement your using and see if it gets inserted into the queue. If so update your furthest on top ten threshold and continue iterating.
If your points are pre-sorted on one of the axes you can further optimize the algorithm by starting at the matching the point on that axis and radiate outward until you are at a difference greater than the distance from your tenth closest point. I did not include sorting in the description in the paragraph above because sorting is O(nlogn) which is slower than O(n). If you are doing this multiple times on the same set then it could be beneficial to sort it.

google maps latitude longitude length

We currently use lat/long stored in our database to display world wide grocery stores with 4 digits to the right of the decimal place (e.g. 36.4488). We are in the process of updating all records to be more accurate on google maps. Just wondering if in this process should we extend the lat/long to 6 digits to the right of the decimal place. Our code would have to change to handle this and wonder if it is really worth the payoff or will 4 digits suffice. Also, noticed that when displaying marker with position: latlng seems to display in different place than with marker with position: point (where point is set by point = results[0].geometry.location;). Has any one seen this before? Thanks for any responses.
If you use 4 digits to the right of the decimal place, the precision of each geolocated point is around 11 meters. On the other hand, 6 digits gives a precision of 11 centimeters, so if you want the exact location of each store, you should use 6 digits instead of 4.
In response to the second question, if the lat/long of the marker is correct, you shouldn't have problems with it, so review the coordinates of the point.

Increasing the pitch of audio using a varied value

Okay, this a bit of maths and DSP question.
Let us say I have 20,000 samples which I want to resample at a different pitch. Twice the normal rate for example. Using an Interpolate cubic method found here I would set my new array index values by multiplying the i variable in an iteration by the new pitch (in this case 2.0). This would also set my new array of samples to total 10,000. As the interpolation is going double the speed it only needs half the amount of time to finish.
But what if I want my pitch to vary throughout the recording? Basically I would like it to slowly increase from a normal rate to 8 times faster (at the 10,000 sample mark) and then back to 1.0. It would be an arc. My questions are this:
How do I calculate how many samples would the final audio track be?
How to create an array of pitch values that would represent this increase from 1.0 to 8.0 back to 1.0
Mind you this is not for live audio output, but for transforming recorded sound. I mainly work in C, but I don't know if that is relevant.
I know this probably is complicated, so please feel free to ask for clarifications.
To represent an increase from 1.0 to 8.0 and back, you could use a function of this form:
f(x) = 1 + 7/2*(1 - cos(2*pi*x/y))
Where y is the number of samples in the resulting track.
It will start at 1 for x=0, increase to 8 for x=y/2, then decrease back to 1 for x=y.
Here's what it looks like for y=10:
Now we need to find the value of y depending on z, the original number of samples (20,000 in this case but let's be general). For this we solve integral 1+7/2 (1-cos(2 pi x/y)) dx from 0 to y = z. The solution is y = 2*z/9 = z/4.5, nice and simple :)
Therefore, for an input with 20,000 samples, you'll get 4,444 samples in the output.
Finally, instead of multiplying the output index by the pitch value, you can access the original samples like this: output[i] = input[g(i)], where g is the integral of the above function f:
g(x) = (9*x)/2-(7*y*sin((2*pi*x)/y))/(4*pi)
For y=4444, it looks like this:
In order not to end up with aliasing in the result, you will also need to low pass filter before or during interpolation using either a filter with a variable transition frequency lower than half the local sample rate, or with a fixed cutoff frequency more than 16X lower than the current sample rate (for an 8X peak pitch increase). This will require a more sophisticated interpolator than a cubic spline. For best results, you might want to try a variable width windowed sinc kernel interpolator.

Resources