Date: 16/10/2021 11:21:28
From: mollwollfumble
ID: 1804359
Subject: Nearest point algorithm?

I have two sets of (longitude,latitude) points in Australia.

Set A contains a million points.
Set B contains 10,000 points.

For every point in set A find the nearest point in set B.

A bad way to do this is: for every point in Set A calculate the distance to every point in Set B and take the minimum.
That’s of O(10,000,000,000) operations.

What’s a good nearest-point algorithm?

Reply Quote

Date: 16/10/2021 12:01:20
From: furious
ID: 1804363
Subject: re: Nearest point algorithm?

mollwollfumble said:


I have two sets of (longitude,latitude) points in Australia.

Set A contains a million points.
Set B contains 10,000 points.

For every point in set A find the nearest point in set B.

A bad way to do this is: for every point in Set A calculate the distance to every point in Set B and take the minimum.
That’s of O(10,000,000,000) operations.

What’s a good nearest-point algorithm?

Put them in a list and sort them?

Reply Quote

Date: 16/10/2021 12:33:51
From: dv
ID: 1804369
Subject: re: Nearest point algorithm?

So the first thing: find the square of the distance, not the distance. It’s faster.

One thing that I’ve done previously is to start by breaking the points into a few rectilinear blocks. For a given focus point in A, compute the distance to the four corner points. Do a scan and eliminate the blocks whose near corner is greater than any of the other blocks’ far corners. This is really only worth it if you have hundreds of thousands of points.

Reply Quote

Date: 16/10/2021 21:00:19
From: mollwollfumble
ID: 1804573
Subject: re: Nearest point algorithm?

I was wondering Veronoi polyhedra, or linear programming, or KD tree but think not.

>> Set A contains a million points.
>> Set B contains 10,000 points.

> Put them in a list and sort them?

One of the things is that I might be able to use what what peculiarities I know of this specific data.

Set A and Set B have similar distributions, both heavily concentrated in cities. So I was wondering if whether to sort Set A + B of 1,010,000 items by longitude, then split into say 15 longitude blocks by constant number of items in each block. For each longitude block sort the B Set of by latitude into blocks and use that as a guide for placing Set A items into the same rectangular region.

I was wondering as first step to put them together in a single list

dv said:


So the first thing: find the square of the distance, not the distance. It’s faster.

One thing that I’ve done previously is to start by breaking the points into a few rectilinear blocks. For a given focus point in A, compute the distance to the four corner points. Do a scan and eliminate the blocks whose near corner is greater than any of the other blocks’ far corners. This is really only worth it if you have hundreds of thousands of points.

> So the first thing: find the square of the distance, not the distance. It’s faster.

Yes!

Perhaps 100 blocks, with 100 items from Set B in each block. Search time in order 100 million operations, which is manageable.

I’m thinking now that the blocks don’t have to be rectilinear. I could use radial coordinates for each major city and rectilinear in the remainder of the country away from cities.

Reply Quote

Date: 17/10/2021 15:29:53
From: mollwollfumble
ID: 1804891
Subject: re: Nearest point algorithm?

I think I’ve found a way to avoid the whole problem using lateral thinking. By reduce that 10,000 points to 535 by using local govenment areas instead of postcodes.

Reply Quote

Date: 17/10/2021 15:32:39
From: The Rev Dodgson
ID: 1804893
Subject: re: Nearest point algorithm?

mollwollfumble said:


I think I’ve found a way to avoid the whole problem using lateral thinking. By reduce that 10,000 points to 535 by using local govenment areas instead of postcodes.

Why not go one better and use states?

(Isn’t there a scipy function that will do what you want?)

Reply Quote

Date: 17/10/2021 16:04:45
From: dv
ID: 1804897
Subject: re: Nearest point algorithm?

The Rev Dodgson said:


mollwollfumble said:

I think I’ve found a way to avoid the whole problem using lateral thinking. By reduce that 10,000 points to 535 by using local govenment areas instead of postcodes.

Why not go one better and use states?

Yeah I mean basically you’ve just reduced your requirements. Australia has LGAs the size of Germany.

Reply Quote

Date: 17/10/2021 16:05:34
From: SCIENCE
ID: 1804898
Subject: re: Nearest point algorithm?

mollwollfumble said:

I think I’ve found a way to avoid the whole problem using lateral thinking. By reduce that 10,000 points to 535 by using local govenment areas instead of postcodes.

are we to take by implication that identified points in a given region are necessarily closer to identified points in the same designated region than to identified points in other designated regions

Reply Quote

Date: 17/10/2021 20:43:24
From: mollwollfumble
ID: 1804980
Subject: re: Nearest point algorithm?

dv said:


The Rev Dodgson said:

mollwollfumble said:

I think I’ve found a way to avoid the whole problem using lateral thinking. By reduce that 10,000 points to 535 by using local govenment areas instead of postcodes.

Why not go one better and use states?

Yeah I mean basically you’ve just reduced your requirements. Australia has LGAs the size of Germany.

That’s right.

BTW, I didn’t even know what a LGA was until today.

Using 7,000 or so postcodes ran the risk of no observations in a large number of postcodes, variations from year to year, etc.

Conversely, using 9 states (counting Christmas and Lord Howe Islands together as a separate state) ran the risk of data errors being hidden in a large number of correct observations.

However, saying that, I have already noticed several whopping big errors on bird data on a state level. Little Wattlebirds and Western Wattlebirds disappeared from the data in the year, and only the year, 2016. In another case, Rainbow Lorikeets are absent in the Northern Territory data for years 2017, 2019, 2019 but present in large numbers in 2014-2016 and 2020. Both cases are a sign of faulty species definition, or to put it another way, a sign of over-zealous correction.

LGAs seem to be just about the right size. Fingers crossed I can guarantee at least one observation in each LGA, and if a bird flock moves a few metres from year to year I don’t lose it. The problem with the Rainbow definition spills over to some LGAs in WA, notably Broome.

Not all LGAs are the size of Germany. They can be as small as 20 km^2 in area. Postcodes can be much smaller.

Reply Quote

Date: 17/10/2021 20:46:37
From: mollwollfumble
ID: 1804981
Subject: re: Nearest point algorithm?

dv said:


The Rev Dodgson said:

Why not go one better and use states?

Yeah I mean basically you’ve just reduced your requirements. Australia has LGAs the size of Germany.

That’s right.

BTW, I didn’t even know what a LGA was until today.

Using 7,000 or so postcodes ran the risk of no observations in a large number of postcodes, variations from year to year, etc.

Conversely, using 9 states (counting Christmas and Lord Howe Islands together as a separate state) ran the risk of data errors being hidden in a large number of correct observations.

However, saying that, I have already noticed several whopping big errors on bird data on a state level. Little Wattlebirds and Western Wattlebirds disappeared from the data in the year, and only the year, 2016. In another case, Rainbow Lorikeets are absent in the Northern Territory data for years 2017, 2019, 2019 but present in large numbers in 2014-2016 and 2020. Both cases are a sign of faulty species definition, or to put it another way, a sign of over-zealous correction.

LGAs seem to be just about the right size. Fingers crossed I can guarantee at least one observation in each LGA, and if a bird flock moves a few metres from year to year I don’t lose it. The problem with the Rainbow definition spills over to some LGAs in WA, notably Broome.

Not all LGAs are the size of Germany. They can be as small as 20 km^2 in area. Postcodes can be much smaller.

> are we to take by implication that identified points in a given region are necessarily closer to identified points in the same designated region than to identified points in other designated regions

If I was doing things properly, they wouldn’t be. But because i’m cutting corners, they are.

Reply Quote

Date: 18/10/2021 11:18:30
From: The Rev Dodgson
ID: 1805074
Subject: re: Nearest point algorithm?

Had a look at what scipy has to offer and found:

special.kdtree

which says:
For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.

So not very helpful.

Reply Quote