Applied math / computing question, find in 2-D

Date: 19/08/2018 03:19:55

From: mollwollfumble

ID: 1264841

Subject: Applied math / computing question, find in 2-D

Hypothetically, suppose I have a collection of 10,000 random points in 2-D (or more).

And suppose I want to quickly find which of those is closest to a new random point. Let’s say there are a million (or more) such new random points.

What’s the fastest computer algorithm to do this? Obviously I don’t want to compare every one of the 10,000 with every new random point.

I could pre-sort the 10,000 points on x, and then work outwards from the closest x value until I find a y value close enough. But I think there ought to be a much faster algorithm than that.

Perhaps a quadtree – whatever that is?

I’m sort of toying with the idea of sorting the 10,000 points simultaneously on x and y into a 2-D array P(i,j) such that x of P(i,j) >= x of P(i-1,k) for all values of j and k, and such that y of P(i,j) >= y of P(k,j-1) for all values of i and k.

The idea is that this would allow me to go straight from the x and y of the new random number to the i,j of P(i,j) and work out from there.

It wouldn’t work for 10,000 nonrandom points, such as x(i)=i, y(j)=j for i,j=1 to 10,000, but I think I’m on the right track for 10,000 random points, (or semi-random, or gridded or chaotic). What do you think?

Reply Quote

Date: 19/08/2018 05:21:45

From: mollwollfumble

ID: 1264846

Subject: re: Applied math / computing question, find in 2-D

mollwollfumble said:

Hypothetically, suppose I have a collection of 10,000 random points in 2-D (or more).

And suppose I want to quickly find which of those is closest to a new random point. Let’s say there are a million (or more) such new random points.

What’s the fastest computer algorithm to do this? Obviously I don’t want to compare every one of the 10,000 with every new random point.

I could pre-sort the 10,000 points on x, and then work outwards from the closest x value until I find a y value close enough. But I think there ought to be a much faster algorithm than that.

Perhaps a quadtree – whatever that is?

I’m sort of toying with the idea of sorting the 10,000 points simultaneously on x and y into a 2-D array P(i,j) such that x of P(i,j) >= x of P(i-1,k) for all values of j and k, and such that y of P(i,j) >= y of P(k,j-1) for all values of i and k.

The idea is that this would allow me to go straight from the x and y of the new random number to the i,j of P(i,j) and work out from there.

It wouldn’t work for 10,000 nonrandom points, such as x(i)=i, y(j)=j for i,j=1 to 10,000, but I think I’m on the right track for 10,000 random points, (or semi-random, or gridded or chaotic). What do you think?

A quick test confirmed a speed up in 1-D searching for a value x along the x axis, by fitting a cubic to the inverse relation i(x). Given x, find i from a cubic, then shuffle along the number line to get the closest. That’s faster than a more familiar binary search.

Reply Quote

Date: 19/08/2018 09:44:07

From: The Rev Dodgson

ID: 1264896

Subject: re: Applied math / computing question, find in 2-D

Interesting question.

There must be a lot of research on this sort of question, surely.

My immediate approach would be to divide the search region into 100 × 100 sub-regions and allocate each of the 100,000 points to a sub-region, then work through the new points searching just the sub-region the new point was in, and if necessary its immediate neighbours.

I guess that’s pretty similar to your way, except my regions are fixed size and may have zero or multiple points, whereas yours are irregular but each has exactly one point.

How important is it to find the exact closest point?

Reply Quote

Date: 19/08/2018 09:55:21

From: The Rev Dodgson

ID: 1264899

Subject: re: Applied math / computing question, find in 2-D

The Rev Dodgson said:

whereas yours are irregular but each has exactly one point.

Are your boundaries defined by a cubic?

I guess you’ll have some variation in the number of points per region as well, although much less than for my way.

I think optimisation of finite element analysis (lots of simple elements vs fewer complex elements) is related to this question, although I’m not sure how.

Reply Quote

Date: 19/08/2018 12:03:28

From: mollwollfumble

ID: 1264938

Subject: re: Applied math / computing question, find in 2-D

The Rev Dodgson said:

The Rev Dodgson said:
whereas yours are irregular but each has exactly one point.

Are your boundaries defined by a cubic?

I guess you’ll have some variation in the number of points per region as well, although much less than for my way.

I think optimisation of finite element analysis (lots of simple elements vs fewer complex elements) is related to this question, although I’m not sure how.

> How important is it to find the exact closest point?

You’ve spotted the issue that’s the critical one. The closest point must be exact

> whereas yours are irregular but each has exactly one point.

That’s not bad actually, even if there are ten points in one region and no points in several adjacent regions. It’s another option. I was thinking of exactly one point per region.

> Are your boundaries defined by a cubic?

In the case I’m thinking of, the boundaries are straight lines. But I’d like a boundary that has an infinite domain in case in future I want to try a gaussian distribution.

> lots of simple elements vs fewer complex elements

One similarity is that if the 100,000 points are generated from a probability distribution function (either a known one or an unknown one), then fitting a higher order curve to that pdf could speed up finding the points a lot.

I’m getting the feeling now that the standard way to solve this is by using what is called a k-d tree https://en.wikipedia.org/wiki/K-d_tree. It’s a binary tree that splits the domain on the x axis, then y axis, then x axis again etc. That splits the search time down from circa 100,000 / 2 = 50,000 down to a minimum of circa log _2_(100,000) = 17, depending on how well the tree is balanced. I’ve never used a k-d tree but it looks promising.

Reply Quote

Date: 20/08/2018 11:46:24

From: The Rev Dodgson

ID: 1265267

Subject: re: Applied math / computing question, find in 2-D

mollwollfumble said:

I’m getting the feeling now that the standard way to solve this is by using what is called a k-d tree https://en.wikipedia.org/wiki/K-d_tree. It’s a binary tree that splits the domain on the x axis, then y axis, then x axis again etc. That splits the search time down from circa 100,000 / 2 = 50,000 down to a minimum of circa log _2_(100,000) = 17, depending on how well the tree is balanced. I’ve never used a k-d tree but it looks promising.

K-D trees sort of ring a bell, but I don’t think I’ve ever used one.

Not for anything useful anyway.

I’ll have a look and see if they are in Scipy, if I remember.

Reply Quote

Date: 20/08/2018 11:47:52

From: The Rev Dodgson

ID: 1265268

Subject: re: Applied math / computing question, find in 2-D

I remembered:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html

Reply Quote

Date: 20/08/2018 15:13:47

From: The Rev Dodgson

ID: 1265285

Subject: re: Applied math / computing question, find in 2-D

Not only does Scipy have KD-Tree functions, I have Excel functions to call them direct from a spreadsheet, via xlwings.

I’ll post a download link later.

Reply Quote

Date: 20/08/2018 15:40:15

From: The Rev Dodgson

ID: 1265297

Subject: re: Applied math / computing question, find in 2-D

The Rev Dodgson said:

Not only does Scipy have KD-Tree functions, I have Excel functions to call them direct from a spreadsheet, via xlwings.

I’ll post a download link later.

Just tried it out searching for 100,000 nearest neighbours from a set of 100,000 2D coordinates. The faster version took about 0.8 s, so for 1 million points it would just be a few seconds.

Reply Quote

Date: 20/08/2018 17:12:17

From: The Rev Dodgson

ID: 1265332

Subject: re: Applied math / computing question, find in 2-D

Link:

Python Space Funcs

Reply Quote

Date: 20/08/2018 17:31:54

From: Michael V

ID: 1265333

Subject: re: Applied math / computing question, find in 2-D

The Rev Dodgson said:

Link:

Python Space Funcs

Can that be done in Fortran?

Reply Quote

Date: 20/08/2018 17:37:24

From: sibeen

ID: 1265336

Subject: re: Applied math / computing question, find in 2-D

Michael V said:

The Rev Dodgson said:

Link:

Python Space Funcs

Can that be done in Fortran?

Can you still remember Fortran?

Reply Quote

Date: 20/08/2018 17:40:14

From: Michael V

ID: 1265337

Subject: re: Applied math / computing question, find in 2-D

sibeen said:

Michael V said:

The Rev Dodgson said:

Link:

Python Space Funcs

Can that be done in Fortran?

Can you still remember Fortran?

I was preempting Moll. He still seems to like doing stuff in Fortran.

I never got the hang of it. Pencilled cards that became punch-cards that wouldn’t compile is my memory.

Reply Quote

Date: 20/08/2018 18:12:12

From: mollwollfumble

ID: 1265350

Subject: re: Applied math / computing question, find in 2-D

The Rev Dodgson said:

The Rev Dodgson said:

Not only does Scipy have KD-Tree functions, I have Excel functions to call them direct from a spreadsheet, via xlwings.

I’ll post a download link later.

Just tried it out searching for 100,000 nearest neighbours from a set of 100,000 2D coordinates. The faster version took about 0.8 s, so for 1 million points it would just be a few seconds.

That is absolutely fantastic!

Reply Quote

Date: 20/08/2018 18:13:19

From: The Rev Dodgson

ID: 1265351

Subject: re: Applied math / computing question, find in 2-D

Michael V said:

The Rev Dodgson said:

Link:

Python Space Funcs

Can that be done in Fortran?

It’s probably all Fortran under the surface anyway.

Reply Quote