Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi, I'm the author of this article. I'm grateful for the positive comments! I plan to explore FALCONN (pointed to by gajjanag) and the beautiful article of Bartosz Ciechanowski pointed out by lgvld.

My original motivation for writing this was to explain (in a follow-up article) an ANN algorithm I published in 2010: https://epubs.siam.org/doi/pdf/10.1137/1.9781611973075.94

I never wrote that up (as a non-academic article), but the fact that people like this article is a nudge for me to do that.

The above paper was the first LSH algorithm which deterministically returns all close-enough points and guarantees no false positives for distant-enough points. I've heard there are also other algorithms with that property published since. If I have time, I would like to one day see how that algorithm compares against the others in Erik Bernhardsson's benchmarks.



Forgive my lack of knowledge, but I was lost the instant I found the first equation :(

H is the hash function and x are coordinates. But what is R Z?


R is the set of real numbers.

Z is the set of integers {.., -2, -1, 0, 1, 2, ...}.

Z is called "Z" due to influential German mathematicians who chose the letter to stand for "zahlen" ("number" in German).

As a fun fact, the letter "N" stands for "natural numbers," meaning either positive integers, or positive integers and zero. However, that letter (N) is _not_ standardized because mathematicians disagree about whether or not to include zero. When I separately told a couple of my professors this in grad school, they didn't believe me — one of them thinking it was crazy for zero to be excluded, the other thinking it was crazy to include it!

https://en.wikipedia.org/wiki/Natural_number


Thank you!


@author a question if you don't mind. I saw this article and it was very interesting but I get the impression (or heard somewhere) that LSH is actually more suitable to much higher dimensional problems than 2D. Put another way, I think LSH is perhaps inappropriate for a two-dimensional case like this example where you could solve it more simply, but that would not scale to higher dimensions easily or at all, which is where LSH would work. Is this clear, and am I correct in my understanding?

Thanks


This is a really well-done article! Very approachable for a non-math-expert. If it is useful at all, I noticed while reading that there are two misspellings. "moives" for "moves" and "tht" for "that". I am looking forward to more of your articles.


Is there any blog where I can learn more on this fascinating area? Especially new discoveries?

I've read the part in the book of "Mining Massive Datasets and lots of old notes about "The Probabilistic Method".

But new things?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: