Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like the exploration of this method, but I would have liked to see the actual comparison of any false positives. Bad data can be acceptable in statistical analysis, but if you were showing someone a list of their ratings or the actors who were in the latest Kevin Bacon movie, false positives have a much stronger impact.

Is there any chance that the bloom could be used as a short-circuit filter but still follow-up with the m2m join to filter out the false positives? If the query optimizer can take advantage of that, then you could likely balance the size and cost of the bloom field.



This is exactly right. Bloom filters used in this way should generate candidates not results to reduce the total amount of work you do when it is expensive to confirm membership in the set. The best used cases check the opposite, e.g. Kevin Bacon was definitely not in this movie, since there are no false negatives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: