Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Similarly, Google purchased reCaptcha and ended up harnessing the stream of human interaction into that to, among other things, classify all of their Street View content (e.g. select the stop lights/bridges/license plates/etc).


I've always wondered: wouldn't they need to have already classified those captchas for them to determine whether the user has made the right selections? If so, doesn't that defeat its "real" purpose of getting people to do that classification work for them?


IIRC, back when it was text, you were shown two words and could type anything for one of the words (typically the easier to read word) and the other word would be a word they’d intentionally blurred a bit to use for the actual captcha check.


That's why you have to do multiple tasks in one verification. Some are against a known ground truth and used as verification, but you don't know which one.


It would appear that asking multiple users and taking the consensus does the trick.


I believe this is the answer.

It works especially well here because there's no easy way for multiple bad actors to collude and generate a false consensus.


They can show you some objects that they already classified and some objects they are unsure about.


Get enough people to do the same thing and take the consensus.


Good podcast on the creation of recaptcha: https://pca.st/zze1abc4


hCaptcha does the same, they offer a data labeling service




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: