> The goal was to come up with a good regular expression to validate URLs in use...

mathias · on June 23, 2014

> > The RFC does not reflect reality either (which, ironically, is what you seem to be complaining about).

> Well, or reality does not match the RFC?

Doesn’t matter – if there’s a discrepancy between what a document says and what implementors do, that document is but a work of fiction.

> And as I said above, such rejection most certainly should not happen in the parser.

This is not a parser.

zAy0LfpBZLC8mAC · on June 23, 2014

> Doesn’t matter – if there’s a discrepancy between what a document says and what implementors do, that document is but a work of fiction.

Yes and no. When there is a de-facto standard that just doesn't happen to match the published standard, yeah, sure. Otherwise, bug compatibility is a terrible idea and should be avoided as much as possible, many security problems have resulted from that.

> This is not a parser.

Well, even worse then. Manually integrating semantics from higher layers into parsing machinery (which it is, never mind the fact that you don't capture any of the syntactic elements within that parsing automaton) is both extremely error prone and gives you terrible maintainability.

edit:

For the fun of it, I just had a look at the "winning entry" (diegoperini). Unsurprisingly, it's broken. It was trivial to find cases that it will reject that you most certainly don't intend to reject. For exactly the reasons pointed out above.