Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In particular, your approach appears to do 3 (and some change) passes over each buffer: 1) read up to '\n' (just the first line), 2) UTF-8 validation, 3) make lowercase and 4) split on whitespace.

I really should have waited until I had more time to respond, but it really only should be 2, 3, and 4.

I tried unsafe/unchecked for UTF8, and, yes, it is a modest bump, but I wanted to do it without unsafe. And 3 and 4 are really pretty fast for what they are. They both work on bytes and the str as_bytes transmute is virtually cost free from what I can tell.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: