> In particular, your approach appears to do 3 (and some change) passes over each buffer: 1) read up to '\n' (just the first line), 2) UTF-8 validation, 3) make lowercase and 4) split on whitespace.
I really should have waited until I had more time to respond, but it really only should be 2, 3, and 4.
I tried unsafe/unchecked for UTF8, and, yes, it is a modest bump, but I wanted to do it without unsafe. And 3 and 4 are really pretty fast for what they are. They both work on bytes and the str as_bytes transmute is virtually cost free from what I can tell.
I really should have waited until I had more time to respond, but it really only should be 2, 3, and 4.
I tried unsafe/unchecked for UTF8, and, yes, it is a modest bump, but I wanted to do it without unsafe. And 3 and 4 are really pretty fast for what they are. They both work on bytes and the str as_bytes transmute is virtually cost free from what I can tell.