Hacker Newsnew | past | comments | ask | show | jobs | submit | dotgov's commentslogin

Caching Parquet headers/footers sounds super interesting. Can you say more about how you implemented it?


Currently there's nothing in my headers, but the footer is straightforward. There's the schema, row group metadata, some statistics, byte offsets for each column in a group, page index, etc. It's everything you'd want if you wanted to reject a query outright or, if necessary, query extremely efficiently.

min/max stats for a column are huge because I pre-encode any low-cardinality strings into integers. This means I can skip entire row groups without every touching S3, just with that footer information, and if I don't have it cached I can read it and skip decoding anything that doesn't have my data.

Footers can get quite large in one sense - 10s-100s of KB for a very large file. But that's obviously tiny compared to a multi-GB Parquet file, and the data can compress extremely well for a second/ third tier cache. You can store 1000s of these pre-parsed in memory no problem, and store 10s of thousands more on disk.

I've spent 0 time optimizing my footers currently. They can get smaller than they are, I assume, but I've not put much thought. In fact, I don't have to assume, I know that my own custom metadata overlaps with the existing parquet stats and I just haven't bothered to deal with it. TBH there are a bunch of layout optimizations I've yet to explore, like using headers would obviously have some benefits (streaming) whereas right now I do a sort of "attempt to grab the footer from the end in chunks until we find it lol". But it doesn't come up because... caching. And there are worse things than a few spurious RANGE requests.


Have you tried AWS s3 tables which is a manged iceberg service?


I haven't. I'm sort of aware of it but I guess I prefer to just have tight control over the protocol/ data layout. It's not that hard and it gives me a ton of room to make niche optimizations. I doubt I'd get the same performance if I used it, but I could be wrong. Usually the more you can push your use case into the protocol the better.


Like most managed services it is a trade off of control vs ease of operation. And like everything with S3 it scales to absurd levels with 10,000 tables per table bucket


Makes sense and tbh there's a very good chance that I'd consider it if I were trying to stay more "standard" but I'd have to learn more.


National Labs are closed over the holidays in the USA too.


There is a word in German: "vogelfrei". It means "free as a bird". Sounds romantic, but what it actually means is that the person who is free as a bird does not enjoy the protections of the law and hence there are no repercussions for killing them.


He lived in extreme poverty, was neglected by his parents, got sexually assaulted by his siblings and was forced to have sex with his stepsister when he was seven. He shot himself twice, once trying to kill himself. The world he lived in was bad.


GP is talking about Mathis Milton. You're quoting a wikipedia article about Quintin Jones. Obviously, they are not the same person.


Whoops my mistake. I’ll try to make my point with Mathis Milton as well:

“Milton dropped out of school after the 8th grade and worked as a cook, mechanic’s helper, and laborer before his arrest. Milton abused alcohol and drugs as a teenager.”

In another article he was being referred to as having “low IQ”. Combined with the fact that the murders happened in a crack house, I infer that he probably lived in a similar environment and had a comparable upbringing and childhood.


What does that have to do with the killer not taking any responsibility, accusing other people of lacking humanity and saying to a victim that she was just "at the wrong place at the wrong time" with 0 other admission of guilt? I get that we could maybe, maybe stretch the argument that he had a rough upbringing (though dropping out of school isn't that extreme) to "explain" the circumstances of the crime... but not that he was basically still seemingly remorseless after having decades in death row to reflect on his acts.


He does show signs of remorse: “To Melanie, I never meant to hurt you.”

With regards to the guilt question: I did make the honest mistake of misquoting the history of another inmate before, what stands out to me and what I wanted to point out: I don’t think it is a coincidence that both of those inmates grew up in an environment of poverty. We do have the specifics of what happened in childhood in the first case, the second one I can only infer from the fact of teenage drug use and of growing up in a certain environment, which are both things I would expect to have a high correlation with childhood trauma.

This is not to say that Milton is not guilty of a crime, in the legal sense. But to expect someone who has been beaten, abused and has gotten the short end of the stick all their life to take full individual responsibility for something he didn’t have a choice in does seem inhumane to me. In a way like Melanie, he was at the wrong place at the wrong time.

And what purpose would such an admission of guilt serve? Would it take make it easier to believe that tragedies like this are fully the individuals responsibility and that the environment the individual grows up in has no influence on the outcomes?


I don't dispute that Milton's life growing up was probably tough, but your comment upthread is making specific factual claims (e.g., that he was raped by his sister/siblings), which are to the best of our knowledge not true (because you confused the subject of the thread). We should avoid making false statements.


And? That describes lots of people who don't go out and commit heinous crimes.


Lol now you guys are literally just guessing about “how bad he had it” to justify the sociopathy


That's a different person, although his world was probably not much better.


The dichotomy of a sterile looking list mixed with the very human last words is somehow very touching.

What sticks out to me the most: the column with the title “race” I find incredibly disturbing.


All the strife and bad blood and still his happiness needs to come out of a needle. Maybe what the previous posters wanted to point out is that if he’d have cared more about his fellow man, he’d have made connections that are less shallow than the ones based on his excessively pompous prose or his other performative exploits. Maybe those visitors wouldn’t have stayed vistors.

Hard to say, personally my experience is that the junkie lifestyle seems less hollow and more appealing than it is, no matter the quality of the substance.


The DOE Office of Science funding is generating papers as results. If my interpretation is correct, the technology discussed is _potentially_ useful.

To me the downside of marketing results aggressively to justify one’s funding is that the quality signal can become inversely correlated. Is this valuable research or is someone who knows to play the DOE game inflating a metric to secure their next round of funding and/or expand their turf? Does this negatively impact researchers who do not have access to a comparable marketing apparatus?


They clearly describe having built an experimental device and outline expectations for improvement of the process. All technology is "potentially useful," until someone uses it.

As far as "marketing," the labs are contractually required by the government to do this. Average 'impact rating' and so forth are part of the performance evaluations. As far as "playing the DOE game," there are a lot of voices in the critical path of getting significant funding from the Office of Science, many of them generally healthily skeptical. I'm not aware of very many charlatans achieving high-level management positions or controlling significant funding.


This hit the nail on the head for me.

I entered the workforce a bit earlier than 2015 and I would have expected HR to be a neutral arbiter as well. My experience at different employers has proven my expectations to be wrong or at least naive. On every occasion I needed to interact with HR, information (eg. applicable laws) was selectively filtered to make it look like my only option was the one that was beneficial to the business.

YMMV, but for me this is good advice and a welcome reminder.


Because a tech worker is highly skilled labor that would leave the country and therefore henceforth would not be available to the labor pool of said country, which arguably will have negative long term economic effects.

This problem occurs precisely because foreign tech workers are being treated differently. You are also glossing over the fact that those tech workers are on their path to become citizens.


To my knowledge and much to my chagrin, TSA Pre is not available for H1B holders (only lawful permanent residents - green card holders). If that’s discriminatory or not is in the eye of the beholder.


PreCheck itself bought from TSA is only available to citizens/nationals/LPRs, but nonresident citizens of a handful of other countries can get Global Entry from CBP, which includes PreCheck benefits as well as expedited immigration/customs handling.

https://www.cbp.gov/travel/trusted-traveler-programs/global-...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: