Hacker Newsnew | past | comments | ask | show | jobs | submit | ajfriend's commentslogin

h3o-zip is really impressive! I've been wanting to play around with it more, and I've been meaning to ask you if you have any good references for that encoding approach. I understand how it works in h3o-zip, but I'd be interested to know more about where else that approach has been used.


I’m pretty sure the approach isn’t that novel, but I really rediscovered it on my own while exploring several compressions approaches (generic compressions with tailored dict like zstd, integer packing/compressions, compressed bitmap, …: I probably have my notes about these somewhere)

As such, I don’t have any name/papers to give you nor point you to similar application. But I would also be interested ^^

But don’t hesitate to reach out if you work on something similar and wanna discuss about it.


I have project that's still very much at the experimental stage, where I try to get something similar to this pipe syntax by allowing users to chain "SQL snippets" together. That is, you can use standalone statements like `where col1 > 10` because the `select * from ...` is implied. https://ajfriend.github.io/duckboat/

    import duckboat as uck

    csv = 'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv'

    uck.Table(csv).do(
        "where sex = 'female' ",
        'where year > 2008',
        'select *, cast(body_mass_g as double) as grams',
        'select species, island, avg(grams) as avg_grams group by 1,2',
        'select * replace (round(avg_grams, 1) as avg_grams)',
        'order by avg_grams',
    )
I still can't tell if it's too goofy, or if I really like it. :)

I write a lot of SQL anyway, so this approach is nice in that I find I almost never need to look up function syntax like I would with Pandas, since it is just using DuckDB SQL under the hood, but removing the need to write `select * from ...` repeatedly. And when you're ready to exit the data exploration phase, its easy to gradually translate things back to "real SQL".

The whole project is pretty small, essentially just a light wrapper around DuckDB to do this expression chaining and lazy evaluation.


They also only have one type of neighbor. Square grids have 2 neighbor types. Triangular grids have 3.


Makes perfect sense. Thanks both


It depends on if you want to model a point or an area. lat/lng gives you a point, but you often want an area to, for example, count how many people are in that area. A spatial index like H3 provides a grid of area units.


But so do lat long ranges.


You can use those if they work for your application. One downside would be that you're storing 4 numbers compared to a single `int64` index with H3.

You also have to decide how you'll do that binning. Can bins overlap? What do you do at the poles? H3 provides some reasonable default choices for you so don't have to worry about that part of your solution design.


...and use H3 instead! https://h3geo.org/


Very different use case -- ZIPs/ZCTAs have some semblance of population normalization


If you care about that and have a data source, you can add, for example, population density per H3 cell as part of your analysis. That has the additional benefit of denoting the this quantity of interest explicitly, rather than some implicitly assumed correlation which may not be true.


Hey AJ, this is almost on topic, do you know of a more up to date version of the dataset you used on the blog post release for H3 v4.0.0 [1]? They stopped updating in Oct 2023. Thanks! [1] https://data.humdata.org/dataset/kontur-population-dataset


I don't. And maybe I should have emphasized "and have a data source" more, since its doing a lot of the heavy-lifting in my statement :)


Not necessarily true. The population isn't balanced at all between many. Census units are.


Absolutely this. Use other Census areal units if you can and ZCTAs only if you have to.


What H3 do I belong to if my house is split between three different ones, pretty much equally? Any/all of them?


You take a smaller H3 :-) The maximum area of a resolution 15 H3 is 1 square meter, so unlikely to split a house in two.


What is the benefit of H3 over a rectangular grid?


That map does seem to be using H3 hexagons: https://h3geo.org/


Oh man, what an exciting opportunity. clears throat The hacker news title seems to mistranslate the original Em dash to an En dash.


We use a submodule in https://github.com/uber/h3-py to wrap the core H3 library, which is written in C. Submodules seemed like a reasonable way to handle the dependency, and, at least for this use case, the approach hasn't given me any problems.


You can compose SQL with https://ibis-project.org/tutorials/ibis-for-sql-users, which is using https://github.com/tobymao/sqlglot to parse the SQL under the hood.

As an alternative to parsing the SQL yourself, DuckDB's https://duckdb.org/docs/api/python/relational_api allows you to compose SQL expressions efficiently and lazily, which I've used when playing around with things like https://gist.github.com/ajfriend/eea0795546c7c44f1c24ab0560a...


One approach I've been enjoying recently in my personal use is to write a light wrapper around DuckDB to enable composable SQL snippets. Essentially like what I have here https://gist.github.com/ajfriend/eea0795546c7c44f1c24ab0560a..., but without the `|` syntax.

You're still writing SQL, so you don't need to learn a new syntax, but I find it more ergonomic for quick data exploration. I also have an easier time writing SQL from memory than I do writing the equivalent Pandas code.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: