More

data_ders · 2026-02-23T16:11:59 1771863119

right? like it's a graph and a relational model query and a pipeline and a language and an abstract syntax tree and declarative logical plan

data_ders · 2026-02-23T16:06:45 1771862805

what do you think is the "most bad" thing about SQL?

otabdeveloper4 · 2026-02-23T16:55:51 1771865751

Lack of abstractions like (block-scoped) variables and lambda functions.

SQL is declarative and purely functional anyways, so implementing these is a no-brainer.

data_ders · 2026-02-23T16:06:07 1771862767

TIL about Verse looks cool I'll have to check it out.

> SQL is not a pipeline, it is a graph.

Maybe it's both? and maybe there will always be hard-to-express queries in SQL, and that's ok?

the RDBMS's relational model is certainly a graph and joins accordingly introduce complexity.

For me, just as creators of the internet regret that subdomains come before domains, I really we could go back in time and have `FROM` be the first predicate and not `SELECT`. This is much more intuitive and lends itself to the idea of a pipeline: a table scan (FROM) that is piped to a projection (SELECT).

thesz · 2026-02-23T18:03:21 1771869801

Pipeline is a specific kind of a graph.

Yes, there will always be hard-to-express queries, the question is how far can we go?

data_ders · 2026-02-23T15:53:54 1771862034

I'm as big a SQL stan as the next person and I'm also very skeptical anytime anyone says that SQL needs to be replaced.

At the same time, it's challenging that SQL cannot be iteratively improved and experimented upon.

IMHO, PRQL is a reasonable approach to extending SQL without replacing SQL.

But what I'd love to see is projects like Google's zeta-sql [1] and Substrait [2] get more traction. It would provide a more stable, standardized foundation upon which SQL could be improved, which would make the case for "SQL forever" even more strong.

I've blogged about this before [3].

[1]: https://github.com/google/googlesql [2]: https://substrait.io/ [3]: https://roundup.getdbt.com/p/problem-exists-between-database...

data_ders · 2026-02-23T15:44:19 1771861459

I agree that CTEs help solve the problem of being able to read a SQL query from top to bottom, but I wouldn't say they're a panacea!

Personally, it's weird to me that `FROM` (scan) comes after `SELECT` (projection). IMHO the datasource should come first!

CTEs don't solve this problem they just let you chain multiple SELECTs together.

A real use case is that it would allow intellisense to kick in a lot earlier!

Instead you have to write `SELECT * FROM my_table` and only after can you edit the `*` and get auto-complete suggestions of the columns from `my_table`

bob1029 · 2026-02-23T16:15:08 1771863308

> CTEs don't solve this problem

They kind of do in my head. "WITH" reads to me exactly like the datasource you are looking for.

data_ders · 2026-02-12T15:37:50 1770910670

if I could tell myself in 2015 who had just found the feather library and was using it to power my unhinged topic modeling for power point slides work, and explained what feather would become (arrow) and the impact it would have on the date ecosystem. I would have looked at 2026 me like he was a crazy person.

Yet today I feel it was 2016 dataders who is the crazy one lol

ayhanfuat · 2026-02-12T15:45:46 1770911146

Indeed. feather was a library to exchange data between R and pandas dataframes. People tend to bash pandas but its creator (Wes McKinney) has changed the data ecosystem for the better with the learnings coming from pandas.

jtbaker · 2026-02-12T17:17:27 1770916647

I know pandas has a lot of technical warts and shortcomings, but I'm grateful for how much it empowered me early in my data/software career, and the API still feels more ergonomic to me due to the years of usage - plus GeoPandas layering on top of it.

Really, prefer DuckDB SQL these days for anything that needs to perform well, and feel like SQL is easier to grok than python code most of the time.

bootsmann · 2026-02-13T12:12:58 1770984778

> Really, prefer DuckDB SQL these days for anything that needs to perform well, and feel like SQL is easier to grok than python code most of the time.

I switched to this as well and its mainly because explorations would need to be translated to SQL for production anyways. If I start with pandas I just need to do all the work twice.

sirfz · 2026-02-13T01:14:58 1770945298

chdb's new DataStore API looks really neat (drop in pandas replacement) and exactly how I envisioned a faster pandas could be without sacrificing its ergonomics

0xcafefood · 2026-02-12T15:51:57 1770911517

Do people bash pandas? If so, it reminds me of Bjarne's quip that the two types of programming languages are the ones people complain about and the ones nobody uses.

hodapp · 2026-02-13T02:21:00 1770949260

The creator of Pandas even bashes it: https://wesmckinney.com/blog/apache-arrow-pandas-internals/

imtringued · 2026-02-13T09:57:47 1770976667

He missed talking about the poor extensibility of pandas. It's missing some pretty obvious primitives to implement your own operators without whipping out slow for loops and appending to lists manually.

fud101 · 2026-02-13T06:23:18 1770963798

have these 'improvements' been backported to pandas now? i would expect it to close the gap over time.

benrutter · 2026-02-13T08:21:25 1770970885

Yes (mostly) is the answer. You can use arrow as a backend, and I think with v3 (recently released) it's the default.

The harder thing to overcome is that pandas has historically had a pretty "say yes to things" culture. That's probably a huge part of its success, but it means there are now about 5 ways to add a column to a dataframe.

Adding support for arrow is a really big achievement, but shrinking an oversized api is even more ambitious.

postexitus · 2026-02-12T16:00:37 1770912037

polars people do - although I wouldn't call polars something that nobody uses.

ayhanfuat · 2026-02-12T16:06:03 1770912363

I also use polars in new projects. I think Wes McKinney also uses it. If I remember correctly I saw him commenting on some polars memory related issues on GitHub. But a good chunk of polars' success can be attributed to Arrow which McKinney co-created. All the gripes people have with pandas, he had them too and built something powerful to overcome those.

mistrial9 · 2026-02-12T16:22:09 1770913329

I saw Wes speak in the early days of Pandas, in Berkeley. He solved problems that others just worked around for decades. His solutions are quirky but the work was very solid. His career advanced a lot IMHO for substantial reasons.. Wes personally marched through swamps and reached the other side.. others complain and do what they always have done.. I personally agree with the criticisms of the syntax, but Pandas is real and it was not easy to build it.

rlh2 · 2026-02-13T11:45:58 1770983158

People also love to hate R but data.table is light years better than pandas in my view

data_ders · 2026-02-12T15:35:29 1770910529

yeah not necessarily compute (though it has a kernel)!

it's actually many things IPC protocol wire protocol, database connectivity spec etc etc.

in reality it's about an in-memory tabular (columnar) representation that enables zero copy operations b/w languages and engines.

and, imho, it all really comes down to standard data types for columns!

data_ders · 2026-02-12T12:21:43 1770898903

love the animations! I’ve been dreaming of doing the same to get people from csvs to something like Lance but with stops at page files, parquet, and Iceberg along the way

data_ders · 2026-01-31T04:12:36 1769832756

same! but if you get it inevitably wrong the first time it gives you the pinyin. but i struggled to get it to transcribe the consonants I was making let alone the tones. i'm pretty sure i'm not as bad as that!

data_ders · 2026-01-29T15:14:35 1769699675

I'm a big SQL stan here and I love the concept and if you ever wanna chat about how it might integrate with dbt let me know :)

conceptual questions:

1) why did you pick SQL? to increase the Total Addressable Userbase with the thinking that a SQL API means more people can use it than those who know Python or Typescript?

2) What isn't or will never be supported by this relational model? what are the constraints? Clickhouse comes to mind w/ it's intentionally imposed limitations on JOINs

3) databases are historically the stickiest products, but even today SQL dialects are sticky because of how closely tied they are to the query engine. why do you think users will adopt not only a new dialect but a new engine? Especially given that the major DWH vendors have been relentlessly competing to add AI search vector functionality into their products?

4) mindsdb comes to mind as something similar that's been in the market for a while but I don't hear it come up often. what makes you different?

playground feedback: 1) why are there no examples that: a) use `JOIN` (that `,` is unhinged syntax imho for an implicit join) b) don't use `*` (it's cool that there's actual numbers!)

2) i kinda get why the search results defaults to a UI, but as a SQL person I first wanted to know what columns exist. I was happy to see "raw table" was available but it took me a while to find it. might be have raw table and UI output visible at the same time with clear instructions on what columns the query requires to populate the UI

tullie · 2026-01-29T15:37:11 1769701031

Would love to chat about it, and talk about dbt integration. There's a few use cases that have come up where this would be really helpful. I'll PM you.

1) So we do actually have a python and typescript API, it's just the console web experience is SQL only as it feels the best for that kind of experience. The most important thing though is that it's declarative. This helps keep things relatively simple despite all the configuration complexity, and is also the best for LLMs/agents as they can iterate on the syntax without doc context.

2) Yeah exactly, joins is something we can't do at the moment, and i'm not sure the exact solution their honestly. Under the hood most of Shaped's offline data is built around Clickhouse, and we do want to build a more standard SQL interface just so you can do ad-hoc, analytical queries. We're currently trying to work if we should integrate it more directly with ShapedQL or just keep it as a separate interface (e.g. a ShapedQL tab vs a Clickhouse SQL tab).

3) We didn't really want to create a new SQL dialect, or really a new database. The problem is none of the current databases are well suited for search and recommendations, where you need extremely low latency, scalable, fault-tolerance, but also the ability to query based on a user or session context. One of the big things here is that because Shaped stores the user interactions alongside the item catalog, we can encode real-time vectors based on those interactions all in an embedding query service. I don't think that's possible with any other database.

4) I haven't looked into mindsdb too much, but this is a good reminder for me to deep dive into it later today. From taking a quick pass on it, my guess is the biggest difference is that we're built specifically for real-time search, recommendations and RAG, and that means latency, and ability to integrate click-through-rate models and things becomes vital.

Thanks so much for the playground syntax, have some follow up questions but i'm going to pm you if that's okay. Agreed on the being able to see which columns exist.