There has been a patch to extend the COPY code with pluggable APIs, adding callbacks at start, end, and for each row processed: https://commitfest.postgresql.org/49/4681/.
I'd guess that this may fit your purpose to add a custom format without having to fork upstream.
I’ve met Simon for the first time in Tokyo in 2009 for a birthday related to JPUG (Japan PostgreSQL User Group) when hot standby was getting integrated into the upstream project. I saw him last time in Prague three months ago, and we have joked about a few things while discussing about life and how things were going on as I did not go to the Postgres Europe conference for 6~7 years.
The community has lost a member, and many people have lost a friend. That’s so sudden. My thoughts go to his family and people who knew him. I’m so sad. RIP, Simon.
There are hooks for the planner and the executor, so you could force their hand quite easily with a plan extracted from a previous EXPLAIN. Being able to pass a plan at protocol level would be more efficient, for sure. I’ve worked on Postgres-XC and somewhat Postgres-XL, and that’s something that we wanted to do to generate a plan on one node, and push down the compiled plan down to other nodes to avoid the overhead of the plan computation. This stuff did that for snapshots and a few more things with an extension of the protocol.
I’ve been doing maintenance and bug fixes for this module for over 18 months now (last commit on HEAD seems to be mine), managing the last two releases. If you have questions and/or feedback, feel free.
Just wanted to say thank you! This extension was critical to a bunch of my research (and now my lab's research as well). Being able to control fine-grained elements of each plan while letting the PG planner "do the rest" has saved me personally probably 100s of hours of work.
I am digging into postgres source code past few weeks. This project seems like a good way to see how plugins integrate. I may reach out later if I have questions
There are many Postgres internals that people are usually not aware of, with more than one way to develop your module. I have a repo that’s a set of plugin templates here, that should be handy for your studies: https://github.com/michaelpq/pg_plugins
Note that this one mentions 借地料, meaning that you are not an owner of the land, but the government is lending it to you for a yearly fee of 220,000 yens.
- disable for all
- disable only for my session, to check what would happen with the plan, and only then decide to proceed with disabling for all (or to drop it)
ALTER is quite invasive way, even more than "UPDATE .. SET indisvalid = false ...". It would be good to do it via SET as it was proposed in the plantuner extension long ago.
Also, Postgres 12 introduced pluggable storage, which might help to implement a shared-nothing architecture without huge changes to vanilla Postgres (I haven't looked at how large their delta is)
Citus Data enables scale out, while also being a pure extension. That means you can upgrade Postgres like normal to the latest point release using whatever normal upgrade process you want (e.g. OS packages).
It has worked for a long time without the need for Postgres 12. However, the new APIs introduced in v12 did enable us to offer columnar compression as an option, which complements a lot of scale-out use cases.
OTOH the PolarDB specific changes seem to be contained enough that if you decide to run it in production, you can probably just apply most of the changes from the v11 branch yourself.
But I agree it's not a very good look to code-drop something on a .2 release when there's been 2,5 years of fixes.
Even if the conflicts are minor, it's going to be annoying to try to work it out. If you are hitting a specific crash, there's a good chance you can backport the fix cleanly, but I doubt you can just pull in all of the fixes proactively without some knowledge of the details of the fork.
I haven't really looked at the details... perhaps PolarDB already has many (or all) of the fixes since 11.2. Also I haven't actually tried a merge, I'm just assuming the difficulty based on the number of diffs (and my experience doing minor version merges in the past).
(Disclaimer: I work for Citus Data. Citus takes the approach of a pure extension, which means it works on unmodified Postgres, and minor upgrades typically don't interfere at all.)
I'd guess that this may fit your purpose to add a custom format without having to fork upstream.