Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

None of them seem all that conducive to source control or merging. Any good format for that?


Why would we expect data to be amenable to version control systems designed for source code?


I don't _expect_ it but I want it. Surely you can understand the power of storing a config file as tabular data.


There is a git diff driver for CSV that makes the CSV diffing and merging a lot better. https://paulfitz.github.io/2014/07/09/diff-merge-csv.html

Works like a charm!


Looks good, although I don't think I would want the diffs to be in cell. I might just end up writing my own merge tool that passes conflicting cells to a fallback tool.


CSV should be pretty easy to merge, given that it’s line based.


Although it's the easiest, same row cell changes still conflict unnecessarily.


Interestingly, tables in XML would be easy to merge, as long as they're formatted with one cell per line. E.g.

    <table>
      <row>
        <cell>1</cell>
        <cell>45</cell>
      </row>
    </table>


We use Subversion to manage CSV files pretty much constantly.


Don't put data in source control; use a database.


Laughs in Lisp

Code is Data would like to have a word


Code may be data but data isn't code.


Data is code that prints/evaluates to itself.


In some programming languages (such as PostScript, where evaluating as itself vs being executed, is a flag separate from the value's type, and can be changed by the cvlit and cvx operators), it is.


The higher-ups might be unhappy if you use that excuse after committing a gigabyte of customer data.


Executable data was never a good idea.


What if you are doing both, as in trying to source control your database. A lot of tools (liquibase for one) can use CSV files as the basis of read-mostly lookup tables for example. E.g. if I had a list of "product categories" that I wanted to be part of my database when I bootstrap a new environment to run my software. Liquibase can turn that CSV file into a set of upserts that will run if the file hash has changed. So you can maintain that file with just regular edit file > git commit, and it will ensure that table is always consistent with what you want and gives you a much easier to understand commit history for your database, especially as things inevitably get complicated.


Databases are measured in gigabytes and terabytes. If you put only a portion of it in source control, how do you back up the rest of your database? Furthermore, static data is a minor subset of a database. Data by its nature is volatile. Transactions make up 80% of the data. A projection or snapshot/summary is a the summarization of the daily/hourly/minute transactions. If you want a golden copy to bootstrap new environments, I would argue you are better off backing up that golden copy and restoring it using native database tools. Mashing together two very different paradigms will lead to long term disaster. There was a thread about a year ago, when a DevOps engineer was surprised when his "nonproduction" configuration code deleted his production database. If your database is a terabyte in size, by the time you restore the database -- you are already fired.


I am way late responding...

> If you put only a portion of it in source control, how do you back up the rest of your database? I think this is simply a misunderstanding of how these DB change management tools work and what they can do to help you with complex migrations over an applications lifetime.

You still back your DB up like normal.

Your your "mostly static" data is managed by running inserts/updates/deletes when data changes (either manually, or the tool can usually generate them for you), when you actually apply it, the software also records that the migration has been applied in a database change log table. That way when you want to update your database with the latest migrations, it'll only run what hasn't been applied yet.

That allows your standard backup/restore procedures to work just fine.

> If you want a golden copy to bootstrap new environments, I would argue you are better off backing up that golden copy and restoring it using native database tools

So this is essentially what we are doing with liquibase... Using database dumps without any data as our baseline.

Any DDL changes are managed through migration scripts.

There are a number of things that are not managed by migration script, and are instead apply-on-change in our project though.

We found it better to have our views stored as individual files in liquibase, and have them apply on change, because of nested dependencies and other issues with not having a good source of truth for the view definition.

Functions/procedures were another that are best treated as code rather than migrations. It allows you to deal with conflicts between branches with your standard merge tools.

Our "static" data that would only change when there is an application change is managed through csv files managed by liquibase that apply on any change. That needs to be in-sync with the version of the application deployed, so it makes sense to manage that along with the code.


I could use a SQLite file for tabular data/configs but that would still not support merging.


And put that under source control.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: