Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can Cockroach do the equivalent of a "select ... for update" (e.g., PostgreSQL), where you lock one thing while applying changes elsewhere?

Concrete example: We have app that has a "documents" table and a "translog" table. The translog is like a series of diff-patches, representing changes to the documents. When we write to the translog, we first lock the document with a "select ... for update", so that no intervening translog entries can be written concurrently against the same document, then we patch the document, and then we write the translog entry and commit.

We do this with Postgres, and we can do the same thing with Redis' MULTI since Redis is completely single-threaded. I can't think of any other NoSQL data store that allows a similar "lock A, update A, insert B, unlock A"; for example, Cassandra's "lightweight transactions" are only transactional in the context of a single row.

(By "lock" I'd also accept optimistic locking, where you can retry on failure.)



CockroachDB is optimistically concurrent, so there is not locking. However, your use case is definitely possible.

The transaction would: 1. Read the current document (i'm assuming this needs to be done to compute the translog). 2. Read the latest ID in the translog table 3. Write a new entry to translog with ID+1 4. Write the document.

If any other transaction interleaves with this process (by either reading or writing one of the same keys in a way that would violate isolation), one of the two transactions will be aborted.


(employee here)

It seems to me that your use-case does not require locking specifically - you just want to make sure no concurrent transactions can clobber your "update A".

As mrtracy explained, such overlapping transactions are linearizable in CockroachDB, so this invariant is preserved without the need for explicit locking.


Hi, thanks for responding.

What I need is for our translog to reflect the order of updates. So if diff A was applied before B, then the translog order also needs to be A, B. (The order only needs to be consistent per document.)

This is because we have listeners — through APIs — that play the translog as it happens and maintain various state based on it.

Currently, the translog is ordered by a sequential number (because it's cheap in Postgres), but every entry also records the ID of the previous entry (so B will point at A). One could sort by time and then reorder by causality before emitting the linear log to consumers, but that would of course be more complicated than one that is already linear.


I think it does require locking, because in PostGres (or Oracle) readers do not block writers and writers do not block readers. So to be sure you update the same version you read, you have to select...for update.


Having serializable transactions is equivalent to adding "FOR UPDATE" to every SELECT statement, so it sounds like CockroachDB already does what you want.

A typical RDBMS will prevent conflicts by forcing queries to block until they can be executed in a conflict-free ordering. CockroachDB instead detects conflicts after the fact and prevents inconsistent transactions from committing, forcing them to retry. The end result -- that is, the set of possible outcomes of a series of transactions -- is the same, but the performance characteristics will be different.


GP said that the use case does not require locking, but in PgSQL (which was mentioned) or Oracle, it does. The default transaction isolation level is not serializable. You don't read uncommitted updates, but reads are not repeatable unless you explicitly ask for that. If you do something like this (in a transaction):

   select ... from T where <condition>;
   ...
   update T ... where <condition>;
   commit;
there is no guarantee that the row you are updating is the same as the one you selected, unless you add "for update" to the select.

Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

-- http://www.postgresql.org/docs/current/static/transaction-is...

A query acquires no data locks. Therefore, other transactions can query and update a table being queried, including the specific rows being queried. Because queries lacking FOR UPDATE clauses do not acquire any data locks to block other operations, such queries are often referred to in Oracle as nonblocking queries.

-- http://docs.oracle.com/cd/B19306_01/server.102/b14220/consis...


Sorry, I don't understand this comment because I can't tell if you're disagreeing with me about anything.

> GP said that the use case does not require locking, but in PgSQL (which was mentioned) or Oracle, it does.

Right, it does in a typical RDBMS, but not in CockroachDB. The definition of an isolation level is defined in terms of what interactions are possible between concurrent successful transactions. Locks, or the lack of locks, are an implementation detail.

> The default transaction isolation level is not serializable. ... Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

I agree. If you set the isolation level to "serializable", such anomalies aren't possible, even if you don't use FOR UPDATE.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: