danieltoshea's comments

danieltoshea · on March 23, 2020

BlueOwl LLC | Privacy, Backend, Data Engineering, Front End, SDET, Analytics | San Francisco, CA or Portland, OR | ONSITE hiring distributed during the pandemic.

We’re here to create a safer, happier and more mindful future for all with the help of data science, engineering, design, and mobile technology. We're starting by reinventing insurance, by rethinking the technologies that enable it, but our true goal is to build a platform that rewards people for driving well — creating safer roads with fewer accidents in the process.

Apply online - https://jobs.lever.co/blueowl/

moriartyj · on March 23, 2020

All of these seem to be in CA and RI. Is there actually something in Portland?

danieltoshea · on July 16, 2017

8tracks could have a backup system that uses Github as the identity provider, the backup data wouldn't have to reside in a repo on Github. As a PSA to others using Github organizations and allowing login via Github, there is a setting you can turn on to require MFA for all the members of the org.

https://help.github.com/articles/requiring-two-factor-authen...

danieltoshea · on May 4, 2016

How does this work if the clocks drift between the nodes? Does this allow incorrect behavior because one transaction looks like it happened before another?

mrtracy · on May 4, 2016

(blog author here)

Interestingly, clock drift does not affect the serializability of the transaction history; this system guarantees that the history is serializable, regardless of clock drift.

However, "serializable" only means that the history is equivalent to some serial ordering of transactions - it makes no guarantee that the equivalent serial ordering is consistent with the real-time ordering of the involved transactions. A history with that property (agrees with real-time) is termed "linearizable", and requires additional rules to guarantee in an environment with clock drift.

As mentioned by knz42, there was another Cockroach Labs blog post (written by Spencer Kimball) that addressed this in some detail; that blog post contrasted our strategy for dealing with drift with that of Google's Spanner.

A quick overview of CockroachDB's properties re linearizability: it guarantees that access to any individual key is linearizable, and by composition any two transactions which share a key (that one of the transactions modifies) will be linearizable with respect to each other. However, if two transactions do not have any overlap in modified keys, Cockroach does not (by default) guarantee the resulting commit history is linearizable. CockroachDB's underlying KV layer does have a "linearizable" flag on transactions that can guarantee this, but it requires that transactions be slowed down considerably; Spencer's blog post addresses some other strategies that CockroachDB is considering to address the issue.

atombender · on May 5, 2016

Can Cockroach do the equivalent of a "select ... for update" (e.g., PostgreSQL), where you lock one thing while applying changes elsewhere?

Concrete example: We have app that has a "documents" table and a "translog" table. The translog is like a series of diff-patches, representing changes to the documents. When we write to the translog, we first lock the document with a "select ... for update", so that no intervening translog entries can be written concurrently against the same document, then we patch the document, and then we write the translog entry and commit.

We do this with Postgres, and we can do the same thing with Redis' MULTI since Redis is completely single-threaded. I can't think of any other NoSQL data store that allows a similar "lock A, update A, insert B, unlock A"; for example, Cassandra's "lightweight transactions" are only transactional in the context of a single row.

(By "lock" I'd also accept optimistic locking, where you can retry on failure.)

mrtracy · on May 5, 2016

CockroachDB is optimistically concurrent, so there is not locking. However, your use case is definitely possible.

The transaction would: 1. Read the current document (i'm assuming this needs to be done to compute the translog). 2. Read the latest ID in the translog table 3. Write a new entry to translog with ID+1 4. Write the document.

If any other transaction interleaves with this process (by either reading or writing one of the same keys in a way that would violate isolation), one of the two transactions will be aborted.

tdrd · on May 5, 2016

(employee here)

It seems to me that your use-case does not require locking specifically - you just want to make sure no concurrent transactions can clobber your "update A".

As mrtracy explained, such overlapping transactions are linearizable in CockroachDB, so this invariant is preserved without the need for explicit locking.

atombender · on May 5, 2016

Hi, thanks for responding.

What I need is for our translog to reflect the order of updates. So if diff A was applied before B, then the translog order also needs to be A, B. (The order only needs to be consistent per document.)

This is because we have listeners — through APIs — that play the translog as it happens and maintain various state based on it.

Currently, the translog is ordered by a sequential number (because it's cheap in Postgres), but every entry also records the ID of the previous entry (so B will point at A). One could sort by time and then reorder by causality before emitting the linear log to consumers, but that would of course be more complicated than one that is already linear.

ams6110 · on May 5, 2016

I think it does require locking, because in PostGres (or Oracle) readers do not block writers and writers do not block readers. So to be sure you update the same version you read, you have to select...for update.

teraflop · on May 5, 2016

Having serializable transactions is equivalent to adding "FOR UPDATE" to every SELECT statement, so it sounds like CockroachDB already does what you want.

A typical RDBMS will prevent conflicts by forcing queries to block until they can be executed in a conflict-free ordering. CockroachDB instead detects conflicts after the fact and prevents inconsistent transactions from committing, forcing them to retry. The end result -- that is, the set of possible outcomes of a series of transactions -- is the same, but the performance characteristics will be different.

ams6110 · on May 5, 2016

GP said that the use case does not require locking, but in PgSQL (which was mentioned) or Oracle, it does. The default transaction isolation level is not serializable. You don't read uncommitted updates, but reads are not repeatable unless you explicitly ask for that. If you do something like this (in a transaction):

   select ... from T where <condition>;
   ...
   update T ... where <condition>;
   commit;

there is no guarantee that the row you are updating is the same as the one you selected, unless you add "for update" to the select.

Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

-- http://www.postgresql.org/docs/current/static/transaction-is...

A query acquires no data locks. Therefore, other transactions can query and update a table being queried, including the specific rows being queried. Because queries lacking FOR UPDATE clauses do not acquire any data locks to block other operations, such queries are often referred to in Oracle as nonblocking queries.

-- http://docs.oracle.com/cd/B19306_01/server.102/b14220/consis...

teraflop · on May 5, 2016

Sorry, I don't understand this comment because I can't tell if you're disagreeing with me about anything.

> GP said that the use case does not require locking, but in PgSQL (which was mentioned) or Oracle, it does.

Right, it does in a typical RDBMS, but not in CockroachDB. The definition of an isolation level is defined in terms of what interactions are possible between concurrent successful transactions. Locks, or the lack of locks, are an implementation detail.

> The default transaction isolation level is not serializable. ... Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

I agree. If you set the isolation level to "serializable", such anomalies aren't possible, even if you don't use FOR UPDATE.

knz42 · on May 4, 2016

This was discussed earlier on the same blog: https://www.cockroachlabs.com/blog/living-without-atomic-clo...

Short answer: the DB will check for drifts and correct small drifts automatically.

mankurt2 · on May 4, 2016

Hybrid logical and physical clocks. http://muratbuffalo.blogspot.com/2014/07/hybrid-logical-cloc...

sriram_malhar · on May 5, 2016

For serializability, all you care about it is some sequential order. You get that with hybrid logical clocks (http://www.cse.buffalo.edu/tech-reports/2014-04.pdf). It gives you a monotonically increasing timestamp that you can use instead of dumb version numbers.

On the other hand, if you want to ensure linearizability, you do care about the worst case clock drift, which in CockroachDB is a configurable parameter. One can adopt Google's Spanner's approach ("commit wait"), which is to wait out the response to the client to ride out NTP uncertainty (typically a few milliseconds inside a data center, but 100s of milliseconds in the wide area).

danieltoshea · on Dec 8, 2015

It is too bad they decided not to open source Mailbox. I would have liked to run my own mailbox service perhaps as a docker container.

danieltoshea · on July 11, 2012

Happy to answer any questions anyone may have here. I am on the team building storyteller :-)

joshstrange · on July 11, 2012

Looks like a very cool concept, any idea on when invite will actually be sent out? Also do you have any public docs or other content to look over?

danieltoshea · on Oct 15, 2007

No gotcha's with Joyent so far. I've been using Solaris for a couple years so Matt leaves the sys admin stuff to me. We also use webmin which makes things much much easier.

The git bundle is pretty basic (add, commit, pull, and push) if you are interested in it shoot me an email and I will send it to you.