There’s been a lot of online discussion about NoSQL this year. Thomas Gideon produced a podcast episode about using NoSQL vs joins. This morning I was pointed to this Highscalability post about Reddit. Quoting:
There are no joins in the database and you must manually enforce consistency. No joins means it’s really easy to distribute data to different machines. You don’t have to worry about foreign keys are doing joins or how to split the data up. Worked out really well. Worries of using a relational database are a thing of the past.
This generalized use of databases really illustrates the zeitgeist of rapid development. NoSQL projects are aiming right there: quick rev time, low schema impedence, built-in replication, no assumption of join usage because a document-oriented database model often presumes the more expensive computation of data subsets at the application level anyhow. In a pre-computed/batched-result environment, this supplants the need to sweat over high-performance joins.
Gideon believes that proper use of SQL databases are efficient and performant, but that novice uses and ORM devices (Hibernate, e.g.) are commonly over-applied so that performance at scale quickly dips. The comment on cmdln’s NoSQL rant by Mr characterizes the choice succinctly:
It does distribution of data out of the box, that is, it is so simplified, ingrained in the product that you don’t even think twice about them. But with SQL databases, sharding, distribution is an afterthought. Not that you cannot DO these with SQL databases, it’s just that with nosql these tasks are SIMPLER. Included in the product from day one.
I’ve certainly thought a fair amount on the topic of partitioning and sharding in light of the fact that MySQL 5.0 does not provide these (though 5.1 and later are better for it). But the whole notion of using a document-oriented database is very attractive when the majority of your operations are simply not relational.