[ale] WooHoo!! PostgreSQL 9.1!

Sat Sep 17 20:04:40 EDT 2011

On Sat, 2011-09-17 at 18:05 -0400, Michael H. Warfield wrote:
> > MySQL is more like SQLite than most people would like to think.
> Except
> > that even SQLite has better support for data integrity in many
> > situations, and they consider themselves to be not a replacement for
> an
> > RDBMS, but a replacement for the fopen() C library call.
> 
> I don't know if I would quite go that far.  That seems a little
> judgemental against MySQL.  I've seen a number of pages support both
> (and Postgresql) but I've also seen pages support MySQL and Postgresql
> and NOT SQLite.  I don't know the reason for it, if it's authors
> preference, experience, or lacking some feature required.  I wouldn't
> disparage MySQL (other than what's happened to it at the hands of
> Oracle) but I now prefer Postgresql, myself.  You're mileage may
> vary. 

The reason that I said that was because of the default storage engine
for MySQL.  The default storage engine for MySQL---and the most popular
one, at that---does not support transactions at all.  It does not
support multi-version consistency control in any way.  A multi-threaded
or multi-process application has every chance to have concurrent updates
which clash with each other because of that.

SQLite is better than MySQL in this regard.  It doesn't support MVCC,
but it does support transactions and it accomplishes that via a crude
form of locking.  As the authors and maintainers hold, it is not a
replacement for a RDBMS, though it can be used as one in low-concurrency
situations without a great deal of contention.  It can scale as far as
primitive file locks will allow, essentially.

Now, neither MySQL (with the MyISAM storage engine) nor SQLite support
referential integrity.  It is strictly advisory.  In other words, your
application can just puke a bunch of bits all over the place, and your
database software will happily let it.  This necessarily improves
"performance", FSVO "performance", anyway.  Personally, I look at
performance the same way that I look at someone's typing speed:  I will
prefer to use a typist for a transcription job if they type 60 WPM with
99% accuracy over someone who can do 120 WPM at 60% accuracy.  Yes, the
latter's raw throughput is better more often than not, but reliability
is part of the performance equation the way that I see it.

PostgreSQL, OTOH, does both transactions and MVCC.  If transactions
aren't explicitly being used, then every single statement is
automatically wrapped in a transaction.  This is about as strong as you
can get when it comes to the protection of the data and its integrity.
You can have savepoints and rollback points, and you don't have to worry
about strange failure modes like "what happens if my application is
running on table type 'foo' that doesn't support the semantics that it
expects?".

The use of PostgreSQL for production applications simplifies application
design, too, which is (at least as far as I'm concerned) something that
is very worthwhile.  You do not have to build in any overhead in terms
of checking that the database is doing sane things.  You do not have to
worry about whether or not transactions are enabled.  You do not have to
worry about whether or not foreign keys are supported, nor do you have
to worry about whether or not all of the semantics of them (for example
ON DELETE CASCADE or ON DELETE RESTRICT or similar ON UPDATE values) are
supported and/or enforced.

Additionally, applications running on PostgreSQL have a rich set of data
types to choose from.  If the type you need to represent something
doesn't exist (which in practice I have found to be a very rare event),
you can create a data type in one of two ways: you can create a compound
type in SQL, or you can create a truly custom data type using C code
that you load into the PostgreSQL server as a module.  As I said,
though, in practice this is usually not needed; more often than not, if
someone thinks that they need a truly custom data type that is typically
a sign that the data storage needs to be thought about again.  Still,
this makes PostgreSQL truly versatile in ways that other database
systems can only dream of.

And the last thing I'll mention about PostgreSQL is this:  backups are
simple, stupid simple.  With MySQL, if you have even a single MyISAM
table, the only way to perform the backup with any guarantee that it is
consistent is to make the server unavailable to all applications so that
data isn't changed underneath its back.  There is a way to do this using
its backup program by taking out a global lock on all databases; this
has the same effect as shutting the server down if the backup takes more
than a few minutes, though, because all clients will be hung on the DB
server (and there is usually a timeout that will be triggered in such an
event).  So to do that cleanly, you have to put your applications in a
maintenance mode of some sort.

In PostgreSQL, you can simply do the backup.  Each database will be
backed up in the context of a transaction.  The transaction ensures that
the view of data will not change for the duration of the transaction.
Simple, easy, database-wide, and you don't have to bring the machine
down in order to get a clean, consistent backup.  Applications can still
continue operations as if the server weren't being backed up without a
problem.  Plus you can take the full (consistent!) backup and add to it
all streaming updates since the start of the transaction for the backup
process and you can do point-in-time-recovery to even just moments after
the backup started.

In short, I don't see much of a reason to use any database system other
than PostgreSQL for 95% of applications out there.  The remaining 5%
that have truly special requirements, IME, are best suited for a
completely different type of solution anyway.

	--- Mike

-- 
A man who reasons deliberately, manages it better after studying Logic
than he could before, if he is sincere about it and has common sense.
                                  --- Carveth Read, “Logic”