Do Not Pass This Way Again

Considering MySQL? Use something else. Already on MySQL? Migrate. For every successful project built on MySQL, you could uncover a history of time wasted mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense of accomplishment over the effort spent making MySQL behave.

Thesis: databases fill roles ranging from pure storage to complex and interesting data processing; MySQL is differently bad at both tasks. Real apps all fall somewhere between these poles, and suffer variably from both sets of MySQL flaws.

Much of this is inspired by the principles behind PHP: A Fractal of Bad Design. I suggest reading that article too -- it's got a lot of good thought in it even if you already know to stay well away from PHP. (If that article offends you, well, this page probably will too.)

Storage

Storage systems have four properties:

  1. Take and store data they receive from applications.
  2. Keep that data safe against loss or accidental change.
  3. Provide stored data to applications on demand.
  4. Give administrators effective management tools.

In a truly "pure" storage application, data-comprehension features (constraints and relationships, nontrivial functions and aggregates) would go totally unused. There is a time and a place for this: the return of "NoSQL" storage systems attests to that.

Pure storage systems tend to be closely coupled to their "main" application: consider most web/server app databases. "Secondary" clients tend to be read-only (reporting applications, monitoring) or to be utilities in service of the main application (migration tools, documentation tools). If you believe constraints, validity checks, and other comprehension features can be implemented in "the application", you are probably thinking of databases close to this pole.

Storing Data

MySQL has many edge cases which reduce the predictability of its behaviour when storing information. Most of these edge cases are documented, but violate the principle of least surprise (not to mention the expectations of users familiar with other SQL implementations).

Preserving Data

... against unexpected changes: like most disk-backed storage systems, MySQL is as reliable as the disks and filesystems its data lives on. MySQL provides no additional functionality in terms of mirroring or hardware failure tolerance (such as Oracle ASM). However this is a limitation shared with many, many other systems.

When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page checksums in order to detect corruption caused by underlying storage. However, many third-party software applications, as sell as users upgrading from earlier versions of MySQL may be using MyISAM, which will frequently corrupt data files on improper shutdown.

The implicit conversion rules that bite when storing data also bite when asking MySQL to modify data - my favourite example being a fat-fingered UPDATE query where a mistyped = (as -, off by a single key) caused 90% of the rows in the table to be affected, instead of one row, because of implicit string-to-integer conversions.

... against loss: hoo boy. MySQL, out of the box, gives you three approaches to backups:

If neither of these are sufficient, you're left with purchasing a backup tool from Oracle or from one of the third-party MySQL vendors.

Like many of MySQL's features, the binary logging feature is too configurable, while still, somehow, defaulting to modes that are hazardous or surprising: the default behaviour is to log SQL statements, rather than logging their side effects. This has lead to numerous bugs over the years; MySQL (now) makes an effort to make common "non-deterministic" cases such as NOW() and RANDOM() act deterministically but these have been addressed using ad-hoc solutions. Restoring binary-log-based backups can easily lead to data that differs from the original system, and by the time you've noticed the problem, it's too late to do anything about it.

(Seriously. The binary log entries for each statement contain the "current" time on the master and the random seed at the start of the statement, just in case. If your non-deterministic query uses any other function, you're still fucked by default.)

Additionally, a number of apparently-harmless features can lead to backups or replicas wandering out of sync with the original database, in the default configuration:

Retrieving Data

This mostly works as expected. Most of the ways MySQL will screw you happen when you store data, not when you retrieve it. However, there are a few things that implicitly transform stored data before returning it:

Efficiency

For purely store-and-retrieve applications, MySQL's query planner (which transforms the miniature program contained in each SQL statement into a tree of disk access and data manipulation steps) is sufficient, but only barely. Queries that retrieve data from one table, or from one table and a small number of one-to-maybe-one related tables, produce relatively efficient plans.

MySQL, however, offers a number of tuning options that can have dramatic and counterintuitive effects, and the documentation provides very little advice for choosing settings. Tuning relies on the administrator's personal experience, blog articles of varying quality, and consultants.

Data Processing

Data processing encompasses tasks that require making decisions about data and tasks that derive new data from existing data. This is a huge range of topics:

The further towards data processing tasks applications move, the more their SQL resembles tiny programs sent to the data. MySQL is totally unprepared for programs, and expects SQL to retrieve or modify simple rows.

Validity

Good constraints are like asserts: in an ideal world, you can't tell if they work, because your code never violates them. Here in the real world, constraint violations happen for all sorts of reasons, ranging from buggy code to buggy human cognition. A good database gives you more places to describe your expectations and more tools for detecting and preventing surprises. MySQL, on the other hand, can't validate your data for you, beyond simple (and fixed) type constraints:

I hope every client with write access to your data is absolutely perfect, because MySQL cannot help you if you make a mistake.

Summarizing and Deriving Data

SQL databases generally provide features for doing "interesting" things with sets of tuples, and MySQL is no exception. However, MySQL's limitations mean that actually processing data in the database is fraught with wasted money, brains, and time:

And now you know why MySQL advocates are such big fans of doing data processing in "the client" or "the app".

Alternate Representations and Derived Tables

Many databases let schema designers and administrators abstract the underlying "physical" table structure from the presentation given to clients, or to some specific clients, for any of a number of reasons. MySQL tries to let you do this, too! And fumbles it quite badly.

Hosting Logic In The Database

Yeah, yeah, the usual reaction to stored procedures and in-DB code is "eww, yuck!" for some not-terrible reasons, but hear me out on two points:

MySQL, as of version 5.0 (released in 2003 -- remember that date, I'll come back to it), has support for in-database code via a procedural SQL-like dialect, like many other SQL databases. This includes server-side procedures (blocks of stored code that are invoked outside of any other statements and return statement-like results), functions (blocks of stored code that compute a result, used in any expression context such as a SELECT list or WHERE clause), and triggers (blocks of stored code that run whenever a row is created, modified, or deleted).

Given the examples of other contemporaneous procedural languages, MySQL's procedural dialect contains some very strange and unfortunate design choices:

The net result is that developing MySQL stored programs is unpleasant, uncomfortable, and far more error-prone than it could have been.

Why Is MySQL The Way It Is?

MySQL's technology and history contain the seeds of all of these flaws.

Pluggable Storage Engines

Very early in MySQL's life, the MySQL dev team realized that MyISAM was not the only way to store data, and opted to support other storage backends within MySQL. This is basically an alright idea; while I personally prefer storage systems that focus their effort on making one backend work very well, supporting multiple backends and letting third-party developers write their own is a pretty good approach too.

Unfortunately, MySQL's storage backend interface puts a very low ceiling on the ways storage backends can make MySQL behave better.

MySQL's data access paths through table engines are very simple: MySQL asks the engine to open a table, asks the engine to iterate through the table returning rows, filters the rows itself (outside of the storage engine), then asks the engine to close the table. Alternately, MySQL asks the engine to open a table, asks the engine to retrieve rows in range or for a single value over a specific index, filters the rows itself, and asks the engine to close the table.

This simplistic interface frees table engines from having to worry about query optimization - in theory. Unfortunately, engine-specific features have a large impact on the performance of various query plans, but the channels back to the query planner provide very little granularity for estimating cost and prevent the planner from making good use of the engine in unusual cases. Conversely, the table engine system is totally isolated from the actual query, and can't make query-dependent performance choices "on its own". There's no third path; the query planner itself is not pluggable.

Similar consequences apply to type checking, support for new types, or even something as "obvious" as multiple automatic TIMESTAMP columns in the same table.

Table manipulation -- creation, structural modification, and so on -- runs into similar problems. MySQL itself parses each CREATE TABLE statement, then hands off a parsed representation to the table engine so that it can manage storage. The parsed representation is lossy: there are plenty of forms MySQL's parser recognizes that aren't representable in a TABLE structure, preventing engines from implementing, say, column or tuple CHECK constraints without MySQL's help.

The sheer number of table engines makes that help very slow in coming. Any change to the table engine interface means perturbing the code to each engine, making progress on new MySQL-level features that interact with storage such as better query planning or new SQL constructs necessarily slow to implement and slow to test.

Poor Priorities

Early on, the MySQL team focused on pure read performance and on "ease of use" (for new users with simple needs, as far as I can tell) over correctness and completeness, violating Knuth's laws of optimization. Many of these decisions locked MySQL into behaviours very early in its life that it still displays now. Features like implicit type conversions legitimately do help streamline development in very simple cases; experience with other languages unfortunately shows that the same behaviours sandbag development and help hide bugs in more sophisticated scenarios.

While the MySQL (and MariaDB, and Percona) teams have matured greatly, MySQL's massive and, frequently, not terribly database-heavy userbase makes it very hard to introduce breaking changes. At the same time, adding optional breaking changes via server and client mode flags (such as sql_mode) increases the cognitive burden involved in understanding MySQL's behaviours -- especially when that behaviour can vary from client to client, or when the server's configuration is out of your control (on a shared host).

A solution similar to Python's from __future__ import pragmas for making breaking changes opt-in some releases in advance of making them mandatory might help, but MySQL doesn't have the kind of highly-invested, highly-skilled user base that would make that effective -- and it still has all of the problems of modal behaviour.

The Inmates are Running The Asylum

What is this shit?

It wouldn't be so frustrating if I could assign poor faith to someone, but no, the MySQL folks are here to help.

Bad Arguments

Inevitably, someone's going to come along and tell me how wrong I am and how MySQL is just fine as a database system. These people are everywhere, and they mean well too, and they are almost all wrong. There are two good reasons to use MySQL:

  1. Some earlier group wrote for it, and we haven't finished porting our code off of MySQL.
  2. We've considered all of these points, and many more, and decided that ___feature_x___ that MySQL offers is worth the hassle.

Unfortunately, these aren't the reasons people do give, generally. The following are much more common:

comments powered by Disqus