MySQL vs. PostgreSQL
Posted: Wed Oct 08, 2008 1:20 pm
As much as it pains me here, I really must ask this question. I'm looking for experienced answers from people in the know, who can back up their stuff with facts. I'd like to ask the moderators here to help me control this from being a trollfest, because I can see it easily becoming one and I do not wish it to be so. I'm going to try and be as academic as possible here.
First and foremost, I love both MySQL and PostgreSQL. I have equal experience on both for several years, although I'm certain many of you smart guys out there know more than me, so please do share.
So here's the issue. I'm starting on a job search site for a niche job market in the UK. We need to determine the best database type for the project, using Linux and free stuff. As you might expect, it will have hundreds of people simultaneously updating CVs and doing job searches on it, so we'll get a healthy dose of both reads and writes, and perhaps more reads than writes most of the time.
We're in the discussions to talk about whether to use MySQL or PostgreSQL on our dedicated server. I hope I have my facts straight, so here's what I'm going to say on this debate:
- PostgreSQL supports row-level locking on its tables, and it's been the default since many years ago. I started with PostgreSQL back in 2002, I believe, and I know that at least then it had row-level locking. MySQL, however, uses MyISAM tables by default, which do not support anything but table-level locking. Now, they introduced InnoDB around, say, 2004?, and it supports row-level locking, but not without its disadvantages. Foremost, InnoDB does not support full-text search. So, one has to either use an add-on called Phoenix (not easily installed with most web hosting providers, or even an option with them), or create a mix of tables of both InnoDB and MyISAM so that you can get the advantages of each when you need them. By default it's probably better to make all tables in InnoDB mode when you start a project, and then to think about what columns need to move into MyISAM tables, linked by a foreign key.
- If you use table-level locking with MySQL, in other words MyISAM, you're looking at the question of what happens when the server crashes for any reason. With row-level locking, you stand to lose a lesser amount of transactions than, say, queued up transactions that were waiting to get their table-level lock with MyISAM. I also thought I read, but have not confirmed, that when a MyISAM database is running down a list of table-locked transactions and is stopped suddenly when a server is powered off, it does not run failed transactions on boot. However, with PostgreSQL, it does run failed transactions at boot, and, at most, you might lose a far lesser amount of transactions than MySQL.
- MySQL's full-text search was a great disappointment. It does not support certain words, and it does not support any search phrase less than 3 characters. As for performance on the latest MySQL, I almost saw the same performance between doing a LIKE query and doing a full-text search query.
- PostgreSQL integrated full-text search starting with 8.3, but had been working with it as an add-on module for about 4 years. I know nothing more about this because I was doing just fine on LIKE queries for awhile. This is something I need to learn the advantages and disadvantages about.
- The last legitimate study I heard was that if you pair up PostgreSQL versus MySQL and run tests, MySQL fairs better only up to 10 concurrent users. If you go to 11+ users, PostgreSQL fairs better and then gets dramatically better upon the more transactions you throw at it, compared to MySQL. I don't know if this still holds true today, or whether if you use a particular table type with MySQL it stands to fair better than PostgreSQL. I also don't know if MySQL or PostgreSQL had their conf files tweaked to utilize the best performance possible on the server.
- PostgreSQL requires tweaks, there is no denying, to make it run well on a given server. Sure, by default it will work, but you'll get weaker performance unless you tweak the postgresql.conf file, which isn't exactly the easiest file to understand, and even then, I didn't find any clear cut algorithm on the web saying anything definitive about the best way to edit that file -- everyone seemed to have an opinion.
- Many people don't mention MySQL needing those kinds of tweaks, but sure enough it has conf files in /etc/mysql (your server may have them elsewhere). However, the tweaks one does have to do is to mix InnoDB and MyISAM tables, and deal with the hassles of that, rather than sticking with one table type.
- MySQL went with a bad strategy of not using file extents with their databases until, I think, recently (2003?) when they introduced InnoDB. Most database platforms in the world use file extents. The two great advantages of using them are that the database server can work against one file extent while another thread can work against another extent, and if the transaction is lucky to not need data from the other extent, it can improve the performance of that transaction. The other problem with not using that strategy was that they relied on how big the file system would let you grow the file. For a good while, Linux only supported file sizes up to 2GB. Now they support very large file sizes and this is not an issue anymore. However, back in 2000 - 2002, it was a tremendously important issue and swayed a lot of serious developers to choose PostgreSQL over MySQL. this is because PostgreSQL, since its inception, used file extents (multiple files for the database) instead of single file or dual-file databases. So, what I'm trying to say is that PostgreSQL has had a longer record of bug fixes and management of file extents than the guys with MySQL.
- By default, you can't connect to PostgreSQL 8.3 remotely from another server on your LAN unless you edit a pg_hba.conf file, which, by the way, isn't the easiest thing to set up and understand, especially if you want to use SSL keys or MD5. (Me? I just use plain text passwords because I don't know the other techniques.) MySQL, on the other hand, does not have this kind of security restriction. Instead, they leave that up to the job of the firewall locally on the server, or the firewall on the LAN, to manage that. I actually like MySQL's strategy better -- it's easier to manage, and mostly all other daemons I know on a given server work in this manner. I think this is a bad dose of marketing for PostgreSQL and hope that they get rid of the pg_hba.conf strategy, and, instead, do things the MySQL way.
- Tech support. There are far more places for PostgreSQL tech support than MySQL, right? I mean one is proprietary, while the other is open source. The problem is that you may not have a definitive form of tech support straight from a company that has its hands on the developers. Well, the workaround is EnterpriseDB.com corp, which I think can be persuaded to support your PostgreSQL database, and can also help you migrate to their version of PostgreSQL that has added on stuff to make it even better. And from there, EnterpriseDB.com will provide tech support to an even greater degree. To boot, two of the core devs on PostgreSQL work at EnterpriseDB.
So, please correct my facts here if I'm wrong, and let me know what your opinion is on MySQL vs. PostgreSQL.
First and foremost, I love both MySQL and PostgreSQL. I have equal experience on both for several years, although I'm certain many of you smart guys out there know more than me, so please do share.
So here's the issue. I'm starting on a job search site for a niche job market in the UK. We need to determine the best database type for the project, using Linux and free stuff. As you might expect, it will have hundreds of people simultaneously updating CVs and doing job searches on it, so we'll get a healthy dose of both reads and writes, and perhaps more reads than writes most of the time.
We're in the discussions to talk about whether to use MySQL or PostgreSQL on our dedicated server. I hope I have my facts straight, so here's what I'm going to say on this debate:
- PostgreSQL supports row-level locking on its tables, and it's been the default since many years ago. I started with PostgreSQL back in 2002, I believe, and I know that at least then it had row-level locking. MySQL, however, uses MyISAM tables by default, which do not support anything but table-level locking. Now, they introduced InnoDB around, say, 2004?, and it supports row-level locking, but not without its disadvantages. Foremost, InnoDB does not support full-text search. So, one has to either use an add-on called Phoenix (not easily installed with most web hosting providers, or even an option with them), or create a mix of tables of both InnoDB and MyISAM so that you can get the advantages of each when you need them. By default it's probably better to make all tables in InnoDB mode when you start a project, and then to think about what columns need to move into MyISAM tables, linked by a foreign key.
- If you use table-level locking with MySQL, in other words MyISAM, you're looking at the question of what happens when the server crashes for any reason. With row-level locking, you stand to lose a lesser amount of transactions than, say, queued up transactions that were waiting to get their table-level lock with MyISAM. I also thought I read, but have not confirmed, that when a MyISAM database is running down a list of table-locked transactions and is stopped suddenly when a server is powered off, it does not run failed transactions on boot. However, with PostgreSQL, it does run failed transactions at boot, and, at most, you might lose a far lesser amount of transactions than MySQL.
- MySQL's full-text search was a great disappointment. It does not support certain words, and it does not support any search phrase less than 3 characters. As for performance on the latest MySQL, I almost saw the same performance between doing a LIKE query and doing a full-text search query.
- PostgreSQL integrated full-text search starting with 8.3, but had been working with it as an add-on module for about 4 years. I know nothing more about this because I was doing just fine on LIKE queries for awhile. This is something I need to learn the advantages and disadvantages about.
- The last legitimate study I heard was that if you pair up PostgreSQL versus MySQL and run tests, MySQL fairs better only up to 10 concurrent users. If you go to 11+ users, PostgreSQL fairs better and then gets dramatically better upon the more transactions you throw at it, compared to MySQL. I don't know if this still holds true today, or whether if you use a particular table type with MySQL it stands to fair better than PostgreSQL. I also don't know if MySQL or PostgreSQL had their conf files tweaked to utilize the best performance possible on the server.
- PostgreSQL requires tweaks, there is no denying, to make it run well on a given server. Sure, by default it will work, but you'll get weaker performance unless you tweak the postgresql.conf file, which isn't exactly the easiest file to understand, and even then, I didn't find any clear cut algorithm on the web saying anything definitive about the best way to edit that file -- everyone seemed to have an opinion.
- Many people don't mention MySQL needing those kinds of tweaks, but sure enough it has conf files in /etc/mysql (your server may have them elsewhere). However, the tweaks one does have to do is to mix InnoDB and MyISAM tables, and deal with the hassles of that, rather than sticking with one table type.
- MySQL went with a bad strategy of not using file extents with their databases until, I think, recently (2003?) when they introduced InnoDB. Most database platforms in the world use file extents. The two great advantages of using them are that the database server can work against one file extent while another thread can work against another extent, and if the transaction is lucky to not need data from the other extent, it can improve the performance of that transaction. The other problem with not using that strategy was that they relied on how big the file system would let you grow the file. For a good while, Linux only supported file sizes up to 2GB. Now they support very large file sizes and this is not an issue anymore. However, back in 2000 - 2002, it was a tremendously important issue and swayed a lot of serious developers to choose PostgreSQL over MySQL. this is because PostgreSQL, since its inception, used file extents (multiple files for the database) instead of single file or dual-file databases. So, what I'm trying to say is that PostgreSQL has had a longer record of bug fixes and management of file extents than the guys with MySQL.
- By default, you can't connect to PostgreSQL 8.3 remotely from another server on your LAN unless you edit a pg_hba.conf file, which, by the way, isn't the easiest thing to set up and understand, especially if you want to use SSL keys or MD5. (Me? I just use plain text passwords because I don't know the other techniques.) MySQL, on the other hand, does not have this kind of security restriction. Instead, they leave that up to the job of the firewall locally on the server, or the firewall on the LAN, to manage that. I actually like MySQL's strategy better -- it's easier to manage, and mostly all other daemons I know on a given server work in this manner. I think this is a bad dose of marketing for PostgreSQL and hope that they get rid of the pg_hba.conf strategy, and, instead, do things the MySQL way.
- Tech support. There are far more places for PostgreSQL tech support than MySQL, right? I mean one is proprietary, while the other is open source. The problem is that you may not have a definitive form of tech support straight from a company that has its hands on the developers. Well, the workaround is EnterpriseDB.com corp, which I think can be persuaded to support your PostgreSQL database, and can also help you migrate to their version of PostgreSQL that has added on stuff to make it even better. And from there, EnterpriseDB.com will provide tech support to an even greater degree. To boot, two of the core devs on PostgreSQL work at EnterpriseDB.
So, please correct my facts here if I'm wrong, and let me know what your opinion is on MySQL vs. PostgreSQL.