Imagine an online car/truck/vehicle classified listing site with lots and lots of entries in it. It starts off small and is quite manageable, but eventually it gets big.
Eventually you need to get serious with PHP and databases and quit "goofing around".
The first thing I realized was that I needed to use PostgreSQL instead of MySQL. But that's another argument for another thread because that war isn't over. So, let's not go there in this thread.
The second thing I realized was that I need to stop using pg_* API in PHP5 and use the PDO API in PHP5. It's finally time to do this, and eventually I think there will be less and less bug fixes with the pg_* API. Do you agree? I only hope it's going to be just as fast.
The third thing I realized was that I need to start caching connections. Right now, my pages all get a fresh database connection each time. Luckily, Linux has some superb memory management where frequently accessed stuff in memory gets a pretty decent performance. And PHP5, as well, helps that. But eventually I think I'm going to have to start using APC memcache to pool connections. Do you agree? I only hope I won't have a lot of trouble with this.
The fourth thing I realized was that PostgreSQL, out of box, is tuned way back on its performance capabilities. It has a conf file that can be tweaked in all kinds of ways to really provide a lot of punch. Unfortunately it's complex and not a lot of people seem to understand it compared to MySQL performance tweaks.
The fifth thing I realized was that I need to create some sort of transaction log and backup scheme that is serious. Now, for instance in my years with M$ SQL Server, we used to do a full backup of the database every day, and then run an incremental transaction log backup on the hour. In the event of a corrupt database, we would kick out the users of the app and then pinpoint the point of failure. We would then restore a backup from that timeframe and then rerun transaction log restores for those records right before the point of failure, and then test, and then let our users back in again. Well, with PostgreSQL, I'm trying to figure out what to do. I decided to do a live full backup every 6 hours, and then use a SQLite3 database to store all UPDATE, INSERT, and DELETE statements as a kind of psuedo transaction log. As time moves on with our database, we'll change from 6 hour backups to 8, then 12, and then move to finding a robust solution that backs up our database with an hourly incremental backup and then a full backup daily. Do you think that's the best approach, or do you have a better idea?
Getting Serious With PHP and Databases
Moderator: General Moderators
-
jack_indigo
- Forum Contributor
- Posts: 186
- Joined: Sun Jun 08, 2008 11:25 pm
-
jack_indigo
- Forum Contributor
- Posts: 186
- Joined: Sun Jun 08, 2008 11:25 pm
Re: Getting Serious With PHP and Databases
In the postgresql.conf file, it appears I can use PostgreSQL and turn on an hourly transaction log backup. I'm still getting used to this and I'm rough around the edges on this. For instance, I don't have a process here that cleans up an archive directory, so I imagine you would clean it up every day before copying in new hourly transaction logs.
Code: Select all
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp -i %p /var/lib/postgresql/8.3/main/archive/%f </dev/null'
archive_timeout = 1h
Re: Getting Serious With PHP and Databases
Why do you say this? Are you experiencing performance issues? Did you rule out badly indexed queries?jack_indigo wrote:The third thing I realized was that I need to start caching connections. Right now, my pages all get a fresh database connection each time. Luckily, Linux has some superb memory management where frequently accessed stuff in memory gets a pretty decent performance. And PHP5, as well, helps that. But eventually I think I'm going to have to start using APC memcache to pool connections. Do you agree? I only hope I won't have a lot of trouble with this.
I think that's overkill frankly, its pretty rare to have a database destroyed beyond repair, are these backups offsite? If not I'd be more worried about the disks themselves failing. I think storing full transaction history is also way overkill, do you have any real need to view your data's history? If so that should probably be built into the schema itselfjack_indigo wrote:and then use a SQLite3 database to store all UPDATE, INSERT, and DELETE statements as a kind of psuedo transaction log. As time moves on with our database, we'll change from 6 hour backups to 8, then 12, and then move to finding a robust solution that backs up our database with an hourly incremental backup and then a full backup daily. Do you think that's the best approach, or do you have a better idea?
Re: Getting Serious With PHP and Databases
It sounds like you're talking mainly out of inexperience. For web use, MySQL is a very performant and viable alternative to PostgresSQL. Just ask Yahoo!, Google, Wikipedia and many others. Don't cache connections - each PHP instance needs it's own MySQL connection or you might run into contentions between separate PHP instances (which run in parallel). That's part of PHP's share-nothing architecture, don't go about reinventing the wheel where it's not necessary.
Second, as josh said, if you are experiencing bad performance it's likely to be badly designed database schema and queries. MySQL is one of the fastest database servers out there, so switching brands won't help you there.
There are plenty of backup solutions, which are not unique to MySQL but are used often with it. I'd recommend reading about snapshots with LVM and about backing up MySQL in general.
Second, as josh said, if you are experiencing bad performance it's likely to be badly designed database schema and queries. MySQL is one of the fastest database servers out there, so switching brands won't help you there.
There are plenty of backup solutions, which are not unique to MySQL but are used often with it. I'd recommend reading about snapshots with LVM and about backing up MySQL in general.
Re: Getting Serious With PHP and Databases
You can use a crontscript, rsync, and symlinks to implement incremental / rolling file backups, just dump the full SQL and do the backups on the filesystem.
-
jack_indigo
- Forum Contributor
- Posts: 186
- Joined: Sun Jun 08, 2008 11:25 pm
Re: Getting Serious With PHP and Databases
I'm not an expert on a lot of stuff and still learning. I've never played with memcache, connection pooling, etc. I just hear other developers in the top PHP blogs talking about it and saying to do it when you want to improve performance. But evidently there's a downside here, I guess. I'll read more about this.It sounds like you're talking mainly out of inexperience.
The old PostgreSQL vs. MySQL argument is kind of tiresome. I've worked with both. From my perspective, they both handle things just as well. But from a statistical and academic perspective, for when any of my projects really take off and start to scale, PostgreSQL appears the winner on many fronts.
Four items that changed my mind about MySQL:
1. http://www.enterprisedb.com/learning/ar ... ostgres.do
(A little stale. You have to scroll down in the page -- page formatting issue)
2. http://blog.page2rss.com/2007/01/postgr ... mance.html
3. http://use.perl.org/~Smylers/journal/34246
4. Plus, I hate having to fight with table types. It's like MySQL needs a blended table type that gives me the advantages of both MyISAM and InnoDB tables. And there's nothing more aggravating than building a website, using MyISAM or InnoDB on the table choice for various reasons, and then out of feature creep you find you need to switch table type from MyISAM to InnoDB or InnoDB to MyISAM. And sometimes by the functional spec requirements that change after the fact, you might end up needing the advantages of both MyISAM and InnoDB on a given table type, which is impossible, so you have to create a second table with other table type and redundant data. With PostgreSQL, I have none of this problem.
P.S. I just found this one: http://sql-info.de/mysql/gotchas.html
-
jack_indigo
- Forum Contributor
- Posts: 186
- Joined: Sun Jun 08, 2008 11:25 pm
Re: Getting Serious With PHP and Databases
BTW, in this post:
viewtopic.php?f=2&t=97824&p=530443#p530021
I got PostgreSQL transaction log backups working. They call it WAL Archiving in the PostgreSQL docs. You archive to a remote server or USB thumb drive. If the drives blow and you do a tape restore from yesterday's backup, you at least can use the WAL archiving to restore the hourly data.
On our dedicated server, we're still starting out and we're cheap, so we paid for at least RAID1 mirroring. Now, the odds of that going down all at once and losing all data would be very rare, so the remote server and the WAL Archiving is not a tremendous need, but a need none the less as a just in case sort of thing. I mean, I've been in server rooms and seen the impossible. I've seen old disk controllers blow, you buy a new disk controller, and the mirrored drives won't respond, and a single drive won't respond -- all data lost. I've also seen a mirrored drive's bad drive light kick on, but it was so dim that no one noticed it, and by the time we did, the second drive blew -- and again, all data lost. Daily backups are great, but they don't restore you back to where you were. And RAID5 or RAID10 would be awesome, but expensive at RackSpace.com, and kind of overkill when just starting out a website.
I found out unfortunately that RackSpace won't let us ship them a USB thumb drive, even if it's free of viruses, very tiny, and loaded only with the EXT3 file system. Instead, they recommend we use scp to copy the WAL Archive over to another server on the Internet, or purchase another disk. The disk cost for something so small is not really worthy. And as far as scp to another server, that sucks up bandwidth and has a cost associated with it. So, copying the WAL Archive stuff to another disk is something we're going to have to wait on until we grow the site. When we grow the site, we'll use 3 servers initially -- 2 web nodes on RAID1 mirroring and one PostgreSQL server on RAID5. So, in that scenario, we can do WAL Archiving from the db server and into one of the web nodes.
viewtopic.php?f=2&t=97824&p=530443#p530021
I got PostgreSQL transaction log backups working. They call it WAL Archiving in the PostgreSQL docs. You archive to a remote server or USB thumb drive. If the drives blow and you do a tape restore from yesterday's backup, you at least can use the WAL archiving to restore the hourly data.
On our dedicated server, we're still starting out and we're cheap, so we paid for at least RAID1 mirroring. Now, the odds of that going down all at once and losing all data would be very rare, so the remote server and the WAL Archiving is not a tremendous need, but a need none the less as a just in case sort of thing. I mean, I've been in server rooms and seen the impossible. I've seen old disk controllers blow, you buy a new disk controller, and the mirrored drives won't respond, and a single drive won't respond -- all data lost. I've also seen a mirrored drive's bad drive light kick on, but it was so dim that no one noticed it, and by the time we did, the second drive blew -- and again, all data lost. Daily backups are great, but they don't restore you back to where you were. And RAID5 or RAID10 would be awesome, but expensive at RackSpace.com, and kind of overkill when just starting out a website.
I found out unfortunately that RackSpace won't let us ship them a USB thumb drive, even if it's free of viruses, very tiny, and loaded only with the EXT3 file system. Instead, they recommend we use scp to copy the WAL Archive over to another server on the Internet, or purchase another disk. The disk cost for something so small is not really worthy. And as far as scp to another server, that sucks up bandwidth and has a cost associated with it. So, copying the WAL Archive stuff to another disk is something we're going to have to wait on until we grow the site. When we grow the site, we'll use 3 servers initially -- 2 web nodes on RAID1 mirroring and one PostgreSQL server on RAID5. So, in that scenario, we can do WAL Archiving from the db server and into one of the web nodes.
Re: Getting Serious With PHP and Databases
Most datacenters offer onsite backup solutions with unlimited bandwidth ( between the backup SAN and the hosting companies equipment ), check out steadfast networks. If the data fits on a thumbdrive I don't see how bandwidth is an issue though