Page 1 of 2

optimize php VS optimize the database

Posted: Wed Feb 25, 2004 3:28 pm
by penguinboy
I'm just wondering what 'your' opinion is on
optimizing for php VS optimizing the database.

My example:

I like to store date&time fields as such:

fieldname=`yyyymmddhhii`
data='200402261450'

I like this because it's easy to sort based on date;
as time progresses the numbers will continue to incriment higher.

Also, I only have to search one field.

But when I need to acess the data through php,
I have to use sub_str() anywhere from 1-5 times
per record to seperate the field into manageable info.

@ 1000 records I could be calling sub_str() 5000 times.

There is an alternative to sub_str():

//$var{character #}
$var = 200402281459;

$year = $var{0}.$var{1}.$var{2}.$var{3} // $year = 2004;

I'm not sure about the speed difference between sub_str() & $var{#};
but I would assume $var{#} would be 'somewhat' faster.

Anyway, to get to my point:

Which would you think would be more effecient?

storing the date&time as one field
and using sub_str() or $var{#} to manage the field

or

storing the date&time as six fields:
yyyymmddhhii, yyyy, mm, dd, hh, ii


There would defenitaly be a storage benefit in
1 field vs 6 fields
but would that benifit be at the expense of script speed;
or would querying the extra 5 fields narrow the gap?

Posted: Wed Feb 25, 2004 6:08 pm
by ilovetoast
Very interesting issue. I've been wrangling with date storage mechanisms in my latest project as well.

I'm going to do some poking around and see what I can dig up. I'll post more when I know more.

peace

Posted: Wed Feb 25, 2004 6:18 pm
by pickle
I've found that when it comes to retrieving data, It's best to make the database do as much of the work as possible. There's a good chance the DB is optimized and runs quicker than anything you, as a single person, can code in PHP. If I'm not mistaken, PHP code is parsed by C based code, then run. The DB is probably run in C, so there would be a smaller amount of overhead.

Re: optimize php VS optimize the database

Posted: Wed Feb 25, 2004 7:49 pm
by timvw
penguinboy wrote:I'm just wondering what 'your' opinion is on
optimizing for php VS optimizing the database.

My example:

I like to store date&time fields as such:

fieldname=`yyyymmddhhii`
data='200402261450'

I like this because it's easy to sort based on date;
as time progresses the numbers will continue to incriment higher.

Also, I only have to search one field.

But when I need to acess the data through php,
I have to use sub_str() anywhere from 1-5 times
per record to seperate the field into manageable info.
I prefer to use timestamp. Those values are also easy to sort.
using the mysql functions MONTH, DAY, YEAR etc on that timestamp returns whatever you need from that timestamp. And with DATE_FORMAT you can format the timestamp to whatever you like ;)

Posted: Wed Feb 25, 2004 9:05 pm
by AVATAr
like pickle said.. i think the best approach is to make the DBMS do the work. Thats why there's a date type.

When you use de DB you can SQL specific functions like those of timvw.. so improving performance..

Posted: Wed Feb 25, 2004 9:35 pm
by McGruff
Definitlely not six fields for date. That's "normalisation" taken too far :)

As mentioned above, always do as much as you can with query logic.

Don't forget about column indexes. Omitting indexes would usually outweigh any other "inefficiencies" I think - especially with large tables.

In a php script, speed is very rarely an issue. Much more important to concentrate on writing clear, maintainable code.

http://phplens.com/lens/php-book/optimi ... ng-php.php

Posted: Wed Feb 25, 2004 11:36 pm
by ilovetoast
All things being said, I agree with McGruff 100%. But sometimes, it is nice to know what is the fastest way to accomplish a task, if for nothing more than to establish a personal best practice for coding. And certainly it was better than this evenings other social options I had to chose from unfortunately.

Everything I have read points to the use of a timestamp as being preferrable. But then there is the question of SQLite/PHP5. SQLite doesn't have the MySQL date/time functions, so if I write code optimized around those functions am I painting myself into a corner? One in which code will not work with an out of the box PHP5 install? How well accepted or appreciated are apps/code libs/snippets that require non-standard installs right now?

As I'm migrating to SQLite, I intend to use unix timestamps for data storage. I'll use PHP's date and time functions to manipulate the timestamp as needed. I think that the timestamp is certainly faster than multiple columns and easier to work with. I believe they will offer the best mix of speed and compatibility. Faster than multiple columns, and compatible with any db.

peace

Posted: Thu Feb 26, 2004 12:51 pm
by eletrium
Well, if you are writing for large corporate stuff, where they will have Oracle or DB2 sort of database servers and they are optimized and running on big ass boxes, it's always best to make the big ass servers do the hardest computing.

If you are running MySQL on your own box, and just dinking around, run some tests for timings if you feel the need. "What way is fastest?" really can only be answered by running tests.

However, if you are doing anything professional, fastest is not always your goal. I can write code that is extremely fast using asm, but how much fastER is it then C++ or Delphi? Now, 8 months later, when I have to come back in and modify/fix/add something to that code, is that 0.5% speed difference worth the headache?

Usually no, and usually it is not anywhere near the headache.

NOW, if you claim the extra speed is worth the massive headache, give me a number. You cannot just say it is faster so it is worth the headache. You have to say: previously, this process ran in 5 hours, rendering it useless for the business, so in making this change I was able to crunch it down to 2 hours. Or there were a lot of customers complaining that our webserver had too much latency in it's requests, so by making this change, we found that the latency was cut by 15%.

Point? Code is so complex and computers have so many variables and there is so much of it in a professional environment that you will never be able to pinpoint bottlenecks ahead of time with a high degree of accuracy. On the whole, linear searches suck. On the whole, using a searcheable datastructure such as a Splay Tree, BST, Hash Table will greatly increase your speeds. BUT, they have a higher cost up front, and higher costs to maintain. So, when is it really not an increase in speed to use a Splay Tree? Depends, so you have to run a test to figure that out.

I once had a guy show me a piece of code he spent a week optimizing. He improved the speed by a factor of 100%. Sound good? Yup. Was it good? Nope. He improved a process that took 0.01% of the overall process time by 100%. And he took a week to do it, too, and added in code that would never be able to be changed or modified without spending another week on it.

Posted: Fri Feb 27, 2004 12:44 pm
by eletrium
Another big thing....

Pushing data across a wire is one of your biggest bottlenecks. In many cases, you don't know if people will need to get data using dialup, or even if there are network issues slowing the netwrok down. It is a significantly unstable variable.

The more work you make the database server do, the less actually 1's and 0's you need to push through the wire the better. In your case, yeah, you probably get the same amount of data either way, but it is another reason why in general pushing work to the database server layer is usually the rule.

Posted: Fri Feb 27, 2004 6:33 pm
by ilovetoast
Very well made and well taken points eletrium. I'll clarify my thoughts by adding some added reasoning.

First is, I believe that excessive focus on optimization is a problem for a lot of coders I know. I believe that unless you really know what you're doing, it isn't even worth the time.

That being said, I think their are small things that coders should do/should attempt to pickup as they learn and continue in each successive project. These are the small things that while they may not add up to much in any individual project, they may come in useful at some point in a career.

As an easy example, coders should always use str_replace instead of ereg_replace or preg_replace for simple replaces. Another is that for/while loops should not use a function as the termination condition if possible (as in don't use $i < sizeof($array) as the second expr in for loop or for a while loop condition).

Little things like that aren't going to make much difference in a small or even a medium project. But that doesn't make them useless. Every time I have hired coders as a project manager, I require submission of original code samples as part of a resume. Small details like these indicate to me (and many of my past coworkers/supervisors/instructors) that a coder writes well thought out code. It may be easier to avoid reverse loops and such, but on a large project, I want the best coders possible working with me.

Furthermore, as coders here move to other languages, these elements become even more important. Java and C++ do require more attention to detail than PHP becuase of the demands of the environments they are used in.

Yes, I understand other things are often more important. Yes, I agree that an unneccesary emphasis on optimization is a bad thing. But I also believe that neither of those things is an excuse for sloppy or rudimentary code. Coding is an art, and taking a small amount of time to do the little things makes it better. These little things and an understanding of why they're done is one part of what separates the run-of-the-mill PHP coder from the professional.

Snobbish? Yeah, it is. And it's important to try and not take a snobbish tone. When I worked a a MAME-dev good code was critical. Anyone can stare at the output from the disassembler and reverse engineer an old arcade game. Only a good coder can do it well enough and for lack of a better word, elagantly enough to be included in the MAME project.

I guess the reason I find this idea db vs. php timestamp interesting is that I think that it is one of those things. If I was in an enterprise environment I would use Java + Sybase or Oracle and make the db do all the work. But let's face it, PHP isn't exactly an enterprise tool. It's a web scripting tool, a very good one, but not really suited to the enterprise.

It may be that normalizing the db to store 6 fields is faster than using unix timestamps and PHP proccessing of the same. I don't know and I don't care to test. I feel that this would, as a previous poster noted, be an example of normalization gone crazy. I feel it would create an unneccesary amount of work in reagards to db maintenance. I think that the timestamp would be much simpler to maintain and I don't feel that the time saved is worth the effort to make the comparison.

But the point of not having the db handle the time functions is an important one - not from an optimization standpoint but from a compatibility standpoint. Many of us are moving to SQLite (for good reasons) with PHP5. Time functions don't work with SQLite.

If you force the db to do the functions, then you require the user to use the db you specify. If the choice is yours alone, no big deal. If your code is released to others for use, it is a big deal. Especially considering that SQLite is going to the default db in PHP5. You'll be writing code that requires non-standard installs for any who want to use it.

So, I agree with you completely on the optimization issues. That is why I'm not comparing 6 fields vs. timestamp. But I guess I wasn't clear in that the reason I'm chosing timestamps over db time function is compatibility alone and not optimization. I want the code I write to be portable. I want the code to be useable regardless of the data storage mechanism - flat files, dbms, rdbms, or whatever.

Just some added babbling.

peace

toast

Posted: Mon Mar 01, 2004 1:39 pm
by eletrium
"When I worked a a MAME-dev good code was critical. Anyone can stare at the output from the disassembler and reverse engineer an old arcade game. Only a good coder can do it well enough and for lack of a better word, elagantly enough to be included in the MAME project. "

No need to post your resume toast. Just by reading the content of your post code-wise I can tell you're a true professional. If someone can't tell you are reading the points you made then posting your resume is a lost cause for them lol.

"Furthermore, as coders here move to other languages, these elements become even more important. "

Agree. Strongly.

"First is, I believe that excessive focus on optimization is a problem for a lot of coders I know."

Agree. Very strongly.

"I believe that unless you really know what you're doing, it isn't even worth the time. "

Agree, but I would add in another "really" in there. Unless you can rip something apart with a tool like VTune and get timings on every segment of the proces from the CPU on up it can be fairly useless. My last job the common thought was that our code needed to be written much faster. We optimized and optimized only to find out that 90% of the time issues were in a database driver. We got a 5% gain on our code, only to find out that our code accounted for only a small % of the overall process time. So strongly agree with you Toast.

"That being said, I think their are small things that coders should do/should attempt to pickup as they learn and continue in each successive project. "

Agree strongly. To anyone begging their journey as a programmer, note the phrase "continue in each successive project". Your first program will suck, don't try to make it perfect. Your second will suck less. Don't try to make it the best ever. Just keep doing projects and as you program more and more, you learn what is important to do and what is not important to do. (Can you say: maintainable code?)

"Snobbish? Yeah, it is. "

Disagree strongly. Very very strongly. Very very very very strongly. I've had to be pulled in on too many projects to count to fix some other gimp programmers mistakes. I'd bet 1,000 bucks you and McGruff have too, Toast. Advocating programmers not write code that is a pain in the ass to read and figure out instead of taking the extra 5 minutes to make it readable is not snobbish. It's selfish. And more power to us for feeling that way lol. I hate having to fix someone else's unreadable code.

"PHP isn't exactly an enterprise tool"

Loosely disagree. Loosely. PHP isn't for large insurance companies' corporate wide dataware, but I kind of wonder how many good corporate applications can be done with PHP and MySQL backend for free licensing and low developments costs and still deliver solid ROI. It doesn't have to do everthing the "customer" wants, it just has to make the customer more money then they were making before or allow them to do their job for reduced costs. I'm curious how the 80/20 rule would apply.... get 80% of what you need for 20% of the cost.

Overall great post Toast. :)

Posted: Mon Mar 01, 2004 4:24 pm
by Straterra
Speaking of SQLite..I use it..and I love it! Muuuuch faster than the MySQL database on my server..I know this post has nothing to do with this topic, but SQLite is much more practical for small/medium applications.

Posted: Sat Mar 06, 2004 11:25 am
by lastcraft
Hi...
eletrium wrote:Loosely disagree.
I completely agree with your posts eletrium. One interesting development is the increasing adoption of PHP by Oracle. I think this is a good fit. Oracle has much more capablity to incorporate business models within the database and so it makes sense to have a thinner layer interacting with it. Building Java object models and then using Oracle as a simple persistence store for JDO has never made a lot of sense to me.

On optimisation...I have never once had to code smith a piece of PHP code to optimise. I have had to optimise database queries, file access (too many handles in Unix, too many files in a folder in Windows), network calls (vital), DBM indexes, full text searches and just about every other type of IO imaginable. Never the code.

By way of example. In SimpleTest tere is a method for testing web page contents called WebTestCase::assertWantedPattern(). Some people have suggested I add a substr() version for speed. Now this tool is performance sensitive; we typically run tests in their hundreds and thousands and we want them to complete in seconds for devloper feedback. I measured it (using ab) and simply could not detect the change. The pattern match was usually being used to scan web pages (swamped by latency even on localhost) or short strings (just didn't matter). Result? I can keep the interface leaner, saving me time all over the shop.

yours, Marcus

Posted: Sun Mar 07, 2004 12:31 am
by eletrium
" I measured it (using ab) and simply could not detect the change."

There is a reason for that. THe simgle most common mistake programmers make is they forget that the code they write is not the code that the computer runs.

Intel makes a phenominal add-on compiler for VC++ that does a lot of fancy stuff during the compile. I forget which compiler does what specific optimizations, but this is why optimizing code is so extraordinarily complex. You write code, the compiler turns it into machine code. The MACHINE code is what needs to be optimal. Unless you see what is going on in the machine code itself you won't know if your proposed fix even compiles the code differently.

Take PHP for example now. Unless you break the law and step through the machine code of the PHP Engine, you won't know about the specific optimizations that are actually being run.

Again, the code you write are just instructions for the compiler or interpreter. Nothing more.

PS lastcraft, I know you know this, but just wanted to explain the "why" to people who have not gone through this.

PPS Yes I have spent hours optimizing code only to find out it compiled to the same exact thing in the end, so no I did not noodle this one out by being intelligent. I noodled it out after many hours of saying to myself "wtf?".

Posted: Thu May 13, 2004 6:08 am
by dave420
Kaj Arnö, (then) VP of the mysql training team, was at my company giving us some lessons on optimizing our mysql databases, and he told us that it's not always faster to get the database to do something. In fact, as a database is optimized around the shifting of data it might be faster to have the database spit out a large amount of unformatted data, and then have your PHP script do the actual formatting. PHP is designed for that, and can take it in its stride, whereas a mysql database might encounter some harsh performance issues.

Basically, there are no rules of thumb for a position like that. You have to find out what works for your current situation. There is no all-encompassing database philosophy that has answers to every problem. Most times, rolling your sleeves up and delving under the hood(/bonnet) will give you the answer that works for you.