Page 1 of 1

Reconfiguring your table data

Posted: Thu Sep 28, 2006 12:02 pm
by MrPotatoes
in C++ there are little things that you can do to speed up your system. alot don't work anymore due to compilers getting awesomer and awesomer.

for instance within your header file you would define your class's members from smallest to largest and have a buffer if needed. this can drastically speed up the system. that is just one thing.

well, what about in databases.

i've made a forum and there are categories and subcategories and perhaps they look like this:

Code: Select all

ROOT
	|
	+--	PARENT_CATEGORY_0 (1)
	|		+--- SUB_CATEGORY_0 (1)
	|		+--- SUB_CATEGORY_1 (1)
	|		+--- SUB_CATEGORY_2 (1)
	+--	PARENT_CATEGORY_1 (2)
	|		+--- SUB_CATEGORY_0 (2)
	|		+--- SUB_CATEGORY_1 (2)
	|		+--- SUB_CATEGORY_2 (2)
	+--	PARENT_CATEGORY_2 (3)
			+--- SUB_CATEGORY_0 (3)
			+--- SUB_CATEGORY_1 (3)
			+--- SUB_CATEGORY_2 (3)

we all understand that you can create, modify, remove categories at will cuz otherwise that would be lame to have it hard coded.

therefore the parent category's ID and parent_id would change (this is how i do categories btw) and i put them into the same table as the sub categories. some people don't agree but that's ok. that's what's awesome about programming no?

well, say it's been 5 years using this software and you've got parent category id's that range from 10-300,000. you've added and changed and deleted and moved and switched orders and all that pizzazy crap. well, that means that these things aren't in order.

thusly it would look like this:

Code: Select all

ROOT
	|
	+--	PARENT_CATEGORY_0 (890)
	|		+--- SUB_CATEGORY_0 (890)
	|		+--- SUB_CATEGORY_1 (890)
	|		+--- SUB_CATEGORY_2 (890)
	+--	PARENT_CATEGORY_1 (78)
	|		+--- SUB_CATEGORY_0 (78)
	|		+--- SUB_CATEGORY_1 (78)
	|		+--- SUB_CATEGORY_2 (78)
	+--	PARENT_CATEGORY_2 (300000)
			+--- SUB_CATEGORY_0 (300000)
			+--- SUB_CATEGORY_1 (300000)
			+--- SUB_CATEGORY_2 (300000)
what if perhaps you wrote a script that reorganized your table so that all the ID's (and post's relating to those ID's were changed as well) so that everything is back to order? would that speed things up in a db? i'm guessing it would because it wouldn't have to count as much but still. every little improvement counts IMO

remember, think of this in terms of the categories being tens deep (Children) and and possibly thousands tall (Parents)

Posted: Thu Sep 28, 2006 12:07 pm
by Benjamin
You may want to look into innob tables, which can do things like that automatically. As far as MyISAM is concerned, you won't notice any speed difference.

Another thing to note, is that you can change the order that records are stored.

Code: Select all

ALTER TABLE `Employees` ORDER BY `FirstName` 

Re: Reconfiguring your table data

Posted: Thu Sep 28, 2006 8:46 pm
by alex.barylski
MrPotatoes wrote:in C++ there are little things that you can do to speed up your system. alot don't work anymore due to compilers getting awesomer and awesomer.

for instance within your header file you would define your class's members from smallest to largest and have a buffer if needed. this can drastically speed up the system. that is just one thing.
Mr. Potatoes you and your C++ on a PHP forum, your as bad as me *high fives*

As for your statement: "They don't work anymore"

I think it would be wise to say they don't apply anymore. As the compiler does use those tricks, but ignores when you tell it to do so. Using shift for faster DIV by 2, etc...

Most optmization quirks like that, are best left for the compiler anyways. Any architecture specific optimizations only bind you tighter to a given system. Plus compilers are much better at trivial optmizations as they are set rules which apply to finite situations, meaning...their difficult for a human to remember and apply, but trivial for a computer.

Best optimizations often come from implementation or algorithm choices; more likely the former.

One example I use in an article I wrote a few years back is that of a text editor buffer (originally done in C++ ported to PHP for demo).

Code: Select all

class CBuffer{
  function insert($char, $offset=0)
  {
    $this->m_buffer[$offset] = $char;
  }

  function delete($offset)
  {
    $this->m_buffer[$offset] = NULL; // Assume this deletes
  }

  function update($char, $offset=0)
  {
    $this->m_buffer[$offset] = $char;
  }
}
Using such a trivial example (especially in PHP with builtin GC) it's hard to fully explore the possibilities of optimizations, but one which should jump out is the refactoring of update() to instead just call insert() which would likely yeild slower execution but should produce smaller footprint (both in RAM and HDD).

Another important optimization tip, especially in PHP is writing tight loops or possibly loop unrolling. Having anything that iterates more than once makes sense to keep that code as small as possible.

Storing a cache of a count() result is faster than calling count() inside your FOR loop because PHP doesn't cache it (even in PHP 5 as far as I know) so, even though count() apparently returns a cached value (Vodka and I got into this disscussion and he extracted the actual PHP source and it appears it is cached) count is still executed and the caching logic inside count() is still executed.

Most optimized compilers would (I believe) know to cache the value of count() or any other function called inside the expression of a FOR loop and just reference that variable instead of calling the function. So it is indeed faster to cache variables like that instead of using (possibly cached functions) function calls.

That is a quirk of PHP which is worth taking advantage of.

The second, which I have found to be true from testing under PHP 4...Is setting an array element to FALSE and calling array_filter() on the entire array for cleanup...this was MUCH faster (on my system) compared to iteratively calling unset() on each element.

Single quotes compared to double quotes...the difference in performance is negligble for small sources, but whenever you have more than one script I would suggest capitalizing on that performance boost.

Of course there are likely other PHP quirks, but the most important thing is implementation. Focus on that and let other layer handle the optimization quirks. The same ideology would likely apply to any language, SQL or not.

Cheers :)

Posted: Fri Sep 29, 2006 3:11 am
by Maugrim_The_Reaper
If you analyse a PHP Application, the vast majority of performance issues will be caused by a limited range of design decisions. Often it's far better to identify what to optimise, before doing anything. That's not as hard as it seems either - there a few tools for PHP, including XDebug 2 (see PECL extensions) which are very good at finding places where execution seems to be hitting a performance wall.

Re: Reconfiguring your table data

Posted: Fri Sep 29, 2006 11:53 am
by Christopher
MrPotatoes wrote:well, say it's been 5 years using this software and you've got parent category id's that range from 10-300,000. you've added and changed and deleted and moved and switched orders and all that pizzazy crap. well, that means that these things aren't in order.

what if perhaps you wrote a script that reorganized your table so that all the ID's (and post's relating to those ID's were changed as well) so that everything is back to order? would that speed things up in a db? i'm guessing it would because it wouldn't have to count as much but still. every little improvement counts IMO
I am assuming that you have indexed the ID column (and any others used in WHERE,JOIN,GROUP BY,ORDER BY,etc.). If you have indexed, then the database will keep the index "in order" for you by definition. I don't think the actual record positions will have any predictable effect on performance unless you mainly do sequential reads (and I am not even sure then). The whole point of indexes is that they optimize access regardless of the actual data layout. Verifying that your table is properly indexed will have the most impact on performance.

Posted: Fri Sep 29, 2006 12:22 pm
by alex.barylski
Maugrim_The_Reaper wrote:If you analyse a PHP Application, the vast majority of performance issues will be caused by a limited range of design decisions. Often it's far better to identify what to optimise, before doing anything. That's not as hard as it seems either - there a few tools for PHP, including XDebug 2 (see PECL extensions) which are very good at finding places where execution seems to be hitting a performance wall.
Hmmm...subjective I would say, as many developers argue it's best to get your implementation right first, then later optimize, rather than find optimizations first then implement with that knowledge.

I'm of the latter group myself, mainly because my code is *very* dynamic in the sense I never get the design right first time out...in fact it's a perpetual thing I never get the design right. I constantly refactor to accomodate better design, so optimizing first thing out the gates for me personally, would be a bad idea, I'm sure you could see why. :)

Cheers :)

Posted: Fri Sep 29, 2006 12:55 pm
by Christopher
Hockey wrote:Hmmm...subjective I would say, as many developers argue it's best to get your implementation right first, then later optimize, rather than find optimizations first then implement with that knowledge.
I agree. Optimizations can hinder getting the design right which is much more important as Hockey says -- being able to be "dynamic" with the design. And worse is that sub-system optimizations can change from release to release which can cause your own optimizations to become ineffective or even a negative.

Posted: Fri Sep 29, 2006 2:57 pm
by Maugrim_The_Reaper
Maugrim_The_Reaper wrote:Often it's far better to identify what to optimise, before doing anything.
Sorry, folks - I should have been clearer :). before optimising arbitrarily

I do not promote premature optimisation - Honest! I even have proof! =>Maugrim: We like to avoid premature optimisation around here... :wink:

Posted: Fri Sep 29, 2006 3:03 pm
by Christopher
[quote="Maugrim_The_Reaper"]Sorry, folks - I should have been clearer :). before optimising arbitrarily/quote]
Didn't doubt you for a minute. ;)

I think it is an important point -- don't optimize until you understand what needs to optimized. A few things you might know up front or are best practices. The rest you won't know until you build the thing.

Posted: Fri Sep 29, 2006 4:48 pm
by alex.barylski
Maugrim_The_Reaper wrote:
Maugrim_The_Reaper wrote:Often it's far better to identify what to optimise, before doing anything.
Sorry, folks - I should have been clearer :). before optimising arbitrarily

I do not promote premature optimisation - Honest! I even have proof! =>Maugrim: We like to avoid premature optimisation around here... :wink:
Oh premature optimization...I thought you said...premature ahhhh... :P

Hmmmm *Hockey has left the building*