Reconfiguring your table data

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
MrPotatoes
Forum Regular
Posts: 617
Joined: Wed May 24, 2006 6:42 am

Reconfiguring your table data

Post by MrPotatoes »

in C++ there are little things that you can do to speed up your system. alot don't work anymore due to compilers getting awesomer and awesomer.

for instance within your header file you would define your class's members from smallest to largest and have a buffer if needed. this can drastically speed up the system. that is just one thing.

well, what about in databases.

i've made a forum and there are categories and subcategories and perhaps they look like this:

Code: Select all

ROOT
	|
	+--	PARENT_CATEGORY_0 (1)
	|		+--- SUB_CATEGORY_0 (1)
	|		+--- SUB_CATEGORY_1 (1)
	|		+--- SUB_CATEGORY_2 (1)
	+--	PARENT_CATEGORY_1 (2)
	|		+--- SUB_CATEGORY_0 (2)
	|		+--- SUB_CATEGORY_1 (2)
	|		+--- SUB_CATEGORY_2 (2)
	+--	PARENT_CATEGORY_2 (3)
			+--- SUB_CATEGORY_0 (3)
			+--- SUB_CATEGORY_1 (3)
			+--- SUB_CATEGORY_2 (3)

we all understand that you can create, modify, remove categories at will cuz otherwise that would be lame to have it hard coded.

therefore the parent category's ID and parent_id would change (this is how i do categories btw) and i put them into the same table as the sub categories. some people don't agree but that's ok. that's what's awesome about programming no?

well, say it's been 5 years using this software and you've got parent category id's that range from 10-300,000. you've added and changed and deleted and moved and switched orders and all that pizzazy crap. well, that means that these things aren't in order.

thusly it would look like this:

Code: Select all

ROOT
	|
	+--	PARENT_CATEGORY_0 (890)
	|		+--- SUB_CATEGORY_0 (890)
	|		+--- SUB_CATEGORY_1 (890)
	|		+--- SUB_CATEGORY_2 (890)
	+--	PARENT_CATEGORY_1 (78)
	|		+--- SUB_CATEGORY_0 (78)
	|		+--- SUB_CATEGORY_1 (78)
	|		+--- SUB_CATEGORY_2 (78)
	+--	PARENT_CATEGORY_2 (300000)
			+--- SUB_CATEGORY_0 (300000)
			+--- SUB_CATEGORY_1 (300000)
			+--- SUB_CATEGORY_2 (300000)
what if perhaps you wrote a script that reorganized your table so that all the ID's (and post's relating to those ID's were changed as well) so that everything is back to order? would that speed things up in a db? i'm guessing it would because it wouldn't have to count as much but still. every little improvement counts IMO

remember, think of this in terms of the categories being tens deep (Children) and and possibly thousands tall (Parents)
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

You may want to look into innob tables, which can do things like that automatically. As far as MyISAM is concerned, you won't notice any speed difference.

Another thing to note, is that you can change the order that records are stored.

Code: Select all

ALTER TABLE `Employees` ORDER BY `FirstName` 
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Reconfiguring your table data

Post by alex.barylski »

MrPotatoes wrote:in C++ there are little things that you can do to speed up your system. alot don't work anymore due to compilers getting awesomer and awesomer.

for instance within your header file you would define your class's members from smallest to largest and have a buffer if needed. this can drastically speed up the system. that is just one thing.
Mr. Potatoes you and your C++ on a PHP forum, your as bad as me *high fives*

As for your statement: "They don't work anymore"

I think it would be wise to say they don't apply anymore. As the compiler does use those tricks, but ignores when you tell it to do so. Using shift for faster DIV by 2, etc...

Most optmization quirks like that, are best left for the compiler anyways. Any architecture specific optimizations only bind you tighter to a given system. Plus compilers are much better at trivial optmizations as they are set rules which apply to finite situations, meaning...their difficult for a human to remember and apply, but trivial for a computer.

Best optimizations often come from implementation or algorithm choices; more likely the former.

One example I use in an article I wrote a few years back is that of a text editor buffer (originally done in C++ ported to PHP for demo).

Code: Select all

class CBuffer{
  function insert($char, $offset=0)
  {
    $this->m_buffer[$offset] = $char;
  }

  function delete($offset)
  {
    $this->m_buffer[$offset] = NULL; // Assume this deletes
  }

  function update($char, $offset=0)
  {
    $this->m_buffer[$offset] = $char;
  }
}
Using such a trivial example (especially in PHP with builtin GC) it's hard to fully explore the possibilities of optimizations, but one which should jump out is the refactoring of update() to instead just call insert() which would likely yeild slower execution but should produce smaller footprint (both in RAM and HDD).

Another important optimization tip, especially in PHP is writing tight loops or possibly loop unrolling. Having anything that iterates more than once makes sense to keep that code as small as possible.

Storing a cache of a count() result is faster than calling count() inside your FOR loop because PHP doesn't cache it (even in PHP 5 as far as I know) so, even though count() apparently returns a cached value (Vodka and I got into this disscussion and he extracted the actual PHP source and it appears it is cached) count is still executed and the caching logic inside count() is still executed.

Most optimized compilers would (I believe) know to cache the value of count() or any other function called inside the expression of a FOR loop and just reference that variable instead of calling the function. So it is indeed faster to cache variables like that instead of using (possibly cached functions) function calls.

That is a quirk of PHP which is worth taking advantage of.

The second, which I have found to be true from testing under PHP 4...Is setting an array element to FALSE and calling array_filter() on the entire array for cleanup...this was MUCH faster (on my system) compared to iteratively calling unset() on each element.

Single quotes compared to double quotes...the difference in performance is negligble for small sources, but whenever you have more than one script I would suggest capitalizing on that performance boost.

Of course there are likely other PHP quirks, but the most important thing is implementation. Focus on that and let other layer handle the optimization quirks. The same ideology would likely apply to any language, SQL or not.

Cheers :)
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

If you analyse a PHP Application, the vast majority of performance issues will be caused by a limited range of design decisions. Often it's far better to identify what to optimise, before doing anything. That's not as hard as it seems either - there a few tools for PHP, including XDebug 2 (see PECL extensions) which are very good at finding places where execution seems to be hitting a performance wall.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Reconfiguring your table data

Post by Christopher »

MrPotatoes wrote:well, say it's been 5 years using this software and you've got parent category id's that range from 10-300,000. you've added and changed and deleted and moved and switched orders and all that pizzazy crap. well, that means that these things aren't in order.

what if perhaps you wrote a script that reorganized your table so that all the ID's (and post's relating to those ID's were changed as well) so that everything is back to order? would that speed things up in a db? i'm guessing it would because it wouldn't have to count as much but still. every little improvement counts IMO
I am assuming that you have indexed the ID column (and any others used in WHERE,JOIN,GROUP BY,ORDER BY,etc.). If you have indexed, then the database will keep the index "in order" for you by definition. I don't think the actual record positions will have any predictable effect on performance unless you mainly do sequential reads (and I am not even sure then). The whole point of indexes is that they optimize access regardless of the actual data layout. Verifying that your table is properly indexed will have the most impact on performance.
(#10850)
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Maugrim_The_Reaper wrote:If you analyse a PHP Application, the vast majority of performance issues will be caused by a limited range of design decisions. Often it's far better to identify what to optimise, before doing anything. That's not as hard as it seems either - there a few tools for PHP, including XDebug 2 (see PECL extensions) which are very good at finding places where execution seems to be hitting a performance wall.
Hmmm...subjective I would say, as many developers argue it's best to get your implementation right first, then later optimize, rather than find optimizations first then implement with that knowledge.

I'm of the latter group myself, mainly because my code is *very* dynamic in the sense I never get the design right first time out...in fact it's a perpetual thing I never get the design right. I constantly refactor to accomodate better design, so optimizing first thing out the gates for me personally, would be a bad idea, I'm sure you could see why. :)

Cheers :)
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

Hockey wrote:Hmmm...subjective I would say, as many developers argue it's best to get your implementation right first, then later optimize, rather than find optimizations first then implement with that knowledge.
I agree. Optimizations can hinder getting the design right which is much more important as Hockey says -- being able to be "dynamic" with the design. And worse is that sub-system optimizations can change from release to release which can cause your own optimizations to become ineffective or even a negative.
(#10850)
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

Maugrim_The_Reaper wrote:Often it's far better to identify what to optimise, before doing anything.
Sorry, folks - I should have been clearer :). before optimising arbitrarily

I do not promote premature optimisation - Honest! I even have proof! =>Maugrim: We like to avoid premature optimisation around here... :wink:
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

[quote="Maugrim_The_Reaper"]Sorry, folks - I should have been clearer :). before optimising arbitrarily/quote]
Didn't doubt you for a minute. ;)

I think it is an important point -- don't optimize until you understand what needs to optimized. A few things you might know up front or are best practices. The rest you won't know until you build the thing.
(#10850)
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Maugrim_The_Reaper wrote:
Maugrim_The_Reaper wrote:Often it's far better to identify what to optimise, before doing anything.
Sorry, folks - I should have been clearer :). before optimising arbitrarily

I do not promote premature optimisation - Honest! I even have proof! =>Maugrim: We like to avoid premature optimisation around here... :wink:
Oh premature optimization...I thought you said...premature ahhhh... :P

Hmmmm *Hockey has left the building*
Post Reply