Autoload for libraries

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Autoload for libraries

Post by Ambush Commander »

I'm planning on converting HTML Purifier, an open-source library I've been developing, to use __autoload. I'll use spl_autoload_register and I have all my classes namespaced so there's little risk of conflict, also, my library is only PHP5, so compatibility won't be a problem either.

However, I'm worried about whether or not this will cause problems for my users in any weird edge-cases. Does anybody have experience/wisdom to offer?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Autoload for libraries

Post by Chris Corbyn »

I've seriously gone off autoloaders since I like to be able to open a file and see the require() lines at the top which indicate what the deps are. Perhaps when combined with namespaces (where you'd replace require() with 'use') it'd be a reasonable trade off but then all you've really done is shifted the logic for the include somewhere else.

Is including a 0 KB file a significantly lower performance hit than including a 500 KB file? If the slow down is in loading and parsing the file contents then perhaps you could go the opposite direction and gain a performance boost by refactoring into smaller files so you're more likely to include include specific units of code you need, rather than those units of code and a bunch of stuff you won't use. I'm not entirely sure of the specifics of the performance hit involved with multiple includes (i.e. parsing source multiple times, or the actual process of loading the file itself).
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: Autoload for libraries

Post by Ambush Commander »

I've seriously gone off autoloaders since I like to be able to open a file and see the require() lines at the top which indicate what the deps are. Perhaps when combined with namespaces (where you'd replace require() with 'use') it'd be a reasonable trade off but then all you've really done is shifted the logic for the include somewhere else.
That is a good point. However, I've noticed that unless your components are very loosely coupled (they may be), it makes little sense to use one of the internal classes outside of its context in the overall program. I've found that, in such cases, it's easy to miss a subtle dependency when you're in a coding frenzy. I've only noticed when things went wrong when I specifically test the file with SimpleTest (in which case only that file is included). I suppose with a little more discipline, it could be feasible, but when you've got Doxygen that can automatically cross-reference things for you, well, meh.

Oh, to be clear, I'm using name mangling namespaces, not the PHP 5.3 type. :-)
Is including a 0 KB file a significantly lower performance hit than including a 500 KB file?
I think so. Parsing PHP and then turning it into opcodes is quite an involved process. At the very least, the memory usage is fairly major. For example, HTML Purifier uses up more memory getting its class definitions setup than it does actually processing HTML. That says something. :-)
If the slow down is in loading and parsing the file contents then perhaps you could go the opposite direction and gain a performance boost by refactoring into smaller files so you're more likely to include include specific units of code you need, rather than those units of code and a bunch of stuff you won't use.
As I said, I usually need everything. There are only a few classes here and there (which are really what I'm targeting autoload towards) which aren't always necessary. Which, come to think of it, doesn't make too much sense, unless you have a centralized list of require's used in conjunction with autoload.
I'm not entirely sure of the specifics of the performance hit involved with multiple includes (i.e. parsing source multiple times, or the actual process of loading the file itself).
From worst to best:
1. Parsing the file
2. Opcode cache with inefficient ops (this usually means there was a conditional include)
3. Opcode cache with efficient ops (everything is explicit, so the opcode cache can save a more finished representation of the classes)

require_once and include_once tend to wreck havoc when you're trying to attain 3; also, their conditional nature means that PHP has to stat the file and check its own list of already included files. When I have 164 files, that's not negligible.

I didn't mean this to be in-defense of autoload. I'm just trying to think (besides performance-wise) of any ramifications this might have in terms of usability.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Autoload for libraries

Post by Christopher »

I think the pendulum swung back toward include_once() in the last round of optimizations to PHP5 that happened during (and in response to) ZF development.

But perhaps there is no definitive answer because there is not definitive question. Obviously most of us are willing, for example, to trade-off a little performance for a big gain in clarity and usability. However, you are talking about making (apparently) subjective improvements in clarity and usability -- and then wondering about what possible side-effects and performance improvements might be. I don't know if there is an answer...
(#10850)
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: Autoload for libraries

Post by Ambush Commander »

Hmm... I should rephrase the question.

Consider a user who is not interested in the inner workings of a library, just that it "works". He uses include_once 'libs/library.php'; to load the library code, or perhaps new library(), because he's got his own autoload and the library is now in his path.

Let us suppose that the library is using autoload too.

Is there anything that could go wrong, that wouldn't go wrong in the case of plain old include_once/require_once?

That's all I need to know. I'm pretty sure that autoload is slower than include_once, so performance really isn't at issue (I'll have other measures in place to compensate).
wei
Forum Contributor
Posts: 140
Joined: Wed Jul 12, 2006 12:18 am

Re: Autoload for libraries

Post by wei »

Another way is to use autoload in addition to combining all the small classes into one large file (or a few logical files). Having a couple hundred class definitions doesn't seem to effect the overall performance much when byte-code cache is enabled.
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Re: Autoload for libraries

Post by Maugrim_The_Reaper »

PHPSpec and PHPMock were converted to use autoload some time ago. There were a few simple reasons why. Firstly, this entire argument was hashed out in excruciating detail on the PEAR list some months ago, and the final decision was to completely avoid any use of *_once() since it created some curious results compared to alternatives. That is, it ranged from a 2% to 10% performance hit depending on the system tested. This was deemed intolerable from the perspective of high-traffic sites hosted on multi-processor platforms.

The solution was autoloading. Using autoloading the PEAR2 autoload method becomes essentially optional. It's a good default, but it can be replaced by a custom solution (e.g. one using absolute paths rather than searching the entire sequence of include_path parts for a relative match, or including all classes in one file).

Finally, the above argument is borne out by the Zend Framework. Any xdebug cachegrind output quickly puts *_once() near the top of the list as taking a significant part of the overall execution time for any request (assuming no database interaction).

So my opinion, is to switch to autoloading. Presumably classes will continue to use the PEAR convention (maybe later the PEAR2 convention for namespaces) so anyone reading the source can easily translate dependent classnames to their file location. If dependency tracking is an issue it can be documented in the phpdoc comments using the @uses keyword.

As for two autoload functions - just use the SPL functions to manage adding yours. You can probably leverage off the PEAR2 version for any potential issues.
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Re: Autoload for libraries

Post by Jenk »

Chris Corbyn wrote:I've seriously gone off autoloaders since I like to be able to open a file and see the require() lines at the top which indicate what the deps are. Perhaps when combined with namespaces (where you'd replace require() with 'use') it'd be a reasonable trade off but then all you've really done is shifted the logic for the include somewhere else.

Is including a 0 KB file a significantly lower performance hit than including a 500 KB file? If the slow down is in loading and parsing the file contents then perhaps you could go the opposite direction and gain a performance boost by refactoring into smaller files so you're more likely to include include specific units of code you need, rather than those units of code and a bunch of stuff you won't use. I'm not entirely sure of the specifics of the performance hit involved with multiple includes (i.e. parsing source multiple times, or the actual process of loading the file itself).
I'm halfway between the two. If I have a class that is completely dependent upon another class (hypothetical example, an Array object returns an Iterator object,) then I'll use include/require. Otherwise, autoload.
User avatar
Oren
DevNet Resident
Posts: 1640
Joined: Fri Apr 07, 2006 5:13 am
Location: Israel

Re: Autoload for libraries

Post by Oren »

Maugrim_The_Reaper, I don't get one thing... how can autoloading be better when we are talking about performance? You do use the *_once() in the __autoload() function anyway.
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Re: Autoload for libraries

Post by Jenk »

There's no point using *_once inside an autoloader, because the __autoload() function is only called when the class requested is not within scope. That breaks down to:

Code: Select all

if (!class_exists($someClass)) {
  if (!class_exists($someClass)) { 
    include $comeClass. '.php';
  }
}
User avatar
Oren
DevNet Resident
Posts: 1640
Joined: Fri Apr 07, 2006 5:13 am
Location: Israel

Re: Autoload for libraries

Post by Oren »

I actually meant "require/include" sorry.
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Re: Autoload for libraries

Post by Maugrim_The_Reaper »

The autoloading performance boost is largely coincidental. It becomes relevant when a library has a lot of require_once redundancy. Say you have three drivers in use, and each needs the same parent class. So because each driver is independent, each would have a require_once at the top.

Now using all three drivers, requires one parent class, but that one class gets three require_once calls for it (one per driver file). Using autoloading, you only get one require/include on the first parent class reference. That's two less *_once calls overall which effects performance at some level.

You can go further by overloading the autoload function (e.g. static class method) to use absolute paths (which eliminates include_path searches for matching relative paths which is another set of file ops eliminated). The performance boost (if any at all) is restricted by the conditional include within the autoload function - i.e. APC will cache the file, but not the assembled classes/functions. That's the main autoload drawback, but the performance it reduces is rarely as significant as the gain from caching the file itself (which is the main benefit since it avoids future file ops).

That's largely why autoload is seen in PEAR2 as the better solution. It's often slightly faster without APC, or with APC (assuming apc.stat not set to 0). It can be overridden with a custom autoloader which makes loading subject to personal tweaking for performance. And it doesn't restrict non-autoload solution, e.g. assembling all classes in one file.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: Autoload for libraries

Post by Ambush Commander »

/me realizes that he's inadvertently created two duplicate threads.

I'll look at PEAR2 for any implementation details related to autoload. In the meantime, you guys can duke it out at this thread about the performance implications of autoload.
User avatar
Oren
DevNet Resident
Posts: 1640
Joined: Fri Apr 07, 2006 5:13 am
Location: Israel

Re: Autoload for libraries

Post by Oren »

Maugrim_The_Reaper wrote:Now using all three drivers, requires one parent class, but that one class gets three require_once calls for it (one per driver file). Using autoloading, you only get one require/include on the first parent class reference. That's two less *_once calls overall which effects performance at some level.
That's right if you simply do *_once() in each one of your drivers, but you can easily eliminate this by using something like Swift's classLoader::load().
Selkirk
Forum Commoner
Posts: 41
Joined: Sat Aug 23, 2003 10:55 am
Location: Michigan

Re: Autoload for libraries

Post by Selkirk »

I like where PEAR2 is going with autoload. I like that it gives a greater degree of freedom to organize your code without worrying about the performance of file loading operations. You can do what you want, then either create a bootstrap file or a rollup file to feed to an opcode cache.

I've started on a project that used autoloading and I think I'm a convert. One problem, though, is that I've observed that a one class per file policy seems to inhibit people from using small classes. I think this policy shapes OO designs in a way I don't like.
Post Reply