Page 1 of 1

Autoload for libraries

Posted: Mon Jan 14, 2008 7:44 pm
by Ambush Commander
I'm planning on converting HTML Purifier, an open-source library I've been developing, to use __autoload. I'll use spl_autoload_register and I have all my classes namespaced so there's little risk of conflict, also, my library is only PHP5, so compatibility won't be a problem either.

However, I'm worried about whether or not this will cause problems for my users in any weird edge-cases. Does anybody have experience/wisdom to offer?

Re: Autoload for libraries

Posted: Mon Jan 14, 2008 8:38 pm
by Chris Corbyn
I've seriously gone off autoloaders since I like to be able to open a file and see the require() lines at the top which indicate what the deps are. Perhaps when combined with namespaces (where you'd replace require() with 'use') it'd be a reasonable trade off but then all you've really done is shifted the logic for the include somewhere else.

Is including a 0 KB file a significantly lower performance hit than including a 500 KB file? If the slow down is in loading and parsing the file contents then perhaps you could go the opposite direction and gain a performance boost by refactoring into smaller files so you're more likely to include include specific units of code you need, rather than those units of code and a bunch of stuff you won't use. I'm not entirely sure of the specifics of the performance hit involved with multiple includes (i.e. parsing source multiple times, or the actual process of loading the file itself).

Re: Autoload for libraries

Posted: Mon Jan 14, 2008 8:48 pm
by Ambush Commander
I've seriously gone off autoloaders since I like to be able to open a file and see the require() lines at the top which indicate what the deps are. Perhaps when combined with namespaces (where you'd replace require() with 'use') it'd be a reasonable trade off but then all you've really done is shifted the logic for the include somewhere else.
That is a good point. However, I've noticed that unless your components are very loosely coupled (they may be), it makes little sense to use one of the internal classes outside of its context in the overall program. I've found that, in such cases, it's easy to miss a subtle dependency when you're in a coding frenzy. I've only noticed when things went wrong when I specifically test the file with SimpleTest (in which case only that file is included). I suppose with a little more discipline, it could be feasible, but when you've got Doxygen that can automatically cross-reference things for you, well, meh.

Oh, to be clear, I'm using name mangling namespaces, not the PHP 5.3 type. :-)
Is including a 0 KB file a significantly lower performance hit than including a 500 KB file?
I think so. Parsing PHP and then turning it into opcodes is quite an involved process. At the very least, the memory usage is fairly major. For example, HTML Purifier uses up more memory getting its class definitions setup than it does actually processing HTML. That says something. :-)
If the slow down is in loading and parsing the file contents then perhaps you could go the opposite direction and gain a performance boost by refactoring into smaller files so you're more likely to include include specific units of code you need, rather than those units of code and a bunch of stuff you won't use.
As I said, I usually need everything. There are only a few classes here and there (which are really what I'm targeting autoload towards) which aren't always necessary. Which, come to think of it, doesn't make too much sense, unless you have a centralized list of require's used in conjunction with autoload.
I'm not entirely sure of the specifics of the performance hit involved with multiple includes (i.e. parsing source multiple times, or the actual process of loading the file itself).
From worst to best:
1. Parsing the file
2. Opcode cache with inefficient ops (this usually means there was a conditional include)
3. Opcode cache with efficient ops (everything is explicit, so the opcode cache can save a more finished representation of the classes)

require_once and include_once tend to wreck havoc when you're trying to attain 3; also, their conditional nature means that PHP has to stat the file and check its own list of already included files. When I have 164 files, that's not negligible.

I didn't mean this to be in-defense of autoload. I'm just trying to think (besides performance-wise) of any ramifications this might have in terms of usability.

Re: Autoload for libraries

Posted: Mon Jan 14, 2008 9:16 pm
by Christopher
I think the pendulum swung back toward include_once() in the last round of optimizations to PHP5 that happened during (and in response to) ZF development.

But perhaps there is no definitive answer because there is not definitive question. Obviously most of us are willing, for example, to trade-off a little performance for a big gain in clarity and usability. However, you are talking about making (apparently) subjective improvements in clarity and usability -- and then wondering about what possible side-effects and performance improvements might be. I don't know if there is an answer...

Re: Autoload for libraries

Posted: Mon Jan 14, 2008 9:25 pm
by Ambush Commander
Hmm... I should rephrase the question.

Consider a user who is not interested in the inner workings of a library, just that it "works". He uses include_once 'libs/library.php'; to load the library code, or perhaps new library(), because he's got his own autoload and the library is now in his path.

Let us suppose that the library is using autoload too.

Is there anything that could go wrong, that wouldn't go wrong in the case of plain old include_once/require_once?

That's all I need to know. I'm pretty sure that autoload is slower than include_once, so performance really isn't at issue (I'll have other measures in place to compensate).

Re: Autoload for libraries

Posted: Mon Jan 14, 2008 10:34 pm
by wei
Another way is to use autoload in addition to combining all the small classes into one large file (or a few logical files). Having a couple hundred class definitions doesn't seem to effect the overall performance much when byte-code cache is enabled.

Re: Autoload for libraries

Posted: Tue Jan 15, 2008 3:33 am
by Maugrim_The_Reaper
PHPSpec and PHPMock were converted to use autoload some time ago. There were a few simple reasons why. Firstly, this entire argument was hashed out in excruciating detail on the PEAR list some months ago, and the final decision was to completely avoid any use of *_once() since it created some curious results compared to alternatives. That is, it ranged from a 2% to 10% performance hit depending on the system tested. This was deemed intolerable from the perspective of high-traffic sites hosted on multi-processor platforms.

The solution was autoloading. Using autoloading the PEAR2 autoload method becomes essentially optional. It's a good default, but it can be replaced by a custom solution (e.g. one using absolute paths rather than searching the entire sequence of include_path parts for a relative match, or including all classes in one file).

Finally, the above argument is borne out by the Zend Framework. Any xdebug cachegrind output quickly puts *_once() near the top of the list as taking a significant part of the overall execution time for any request (assuming no database interaction).

So my opinion, is to switch to autoloading. Presumably classes will continue to use the PEAR convention (maybe later the PEAR2 convention for namespaces) so anyone reading the source can easily translate dependent classnames to their file location. If dependency tracking is an issue it can be documented in the phpdoc comments using the @uses keyword.

As for two autoload functions - just use the SPL functions to manage adding yours. You can probably leverage off the PEAR2 version for any potential issues.

Re: Autoload for libraries

Posted: Tue Jan 15, 2008 7:24 am
by Jenk
Chris Corbyn wrote:I've seriously gone off autoloaders since I like to be able to open a file and see the require() lines at the top which indicate what the deps are. Perhaps when combined with namespaces (where you'd replace require() with 'use') it'd be a reasonable trade off but then all you've really done is shifted the logic for the include somewhere else.

Is including a 0 KB file a significantly lower performance hit than including a 500 KB file? If the slow down is in loading and parsing the file contents then perhaps you could go the opposite direction and gain a performance boost by refactoring into smaller files so you're more likely to include include specific units of code you need, rather than those units of code and a bunch of stuff you won't use. I'm not entirely sure of the specifics of the performance hit involved with multiple includes (i.e. parsing source multiple times, or the actual process of loading the file itself).
I'm halfway between the two. If I have a class that is completely dependent upon another class (hypothetical example, an Array object returns an Iterator object,) then I'll use include/require. Otherwise, autoload.

Re: Autoload for libraries

Posted: Tue Jan 15, 2008 12:35 pm
by Oren
Maugrim_The_Reaper, I don't get one thing... how can autoloading be better when we are talking about performance? You do use the *_once() in the __autoload() function anyway.

Re: Autoload for libraries

Posted: Tue Jan 15, 2008 1:59 pm
by Jenk
There's no point using *_once inside an autoloader, because the __autoload() function is only called when the class requested is not within scope. That breaks down to:

Code: Select all

if (!class_exists($someClass)) {
  if (!class_exists($someClass)) { 
    include $comeClass. '.php';
  }
}

Re: Autoload for libraries

Posted: Tue Jan 15, 2008 2:09 pm
by Oren
I actually meant "require/include" sorry.

Re: Autoload for libraries

Posted: Tue Jan 15, 2008 2:37 pm
by Maugrim_The_Reaper
The autoloading performance boost is largely coincidental. It becomes relevant when a library has a lot of require_once redundancy. Say you have three drivers in use, and each needs the same parent class. So because each driver is independent, each would have a require_once at the top.

Now using all three drivers, requires one parent class, but that one class gets three require_once calls for it (one per driver file). Using autoloading, you only get one require/include on the first parent class reference. That's two less *_once calls overall which effects performance at some level.

You can go further by overloading the autoload function (e.g. static class method) to use absolute paths (which eliminates include_path searches for matching relative paths which is another set of file ops eliminated). The performance boost (if any at all) is restricted by the conditional include within the autoload function - i.e. APC will cache the file, but not the assembled classes/functions. That's the main autoload drawback, but the performance it reduces is rarely as significant as the gain from caching the file itself (which is the main benefit since it avoids future file ops).

That's largely why autoload is seen in PEAR2 as the better solution. It's often slightly faster without APC, or with APC (assuming apc.stat not set to 0). It can be overridden with a custom autoloader which makes loading subject to personal tweaking for performance. And it doesn't restrict non-autoload solution, e.g. assembling all classes in one file.

Re: Autoload for libraries

Posted: Tue Jan 15, 2008 2:52 pm
by Ambush Commander
/me realizes that he's inadvertently created two duplicate threads.

I'll look at PEAR2 for any implementation details related to autoload. In the meantime, you guys can duke it out at this thread about the performance implications of autoload.

Re: Autoload for libraries

Posted: Wed Jan 16, 2008 2:03 pm
by Oren
Maugrim_The_Reaper wrote:Now using all three drivers, requires one parent class, but that one class gets three require_once calls for it (one per driver file). Using autoloading, you only get one require/include on the first parent class reference. That's two less *_once calls overall which effects performance at some level.
That's right if you simply do *_once() in each one of your drivers, but you can easily eliminate this by using something like Swift's classLoader::load().

Re: Autoload for libraries

Posted: Sat Jan 19, 2008 12:02 pm
by Selkirk
I like where PEAR2 is going with autoload. I like that it gives a greater degree of freedom to organize your code without worrying about the performance of file loading operations. You can do what you want, then either create a bootstrap file or a rollup file to feed to an opcode cache.

I've started on a project that used autoloading and I think I'm a convert. One problem, though, is that I've observed that a one class per file policy seems to inhibit people from using small classes. I think this policy shapes OO designs in a way I don't like.