Page 2 of 2
Posted: Tue Aug 15, 2006 4:24 pm
by nickvd
Ambush Commander wrote:With a few more tricks, I've managed to slash it down to 12%. HTMLPurifier still is slow, but it's not as slow, and I think I'll now start implementing a few more features. (Unless, of course, Feyd says otherwise).
Just curious, but how slow is slow? Taking for example, this posts' page, how long does it take to get cleaned?
Posted: Tue Aug 15, 2006 4:33 pm
by Ambush Commander
Page one of this forum topic, weighing 120KB, takes 6 seconds (note that the actual time spent for the server is about 20 seconds because the data has to be sent there and sent back).
Bottom line is that for important stuff, you can't just drop it in: you'll also need to add a caching layer. That's the price you pay for power. We should get it fast enough to be on-demand for low-traffic sites , because caching is a very BIG change end-users will have to make to have this be viable.
Note that DevNetwork HTML is not a good standard to benchmark the library to: it's a lot larger than normal documents would be, and it also has loads and loads of tables (which is a somewhat expensive operation).
Posted: Wed Aug 16, 2006 12:09 am
by nickvd
6 seconds for 120kb.. that's not at all slow. It's roughly 20 times the size of my average complete page, so unless you're serving thousands of hits per hour, it seems to be quite fast...
Posted: Wed Aug 16, 2006 7:54 am
by Ambush Commander
Well, the trouble is when you start serving moderately large documents. I had to write
SLOW docs to give ideas on how to speed things up.
Posted: Wed Aug 16, 2006 2:00 pm
by bg
Ambush Commander wrote:Page one of this forum topic, weighing 120KB, takes 6 seconds (note that the actual time spent for the server is about 20 seconds because the data has to be sent there and sent back).
Bottom line is that for important stuff, you can't just drop it in: you'll also need to add a caching layer. That's the price you pay for power. We should get it fast enough to be on-demand for low-traffic sites , because caching is a very BIG change end-users will have to make to have this be viable.
Note that DevNetwork HTML is not a good standard to benchmark the library to: it's a lot larger than normal documents would be, and it also has loads and loads of tables (which is a somewhat expensive operation).
120kb I assume includes images? Correct me if I'm wrong. PHP just isn't made for this kind of data grinding. Something like this is much better suited as a PHP extension that can be written in C.
Posted: Wed Aug 16, 2006 3:35 pm
by Ambush Commander
120kb I assume includes images? Correct me if I'm wrong.
118kb if I save the source to a file and check that filesize. Still a lot.
PHP just isn't made for this kind of data grinding.
True, however...
Something like this is much better suited as a PHP extension that can be written in C.
Well, first of all, I don't know how to write C.

Second of all, if anyone wants to port this to C and make it a standard PHP extension, be my guest. A pure PHP solution will still acheive maximum portability, esp. for those people on shared hosting environments.
Posted: Wed Aug 16, 2006 3:41 pm
by feyd
..which is why I made SHA256 purely in PHP too.
Sorry,

Posted: Wed Aug 16, 2006 3:45 pm
by Ambush Commander
No, that's on topic. The price you pay for abstraction and portability is performance.
Posted: Wed Aug 16, 2006 6:54 pm
by bg
Can you post the xdebug profile dump? I'd be interested in seeing it.
Posted: Wed Aug 16, 2006 7:42 pm
by Ambush Commander
Hmm... I'd have to reprofile the older versions of code for pre-optimization dumps, but I can give you one taken after the optimization.
http://www.thewritingpot.com/media/cach ... 5675463.gz