suggestion to improve site preformance

Ye' old general discussion board. Basically, for everything that isn't covered elsewhere. Come here to shoot the breeze, shoot your mouth off, or whatever suits your fancy.
This forum is not for asking programming related questions.

Moderator: General Moderators

Post Reply
User avatar
yacahuma
Forum Regular
Posts: 870
Joined: Sun Jul 01, 2007 7:11 am

suggestion to improve site preformance

Post by yacahuma »

Hello,

I have a site where people fill their taxes, then generate a pdf, then download the pdf. Last year I had everything on one machine and it collapse. This year I put the pdf printer on a different machine. Now I have 3 choices:
1. send the file back to the originating server so people can download their pdf
2. let people pickup the pdf from the print server
3. send the files to another machine for pickup.

I am wondering how much performance hit I get when people start downloading? Any suggestions?
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: suggestion to improve site preformance

Post by Christopher »

The best performance is to let the print server only do its job -- so the website and the downloads are separate.

You might want to look into something like Gearman as a nice way to queue these jobs. Users could receive an email when the download is ready.
(#10850)
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: suggestion to improve site preformance

Post by Eran »

I agree with Chris regarding having it downloaded from the printing server. Another alternative would be to use an external service like Amazon S3. I have to wonder though - why is it creating so much of a load? is it possible that with a different package or C extension / application you would have gotten much better results and didn't need two servers?
User avatar
yacahuma
Forum Regular
Posts: 870
Joined: Sun Jul 01, 2007 7:11 am

Re: suggestion to improve site preformance

Post by yacahuma »

For the new implementation I switched to gearman. Part of the problem was that the creating of pdf takes a LOT a CPU. It will be nice to have the PDF creation as an extension, but the creator of the library (setapdf) just dont have the resources to write a C extension. SetaPDF is awesome. I have no complains in the support or quality of the library.

So we agree that the best choice will be to download from the print server. I guess if I move the files back, I am also introducing the ftp process to the mix.


I looked into Amazon a bit. I am just not ready to add another level of complexity.(complexity meaning, never used them and have to learn the ins and outs)

Thank you
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: suggestion to improve site preformance

Post by Eran »

If it's written in pure PHP and has performance problems - that might be reason enough to switch to something else. Did you have a look at pdflib? http://pecl.php.net/package/pdflib
User avatar
yacahuma
Forum Regular
Posts: 870
Joined: Sun Jul 01, 2007 7:11 am

Re: suggestion to improve site preformance

Post by yacahuma »

I wish it was that simple. The problem is that tax form are complex and the library does a couple of things. The most basic one is search and replace values in a form, in addition creating watermarks. I am not sure pdflib is able to search and replace form values in a pdf. I might be wrong, but when I search the first time, it was not able to do that. I really trust the library. The problem is that the first time I had nothing in place for the avalanche of people printing. Now with gearman will be different. I just was not sure about the strategy to follow for the download. But the more I think about it, the more it make sense to offload the download part to the print server.


Another thing is that I am running the web server and mysql server on the same machine. I always though that having the db server on the same place as the web server is the best choice, since their is no network routing involve. If the server fails next year, I guess I will have to rethink the idea of separating the DB and apache server. Comments on this??
Post Reply