Very large database export to csv, excel without php timeout

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
hippypink
Forum Newbie
Posts: 5
Joined: Mon Sep 24, 2007 12:23 pm

Very large database export to csv, excel without php timeout

Post by hippypink »

When exporting very large datasets, lets say, to an Excel file, or a CSV file, it times out; or, importing large files to put into the database.

Increasing the php max execution time, or set_time_limit is an option, however this does not feel like it is the best practice for a few various reasons.

1. If there is a bad script somewhere within my app, then in an environment with potentially thousands of users, this would wreak havoc. It seems there is a limit there for a reason in the first place.

2. I may be exporting up to 250,000 items, or perhaps importing 100,000 items, where each item represents up to 20 INSERT statements.

3. Database memory limits (maybe mysql_unbuffered_query can help here?)

I have seen systems, such as analytics software, built in a way where you start the report generation, and then can return later when it's finished. I do not know if this is smart, nor how it is done.

One idea I had was to somehow run the script over and over, using some standard number I define (e.g. 5,000 records). Sort of a loop, or batch/queue. I would of course need to keep track of where it started and stopped between each cycle. Command line stuff, or C++ is really undesired here as well since I want to make the app as portable, and purely PHP dependent as possible.

However, there is probably already an ideal design pattern, or best practice for this sort of situation--and help here?
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

set_time_limit() will only be applicable in the script that it is running on. So, that is your best bet. The limits of mysql memory can be be reduced by getting what you need in chunks. array_chunk() and the use of WHERE `something` IN($chunked_data) comes to mind. Also usage of mysql_free_result() to free result sets from mysql memory if you're doing multiple queries.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

MySQL has LOAD DATA INFILE. It may be of interest.
User avatar
jimthunderbird
Forum Contributor
Posts: 147
Joined: Tue Jul 04, 2006 3:59 am
Location: San Francisco, CA

Post by jimthunderbird »

Hi Happylink,
I never take the task like you mentioned here, but I used to export around 1000 long_blob data (pdf files stored in mysql database) out and store them into a folder.

At first I do a straight query and export, but it takes too long and I keep getting timeout problem.

Then I came up with this solution:

1. I create two php file, let's say A and B, A for reading the record in chunks, more like a paging, B do the actual export on the chunk just read.
2. A then call B every second using AJAX and javascript's setTimeout function.

It works like magic, I sucessfully export all the pdfs. I might say this is a sort of like AJAX/PHP multi-threading solution.

Hope this give you some inspiration.

With my best,
Jim
hippypink
Forum Newbie
Posts: 5
Joined: Mon Sep 24, 2007 12:23 pm

Post by hippypink »

I realized that the answer may depend on the specific task involved.

Exporting:

jimthunderbird has a pretty cool sounding solution for exporting. My case requires that the the outputted file to be one big files (instead of many). perhaps there is a way to write a single file and safely/quickly append to it the chunks.


Importing:

I am using an actual scenario which is probably quite common. As I run through the CSV file, I am importing records into multiple tables. This makes it different in how I might break it up into blocks, and using using "LOAD DATA INFILE" makes me wonder about the same issues--will it go faster since the PHP has already added the overhead to write out multiple files (benchmarks anyone?).


I also found an interesting article from IBM which may be of value:

(Google cached page)
http://64.233.167.104/search?q=cache:2A ... php-batch/
Begby
Forum Regular
Posts: 575
Joined: Wed Dec 13, 2006 10:28 am

Post by Begby »

It might also be possible to schedule these big jobs. When a request is made writs the details of the request to a db. Then you have a cron job that runs a python, or even another php script, that reads the jobs and runs them and outputs the files. Then after each job is done it sends an email notifying the person that their export is complete with a link to download it.
Post Reply