Page 1 of 1

Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 2:46 am
by m4rv5
I have to access this file on FTP: and store the data on my Table with an additional field for file_contents.

Of each filename listed, i need to get the content also and store on my table on file_contents field.

Can someone give me an expert advise on the most optimal, efficient, fast and effective way to do this?

by the way, the file below could contain large amount of data as much as 300k to 1M lines.

I already have my script but im getting random timeout and 'Unable to get the files' error


Code: Select all

Description:           Master Index of EDGAR Dissemination Feed
Last Data Received:    March 25, 2008
Comments:              webmaster@sec.gov
Anonymous FTP:         ftp://ftp.sec.gov/edgar/
 
 
 
 
CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000045|NICHOLAS FINANCIAL INC|10-Q|2008-02-11|edgar/data/1000045/0001193125-08-025292.txt
1000045|NICHOLAS FINANCIAL INC|4|2008-02-05|edgar/data/1000045/0001000045-08-000001.txt
1000045|NICHOLAS FINANCIAL INC|4|2008-02-07|edgar/data/1000045/0001000045-08-000002.txt
1000045|NICHOLAS FINANCIAL INC|4|2008-03-18|edgar/data/1000045/0001000045-08-000003.txt
1000045|NICHOLAS FINANCIAL INC|8-K|2008-02-05|edgar/data/1000045/0001193125-08-020320.txt
1000045|NICHOLAS FINANCIAL INC|SC 13G/A|2008-02-08|edgar/data/1000045/0000950135-08-000646.txt
1000045|NICHOLAS FINANCIAL INC|SC 13G/A|2008-02-14|edgar/data/1000045/0000315066-08-001913.txt
1000045|NICHOLAS FINANCIAL INC|SC 13G/A|2008-02-14|edgar/data/1000045/0001362310-08-000904.txt
1000045|NICHOLAS FINANCIAL INC|SC 13G|2008-02-11|edgar/data/1000045/0001362310-08-000627.txt
1000069|EMPIRIC FUNDS, INC|40-17G/A|2008-02-13|edgar/data/1000069/0000894189-08-000447.txt
 

Re: Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 3:55 am
by Chris Corbyn
Have you got a call to set_time_limit() anywhere in your code? The default is 30 seconds before PHP exits.

Re: Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 4:20 am
by m4rv5
no, but i got this declared on my htaccess

php_value max_execution_time 86400

Re: Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 4:22 am
by Chris Corbyn
We'll have to see the error ;) Can you paste it here?

Re: Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 4:26 am
by m4rv5
lol i'll try.. like i said its random... it can occur on the 100th minute of execution, or at the 3rd hour.. heheh :D sometimes it just hangs.

Re: Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 4:30 am
by m4rv5
Ah, I already posted about this last week and no one answered.. anyway for my code see here

viewtopic.php?f=1&t=81271

The error also is in there:
The problem is it keeps on randomly getting FTP timeout errors (No transfer timeout (300 seconds): closing control connection) and unable to get the file when in fact the file is present.

Re: Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 4:56 am
by Chris Corbyn
Ok, that's the FTP server configuration imposing a maximum amount of time for a transfer to occur. You'll need to speak to the FTP server admin. Basically if any file takes longer than 5 minutes to download or upload the FTP server will kill your connection (to save resources).

Re: Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 5:15 am
by m4rv5
ah i see.. just what i expected.. but i dont believe the administrators will be willing to adjust. I need a way to handle this incase this occurs.... any ideas? like reconnecting if possible?

Re: Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 5:31 am
by Chris Corbyn
If you want to take advantage of stop-start transfers you'll need to understand the FTP protocol and write an implementation using fsockopen() ;) I'd imagine it will be pretty simple actually because although I don't really know the FTP protocol I do believe it's not a complex one.

Re: Retrieving the contents of large amount of files on FTP

Posted: Tue Apr 15, 2008 5:54 am
by m4rv5
Thank you for your time, I'm going to research more about it... Currently i setup pure cURL functions, I will run script and see if i'll get timeouts.

Again Thanks