Retrieving the contents of large amount of files on FTP

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
m4rv5
Forum Newbie
Posts: 13
Joined: Fri Apr 11, 2008 12:49 am

Retrieving the contents of large amount of files on FTP

Post by m4rv5 »

I have to access this file on FTP: and store the data on my Table with an additional field for file_contents.

Of each filename listed, i need to get the content also and store on my table on file_contents field.

Can someone give me an expert advise on the most optimal, efficient, fast and effective way to do this?

by the way, the file below could contain large amount of data as much as 300k to 1M lines.

I already have my script but im getting random timeout and 'Unable to get the files' error


Code: Select all

Description:           Master Index of EDGAR Dissemination Feed
Last Data Received:    March 25, 2008
Comments:              webmaster@sec.gov
Anonymous FTP:         ftp://ftp.sec.gov/edgar/
 
 
 
 
CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000045|NICHOLAS FINANCIAL INC|10-Q|2008-02-11|edgar/data/1000045/0001193125-08-025292.txt
1000045|NICHOLAS FINANCIAL INC|4|2008-02-05|edgar/data/1000045/0001000045-08-000001.txt
1000045|NICHOLAS FINANCIAL INC|4|2008-02-07|edgar/data/1000045/0001000045-08-000002.txt
1000045|NICHOLAS FINANCIAL INC|4|2008-03-18|edgar/data/1000045/0001000045-08-000003.txt
1000045|NICHOLAS FINANCIAL INC|8-K|2008-02-05|edgar/data/1000045/0001193125-08-020320.txt
1000045|NICHOLAS FINANCIAL INC|SC 13G/A|2008-02-08|edgar/data/1000045/0000950135-08-000646.txt
1000045|NICHOLAS FINANCIAL INC|SC 13G/A|2008-02-14|edgar/data/1000045/0000315066-08-001913.txt
1000045|NICHOLAS FINANCIAL INC|SC 13G/A|2008-02-14|edgar/data/1000045/0001362310-08-000904.txt
1000045|NICHOLAS FINANCIAL INC|SC 13G|2008-02-11|edgar/data/1000045/0001362310-08-000627.txt
1000069|EMPIRIC FUNDS, INC|40-17G/A|2008-02-13|edgar/data/1000069/0000894189-08-000447.txt
 
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Retrieving the contents of large amount of files on FTP

Post by Chris Corbyn »

Have you got a call to set_time_limit() anywhere in your code? The default is 30 seconds before PHP exits.
m4rv5
Forum Newbie
Posts: 13
Joined: Fri Apr 11, 2008 12:49 am

Re: Retrieving the contents of large amount of files on FTP

Post by m4rv5 »

no, but i got this declared on my htaccess

php_value max_execution_time 86400
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Retrieving the contents of large amount of files on FTP

Post by Chris Corbyn »

We'll have to see the error ;) Can you paste it here?
m4rv5
Forum Newbie
Posts: 13
Joined: Fri Apr 11, 2008 12:49 am

Re: Retrieving the contents of large amount of files on FTP

Post by m4rv5 »

lol i'll try.. like i said its random... it can occur on the 100th minute of execution, or at the 3rd hour.. heheh :D sometimes it just hangs.
m4rv5
Forum Newbie
Posts: 13
Joined: Fri Apr 11, 2008 12:49 am

Re: Retrieving the contents of large amount of files on FTP

Post by m4rv5 »

Ah, I already posted about this last week and no one answered.. anyway for my code see here

viewtopic.php?f=1&t=81271

The error also is in there:
The problem is it keeps on randomly getting FTP timeout errors (No transfer timeout (300 seconds): closing control connection) and unable to get the file when in fact the file is present.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Retrieving the contents of large amount of files on FTP

Post by Chris Corbyn »

Ok, that's the FTP server configuration imposing a maximum amount of time for a transfer to occur. You'll need to speak to the FTP server admin. Basically if any file takes longer than 5 minutes to download or upload the FTP server will kill your connection (to save resources).
m4rv5
Forum Newbie
Posts: 13
Joined: Fri Apr 11, 2008 12:49 am

Re: Retrieving the contents of large amount of files on FTP

Post by m4rv5 »

ah i see.. just what i expected.. but i dont believe the administrators will be willing to adjust. I need a way to handle this incase this occurs.... any ideas? like reconnecting if possible?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Retrieving the contents of large amount of files on FTP

Post by Chris Corbyn »

If you want to take advantage of stop-start transfers you'll need to understand the FTP protocol and write an implementation using fsockopen() ;) I'd imagine it will be pretty simple actually because although I don't really know the FTP protocol I do believe it's not a complex one.
m4rv5
Forum Newbie
Posts: 13
Joined: Fri Apr 11, 2008 12:49 am

Re: Retrieving the contents of large amount of files on FTP

Post by m4rv5 »

Thank you for your time, I'm going to research more about it... Currently i setup pure cURL functions, I will run script and see if i'll get timeouts.

Again Thanks
Post Reply