Accessing files on FTP
Posted: Fri Apr 11, 2008 1:08 am
Hi guys,
Need help/suggestion on the most optimal way to code this:
1. I need to access an FTP file in which i will parse and get the contents onto my database. Lets call this index File master.idx
2. This master.idx contains a LIST of tens of thousands of filenames in a form of path/to/filename.txt
Sample:
3. I need to open each of these files on the list and store the contents on my database.
4. So my database table would look something like this:
5. All these files are on an FTP server:
I have an existing script that i need to modify coz this existing script causes problems:
1. The existing script reads the master.idx by cURL and line by line:
then opens the path/to/filename.txt and grab the contents by ftp_fget:
2. The problem is it keeps on randomly getting FTP timeout errors (No transfer timeout (300 seconds): closing control connection) and unable to get the file when in fact the file is present.
Any inputs are appreciated.
Thanks.
Need help/suggestion on the most optimal way to code this:
1. I need to access an FTP file in which i will parse and get the contents onto my database. Lets call this index File master.idx
2. This master.idx contains a LIST of tens of thousands of filenames in a form of path/to/filename.txt
Sample:
Code: Select all
Description:
Last Data Received:
Comments:
Anonymous FTP:
CCC|CN|FT|Date Filed|Filename
--------------------------------------------------------------------------------
1000045|AAA |10-Q|2008-02-11|folder_ee/folder_dd/folder_zzz/0001193125-08-025292.txt
1000045|AAA |4|2008-02-05|folder_ee/folder_dd/folder_yyy/0001000045-08-000001.txt
1000045|AAA |4|2008-02-07|folder_ee/folder_dd/folder_yyy/0001000045-08-000002.txt
1000045|AAA |4|2008-03-18|folder_ee/folder_dd/folder_xxx/0001000045-08-000003.txt4. So my database table would look something like this:
Code: Select all
ID FILENAME CONTENT
1 folder_ee/folder_dd/folder_zzz/filename.txt content of the file
5. All these files are on an FTP server:
I have an existing script that i need to modify coz this existing script causes problems:
1. The existing script reads the master.idx by cURL and line by line:
Code: Select all
includes "includes";
$masterfilename = 'folder_ee/folder_ff/master.idx';
$buffer = get_file_filings_idx($masterfilename);
$i=0;
while (!feof($buffer)) {
$file = new file_filings();
$record = fgets($buffer);
$pos = strpos($record, 'folder_ee/folder_dd/');
$data = explode ('|', $record);
//data[4] on position $pos = folder_ee/folder_dd/folder_zzz/filename.txt
if (($pos>-1) AND (trim($data[2])=='4')) {
$i++;
$file->file_contents = ftp_get_file_contents(trim($data[4]));
if ($file->Save()) {
echo $i .') record saved successfully' . ' - '. date('h:i:s').'<br>';
} else {
echo $i .') records updated!'. ' - '. date('h:i:s').'<br>';
}
}
}Code: Select all
function get_file_filings_idx($masterfilename) {
$buffer = '';
$url = 'ftp://example.com/'.$masterfilename;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
//curl_setopt($ch, CURLOPT_PUT, 1);
//curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0");
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_USERPWD, $user.":".$password);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
$buffer = curl_exec($ch);
curl_close($ch);
if (preg_match('/Error 404/', $buffer))
$buffer = "";
$buffer = fopen($url, "r");
return $buffer;
}then opens the path/to/filename.txt and grab the contents by ftp_fget:
Code: Select all
function ftp_get_file_contents($filename) {
$user = 'anonymous';
$password = '';
$dump = 'php://output';
//$filename folder_ee/folder_dd/folder_zzz/filename.txt
$patharray = explode ('/', $filename);
$ftp = ftp_connect('ftp.example.com', 21, 900);
ftp_login($ftp, 'anonymous', '');
ftp_pasv ( $ftp, true);
//need to manually navigate thru folders...
ftp_chdir($ftp, 'folder_ee');
ftp_chdir($ftp, 'folder_dd/'.$patharray[2]);
$file = $patharray[3];
ob_end_flush();
ob_start();
$out = fopen($dump, 'w');
if (!ftp_fget($ftp, $out, $file, FTP_ASCII)) die('Unable to get file: ' . $filename);
fclose($out);
$data = ob_get_clean();
ftp_close($ftp);
return $data;
}
2. The problem is it keeps on randomly getting FTP timeout errors (No transfer timeout (300 seconds): closing control connection) and unable to get the file when in fact the file is present.
Any inputs are appreciated.
Thanks.