All of the data..
It is a static list which is relatively uniform (some things aren't, but I don't mind a few errors).
As I mentioned before, here is (one of) the page(s) I'm working with..
http://home.att.net/~jbaugher/1938.html
This is aircraft serial number data, so I want to extract the info on the serial. As an example we'll try to extract data for serial number 38-214.
Serial numbers and ranges are listed on the left side. In the case of 38-214 there is no serial number listed; instead it's part of range 38-211/223 (which is to say, serial numbers 38-211 through 38-223). I've written a regular expression which can go and get all the serials and ranges, and then I wrote a for loop which will convert the ranges into individual serials. So once we've actually got the serial 38-214, I want to get the manufacturer and designation. In this case the manufacturer for all aircraft in the range is Boeing and the designation is B-17B (note, Fortress is just a nickname and not part of the designation; I don't want that data). Then I want to get any range-wide data, which is the first piece of info below the serial ranges aircraft designation. In this case the range-wide data is c/n 2004/2016. Not all ranges or serials will have this. Then I want to get the data specific to 38-214. Once again, 38-214 is not listed, but 214 is, so we'll grab the data within that range whose row begins with 214. This data is: 214 crashed in Santa Catalina Mts near Davis Monthan AAF after inflight engine fire Apr 6, 1942. 2 bailed out, 6 killed.
I would like to retrieve all that data, but I'm not very good at this.. So far I've written some crap which actually works (however inefficently) except when a serial has more than one line of data, or when the range-wide data is more than one line; in these situations it will only grab one line. Also, if there's no data for the serials, it still uses the next line, which is incorrect because the next line is actually another serial or range. Please note that I'm using an altered form of the data which I've manually created; I don't know if it's actually easier but it's easier for me. This format of data is easily achieved by copying and pasting between a few things. Here is an example of the data I'm using..
Code: Select all
38-211/223 Boeing B-17B Fortress<br>
c/n 2004/2016<br>
214 crashed in Santa Catalina Mts near Davis Monthan AAF after inflight engine<br>
fire Apr 6, 1942. 2 bailed out, 6 killed.<br>
215 attached to Cold Weather Testing Detachment at Ladd Field, Alaska 1941-42. <br>
Participated in bomb strikes against Japanese fleet during the Dutch Harbor<br>
operation and was involved in air battle above Umnak Pass June 4, 1942.<br>
Crashed Jul 18, 1942 while returning from weather recon to Kiska. All 6 crew KIA.<br>
217 crashed near Lovelock, NV while enroute to Wright Field Feb 6, 1942. All 8 killed.<br>
38-224/257 North American BT-9C<br>
Here's the code I'm using..
Code: Select all
preg_match_all("/^[0-9]{2}-[0-9]{1,7}(\/[0-9]{1,7}){0,1}\s.+?<br>/ms",$contents,$matches);
for($x=0;$x<20000;$x++){
// DESIGNATION
$dsgnt = NULL;
$dsgnt0 = explode(" ",$matches[0][$x]);
for($v=1;$v<count($dsgnt0);$v++) $dsgnt .= $dsgnt0[$v] . " ";
$dsgnt = str_replace("<br>","",rtrim($dsgnt));
preg_match("/^(.+)\s((\w+)-(\w+))(\s(\w+)){0,1}/",$dsgnt,$d_matches);
$man = $d_matches[1];
$des = $d_matches[2];
$match = str_replace("<br>","",$dsgnt0[0]);
$ex = explode("/",$match);
if(count($ex)>1){
$ex0 = explode("-",$ex[0]);
$pre = $ex0[0];
for($y=$ex0[1];$y<($ex[1]+1);$y++){
$details = NULL; $details_pre = NULL;
$details = explode($match,$contents);
$details = explode("<br>",$details[1]);
$details = ltrim(rtrim($details[1]));
$details0 = explode(" ",$details);
if(is_numeric($details0[0])){ $bah = 1; }
else{
$details = strtoupper($details{0}).substr($details,1);
if($details AND substr($details,-1,1)!=".") $details .= ".";
$details_pre = $details . "<br />";
}
$details = NULL;
$details = explode($match,$contents);
$details = explode("<br>
$y ",$details[1]);
$details = explode("<br>",$details[1]);
$details = ltrim(rtrim($details[0]));
$details = strtoupper($details{0}).substr($details,1);
if($details AND substr($details,-1,1)!=".") $details .= ".";
$details = $details_pre . $details;
$serial = $pre . "-" . $y;
if($serial) print "$serial --- match: $match - y: $y<br />\n";
if($serial) mysql_db_query($database,"INSERT INTO aviation_aircraft_serials2 VALUES ('$serial', '1', '$man', '', '$des', '$details')") or die(mysql_error());
}
}else{
$details = NULL;
$details = explode($match,$contents);
$details = explode("<br>",$details[1]);
$details = ltrim(rtrim($details[1]));
$details0 = explode(" ",$details);
$details = strtoupper($details{0}).substr($details,1);
if($details AND substr($details,-1,1)!=".") $details .= ".";
if($match) print $match." - $dsgnt<br />\n";
if($match) mysql_db_query($database,"INSERT INTO aviation_aircraft_serials2 VALUES ('$match', '1', '$man', '', '$des', '$details')") or die(mysql_error());
}
}
If anyone can do anything to help me with this, I'd really appreciate it! I'm sure that regular expressions can be used to do this much more efficently, but I don't really know how..
Thanks for all the help!