Page 1 of 1
list of all the colleges in the US?
Posted: Fri Aug 11, 2006 1:21 am
by matt1019
Hmm,
Hi guys! Looking at one of the posts (list of cities,state,zip) in this section got me thinking.... is there a similar compilation of colleges/universities in the US?
thanks much!
-Matt
Posted: Fri Aug 11, 2006 4:07 am
by Weirdan
Posted: Fri Aug 11, 2006 11:31 am
by matt1019
Holy Cow!!
Just what I needed, except they do not provide in database format.... its a huge huge list to manually cut and paste too...
I can think of an idea: a source code parser.!!!
I just learned in c++ how to take a .html page, and strip everything between specified tags.
Once I finish, I will post the results here in zip format
in the mean time, if someone has a "php" based solution, please share
Basically, the source parser will take the page in .html form, and "take" everything between
TARGET="_blank"> and
</A></LI> since the name of the colleges are in between those tags.
here's an example:
interesting little project.... worth it though
-Matt
Posted: Fri Aug 11, 2006 11:55 am
by s.dot
Or how about this
Code: Select all
$file = file_get_contents('sourcefile.txt');
preg_match_all("#<LI>*+? TARGET=\"_blank\">(*+?)</A>#ism",$file,$matches);
foreach($matches[0] AS $match)
{
echo $match.'<br />';
}
Not tested, of course.
Posted: Fri Aug 11, 2006 12:19 pm
by matt1019
Hi Scotayy,
nope, does not work... gives me the following errors:
Warning: preg_match_all() [function.preg-match-all]: Compilation failed: nothing to repeat at offset 6 in getcolleges.php on line 7
Warning: Invalid argument supplied for foreach() in getcolleges.php on line 9
i tried the following also.... still no luck.
Code: Select all
<?php
error_reporting(E_ALL);
$file = file_get_contents('sourcefile.html');
preg_match_all("#<LI>*+? TARGET=\"_blank\">,(*+?)</A>#ism",$file,$matches,PREG_PATTERN_ORDER);
foreach($matches[0] AS $match)
{
echo $match.'<br />';
}
?>
The file "sourcefile.html" does exist in the same dir as this script.
-Matt
Posted: Fri Aug 11, 2006 12:25 pm
by s.dot
Ah. My bad.
Code: Select all
$file = file_get_contents('sourcefile.txt');
preg_match_all("#<LI>.+? TARGET=\"_blank\">(.+?)</A>#ism",$file,$matches);
foreach($matches[1] AS $match)
{
echo $match.'<br />';
}
Posted: Fri Aug 11, 2006 12:30 pm
by matt1019
Hahahahahahha!! Yaarrrrrr! Genius!
I will now tweak this script so that it produces "sql" type output....so then I can just copy and paste.... and get database table filled with these values.
As again, I will post the completed sql file
thanks scottayy!!
-Matt
Posted: Fri Aug 11, 2006 1:39 pm
by matt1019
No Attachments allowed?
any suggestions as to where I can upload my rar file?
(has the complete sql file

)
-Matt
Posted: Fri Aug 11, 2006 2:12 pm
by matt1019
Ahhh, yes.
Found a good site to host it on
here's the link to the download:
http://www.upload2.net/page/download/hA ... 0.rar.html
please read the comments inside the sql file. It explains what you need to know.
-Matt