The idea is to have a database of various sites and a few attributes of that site which then allows some php code to grab the site scrape it. We are currently working the idea to harvest movie ratings from a large variety of places and to parse names from online directories (using 2 completely different types of data to test / prove the design).
What I need is a generic way to insert a "post-processor". In other words:
Step 1: retrieve all vars from the db
Step 2: get the web page
Step 3: parse the records (preg_match_all) and place in array
Step 4: convert each record to plain text
Step 5: parse each record for fields we want
Step 6: insert into db
I need to insert a command in Step 5a which is more generic in nature and should really be a regex statement. Its main purpose is to "re-arrange" the data into a format that the parse can more easily recognize. For example consider the following 3 records
Code: Select all
Peter Carlson Peter Carlson Peter Carlson
12345 Main St 111-222-3333 Extra Stuff Here
Mytown, AA, 00000 12345 Main St 12345 Main St
111-222-3333 Mytown, AA 0000 Mytown, AA 0000
So with all that my 2 questions:
1. I have no idea what the regex expressions could look like to do that. I know it needs to be something like
/(.*+)\n(.*+)\n(.*+)\n(.*+\n)/$1\n$3\n$4\n$2\n
/(.*+)\n(.*+)\n(.*+)\n(.*+\n)/$1\n$3\n$4\n
2. what php function should I be using. preg_match... is not appropriate, neither is preg_replace as it requires the search and replace in different variables. I think I need a more generic regex statment something like regex('s/search/replace/gsi')
Thanks!
Peter