Parsing external documents
Moderator: General Moderators
Parsing external documents
Is ther anyway in php to parse and external document for a certain value. for example, say I wanted to parse yahoo.com's main site for the word News, can php do that?
Parsing documents
Yes!
lol ok thanks. I looked it up in the php dev cookbook, saw how to do it. But I am having trouble understanding the preg_match_all() function. How do I send it input to get all the values following a given word?
i.e.
I search a html page for the word "Points" and I want it to give me the number of points, which follows the word by a colon or something. Any help?
i.e.
I search a html page for the word "Points" and I want it to give me the number of points, which follows the word by a colon or something. Any help?
preg stuff...
I'm sorry, I was just feeling like being a little stupid and
I worked in a prison from '89 to '97 (dating myself) and with those fools, it was clown or be clowned. You got your jokes in as soon as the opportunity arose!
I'm at work right now, but when I get home and get back online this evening, I will also check some code form about a year ago where I did some of that stuff. Hopefully it will help you out some. If I foget (like by tomorrow), just kinda send me a message or something to prod my rememberance.
Later on,
BDKR (TRC)
I'm at work right now, but when I get home and get back online this evening, I will also check some code form about a year ago where I did some of that stuff. Hopefully it will help you out some. If I foget (like by tomorrow), just kinda send me a message or something to prod my rememberance.
Later on,
BDKR (TRC)
reg ex
This (regular expressions and stuff) is something I don't do much. I allways try to look for another way of doing it before I do use it. Most of the time, I don't have to. It's one of the ugliest looking things I've ever seen in programming, but anyone that can read and understand line after line of that stuff gets credit from me.
Anyways, I've had need of preg_replace() and ereg() and to be honest with you, I'm not sure of the difference between the two. I'm not sure I care.
Anyways, what I was doing in one of those instances was looking for all data between two points in an document. Here is a code snippet.
Now the var $info is actually an array (php.net/ereg) and you should fiddle with the array just a tad to get the info out of it that you want.
The "(.*)" bidness is saying grab everything between the "start title" and "end title" tags. That, if I'm not mistaken, is what's being stored as an element in the $info var. It's obviously a title for a story.
Now I'm assuming (hoping actually) somewhat that perhaps there is a place in the document that's very similar to what we have above. How do you, and in turn, your script, know when to start parsing for the information. Is there a place similar to the above.
If it's something like the above; something where you know where the information is going to begin and end, then you can use ereg as I did above and use "(.*)" to grab all the data between those two points. Did you create this document that is going to be parsed? Was the document created in such a way to make it easy to be parsed?
This is the kind of thing that xml is great for. But the document needs to be an xml document. If it is by chance that, then it's even easier. Let me know.
Anyways, I've had need of preg_replace() and ereg() and to be honest with you, I'm not sure of the difference between the two. I'm not sure I care.
Anyways, what I was doing in one of those instances was looking for all data between two points in an document. Here is a code snippet.
Code: Select all
if(strstr($story, "<!--- start title --->"))
{
$search=ereg("<!--- start title --->(.*)<!--- end title --->", $story, $info);
}The "(.*)" bidness is saying grab everything between the "start title" and "end title" tags. That, if I'm not mistaken, is what's being stored as an element in the $info var. It's obviously a title for a story.
Now I'm assuming (hoping actually) somewhat that perhaps there is a place in the document that's very similar to what we have above. How do you, and in turn, your script, know when to start parsing for the information. Is there a place similar to the above.
Code: Select all
<!--- point ---->
point information here
<!--- end points --->This is the kind of thing that xml is great for. But the document needs to be an xml document. If it is by chance that, then it's even easier. Let me know.
mas reg ex
In reading your post some more, I paid more attention to the explanation you gave. Would the date be in a form kind of like....
? What it's grabbing is all information after "Points" and before the ":". You may need to use the trim function on the take.
Let me know how it goes.
Later on,
BDKR (TRC)
? I'm not sure if I understood that correctly. If that's the case, maybe soemthng like....Points "Number of Points" :
Code: Select all
$points=erg("Points(.*):", $document_buffer, $points_info);Let me know how it goes.
Later on,
BDKR (TRC)