Hi everyone. I haven't a bit of trouble with a new script I am trying to write. Here is what is going on...
I am using cURL to grab the source of a html. I'm trying to write a function that is sent the display text of a link and then returns the url of that link.
For example, somewhere in the source file there is a link <a href="file.html">Click Here</a>
In this case I would want to be able to send the function "Click Here" and it return "file.html"
I have tried trying to use regular expressions all morning, but I usually do database e-commerce stuff and can't figure this stuff.
Thanks guys for any help you can offer!!
-Bradly
Web Scrapeing Help Needed. Maybe Regular Expressions?
Moderator: General Moderators
LOL. I really wish I got close enough to have anything of use to post. This is my first time working with regular expressions and boy is it a process to learn!feyd wrote:Regular expressions is the general way to go about doing this. Would you care to post what you've tried and their results so we have a baseline to help you from?
I am really way out of my league here, i can't see it being to difficult for a regex pro. I am guessing it would go something like this
1.) search for a line containing a link with the title. Something like:
Code: Select all
$line = preg_grep('">' . $link_title . "</a>", $html_source);2.) Then I would have a regex that would look for a string in between <a hre=" and ">$link_title
Does this make sense? Is this the proper way of doing something of this nature? Thanks for any advice/help you can offer!
-Bradly
p.s. I just noticed that there is a Regular Expressions forum here. Can someone witht he proper credentials move this topic? I don't want to cross-post.
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
- n00b Saibot
- DevNet Resident
- Posts: 1452
- Joined: Fri Dec 24, 2004 2:59 am
- Location: Lucknow, UP, India
- Contact:
you should use preg_match and use full <a href=""> tag for matching. you will get better results.bradly wrote:Code: Select all
$line = preg_grep('">' . $link_title . "</a>", $html_source);