Reg expressions always give me headaches.....
Ok, here the problem:
i have string which contains a URL, example:
...
http://news.google.com/news/url?sa=T&ct=us/3-0&fd=R&url=http://www.washingtonpost.com/wp-dyn/co ... 02814.html%3Fhpid%3Dsec-health&cid=12936257725&ei=Rg18SYnNHAZWK8uCvCw&usg=AFZQHAO3t-H7mPEcZUQKAWDxLzkA
...
I need to extract only the bold part, so i need an expression/code which parses the string and returns the URL which i have marked in bold.
What i marked in red is always the same and can serve as markers, the other stuff is dynamic. I am only interested in the bold URL, the rest can be stripped in the result.
G.
Need help with reg expression matching.
Moderator: General Moderators
Re: Need help with reg expression matching.
parse_str can do exactly what you need. Just make sure there's only one bit that looks like a URL in that string and you'll be fine.
You can use parse_url first to remove everything before the URL begins but won't clean up the text afterwards. That'll only affect the last key/value pair but since you don't care about the "usg" it shouldn't be a problem.
You can use parse_url first to remove everything before the URL begins but won't clean up the text afterwards. That'll only affect the last key/value pair but since you don't care about the "usg" it shouldn't be a problem.
Code: Select all
$text = <<<TEXT
Reg expressions always give me headaches.....
Ok, here the problem:
i have string which contains a URL, example:
...
http://news.google.com/news/url?sa=T&ct=us/3-0&fd=R&url=http://www.washingtonpost.com/wp-dyn/content/article/2009/01/23/AR2009012302814.html%3Fhpid%3Dsec-health&cid=12936257725&ei=Rg18SYnNHAZWK8uCvCw&usg=AFZQHAO3t-H7mPEcZUQKAWDxLzkA
...
I need to extract only the bold part, so i need an expression/code which parses the string and returns the URL which i have marked in bold.
What i marked in red is always the same and can serve as markers, the other stuff is dynamic. I am only interested in the bold URL, the rest can be stripped in the result.
G.
TEXT;
$url = parse_url($text, PHP_URL_QUERY);
parse_str($url, $GET);
print_r($GET);Re: Need help with reg expression matching.
thank you, but I am still having a problem
Let's say i am pulling web site, and the site contains whatever content, with
...
<A href="http://news.google.com/news/url?sa=T&ct ... &fd=R&url=http://www.washingtonpost.com/wp-dyn/co ... 01062.html&cid=1243573618&ei=xGJ8Sf5325QHAifCIAg&usg=AF643w5AmoH1CMw2_UJ5753S643A">
...
<A href="http://news.otherurl.com/news/url?sa=T& ... &fd=R&url=http://www.blah.com/wp-dyn/content/arti ... 01062.html&cid=1243573618&ei=xGJ2Sf5325QHAifCIAg&usg=AF643w5AmoH1CMw2_UJ5753S643A">
...
and similar URLs embedded all throughout the content.
What i want is to replace each occurence of such string throughout the whole page with the url which is in red, the part after the url=.
<A href="http://news.otherurl.com/news/url?sa=T& ... &fd=R&url=http://www.blah.com/wp-dyn/content/arti ... 01062.html&cid=1243573618&ei=xGJ2Sf5325QHAifCIAg&usg=AF643w5AmoH1CMw2_UJ5753S643A">
becomes ---->
<A href="http://www.blah.com/wp-dyn/content/arti ... 62.html"[b]>[/b]
So..i would need a regexp which finds links in the site/string, parses it, extracts the "url=xxxxxxxxx part and replaces each link with the red part, the rest of the link is not of interest.
Help very appreciated!
Let's say i am pulling web site, and the site contains whatever content, with
...
<A href="http://news.google.com/news/url?sa=T&ct ... &fd=R&url=http://www.washingtonpost.com/wp-dyn/co ... 01062.html&cid=1243573618&ei=xGJ8Sf5325QHAifCIAg&usg=AF643w5AmoH1CMw2_UJ5753S643A">
...
<A href="http://news.otherurl.com/news/url?sa=T& ... &fd=R&url=http://www.blah.com/wp-dyn/content/arti ... 01062.html&cid=1243573618&ei=xGJ2Sf5325QHAifCIAg&usg=AF643w5AmoH1CMw2_UJ5753S643A">
...
and similar URLs embedded all throughout the content.
What i want is to replace each occurence of such string throughout the whole page with the url which is in red, the part after the url=.
<A href="http://news.otherurl.com/news/url?sa=T& ... &fd=R&url=http://www.blah.com/wp-dyn/content/arti ... 01062.html&cid=1243573618&ei=xGJ2Sf5325QHAifCIAg&usg=AF643w5AmoH1CMw2_UJ5753S643A">
becomes ---->
<A href="http://www.blah.com/wp-dyn/content/arti ... 62.html"[b]>[/b]
So..i would need a regexp which finds links in the site/string, parses it, extracts the "url=xxxxxxxxx part and replaces each link with the red part, the rest of the link is not of interest.
Help very appreciated!