Page 1 of 1

How to follow meta refresh or other redirect tags

Posted: Fri Apr 25, 2008 8:27 pm
by php_ghost
Hi guys,
I have a script that I want to use to strip different tags on a webpage and im using the file function in php to do this. however im having problems with webpages that redirects. How will I know the new page that I have to follow?

TIA,
Arch :D

Re: How to follow meta refresh or other redirect tags

Posted: Tue Apr 29, 2008 4:11 pm
by kendall
php_ghost wrote:Hi guys,
I have a script that I want to use to strip different tags on a webpage and im using the file function in php to do this. however im having problems with webpages that redirects. How will I know the new page that I have to follow?

TIA,
Arch :D
I'm not sure if i will be correct here but i dont think you can unless there are links (HTML) that you can capture the url... :dubious:

Re: How to follow meta refresh or other redirect tags

Posted: Wed Apr 30, 2008 11:32 am
by Kieran Huggins
This smells of bad design... what problem are you trying to solve?

Re: How to follow meta refresh or other redirect tags

Posted: Wed Apr 30, 2008 12:27 pm
by John Cartwright
I've recently solved this issue using cURL and regular expression. I cannot share the code with you for legal reasons, but it involved recursively calling the page with

Code: Select all

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
to follow header redirects, and regular expression to detect meta redirects. If one was found, the process was repeated until we land on the end page.

Re: How to follow meta refresh or other redirect tags

Posted: Thu May 01, 2008 6:49 pm
by php_ghost
uhmmm actually I'm trying to recreate something like phonifier (check out phonifier.com). it actually is opensource but I just can't figure out which part does the redirect stuff. so I'm asking maybe someone can enlighten me with that specific part.

Thanks a lot. :D

Re: How to follow meta refresh or other redirect tags

Posted: Thu May 01, 2008 6:58 pm
by php_ghost
Btw, here's the problem I'm trying to solve. I rewrite the links/images of the page basing from the URL. So If I open for example http://www.website.com/index.html and an image on that page is located at "images/logo.jpg" I will rewrite the src to "http://www.website.com/images/logo.jpg". But if http://www.website.com/index.html redirects me to http://www.website.com/main/index.html and an image is located at "images/logo.jpg" my script will still rewrite it as "http://www.website.com/images/logo.jpg" because it is the initial URL the script has captured and it doesn't know that it was already redirected to http://www.website.com/main/index.html and should have rewritten the code to "http://website.com/main/images/logo.jpg".

here's how I use my script btw. http://www.domain.com/myscript.php?url= ... ebsite.com