saving sites with question marks and other symbols

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
jaymoore_299
Forum Contributor
Posts: 128
Joined: Wed May 11, 2005 6:40 pm
Contact:

saving sites with question marks and other symbols

Post by jaymoore_299 »

I am trying to develop a simple code to display page within pages. However, I am having trouble with sites that have special symbols like ? and = . Here is the code I use.

Code: Select all

<?php
ob_start();
include('http://www.somesite.com/index.php?id=1'); 
$inc = ob_get_contents(); 
ob_end_clean(); 
print $inc;
?>
When I do it with this site, all the images come out with red x's on them and their source is incorrect.
jaymoore_299
Forum Contributor
Posts: 128
Joined: Wed May 11, 2005 6:40 pm
Contact:

Post by jaymoore_299 »

I have a different problem in addition to the above. For the first problem, if I had any characters like = or ?, the url's output couldn't be saved. But even in the case when it can be saved, some of the pages have images with the wrong source and so they are not displayed.

The problem is that in the page source, the images have relative urls like this.

<IMG SRC="/images/

the page with relative urls has to be processed before it is displayed. Is there any way to get the processed version of the page with only absolute urls in it? Or is there a php script that checks for relative urls and changes them to absolute ones?

I have no creative control over the pages I seek to include so I can't personally go in and change the urls to absolute myself.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

images are transmitted in seperate streams. The browser must resolve them in some fashion. You'll need to inject html or alter the html the page outputs to get that.

Using include() is extremely dangerous, as any PHP in the output will be executed by your server. file_get_contents() is often preferred, however cURL can be used to send full user-agent headers, among other things. Even making POST submissions.
Post Reply