Page 1 of 1

problem with atuomatic scraping when links with /.../ comes

Posted: Thu May 14, 2009 6:15 pm
by Mehnaz
HI

I am scraping top ten links from a search engine to grab their contents.

I got the error

" Warning: file_get_contents(http://www.nlm.nih.gov/.../druginfo/nat ... odine.html) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 File not found in C:\wamp\www\websearch\searchdata.php on line 41"


when a url having this /.../ comes in. :? ( for example in this case http://www.nlm.nih.gov/.../druginfo/nat ... odine.html )

I am using file_get_contents() with preg_match_all() for getting titles. and wamp 2.0 with php 5.2.6

Any solution that would be helpful with these types of urls??

Thanks in advance

Mehnaz

Re: problem with atuomatic scraping when links with /.../ comes

Posted: Thu May 14, 2009 7:26 pm
by requinix
Take a look at your post. See how another ... showed up in that link? It's because the forum took your long URL and made it shorter by cutting out a part.

I bet the search tool you're scraping did the same thing. What does this mean?

You're screwed.

Find another way to do what you want. Perhaps that link shows up somewhere else in the result? (Spoiler: it does)