Help validating submitted URLs
Posted: Thu May 21, 2009 10:29 pm
I am creating a submissions app for a site called Tumblr. The basic premise is that the user submits the url of a Tumblr Blog, and then someone can go in and look at the submissions and either post the submissions or delete them.
The part I am stuck on is the validating the URL. I have already gotten past validating that the URL is an actual URL, but I would like to check that the submitted URL actually points to a blog on Tumblr.
A typical tumblr URL looks like this: http://sometext.tumblr.com/. I have already written some code that checks this URLs like this one and returns a true or false if the blog exists, so I'm good there.
But some people choose to have a domain name point to their Tumblr blog. These can look like almost anything. http://blog.name.com/ or http://blogname.info/ and on and on and on.
If a url like this is submitted, the function always returns true, weather the url points to a tumblr or not.
Does anybody know how I can validate these urls?
One thing that might be helpful is that a blog hosted on Tumblr's server would return JSON output, and I was thinking I might somehow check whether the output of the URL call is JSON or not.
Another thing is the fact that I am getting 404 Not Found output on urls that aren't official Tumblr blogs. Is there a way capture/check for 404 error and do something with that?
Here's my basic code with an idea of what I am trying to do:
Thanks in advance for taking a look at this problem.
The part I am stuck on is the validating the URL. I have already gotten past validating that the URL is an actual URL, but I would like to check that the submitted URL actually points to a blog on Tumblr.
A typical tumblr URL looks like this: http://sometext.tumblr.com/. I have already written some code that checks this URLs like this one and returns a true or false if the blog exists, so I'm good there.
But some people choose to have a domain name point to their Tumblr blog. These can look like almost anything. http://blog.name.com/ or http://blogname.info/ and on and on and on.
If a url like this is submitted, the function always returns true, weather the url points to a tumblr or not.
Does anybody know how I can validate these urls?
One thing that might be helpful is that a blog hosted on Tumblr's server would return JSON output, and I was thinking I might somehow check whether the output of the URL call is JSON or not.
Another thing is the fact that I am getting 404 Not Found output on urls that aren't official Tumblr blogs. Is there a way capture/check for 404 error and do something with that?
Here's my basic code with an idea of what I am trying to do:
Code: Select all
$url = 'http://url.com';
$c = curl_init($url);
curl_setopt($c,CURLOPT_HEADER,1);
curl_setopt($c,CURLOPT_RETURNTRANSFER,1);
$output = curl_exec($c);
// Tumblr returns the following if the URL points to Tumblr's server
// but is not a registered Tumblr blog.
$check = preg_match("/We couldnt find the page you were looking for./", $output);
if($check) {
$this->error = 'Sorry...';
return false;
}
// Here id like to check $output for either a 404 Not Found
// or JSON output (if my thinking is correct). Any ideas?