How IE and FF understand URLs?
Posted: Fri Jul 25, 2008 10:14 am
I'm writing my simple parser and everything is ready, but I cannot understand the following thing:
Lets say I have two types of urls:
1. http://somesite.com/somefolder
2. http://somesite.com/somefile.php
If you type the first url which leads to some folder without final slash, the browser WILL ADD the trailing slash automatically. How they understand that they should add it?
But if you type the second url which leads to a file browsers don't add the trailing slash. How they understand it?
So the first URL will look in browsers like:
1. http://somesite.com/somefolder/
2. http://somesite.com/somefile.php
I thought that browsers look at some extensions, e.g. php. But lets swap it and add ".php" to folder's name and remove .php from the file name. Will browsers add the trailing slash correctly?
1. http://somesite.com/somefolder.with.fake.ext.php
2. http://somesite.com/some.file.with.no.ext
The answer: yes, they will! Why? How do they understand it? Anyone knows?
I extremely need to understand this because when I parse URLs in my own script I need to work with contents of downloaded pages so all contents inside should have proper full-urls. That's why it is important if we have trailing slash or not.
If anyone understands what I have described here, please help, thanks!
Lets say I have two types of urls:
1. http://somesite.com/somefolder
2. http://somesite.com/somefile.php
If you type the first url which leads to some folder without final slash, the browser WILL ADD the trailing slash automatically. How they understand that they should add it?
But if you type the second url which leads to a file browsers don't add the trailing slash. How they understand it?
So the first URL will look in browsers like:
1. http://somesite.com/somefolder/
2. http://somesite.com/somefile.php
I thought that browsers look at some extensions, e.g. php. But lets swap it and add ".php" to folder's name and remove .php from the file name. Will browsers add the trailing slash correctly?
1. http://somesite.com/somefolder.with.fake.ext.php
2. http://somesite.com/some.file.with.no.ext
The answer: yes, they will! Why? How do they understand it? Anyone knows?
I extremely need to understand this because when I parse URLs in my own script I need to work with contents of downloaded pages so all contents inside should have proper full-urls. That's why it is important if we have trailing slash or not.
If anyone understands what I have described here, please help, thanks!