Robots and cookies

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
flycast
Forum Commoner
Posts: 37
Joined: Wed Jun 01, 2005 7:33 pm

Robots and cookies

Post by flycast »

My site requires that a person goes and selects the state they reside in and stores it in a cookie. The problem is when there is the paranoid user that blocks cookies or robots that cannot select the state.

Has anyone run against this before? What is best practice?

My current thinking was to check for common robots in the $_SERVER['HTTP_USER_AGENT'] string and allow robots to index without redirecting to the entry form.

Any other ideas? Surely someone has solved this before.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Better yet, have your not not rely on cookies, although you can use them to fall back on.

http://php.net/session might be of interest
flycast
Forum Commoner
Posts: 37
Joined: Wed Jun 01, 2005 7:33 pm

Post by flycast »

Yes. I have been looking at that. Here is the downside of sessions:

You need a strategy for dealing with the transmission of the session id. Either the user has to accept a cookie with the session id in it or you need to append the session id to the end of the url. If you place the session id at the end of the url it is said that Google does not like that; they believe that the site is serving different content. Also your session id's that were served to Google show up in the index links.

Is there a third way to use sessions?
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Are you saying that you refuse to use sessions because of a rumor you heard about Google? How many websites that use sessions do you know of that Google refuses to index?
flycast
Forum Commoner
Posts: 37
Joined: Wed Jun 01, 2005 7:33 pm

Post by flycast »

I can't say that I see a lot of sessions id's appended to the url. I enable cookies and use AdBlock to keep the ad trackers out. Agreed. There are a lot of rumors about Google and their black box. Some of them are pretty reasonable and some of them are pure wild conjecture from some pretty logic and common sense challenged people.

In this case the customer is very concerned that they not lose search engine standings when they upgrade to their new CMS site.

I have seen session id's in url's on Google in the past. It seems reasonable that session id's could interfere with the Google algorithm. I am just choosing a conservative approach. Anyway, putting sessions id's on the urls would be a hassle since it would take some kind of custom logic at every url (a function would do the trick) and session id's make for some very, very ugly url's.

Another reason is that when I look at session id's I get a lot of warning that it is not a good idea because of injection attacks and session hijacking. Both of these are currently a low priority possibility that anything would happen that would be bad on this site but again - being conservative.

Anyway, I think we are getting off topic. The main question at hand is how to make sure the robots can index the site when I am checking for the presence of a cookie that comes from a form selection on the entry page. The robot will be unable to select a form value that makes sense and will be locked out of the rest of the site. My thought is to check for robots in the http_user_agent and allow them to browse the site (not redirect them because they have not made a choice).

Is there a better way to do this?
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

flycast wrote:Anyway, I think we are getting off topic.
Hardly. You are refusing to take the optimum solution because of a rumor. Their concern with SEO seems to be a very inexperienced one, but your reasoning seems just as inexperienced. Any search engine worth being a part of can easily handle query strings, and the session id is not an exception to the rule. And as for the 'custom logic,' PHP can append the session id to the end of all URLs automatically.
flycast wrote:The main question at hand is how to make sure the robots can index the site when I am checking for the presence of a cookie that comes from a form selection on the entry page. The robot will be unable to select a form value that makes sense and will be locked out of the rest of the site. My thought is to check for robots in the http_user_agent and allow them to browse the site (not redirect them because they have not made a choice).

Is there a better way to do this?
Spam bots complete forms... Search engine bots do not. If you require that a form be filled out, you are essentially blocking all search engine bots from entering your website. Trying to hack it through the user agent is a bad idea because that setting is client-side and can be spoofed.
User avatar
iknownothing
Forum Contributor
Posts: 337
Joined: Sun Dec 17, 2006 11:53 pm
Location: Sunshine Coast, Australia

Post by iknownothing »

I think I know what he's talking about with Google. The variables such as page=whatever etc in the URL, which is how the session will be presented, can cause issues with search engines, however, if you Google for a while, perhaps "GET variable search engine" or similar, you will find there is an easy workaround for you worries.

EDIT: Googled it myself: http://www.zend.com/zend/spotlight/searchengine.php
flycast
Forum Commoner
Posts: 37
Joined: Wed Jun 01, 2005 7:33 pm

Post by flycast »

Trying to hack it through the user agent is a bad idea because that setting is client-side and can be spoofed.
That is why I posted. How do I keep from rejecting Google and the other legit robots when the user will get redirected to an entry page if they have not told us what state they live in?
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

flycast wrote:How do I keep from rejecting Google and the other legit robots when the user will get redirected to an entry page if they have not told us what state they live in?
By not making it a requirement. Not everyone is willing to give up personal information, anyway.
Give a link to skip the step.
Post Reply