Serialize post and store it for later

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

User avatar
julian_lp
Forum Contributor
Posts: 121
Joined: Sun Jul 09, 2006 1:00 am
Location: la plata - argentina

Post by julian_lp »

astions wrote:FYI: Google will not index any pages containing a variable named "id" in the url.
Can I know why? Give me a link if you've one please
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Post by Luke »

my guess: generally a url with an "id" in it will have something to do with modifying the database, and google does not want to index a page that may change the site's database
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

So you're saying/you think that search engines would index messy urls as well as simpler ones?
Nope, I'm saying I don't know. But it would harm their to discriminate by URL. That's why I asked for evidence.
the reason i don't want to use GET is because I never want to turn anyone away from my website for any reason, thats stupid to do. Thats why I have to make sure that all the data can be submitted to the server and if its too long (which it probably would be if I used GET) then thats a client gone.
Well that's your choice. Personally I don't know what kind of person would be turned away from a website because the search form uses GET, that person wouldn't use any of the major search engines, and you said it yourself:
GET would allow people to link someone else to the search.
I will never force javascript on a user either, I don't browse with javascript and I think its poor web design to be forced to use javascript. Many websites do this and the only reason it seams is because they want to serve me ads and track what I do, screw that.
At the moment it is poor design to force JavaScript. In the future it may well be an acceptable requirement of browsing the web. But you are wrong if you think the only uses for JavaScript are ads and mining. A lot of the time it enhances usability, that and scripted submission is soley what I use it for. We are on the brink of mainstream Web 2.0 and JavaScript is a big part of that.

However, yes, a lot of sites use it and flash to make things that are really annoying, so I think it is wise to have the facility to turn it on and off at will. But I would never browse with it off perminently. I think you may also be prejusticing JS. Have you written any? Do you even know what it is capable of? I certainly didn't at first and only after weeks of full-time learning did I come to appreicate it as an excellent language that is better than PHP 4 (almost as good as PHP 5).
may have just started a flame war there :D
Thats why I am going with POST. my design works oh so well because I can use links for pagentation and keep the search option select boxes with the right stuff selected.
You probably know this but you can still do that with GET.
Looking at that form, in your case I think it is a close call between POST and GET because you have so many fields. I would probably still use GET but POST is satisfactory in this case.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

The Ninja Space Goat wrote:my guess: generally a url with an "id" in it will have something to do with modifying the database, and google does not want to index a page that may change the site's database
No, because only POST data should change something in a database. A variable called 'id' in a GET query string is GET-ting something. Google aren't indexing it because they don't want to index every single page in every web application (forums, shops etc.) because this would clutter their own databases esp. when web applications frequently have their own search capabilities.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

julian_lp wrote:Can I know why? Give me a link if you've one please
While looking for the link to the google page which stated this, I noticed that there are pages indexed with id in the url. I could have sworn that it was id but it might be something else they don't index like sid or userid or something. Sorry I'm not able to find it again.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

While looking for the link to the google page which stated this, I noticed that there are pages indexed with id in the url. I could have sworn that it was id but it might be something else they don't index like sid or userid or something. Sorry I'm not able to find it again.
lol :lol:

Just as I suspected, its all hearsay
xD
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

ole wrote:
While looking for the link to the google page which stated this, I noticed that there are pages indexed with id in the url. I could have sworn that it was id but it might be something else they don't index like sid or userid or something. Sorry I'm not able to find it again.
lol :lol:

Just as I suspected, its all hearsay
xD
No, they are blocking something, I just can't remember what.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

Here..
Google definitely doesn't index pages containing "&id=" as a parameter in your URLs.
http://stason.org/articles/money/seo/go ... _rank.html

I remember reading this on the actual google site a long time ago. I don't know if this site is credible or not but it's on track with what I remember.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

No, they are blocking something, I just can't remember what.
Yeah I was just joking. I'm sure they would.

I think it is pretty important to know exactly what they block and what don't. Anyway I've emailed the author of that article to find out where he got that information from and if it is still relevent. Sooooo time will tell :)
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Let the rumours and myths be dispelled from this great forum of souls!
Stason wrote: I don't remember all the source now. But a lot of it came from Google
webmaster guidelines. In particular:
http://www.google.com/support/webmaster ... c=0&type=f
http://www.google.com/support/webmaster ... c=0&type=f

I hope this helps
On those links you can find:
Google wrote:Yes, Google indexes dynamically generated webpages, including .asp pages, .php pages, and pages with question marks in their URLs. However, these pages can cause problems for our crawler and may be ignored. If you're concerned that your dynamically generated pages are being ignored, you may want to consider creating static copies of these pages for our crawler. If you do this, please be sure to include a robots.txt file that disallows the dynamic pages in order to ensure that these pages aren't seen as having duplicate content.
Google wrote:If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.
Google wrote:Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.
Google wrote:Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index.
So what do we know?
Well not that much. We know what query strings "can cause problems" and the recommendation is to keep them to a minimum. Presumibly there is something particular about them that when used in a certain way will break the Google Crawler, but they aren't going to tell us what that is because it might indicate how it works, which obviously is a secret and also that knowledge could be abused.

But do you want the results of a search engine to be crawled?
No.
Is it possible for them to be crawled?
No. Why? Because they actually require user input

In which case SEO is no longer an issue when choosing between POST and GET for a search form method, in which case I would still recommend GET. Why you ask?
  • GET because a search is GET-ting data,
  • GET because it doesn't cause the page to expire,
  • GET because you are able to save the search as a bookmark or give it to someone else,
  • GET because it can handle up to 4096 chars of data,
  • GET because nobody, realisticly, will be scared off by a long query string resulting from a search request
  • GET because that is what the W3C would recommend,
  • GET because you don't have to bother writing a class to serialize POST and store it in a session for later
  • GET because you know it makes sense!
rrrrmmmmgggggggGGHH!
(strained inhalation)


Suck on that.
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Post by Luke »

^interesting fella
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

now I am very very suborn, and it takes a lot of fighting to get me to stray from what i believe but...

ole has a very good point. It's so true. The only problem I have is that my site is going to have many more options for the search as time goes on. What I am thinking about doing is the same thing I am doing now but with GET so the base64 encoded stuff goes to GET and gets decoded and whatnot. that way the query string wont as long, you can still link someone to a search, and it will be more difficult for your average idiot script kiddy to start trying sql injections (save my bandwidth).

Good? Yes no?
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

base64 encoded GET seems good to me. But what worries you so much about normal users having the ability to change those? Because you should be validating/filtering/escaping the data already in case somebody a little more 'pro' gets at them. If someone more pro does a semantic URL attack your base64 encoded query string ain't gonna do smurf.
shiznatix wrote:now I am very very suborn, and it takes a lot of fighting to get me to stray from what i believe but...

ole has a very good point
My work here is done. xD
User avatar
shoebappa
Forum Contributor
Posts: 158
Joined: Mon Jul 11, 2005 9:14 pm
Location: Norfolk, VA

Post by shoebappa »

Damnit, I just wrote an app that uses &id= and one of the concerns was google finding the pages : (

Maybe it's just time for a little mod_rewrite action, but I mean &id= is logical cause that's prolly what you're passing to pull content out of a database... Seems silly to me to ignore those pages.
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

Problem.

When using GET my query string is incredibly long and will only grow. Nowhere near the 4*** that it can accept but its ugly as sin. I tried it with base64_encode and whatnot but that just makes it longer.

How can I shorten it so users could give someone a link to a search without it taking a whole page to send it?
Post Reply