Page 1 of 2

"Accidental" Session ID Collision...

Posted: Fri Jul 29, 2005 4:57 pm
by nielsene
This is a spin off from the thread in General Discussions:
session id
That I think could benefit from increase exposure here.

The initial question was, whats the chance of generating the same session id twice.

If we posit that the session id is derived from a 32 bit integer we have about 4E9 possible session identifiers.

Treating this as a birthday collision problem we have an equation like

Code: Select all

x:=2^32 -1
p(n):=1 - ї x! / ((x-n)!(x^n)]
After a few iterations to reduce this to a form that doesnt't overflow I was using: (in python or 'R'... here shown in psudeo-php)

Code: Select all

$x=pow(2,32)-1;
$temp=1.0;
for ($i=0;$i<1000000;$i++) {
  $temp*=($x-$i);
  $temp/=$x;
  $p=1.0-$temp;
  if (($i %10000)==0) echo "$i --- $p";
}
// Code probably has slightly serious accumulation of
// round-off errors, but general shape and ratio should be
// good enough for proof of concept and general "scale"
Found that at about 65-70K the probability approaches 0.5 and about about 200K the probability is asympototic to 1.

Thus if you have ~65K simultaneous sessions, the session collision chance is extremely high and an attacker could guess an id one time in two.

Its very interesting that the percentage of the population required for p(n)=0.5 decreases with increasing population size. Its about 1/15 in the birthday collision (23/365) but ~15/1M in this case (65K/4B).

The idea about manually checking if you've issued a given session id before issueing it is only a slight help if you have a large number of active sessions. It would stop you from accidentally causing a collision, but won't protect from the ease of a random guesss matching.

Changing to using a pair of 32bit tokeen is of course an option. I've run upto 5,000,000 and its still down in the 1E-07 probability range of collision then.

So in short, for "normal" sites the 32 bit session id should be safe; but I would hope that very large sites take some form of additional protection.

Posted: Fri Jul 29, 2005 5:38 pm
by shiflett
The session identifier should be a 128 bit number, represented as 32 characters (bytes). This makes the likelihood of a collision negligible.

In order to explain this better, I need to use smaller numbers. If a session identifier were represented as 4 characters (instead of 32), it would represent a 16 bit number, not a 4 bit one. For example, f3c2 is
1111001111000010 (unless I made a mistake in my haste).

f = 1111
3 = 0011
c = 1100
2 = 0010

Posted: Fri Jul 29, 2005 5:46 pm
by nielsene
OK, yes I was a little surprised by the assertion that it was a 32 bit value, in the original thread referenced. Give that its a 128 bit with a 160bit option in PHP5, its a moot point.

Posted: Fri Jul 29, 2005 5:52 pm
by shiflett
It's 128 bits. :-)

Posted: Fri Jul 29, 2005 5:55 pm
by nielsene
yup just noticed that and corrected my post while you posted...

Posted: Fri Jul 29, 2005 9:13 pm
by feyd
could always use a certain 256-bit one :wink:

Posted: Fri Jul 29, 2005 9:22 pm
by Ambush Commander
Until quantum computers become feasible.

Of course, quantum computers would render all of our "oh, it's big enough" arguments useless.

Posted: Fri Jul 29, 2005 9:27 pm
by nielsene
While the analyis no longer applies, I do think its quite interesting that the percentage of the available population space before the probability of a collision passes 0.5 decreases rapidly with increasing population size.

Ie, increased protection against accidental/random collisions is less than O(n) in population space.

Posted: Sat Jul 30, 2005 10:52 am
by theda
Why not just do a 1024 bit one? :) < My illogical statement for the day.

Posted: Sat Jul 30, 2005 8:58 pm
by feyd
theda wrote:Why not just do a 1024 bit one? :) < My illogical statement for the day.
The longer the ID, the longer it takes to generate a good random one, not to mention that it requires more and more storage space to keep it around and takes longer to compare. Only under certain circumstances should someone even think of generating such a large number.

Posted: Wed Aug 03, 2005 7:28 pm
by x01
I have never tough of that.

Do you think its accurate, to loop trough sessions, 'till we get one thats not used?

After 10 billion ( 32 bytes ^ 36 chars ) unique visits, there would be no more session ID left. I know that a website needs a lot of visits and fame to achieve that, but imagine we get those hits. What would happen then?

Posted: Wed Aug 03, 2005 7:32 pm
by Ambush Commander
Hope that those sessions have expired by the time the ten billion visitors have come.

Posted: Wed Aug 03, 2005 7:41 pm
by feyd
x01 wrote:I have never tough of that.

Do you think its accurate, to loop trough sessions, 'till we get one thats not used?

After 10 billion ( 32 bytes ^ 36 chars ) unique visits, there would be no more session ID left. I know that a website needs a lot of visits and fame to achieve that, but imagine we get those hits. What would happen then?
you're a bit off on your figures x01.. a session ID is a 128-bit key. That's 340,282,366,920,938,463,463,374,607,431,768,211,456 unique values.

That's plenty for most applications.. however, because the ID is actually run (at least partly) off a pseudo-random number, it has far more chances of repeating, however.. if the seed is good enough, and the interval between repeats on the random are long enough (the entropy can be made longer with other filters, however they are not entirely accurate), and the decay of the ID's is quick enough, then you have very very very very very little chance of ever generating the same ID within a usage.

Posted: Wed Aug 03, 2005 8:07 pm
by x01
feyd wrote:you're a bit off on your figures x01.. a session ID is a 128-bit key. That's 340,282,366,920,938,463,463,374,607,431,768,211,456 unique values.
Damn Windows calculator. :twisted:

Well, I just wanted to know if:

a) Session ID auto-expire after $X time. If $X > 0, echo $X; :D

b) If I randomly get an existing Session ID, would it be considered by law, has hacking?


On my site, I register the Session ID on the DB, with maximum 1 hour session time, and compare the PHP-Session ID with the DB, so I think that is a solution that is 99.9% secure?

Posted: Wed Aug 03, 2005 8:15 pm
by feyd
trusting anything coming from a user is not secure.. ever. It's all a trade off, requiring someone to send user and password every page they want to perform a registered users only action on can be more secure, but becomes annoying to the users. Storing their credentials in a cookie can help, but can you really trust it? Do you really care? Is the data on the "secured" area that important? Certain things like changing their email address should require resubmission of their password, along with logging of requests for such actions (including flood control)...

This is just the tip of the iceberg, depending on what your actual end action requires..