Sorry to dissapoint you, but it appears that it's the same:kaisellgren wrote:I would like to see images generated with /dev/urandom so can you make one since you are not using Windows, are you?
Here is a reason why not to use rand().
Moderator: General Moderators
Re: Here is a reason why not to use rand().
- Attachments
-
- urandom.png (1.99 KiB) Viewed 886 times
There are 10 types of people in this world, those who understand binary and those who don't
Re: Here is a reason why not to use rand().
Its also of note none of these measure predictability correctly, not even my example, all we're doing is measuring distributions, since none of these examples take order ( and therefore predictability ) into account, we're only demonstrating "net discrete distribution"* ( not sure if there's a proper term )
For instance: Vlad, in sorted.png - you have actually gone through and sorted these, which is not the same as Pearson correlation. Also your axes are unlabeled which confuses me, maybe you can clarify what I am visualizing when I see the rand() producing values in a different range of bounds and not just a different distribution. Without clarifying what it is we're measuring its just confusing to other people, pretty much. I guess that's why I was making a big deal about the OP having a "bug", I think everyone pretty much figured he was using values straight from rand for X and Y for each dot
* Just saw you called this "probability distribution" - is this in fact what your code is measuring? What I am visualizing on each axis in unsorted and sorted, what "distribution" metric are you implementing here exactly? I would have figured draw a 2d vector of randomly produced digits for each Cartesian coordinate, and then calculate the distribution as { total covered dots } / { total available Cartesian "volume" } for a "single probability" for each function ( probability of calling it Nx with 0,N as bounds giving you even distribution between 0 and N ( lower values of probability would imply empty area and therefore repeated pixels - no need to count that 2x ). Even this its still mis-leading because what does this show? Is more even distribution less predictable? In that sense if I observe the first 100/400 dots drawn in the lower left quadrant of the graph, and I had to wager money on where the next dot would be drawn, I would not pick the lower left corner - so is even distribution any "less random" or "predictable". In fact to be truly random should we not measure the values over repeated test runs - and then measure correlations between those "meta" level tests, and test those tests - ad infintum
PS - yeah you might consider this off topic, but its interesting and at least some people find it on topic, so sue us
For instance: Vlad, in sorted.png - you have actually gone through and sorted these, which is not the same as Pearson correlation. Also your axes are unlabeled which confuses me, maybe you can clarify what I am visualizing when I see the rand() producing values in a different range of bounds and not just a different distribution. Without clarifying what it is we're measuring its just confusing to other people, pretty much. I guess that's why I was making a big deal about the OP having a "bug", I think everyone pretty much figured he was using values straight from rand for X and Y for each dot
* Just saw you called this "probability distribution" - is this in fact what your code is measuring? What I am visualizing on each axis in unsorted and sorted, what "distribution" metric are you implementing here exactly? I would have figured draw a 2d vector of randomly produced digits for each Cartesian coordinate, and then calculate the distribution as { total covered dots } / { total available Cartesian "volume" } for a "single probability" for each function ( probability of calling it Nx with 0,N as bounds giving you even distribution between 0 and N ( lower values of probability would imply empty area and therefore repeated pixels - no need to count that 2x ). Even this its still mis-leading because what does this show? Is more even distribution less predictable? In that sense if I observe the first 100/400 dots drawn in the lower left quadrant of the graph, and I had to wager money on where the next dot would be drawn, I would not pick the lower left corner - so is even distribution any "less random" or "predictable". In fact to be truly random should we not measure the values over repeated test runs - and then measure correlations between those "meta" level tests, and test those tests - ad infintum
PS - yeah you might consider this off topic, but its interesting and at least some people find it on topic, so sue us
Re: Here is a reason why not to use rand().
It's a histogram (normalized) of a 1D signal:
X axis - all of the possible values of X(i);
Y axis - the count each value appears in a given signal series;
From http://en.wikipedia.org/wiki/Probabilit ... y_function:
PS: I sort the histogram, so one can see the variance.
X axis - all of the possible values of X(i);
Y axis - the count each value appears in a given signal series;
From http://en.wikipedia.org/wiki/Probabilit ... y_function:
So, we have a histogram, which give us the probability density function, which in turn give us the entropy, which in turn give us the randomness of the signalInformally, a probability density function can be seen as a "smoothed out" version of a histogram
PS: I sort the histogram, so one can see the variance.
There are 10 types of people in this world, those who understand binary and those who don't
Re: Here is a reason why not to use rand().
Exactly - that's why I expected the PDF to be a straigh linejshpro2 wrote:* Just saw you called this "probability distribution" - is this in fact what your code is measuring? What I am visualizing on each axis in unsorted and sorted, what "distribution" metric are you implementing here exactly? I would have figured draw a 2d vector of randomly produced digits for each Cartesian coordinate, and then calculate the distribution as { total covered dots } / { total available Cartesian "volume" } for a "single probability" for each function ( probability of calling it Nx with 0,N as bounds giving you even distribution between 0 and N ( lower values of probability would imply empty area and therefore repeated pixels - no need to count that 2x ). Even this its still mis-leading because what does this show? Is more even distribution less predictable? In that sense if I observe the first 100/400 dots drawn in the lower left quadrant of the graph, and I had to wager money on where the next dot would be drawn, I would not pick the lower left corner - so is even distribution any "less random" or "predictable". In fact to be truly random should we not measure the values over repeated test runs - and then measure correlations between those "meta" level tests, and test those tests - ad infintum![]()
And measuring the randomness of (mt)rand() in 2D space will not give different result than the one in a 1D space - but the last one is easier to implement/check.
There are 10 types of people in this world, those who understand binary and those who don't
Re: Here is a reason why not to use rand().
I did another measurement:
Basically, I've increased the number of rand() calls.
So, you can see that it's close enough to the "ideal" straight line
and one should expect that in infinity it will be so 
Code: Select all
<?php
set_time_limit(120);
define('M', 500);
define('U', 100000);
function hystogram($f, $coef = 1)
{
$a = array();
for ($i=0; $i < U*$coef; $i++)
{
$c = call_user_func($f, $i);
if (!isset($a[$c]))
$a[$c] = 1;
else
$a[$c] ++;
}
$norm = 0;
for ($i=0; $i < M; $i++)
{
if (!isset($a[$i]))
$a[$i] = 0;
if ($norm < $a[$i])
$norm = $a[$i];
}
for ($i=0; $i < M; $i++)
$a[$i] /= $norm;
sort($a);
return $a;
}
function _rand($x)
{
return rand(0, M);
}
function _mt_rand($x)
{
return mt_rand(0, M);
}
header('Content-Type: image/png');
$im = imagecreatetruecolor(M, M + 20);
$r = hystogram('_rand');
$r2 = hystogram('_rand', 100);
for ($i=0; $i < M; $i++)
{
imagesetpixel( $im, $i, M - M*$r[$i] + 10, imagecolorallocate( $im, 255,255,255));
imagesetpixel( $im, $i, M - M*$r2[$i] + 10, imagecolorallocate( $im, 255,0,0));
}
imagepng($im);
imagedestroy($im);
?>So, you can see that it's close enough to the "ideal" straight line
- Attachments
-
- infinity.png (967 Bytes) Viewed 824 times
There are 10 types of people in this world, those who understand binary and those who don't
Re: Here is a reason why not to use rand().
A little offtopic:
man mt_srand:
man mt_srand:
It's insane! I always rely on that a identical seed will give me an identical sequence!Since 5.2.1 The Mersenne Twister implementation in PHP now uses a new seeding algorithm by Richard Wagner. Identical seeds no longer produce the same sequence of values they did in previous versions. This behavior is not expected to change again, but it is considered unsafe to rely upon it nonetheless.
Last edited by VladSun on Fri Jan 02, 2009 7:03 pm, edited 1 time in total.
There are 10 types of people in this world, those who understand binary and those who don't
Re: Here is a reason why not to use rand().
In fact, it's the opposite in these graphicsjshpro2 wrote:From this we can conclude mt_rand is less predictable when compared with rand, thats about it.
Again, it's the oposite:jshpro2 wrote:As a rule of thumb I would use mt_rand, if you're making repeated calls to mt_rand that is slowing down performance then thats indication it's time to downgrade to rand.
http://bg2.php.net/manual/tr/function.mt-rand.php
PS: maybe these "four times faster" lead to less randomnessIt uses a random number generator with known characteristics using the » Mersenne Twister, which will produce random numbers four times faster than what the average libc rand() provides.
Last edited by VladSun on Sat Jan 03, 2009 6:15 pm, edited 1 time in total.
There are 10 types of people in this world, those who understand binary and those who don't
Re: Here is a reason why not to use rand().
Its worth noting the "pattern" when "more random" ( less predictable ) data is needed, you basically seed it with human generated input ( aka putty SSH key generation asks you to "wiggle" your mouse in a box for 15 seconds while it samples 'random' coordinates. Thanks for elaborating on how we can infer meaning from these charts ( its also worth reiterating for anyone still following along, that this is "only" one way to measure 'randomness', a very useful way nonetheless ). I just don't want people assuming the x and y values in these charts themselves are "randomly" generated ( like your highschool teacher always said "always label your axies" lol )

Interesting, thanks for correcting me. One would have assumed it would be the other way aroundVladSun wrote:Again, it's the oposite: