Page 2 of 3
Posted: Sun Jul 01, 2007 4:44 pm
by alex.barylski
d11wtq wrote:I personally would use fread() iteratively so you only pull a small amount of data from the file into memory at any one time.
I'm not sure I see the benefit to that approach?
Are you trying to avoid taxing the systems memory limit by iteratively consuming memory instead of all at once?
I can see a few draw backs to that approach, but most importantly is memory segmentation or allocation of non-contiguous memory blocks. There are times when you might want to do that, like when reading a massive file which you may conditionally stop reading half way through based on some factor. But when dealing with image files like OP has suggested. I think it's best to load into memory right off the hop.
Either using get_file_contents or fread with a filesize.
Memory mapping in this case would make additional sense as the file is being requested a lot, so the additional over head in using system file operations like fread, fwrite would likely cause more problems.
Using system functions, this is how a file is typically read into memory:
1) Process reads file
2) Data moved from disk to buffers in Kernel address space
3) Memory pages are then copied into processes user space
That switch from user mode to kernel mode 'may' cause a context switch which is something you really want to avoid, especially if it's happening many times like in the case of an image request.
So personally, I'd say file_get_contents() is the better choice...but ultimately it boils down to personal choice.

Posted: Sun Jul 01, 2007 4:52 pm
by stereofrog
Hockey wrote:Use file_get_contents(). It uses memory mapping and according to the docs is binary safe!
readfile() uses mmap as well, if it's not available (IIRC on windows), readfile copies the source in 8K chunks.
I'm not sure what you mean by "binary-safe" in this context.
Posted: Sun Jul 01, 2007 4:55 pm
by Benjamin
Someone just test it and be done with it.
Posted: Sun Jul 01, 2007 5:31 pm
by Jenk
Any method you choose will involve reading the file to memory anyway.. files don't just magically travel from disk to user.
Posted: Sun Jul 01, 2007 6:34 pm
by alex.barylski
stereofrog wrote:Hockey wrote:Use file_get_contents(). It uses memory mapping and according to the docs is binary safe!
readfile() uses mmap as well, if it's not available (IIRC on windows), readfile copies the source in 8K chunks.
I'm not sure what you mean by "binary-safe" in this context.
I just did a quick grep for readfile() in the latest PHP 5.2.3 and looked at the function. I also looked at get_file_contents.
There is a clear comment and usage of mmap in get_file_contents() but I couldn't find anything similar in readfile? The docs also don't make any mention of readfile using mmap. How did you come to this conclusion?
I thought maybe the streams API used mmap and that is where you were getting that from, but after some quick searches I couldn't find anything abotu streams using memory mapping.
Is this not the source for readfile():
Code: Select all
PHP_FUNCTION(readfile)
{
char *filename;
int size = 0;
int filename_len;
zend_bool use_include_path = 0;
zval *zcontext = NULL;
php_stream *stream;
php_stream_context *context = NULL;
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s|br!", &filename, &filename_len, &use_include_path, &zcontext) == FAILURE) {
RETURN_FALSE;
}
context = php_stream_context_from_zval(zcontext, 0);
stream = php_stream_open_wrapper_ex(filename, "rb", (use_include_path ? USE_PATH : 0) | ENFORCE_SAFE_MODE | REPORT_ERRORS, NULL, context);
if (stream) {
size = php_stream_passthru(stream);
php_stream_close(stream);
RETURN_LONG(size);
}
RETURN_FALSE;
}
I see no mention or indication mmap is used.

Unless i'm looking at the wrong function or something...
If streams by default use mmap, why would file_get_contents() explicitly make use of mmap function(php_stream_copy_to_mem) if it's already handled by the streams API???
You don't know what I mean by by binary-safe under this context? I must admit to being confused by that statement. What does it mean to you usually when an operation is binary safe?
astions wrote:Someone just test it and be done with it.
Haha....that would be easiest although not always accurate as tests are easily skewed. Besides this is more interesting.

I never bother learning about PHP internals except in instances like this where I have a hunch, think it's correct and then try and prove it is or learn otherwise.
Jenk wrote:Any method you choose will involve reading the file to memory anyway.. files don't just magically travel from disk to user.
Heh? I never said they did. I did say, to the best of my understanding, that memory mapping makes the process slightly faster by avoiding extra steps. I believe the step which is alleviated (because file_get_contents still needs to read the disk) is the fact that the data is read from the file but instead of being mapped into kernel space, then copied into user space, the operation directly copies the data from file into user space.
Cheers

Posted: Sun Jul 01, 2007 8:19 pm
by Jenk
I didn't say you (nor anyone else) did. I was merely making the point that it is insignificant which ever you choose.
Posted: Sun Jul 01, 2007 8:38 pm
by alex.barylski
Jenk wrote:I didn't say you (nor anyone else) did. I was merely making the point that it is insignificant which ever you choose.
But it is significant, why don't you read up on memory mapping versus standard system calls and explain to me why it's not? It's insignificant if execution speed is of no concern to you, but scottayy clearly requested the most efficient approach, not the most obvious.
Posted: Sun Jul 01, 2007 8:59 pm
by Jenk
simple testing:
Code: Select all
<?php
$file = 'image.jpeg'; // 1.5meg image file.
$start = microtime();
readfile($file);
echo "TIME: " . microtime() - $start;
?>
'broken' but irrelevant. Average time over 30 attempts: 0.000427.
Code: Select all
<?php
$file = 'image.jpeg'; // 1.5meg image file.
$start = microtime();
echo file_get_contents($file);
echo "TIME: " . microtime() - $start;
?>
Average time over 30 attempts: 0.000394.
.. insignificant.
Posted: Mon Jul 02, 2007 12:29 am
by alex.barylski
Jenk wrote:simple testing:
Code: Select all
<?php
$file = 'image.jpeg'; // 1.5meg image file.
$start = microtime();
readfile($file);
echo "TIME: " . microtime() - $start;
?>
'broken' but irrelevant. Average time over 30 attempts: 0.000427.
Code: Select all
<?php
$file = 'image.jpeg'; // 1.5meg image file.
$start = microtime();
echo file_get_contents($file);
echo "TIME: " . microtime() - $start;
?>
Average time over 30 attempts: 0.000394.
.. insignificant.
Profiling in PHP is not exactly what one could call accurate. It only gives you a very rough idea
file_get_contents returns a string whereas
readfile writes straight to output buffer (ie: screen). To get the accurate reading as to whether the code is "insignificant" you would need to drop some instructions into the actual functions.
For instance, considering the fact that your profiling file_get_contents() as well as the echo statement isn't actually accurate. Ideally when you profile, you cut out as many outside forces as you can, then profile. What you have done is calculated the time it would take to carry out the task in general, the more you abstract the further off the accuracy of the results (so although you know the entire execution is more or less the same; you have no idea if it's actually file_get_contents or not). The general idea of profiling is to determine where a bottleneck is occuring, you don't know whether it's file_get_contents making it close to readfile or if it's the echo'ing of data to the screen. That is like weighing two people who are 500 lbs, but one of them has a backpack on which weighs 300.
Your technique is OK for getting a general idea as to which is *ultimately* faster but you have no validity in saying the two functions calls in terms of speed are insignificant. If it was like that, why would the "docs" by the very people who develop PHP suggest using
file_get_contents()???
That fact alone makes me wonder if your just arguing to spite me.
You even did the test.
file_get_contents = 0.000394
readfile = 0.000427
And your still crying "insignificant". When your own bloody tests prove I was right. Perhaps the difference is minimal but are you going to use the knowingly SLOWER function just to show me how not smart you are???
So please, enlighten me...what is it your trying to prove???
I said from the get go, OP should use file_get_contents because the docs recommend it. You claim "insignificant" and then do the tests and prove me correct (as I never said the difference would be astonishing but when your a hack like me, every nano-second/clock cycle squeezed is a bonus). So I ask, as a professional developer are you now going to use the knowingly slower function?
By only highlighting the fact the difference is dismal it's like your trying to make the decision seem insignificant, which is just bad practice on behalf of a professional developer. Like not writing standards compliant code, writing purposely inefficient code is equally as bad.
Cheers

Posted: Mon Jul 02, 2007 12:37 am
by Benjamin
Hockey it's all good man, don't take things so personally.
How about you write a test case for it if you don't like that one?
Posted: Mon Jul 02, 2007 1:32 am
by alex.barylski
astions wrote:Hockey it's all good man, don't take things so personally.
How about you write a test case for it if you don't like that one?
Naw man, I'm not taking anything personally. Hakuna Matata

Posted: Mon Jul 02, 2007 2:52 am
by dbevfat
Hockey: the documentation doesn't suggest to prefer file_get_contents() to readfile() in general but only in one case, quote:
Note: If you just want to get the contents of a file into a string, use file_get_contents() as it has much better performance than the code above.
If you're trying to just pass the file data to the client with minimum overhead, readfile() could prove better, because it doesn't allocate memory for return string. I'm not saying that it is, just that it
could be under some circumstances, say if the server is short on RAM and the files are big. Jenk's test didn't measure that.
In that perspective, Jenk's point was that by serving 1.5MB file, the difference was negligible, as both results were well under a millisecond. I agree with him on that, both are fast enough to not prefer one over the other. That is until you reach the theoretical limit of serving, and by then you'll probably have a much bigger bottleneck elsewhere. So, yes, it's quite insignificant.
Last but not least, do you mind taking a bit more care with your spelling? I've noticed you use a lot of "your" instead of "you're" and you're quite keen on question marks. The former makes your posts harder to read and the latter just shows your impatience.

Posted: Mon Jul 02, 2007 3:16 am
by stereofrog
Hockey wrote:
I see no mention or indication mmap is used.

Unless i'm looking at the wrong function or something...
If you study the code carefully, you may notice that PHP_FUNCTION(readfile) is just a wrapper and the real job is done by php_stream_passthru(). It might be interesting for you to look at it too.
Posted: Mon Jul 02, 2007 8:02 am
by Chris Corbyn
Hockey wrote:d11wtq wrote:I personally would use fread() iteratively so you only pull a small amount of data from the file into memory at any one time.
I'm not sure I see the benefit to that approach?
Are you trying to avoid taxing the systems memory limit by iteratively consuming memory instead of all at once?
But you won't take the entire file into memory if you output each chunk as you see it rather than collecting it into a string first:
Code: Select all
<?php
$fp = fopen("file.txt", "rb");
while (false !== $bytes = fread($fp, 8192))
{
echo $bytes;
flush();
}
fclose($fp);
Posted: Mon Jul 02, 2007 1:04 pm
by alex.barylski
stereofrog wrote:Hockey wrote:
I see no mention or indication mmap is used.

Unless i'm looking at the wrong function or something...
If you study the code carefully, you may notice that PHP_FUNCTION(readfile) is just a wrapper and the real job is done by php_stream_passthru(). It might be interesting for you to look at it too.
You are correct, after some more searching it does appear as though the streams API implements mmap if available, under *nix anyways.
So if the developers of PHP opted to use mmap if available at every file level, does that confirm my point? In that, memory mapping is typically a faster operation?
bdevfat wrote:Last but not least, do you mind taking a bit more care with your spelling? I've noticed you use a lot of "your" instead of "you're" and you're quite keen on question marks. The former makes your posts harder to read and the latter just shows your impatience. Wink
I really don't have time to spell check on a forum where I am not likely to be hired by another member of the community, sorry.

If I seem impatient, it's because I AM.
d11wtq wrote:But you won't take the entire file into memory if you output each chunk as you see it rather than collecting it into a string first:
Ok, now I follow you.
Admittedly, I disagree with that approach *only* because I once read some old C source I was working on, via a comment, that: "Reading the contents of the file right off the hop is more efficient because of minimized memory allocations, fragmentations, etc."
For that reason, I've always just read the entire file into memory ASAP unless there was significant reasons not to. The only time I've seen loading like you have demonstrated was in socket implementations, like FTP servers, etc.