Page 3 of 3

Posted: Mon Jul 02, 2007 2:42 pm
by Chris Corbyn
Hockey wrote:For that reason, I've always just read the entire file into memory ASAP unless there was significant reasons not to. The only time I've seen loading like you have demonstrated was in socket implementations, like FTP servers, etc.
I'm not sure memory fragmentation has any bearing on PHP. I any case, streaming data like that is standard practise in all languages I've used supporting IO functions. It's just filestreaming.

I can relate to this with Swift Mailer. It would not be possible to send large attachments without filestreaming. Not only do I stream data through a socket when I come to send it, but I also stream to disk when I encode it.

How would you tackle the problem of base 64 encoding a 100MB file in PHP? Like this:

Code: Select all

$inStream = fopen("to-encode.file", "rb");
$outStream = fopen("encoded.file", "wb");
while (false !== $bytes = fread($inStream, 8190)) //Must be divisible by 3 for base64
{
  fwrite($outStream, base64_encode($bytes));
}
fclose($inStream);
fclose($outStream);
You certainly would not want to read the entire file then base 64 encode it purely to avoid "memory fragmentation" ;)

Posted: Mon Jul 02, 2007 5:57 pm
by alex.barylski
d11wtq wrote:
Hockey wrote:For that reason, I've always just read the entire file into memory ASAP unless there was significant reasons not to. The only time I've seen loading like you have demonstrated was in socket implementations, like FTP servers, etc.
I'm not sure memory fragmentation has any bearing on PHP. I any case, streaming data like that is standard practise in all languages I've used supporting IO functions. It's just filestreaming.

I can relate to this with Swift Mailer. It would not be possible to send large attachments without filestreaming. Not only do I stream data through a socket when I come to send it, but I also stream to disk when I encode it.

How would you tackle the problem of base 64 encoding a 100MB file in PHP? Like this:

Code: Select all

$inStream = fopen("to-encode.file", "rb");
$outStream = fopen("encoded.file", "wb");
while (false !== $bytes = fread($inStream, 8190)) //Must be divisible by 3 for base64
{
  fwrite($outStream, base64_encode($bytes));
}
fclose($inStream);
fclose($outStream);
You certainly would not want to read the entire file then base 64 encode it purely to avoid "memory fragmentation" ;)
tdio
I agree. But this is an example of an exception to the rule. After further investigation, it seems as though all file functions in PHP utilize the stream API, which uses mmap anyways. So I am not sure where this disscussion is going anymore. :P

Anytime there is latency between a client and server in sending/receiving data I guess it makes little sense to allocate a buffer to hold the entire contents of a file (something I didn't take into consideration when replying to your post). Your example of sending massive files is entirely justified to my knowledge, under the context of streaming data over sockets. Admittedly I don't know enough about how and where PHP sits relative to other systems to disscuss this, but...

I think because file_get_contents() reads the entire content into a buffer is irrelevant when compared to readfile. What I mean by that is, I think where file_get_contents is better is the fact that:

a. Memory doesn't get fragmented when you allocate large blocks at once; this is a common technique in traditional application development I've seen it a million times.

b. Your example of reading blocks of data iteratively likely won't fragment memory, as it doesn't buffer that data but instead send straight to standard output. Streaming that data to STDIO makes sense under the context of PHP because of the client/server connection can only accept so many bytes at a time. However echo file_get_contents() does the exact same thing at the loest level I would think. Just because you read a file into a buffer all at once doesn't mean it's sent to the screen all at once. I am sure echo streams that output in the same way as readfile() does...

My interest is piqued...but I really can't spend any more time on this...haha :)

My basic thoughts are this:

1) Using a loop in PHP is going to be slower than relying on a C implementation of file_get_contents
2) Regardless of whether readfile() allocates a buffer for each read, calling a function iteratively is going to be slower than calling a single function once and allocating a large buffer to accomodate it's data.
3) The main advantage of readfile and a loop (as your demo shows) is that you do not swallow a chunk of memory, but pay for it in terms of clock cycles and indirectly RAM. Whereas, if you allocate a large buffer right off the hop and echo to STDIO PHP should stream those results as equally well as readfile() but without the PHP iteration code using more clock cycles.

In saying that, it appears as though (as usual) it really depends on your situation. Allocating large amounts of memory 100MB could potentially bring a system to a crawl. However if the files being served are small (less than 10 MB; ideally a couple megs) I think it makes more sense to allocate a single large buffer and echo that to screen rather that reading 4KB chunks and sending that to STDIO iteratively.

I appreciate the non-confrontational conversation ;)

Cheers :)