Page 1 of 2
Ideas to preserve integrity of data in flatfile
Posted: Mon Nov 03, 2008 7:08 am
by hknight
I am forced to use flat files to store important data.
read.php gets data from the flat file
write.php places data in the flat file
I have run into a problem: if read.php is accessed while write.php is in the middle of writing to flatfile.dat then only some data is called and corrupted data is displayed.
I have found that flock() is very unreliable and it does NOT protect the integrity of my data. The documentation for flock() warns that it does not work the same on all systems.
I have a few ideas and want some constructive feedback.
- Use while() to read the data in a loop. Don’t return data until it is the same twice. So if the data keeps changing PHP will keep trying to read it until it gets the same data twice.
- In write.php, first rename flatfile.dat to flatfile.dat.tmp then write to flatfile.dat.tmp then rename flatfile.dat.tmp to flatfile.dat. In read.php, if flatfile.dat does not exist and flatfile.dat.tmp does exist then wait 10 milliseconds and try again.
- In write.php, first rename the directory that flatfile.dat is in then write to flatfile.dat in the temporary directiry. After done writing to the flatfile change the directory back to its original name. In read.php, if directory that flatfile.dat lives in does not exist then wait 10 milliseconds and try again.
A database is not an option for this project. I must find a reliable way to use flat files with using PHP 4.
I would greatly appreciate all constructive comments, criticism and ideas.
Re: Ideas to preserve integrity of data in flatfile
Posted: Mon Nov 03, 2008 8:07 am
by josh
Store data incrementally and invoke a garage collection routine when a threshold for data files has been hit
Re: Ideas to preserve integrity of data in flatfile
Posted: Mon Nov 03, 2008 8:29 am
by hknight
Thanks for your idea, jshpro2. Sadly I have no idea what you are talking about...
Re: Ideas to preserve integrity of data in flatfile
Posted: Mon Nov 03, 2008 9:04 am
by crazycoders
Something more simple than that would be to simply create a "cookie" file that says the file is currently locked. I used that for a cache-on-demand mechanism. Some pages would be asked so many times and the rendering take so much time that when the file got to it's time to live. I had to refresh it and multiple requests made concurent access to the same file.
So to do that, i simply check the TTL of the file, create a "DONOTTOUCHxyz.ext" file with no content. Access my file in read/write mode and when done, remove the cookie. Your other scripts should check if the file is there and either return a message to the user or wait in a loop until ready.
My two cents? Great way to do it, a bit drive frenzy since you read a lot and write a lot of small files to drive, but it works and works great.
This technique was used on a 600 request per second server where caching on demande was used, so believe me, there were a lot of requests to test this method

Re: Ideas to preserve integrity of data in flatfile
Posted: Mon Nov 03, 2008 9:05 am
by josh
If the file is being written to that often, what I would do is write to file.txt.1, file.txt.2, etc.. have the script find the next available file and try to open it for writing and create a pointer, once it knows it has a pointer to that file it can write to it will other processes concurrently create file.txt.4, etc...
Every so often you would need a script that goes through and cleans up the mess / resets the counter back to 0. This way no script is waiting on another script to finish writing, the processes write what they need to and some background process takes care of merging it or whatever or analyzing the data or whatever ( no idea what context you're using this in )
Re: Ideas to preserve integrity of data in flatfile
Posted: Mon Nov 03, 2008 9:14 am
by Eran
What I would do is write the content to a temporary file. Once finished, rename the file into the original file. This process should make the file unavailable for a much shorter time, and also it would be unreadable so no gibberish will be shown on screen.
Re: Ideas to preserve integrity of data in flatfile
Posted: Mon Nov 03, 2008 10:07 am
by Selkirk
Here is some code for atomic move on unix:
Code: Select all
$dir = dirname($filename);
if ((!is_dir($dir) && !@mkdir($dir, 0777, TRUE)) ||
!($tmpfile = @tempnam($dir, '.tmp')) ||
!@file_put_contents($tmpfile, $data) ||
!@rename($tmpfile, $filename)) {
return FALSE;
}
Alternatively,
lock the file.
Re: Ideas to preserve integrity of data in flatfile
Posted: Mon Nov 03, 2008 2:16 pm
by josh
He said he tried that, plus you're still going to run into concurrency problems when multiple processes try to get pointers to the tmp file
Re: Ideas to preserve integrity of data in flatfile
Posted: Mon Nov 03, 2008 2:18 pm
by hknight
Based on a combination of your ideas I came up with this. Is the code good or bad? How could it be improved?
Code: Select all
<?php
ignore_user_abort();
set_time_limit(0);
function lockFile($filename) {
if (@mkdir("lock/".ereg_replace("[^A-Za-z0-9]", "", $filename ), 0600)==true)
return true;
else {
usleep (200000);
return lockFile($filename);
}
}
function unlockFile($filename) {
if (@rmdir("lock/".ereg_replace("[^A-Za-z0-9]", "", $filename ), 0600)==true)
return true;
else {
usleep (200000);
return unlockFile($filename);
}
}
// Do This Everytime Before You Read a File
if(lockFile('file.txt')==true)
{
/// read file
}
// Do This Everytime Before You Write to a File
if(lockFile('file.txt')==true)
{
usleep (400000); // Wait a while to give people currently reading the file a change to complete reading the file
/// write to file
unlockFile('file.txt');
}
?>
Re: Ideas to preserve integrity of data in flatfile
Posted: Tue Nov 04, 2008 8:43 am
by hknight
Thank you. I have another idea that I think might solve all issues.
When reading or writing it verifies that data is correct, and if it is not it waits for a few milliseconds and tries again.
I think that this solves both atomic and race condition issues.
Is my idea good or bad?
Code: Select all
<?php
### To Write ###
if(file_put_verified_contents('data/file1.txt', 'Hello World')==false) {
echo "<h1>There was a problem saving this data.</h1>";
}
### To Read ###
$page = file_get_verified_contents('data/file1.txt');
if ($page === false) echo "<h1>Website Down for Maintenance</h1>";
echo '<p>Data: ' . $page . '</p>';
function file_get_verified_contents ($file, $i=0) {
ignore_user_abort(true);
set_time_limit(0);
$data = @file_get_contents ($file);
if (substr($data, 0,7) != '##BOF##' || substr($data, -7) != '##EOF##')
{
usleep (200000);
if ($i++<10) return file_get_verified_contents ($file, $i);
else return false;
}
return substr($data, 7,-7);
}
function file_put_verified_contents ($file, $data, $i=0) {
ignore_user_abort(true);
set_time_limit(0);
$fp = fopen($file, 'w');
fwrite($fp, "##BOF##\n");
fwrite($fp, $data);
fwrite($fp, "\n##EOF##");
fclose($fp);
$newData = @file_get_contents ($file);
if (substr($newData, 0,7) != '##BOF##' || substr($newData, -7) != '##EOF##')
{
usleep (20000);
if ($i++<10) return file_put_verified_contents ($file, $data, $i);
else return false;
}
else return true;
}
?>
Re: Ideas to preserve integrity of data in flatfile
Posted: Tue Nov 04, 2008 10:10 am
by josh
The problem you're probably going to have is if 2 concurrent processes check for a file lock at the same time, both processes think they have the exclusive lock and then go to write. The lock checking and file opening has to be atomic if you're going to try that approach. My suggestion overcame this and did not require the processes to block, also 2 seconds is a bit long of a blocking time. Perhaps you could give us more context on the problem and we could come up with a better solution
Re: Ideas to preserve integrity of data in flatfile
Posted: Tue Nov 04, 2008 11:35 am
by hknight
What about this?
Code: Select all
<?php
### To Write ###
if(file_put_verified_contents('data/file1.txt', 'Hello World')==false) {
echo "<h1>There was a problem saving this data.</h1>";
}
### To Read ###
$page = file_get_verified_contents('data/file1.txt');
if ($page === false) echo "<h1>Website Down for Maintenance</h1>";
echo '<p>Data: ' . $page . '</p>';
function lockFile($file) {
clearstatcache();
if((time()+10)-(filectime($file)) > 20)
{
if (@rmdir("lock/".ereg_replace("[^A-Za-z0-9]", "", $file ), 0600)==true)
return false;
}
if (@mkdir("lock/".ereg_replace("[^A-Za-z0-9]", "", $file ), 0600)==true)
return true;
else {
usleep (200000);
return lockFile($file);
}
}
function unlockFile($file) {
if (@rmdir("lock/".ereg_replace("[^A-Za-z0-9]", "", $file ), 0600)==true)
return true;
else {
usleep (200000);
return unlockFile($file);
}
}
function file_get_verified_contents ($file, $i=0) {
ignore_user_abort(true);
set_time_limit(0);
lockFile($file);
$data = @file_get_contents ($file);
if (substr($data, 0,7) != '##BOF##' || substr($data, -7) != '##EOF##')
{
usleep (200000);
if ($i++<10) return file_get_verified_contents ($file, $i);
else return false;
}
unlockFile($file);
return substr($data, 7,-7);
}
function file_put_verified_contents ($file, $data, $i=0) {
ignore_user_abort(true);
set_time_limit(0);
lockFile($file);
$fp = fopen($file, 'w');
fwrite($fp, "##BOF##\n");
fwrite($fp, $data);
fwrite($fp, "\n##EOF##");
fclose($fp);
unlockFile($file);
$newData = @file_get_contents ($file);
if (substr($newData, 0,7) != '##BOF##' || substr($newData, -7) != '##EOF##')
{
usleep (20000);
if ($i++<10) return file_put_verified_contents ($file, $data, $i);
else return false;
}
else
return true;
}
?>
jshpro2, you asked for more context on the problem.
I have been tasked to develop a PHP content management system that does not require a database and will work on PHP 4,5 and 6. The objective is that it will work with hosting providers that do not support other content management systems.
From
http://www.php.net/flock
- flock() will not work on NFS and many other networked file systems. Check your operating system documentation for more details.
- On some operating systems flock() is implemented at the process level. When using a multi-threaded server API like ISAPI you may not be able to rely on flock() to protect files against other PHP scripts running in parallel threads of the same server instance!
- flock() is not supported on antiquated filesystems like FAT and its derivates and will therefore always return FALSE under this environments (this is especially true for Windows 98 users).
So flock() does not seem to be a good option.
Re: Ideas to preserve integrity of data in flatfile
Posted: Tue Nov 04, 2008 11:38 am
by Selkirk
jshpro2 wrote:
He said he tried that, plus you're still going to run into concurrency problems when multiple processes try to get pointers to the tmp file
Yeah, I didn't read very well, did I? That's what I get for quick skimming with my morning coffee. Sorry.
However, tempnam creates a different name for each temp file, so no concurrency problems (I think).
Re: Ideas to preserve integrity of data in flatfile
Posted: Tue Nov 04, 2008 12:36 pm
by josh
hknight wrote:What about this?
Still not atomic, the problem is that although this statistically mitigates the problem, it's still possible for 2 processes to detect an open file, since your processes are running concurrently on the server
So the computer might see the following commands
An example of your solution working
Code: Select all
Process A checks file_exists( $lockFile ) // false
Process A creates( $lockFile )
Process B checks file_exists( $lockFile ) // true, process B blocks for a few seconds ( which may I mention infuriates your users )
# Process A does some stuff, and closes file
Process B checks file_exists( $lockFile ) // false
# Process B does some stuff, and closes file
An example where your solution violates atomic constraints
Code: Select all
Process A checks file_exists( $lockFile ) // false
Process B checks file_exists( $lockFile ) // false
# remember, even though creation always runs the file_exists() in sequence before it creates the lock file, CPUs of today do more than 1 thing at once..
Process A creates( $lockFile )
Process B creates( $lockFile ) // concurrency violation
Re: Ideas to preserve integrity of data in flatfile
Posted: Tue Nov 04, 2008 1:39 pm
by crazycoders
then why not simply catch the error when creating a lockfile and branch on this?
I'm sure an fopen() can be error catchable, if the error occurs, loop and wait until no errors or simply return to the user with a graceful error.