Hey all. I am racking my brain trying to figure out the best way to handle a certain process my company needs me to do. First I will give a general description of the issue, then I will get into specifics.
Basically I have a file that contains a bunch of data. The data stored in the file is consistant, each entry has X number of elements with a total character size of Y. Element 1 in each entry will always take up the same amount of characters.
So if I have 2 elements in each data entry, each 5 characters long, then each data entry is 10 characters. Whcih means to access the 2nd element of the 3rd entry I would grab characters 15-20.
I would want to be able to search this data for a number of things. Basically id want to be able to say, "give me all results where element 1 = 'pancakes'. Or where element 2 >8. Or more substaintial searches such as element 1>10 AND element 2<5.
Is there a good way to do this?
My first line of thought would be to run through the file, grabbing every element that can be searched on. So lets say if elements 2 and 4 can be searched on, and I have 10 entries, I would run through and grab an array holding the values for entry 2 and 4.
array(
array('index=>0', 'element2'=>10, 'element4'=>12),
array('index=>1', 'element2'=>11, 'element4'=>-2),
array('index=>2', 'element2'=>0, 'element4'=>7)
)
Then, depending on the search, I would reorder the array and search through it. So if I wanted to see if element2 was greater then 8, id reorder by element2 and then search. As if this wasnt difficult enough, the issue would be on complex searches such as 'element1>10 OR (element2=6 AND element3=0)'
I suppose i could store the result array of each individual search result and then merge/union/intersect based on the AND/OR.
To make things ever CRAZIER, the flat files can have other flat files as children. So if I have element1 in file 1 and element1 in file 2, all of file2 data would be part of the corrisponding data in file1.
So, id basically need to be able to "join" files. I have thought of a number of solutions, but im not sure how they will pan out.
Ugh.
In case your wondering WHY I need to do this. My company runs a lot of data off an old database called filepro which uses flat files with index and map files. Right now we have a few systems which weve "synched" with mysql which runs batch updates between the two databases to keep data consistant.
We wont be coming "off" of filepro for a long, long time. But a lot of my php/mysql apps need access to the data on filepro. So, Im trying to find a way to quickly be able to search and pull data from filepro into php. Basically, i want to run search queries on the flat files.
Basically id like your input on a number of things:
Is this even a reasonable attempt?
Is the idea of pulling the searched elements into php to actually run the search, then collecting the correct rows and getting the data directly from them a smart move?
Once I have the searchable elements and index, what algorithms are good for actually running the search?
PHP search on flat files...
Moderator: General Moderators
- AbraCadaver
- DevNet Master
- Posts: 2572
- Joined: Mon Feb 24, 2003 10:12 am
- Location: The Republic of Texas
- Contact:
Re: PHP search on flat files...
Please give an example of the data (a couple of lines) and the naming convention etc. that identifies what record is linked to what other file and what the structure of that file would be.
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.
-
ajlisowski
- Forum Newbie
- Posts: 22
- Joined: Wed Dec 16, 2009 8:22 pm
Re: PHP search on flat files...
For each file there is a map file and a key file.
Map File Format
The file is 64 bytes long and contains the following information:
The first line of the map is as follows:
map:xxx:yyy
XXXXXXXX:PPPPPPPPPPPPPPPP
Where:
map is the literal "map"
xxx is the record length of the key file (excluding the 20-byte header)
yyy is the record length of the data file
zzz is the number of fields in the file
XXXXXXXX is the 8 digit hexadecimal checksum of the encoded password
PPPPPPPPPPPPPPP is the encoded password
Each additional line is:
Field name:xxx:type:
Where:
Field name is the name of the field
xxx is the length of the field (3 digits, right justified)
type is the edit type
Key
The key is just a text file with each record written with a 20 byte header. The map file defines the record length and column lengths.
There are index files which index certain columns within the record, but to be honest documentation on these files is rough and looking at them seems alien.
However in the files there is no official relationship set between files. Lets say I have a product file and a price file. With a one to many relationship based on sky and region.
Product 1 has a sku of 1234567.
It has two price files, one with sku 1234567 and region 1 and one with sku 1234567 and region 2. Nowhere in any file does it determine a relationship between product file and price file, it is up to the developer within filepro to make these relationships.
So, I was thinking I would write a class which could search these files, which would require the user to define which tables to be searched, what the keys are on that table, which columns to filter by and how the tables are related.
so id do
$search->add_table('product');
$search->add_key('product', 'sku');
$search->add_table('price');
$search->add_key('price', 'id');
$search->add_filter('price', 'region', '=', 'Chicago');
$search->add_filter('price', 'price', '>', 20);
$search->add_relationship('product', 'price', 'sku', 'sku);
$results=$search->run();
Basically the class would then grab all the record ids and skus from products. Grab all the record ids, skus, regions and price from price. Then search through these data sets and grabbing the record IDs and skus that match. Then combine data sets based on skus. Then go back into the files and grab the data i need from each file based on the record ids that I have.
Map File Format
The file is 64 bytes long and contains the following information:
The first line of the map is as follows:
map:xxx:yyy
Where:
map is the literal "map"
xxx is the record length of the key file (excluding the 20-byte header)
yyy is the record length of the data file
zzz is the number of fields in the file
XXXXXXXX is the 8 digit hexadecimal checksum of the encoded password
PPPPPPPPPPPPPPP is the encoded password
Each additional line is:
Field name:xxx:type:
Where:
Field name is the name of the field
xxx is the length of the field (3 digits, right justified)
type is the edit type
Key
The key is just a text file with each record written with a 20 byte header. The map file defines the record length and column lengths.
There are index files which index certain columns within the record, but to be honest documentation on these files is rough and looking at them seems alien.
However in the files there is no official relationship set between files. Lets say I have a product file and a price file. With a one to many relationship based on sky and region.
Product 1 has a sku of 1234567.
It has two price files, one with sku 1234567 and region 1 and one with sku 1234567 and region 2. Nowhere in any file does it determine a relationship between product file and price file, it is up to the developer within filepro to make these relationships.
So, I was thinking I would write a class which could search these files, which would require the user to define which tables to be searched, what the keys are on that table, which columns to filter by and how the tables are related.
so id do
$search->add_table('product');
$search->add_key('product', 'sku');
$search->add_table('price');
$search->add_key('price', 'id');
$search->add_filter('price', 'region', '=', 'Chicago');
$search->add_filter('price', 'price', '>', 20);
$search->add_relationship('product', 'price', 'sku', 'sku);
$results=$search->run();
Basically the class would then grab all the record ids and skus from products. Grab all the record ids, skus, regions and price from price. Then search through these data sets and grabbing the record IDs and skus that match. Then combine data sets based on skus. Then go back into the files and grab the data i need from each file based on the record ids that I have.