Page 1 of 1

Help understanding a tree data type...

Posted: Fri Oct 29, 2010 10:13 am
by ajlisowski
Hey all. Long story short I am working on a project to build a bridge between php and an older flat file based database. The old database has the following:

A map file
a key file
X number of index files.


Map and key are very basic. Map file gives basic information about the database, key file contains the actual data with each entry being N bytes. Ive written a series of classes to read the map and then be able to grab data from the key file.

The index files are created by the old database as a way to quickly retrieve the record of indexed keys. What I need help with is determining what sort of tree the data is stored as given the limited documentation on the site.

Here is what the site says about the index files:
Automatic Index Header

The first 140 bytes of the index are as follows:
Offset Length Contents
0 2 "0xC139" - filePro 4.5 index magic number
2 2 Depth of index tree
4 2 "1" if root node is in block zero, else "0"
6 2 Maximum number of keys per node
8 4 Size of each node, in bytes
12 4 Block number of root node
16 4 Number of records in index
20 2 Length of sort key, in bytes
22 2 Length of record number portion of key in bytes
24 64 Sort information -- see below
88 4 Block number of freechain head
92 48 Index comment (NUL-terminated ASCII)

Sort Information

The sort information is the following 8 bytes repeated 8 times - once for each possible sort key.
Offset Length Contents
0 2 The field number
2 1 Associated field instance
3 1 "0" - Used for output formats only
4 2 Field length
6 1 "0" for ascending, "1" for descending
7 1 Field type

Non-leaf node

Each non-leaf node has the following format:
Offset Length Meaning
0 2 The number of node pointers within this node
2 4 Left node pointer
6 n*m n entries consisting of: Lowest key of node (n bytes), Node pointer (4 bytes). The node count does not include the left node pointer.

Leaf node

Each leaf node has the following format:
Offset Length Meaning
0 4 Backward node pointer link
4 4 Forward node pointer link
8 2 Extended block flag:
" 0" - A normal block
" 1" - This block contains a list of duplicate keys, and continues into the next block
" 2" - This block is the end of a chain of duplicate-key blocks.
10 2 Number of key values in this node
12 2*n Offset within block of start of each key value

Each key value is stored as:

Offset Length Meaning
0 n The key value
n m*o A list of the records for this key

Freechain

Each node on the freechain is stored in a singly-linked list as:
Offset Length Meaning
0 4 Forward link pointer
Reading this, im trying to figure out what data type to store the tree as and how to convert it (or if its already stored as one based on the node descriptions.)

First off, I dont understand what a freechain is.

Then I am unsure of how to interpret the nodes.

What does the left pointer point to on the non-leaf-nodes? Is it pointing to the parent? And all other pointers are pointing to children nodes? Id assume that a non-leaf-node would only contain at most, one other non-leaf-node and any amount of leaf nodes?

Leaf nodes contain the data. I get that. Backwards pointer points to the parent node. Forwards pointer possibly points to other leaf-nodes with a duplicate key?

Re: Help understanding a tree data type...

Posted: Fri Oct 29, 2010 12:22 pm
by Weirdan
There's a PHP extension for read-only access to filepro databases (albeit unmaintained): http://www.php.net/manual/en/book.filepro.php .
Have you considered it?

Re: Help understanding a tree data type...

Posted: Mon Nov 01, 2010 4:25 pm
by ajlisowski
I have. Two problems with it. One the extension isnt compatibale with our linux version. Two, the read-access is record specific. You need to KNOW the record to get it.

Ive written a PHP class which pretty much mimics that extension, allowing me to define a record and key and get the data back. However, the issue is, I need to be able to know which record Id want to grab. Thus, the index files need to be read. That extension does not seem to have any way to do so. Thus, I am writing a way to.

So far Ive determined the type is a b-tree. But I have some issues parsing it. I am unsure of where the index node would be. I assume I would take the node_size and multiply it by the index node block number and add the 140 header to get the starting byte for the index node. But I am unsure.

I am also unsure of what the node pointers will give me. Will it be the block number of the node, or would it be the starting byte of the node?

Im still trying to crack it though. So...good for me :) If anyone can help who has experience with b-trees I would be very excited.