Page 1 of 2

PHP CSV Utilities

Posted: Fri Mar 14, 2008 10:10 pm
by Luke
Tar and zip files are available via attachment (see below)

I am preparing to release version 0.2 of my csv library which I have tentatively named PHP CSV Utilities. I would like to come up with a less boring name, but haven't been able to think of anything. One of the main reasons I wrote the library is to learn the ins and outs of the release cycle. I've never really released a software package before so I picked a relatively easy problem to solve and solved it :)

So, for those of you who are interested, I'd like for you to do any / all of the following:

1) Let me know what version of PHP you are using and whether or not you had any issues. You can verify this by running its unit test in the tests folder. I have only tested it on version 5.2.4 and that is pretty bleeding edge. I know Csv_Reader_String will not work on versions less than 5.1 because it makes use of the php://temp input stream, but that's as much as I know. I don't plan on writing a PHP4 compatible version. PHP4 is for the birds! :P

2) Are there any interface issues you have with it? What do you like about it? What do you dislike about it? Is there anything that is confusing or just plain nonsensical?

3) What would you like to see in the next version?

4) Are there any Csv_Dialect classes you can think of that would be useful (so far I have one for excel, and there are plans for google docs and open office)

5) Are there any implementation improvements I can make? I happen to know that several of the methods can me optimized because I sort of rushed through them. I will eventually go back and optimize, but help from you would make that process that much easier.

6) Test that all of the components (especially Csv_Sniffer) return the right results MOST of the time (since csv files are notoriously malformed, I can't expect 100% accuracy)

What does this library do?

Provides an object-oriented (php5) interface to read and write basically any delimited data. Although I call it a csv library, it is very flexible as far as format goes. To read a csv file, there are several methods you can use. It implements the SPL interface Iterator as well as Countable

Code: Select all

try {
    $reader = new Csv_Reader("./data/orders.csv");
    foreach ($reader as $row) {
        list($orderid, $orderdate, $ordertotal, $customername, $etc) = $row;
        // do something with data
    }
} catch (Csv_Exception_FileNotFound $e) {
    printf("<p class=\"error\">%s</p>", $e->getMessage());
}
 
OR (skipped the exceptions stuff here for brevity)

Code: Select all

$reader = new Csv_Reader("./data/orders.csv");
$reader->getRow(); // gets header (first row)
while ($row = $reader->getRow()) {
    // do something with row
}
OR

Code: Select all

$reader = new Csv_Reader("./data/orders.csv");
while ($row = $reader->current()) {
    // do something with row
    $reader->next();
}
Writing to a csv file is really easy (skipped exception stuff for brevity)

Code: Select all

$writer = new Csv_Writer("./data/orders.csv");
foreach ($data as $row) {
    if (count($row) == 12) {
        $writer->writeRow($row);
    }
}
// if you have data in an array and you simply want to write it all to a csv file, use writeRows() instead
$writer->writeRows($data);
$writer->close(); // writes the file and closes the resource (also gets called in __destruct()
 
If you have a csv file already and you want to append it, simply provide a file handle in append mode

Code: Select all

$writer = new Csv_Writer(fopen("./data/orders.csv", "a"));
// now write to the file
Csv_Reader and Csv_Writer both assume the following:
delimiter - comma
quoting character - double quote
escape character - backslash
line terminator - carriage return + newline
and they assume that only columns with either the quote char or the delim character need to be quoted

If, however you need to use different parameters, it is as easy as providing a Csv_Dialect as the second option. Basically a Csv_Dialect (name borrowed from python's csv module) tells reader & writer the format of the csv file. As of now, I have only written two dialects. Standard (the default) and excel. It is very easy to change any of a dialect's parameters.

Code: Select all

$dialect = new Csv_Dialect_Excel(array("delimiter" => "\t")); // uses excel format, but instead of a comma it uses a tab
$reader = new Csv_Reader('customers.dat', $dialect);
// now reader will read a tab-delimited excel file without issue :)
 
$writer = new Csv_Writer('orders.dat', new Csv_Dialect(array("lineterminator" => "\n", "quoting" => Csv_Dialect::QUOTE_NONE)));
// now writer will use a newline as its line terminator and it won't quote any columns
This library makes use of exceptions where appropriate, so be aware of that.

Let's see... what else? Umm... oh yea! I almost forgot the coolest thing! Csv_Sniffer. Csv_Sniffer is basically a port of python's csv.sniffer class. I haven't thought out the interface particularly well because I was mostly concerned with getting it to work the way it's supposed to, so any interface advice you can give on it would be awesome. Csv_Sniffer::sniff() accepts a sample of the csv file (needs at least ten rows, and doesn't read more than 20 rows even if you provide them) and it returns a Csv_Dialect object. Csv_Sniffer::hasHeader() accepts a sample and returns true if the sample likely has a header and false otherwise.

Code: Select all

$file = 'data/products.csv';
$rows = file($file);
$sample = implode("", array_slice($rows, 0, 20 // grab 20 lines of a csv file as a string
try {
    $dialect = $sniffer->sniff($sample)
    $reader = new Csv_Reader($file, $dialect);
} catch (Csv_Exception_CannotDetermineDialect $e) {
    echo "<p class=\"error\">Sorry, unable to determine csv format</p>";
}
 
if ($sniffer->hasHeader($sample)) {
    echo "<p>File probably has a header</p>";
} else {
    echo "<p>File probably does not have a header</p>";
}
 
Tar and zip files are available via attachment (see below)

Check our blog for more info and updates. The project is hosted here

Re: PHP CSV Utilities

Posted: Sat Mar 15, 2008 3:13 pm
by Christopher
Looks good. I haven't had any problems running it yet. I do have a couple of comments:

- I would like to have exceptions be optional, so you could work with return codes.

- I still don't see an option to deal with reading or writing files with the column names in the first row. That is an essential feature for me.

- I don't like the writeRow(), writeRows(), writeData() interface. Is that how the Python library works? I would rather either only have writeRow() and writeRows(), and have them write immediately. Or have addRow()/addRows() to build the buffer and write()/append() to commit to disk.

- I haven't checked, but often you read a file, modify the data in the array, then write it back. So there should be some symmetry to the interface.

Re: PHP CSV Utilities

Posted: Sat Mar 15, 2008 3:31 pm
by Luke
Good point. I'm going to have to agree with you. I'll change that in the next version. Although this library does follow a lot of the same interface as the python module, there are several places where I have strayed from it. Mainly the Csv_Sniffer class is the one I followed python's module the most.

EDIT: Oh yea, what version of php did you run it on? How can I test it on more versions?

Re: PHP CSV Utilities

Posted: Sat Mar 15, 2008 6:16 pm
by Christopher
I ran it on 5.2.5, but did not throughly test.

Re: PHP CSV Utilities

Posted: Sat Mar 15, 2008 11:28 pm
by Luke
well all you need to do is run the unit tests. I have officially released version 0.2. It can be downloaded here:

http://code.google.com/p/php-csv-utils/downloads/list

I have begun documentation here:

http://code.google.com/p/php-csv-utils/ ... umentation

It's not the greatest documentation in the world, but hey it's the first time I've ever written docs before.

Re: PHP CSV Utilities

Posted: Sun Mar 16, 2008 12:31 am
by Luke
I still don't see an option to deal with reading or writing files with the column names in the first row. That is an essential feature for me.
What do you mean by this?
I haven't checked, but often you read a file, modify the data in the array, then write it back. So there should be some symmetry to the interface
Well you can do this... is this what you mean?

Code: Select all

 
try {
    $filename = './data/tab-200.csv';
    $sample = implode("", array_slice(file($filename), 0, 20)); // grab 20 lines in a string
    $sniffer = new Csv_Sniffer();
    $dialect = $sniffer->sniff($sample); // try to determine dialect (if it can't it will throw an exception)
 
    $reader = new Csv_Reader($filename, new Csv_Dialect(array('delimiter' => "\t"))); 
    if ($sniffer->hasHeader($sample)) {
        $header = $reader->getRow(); // if sniffer detects a header, grab the first row and advance the internal pointer forward
    }
 
    $dialect->quotechar = "'";
    $dialect->lineterminator = "\n";
    $dialect->quoting = Csv_Dialect::QUOTE_NONNUMERIC;
    $dialect->delimiter = "|";
 
    // if you didn't need to change the values in the data, but only the dialect params, you could simply pass the reader to writer::writeRows()
    // $writer->writeRows($reader); // writeRows() accepts a reader or an array of data
 
    $writer = new Csv_Writer($filename, $dialect); // rewrite the file with new line-endings and quote character and  stuff
    unset($header[8], $header[9]); // unset the 8th and 9th columns to stay consistant with changes below
    $writer->writeRow($header);
 
    while ($row = $reader->getRow()) {
        $row[3] = strtoupper($row[3]); // change column three to uppercase
        $row[6] = (bool) $row[6]; // 
        $row[7] = $row[8] . " " . $row[9];
        unset($row[8], $row[9]);
        $writer->writeRow($row);
    }
 
    $writer->close(); // write the data to disk
 
} catch (Csv_Exception $e) {
    // will catch Csv_Exception_FileNotFound or Csv_Exception_CannotDetermineDialect and Csv_Exception_CannotAccessFile
    // which will be called by Csv_Reader, Csv_Sniffer and Csv_Writer respectively
    printf("<p>%s</p>", $e->getMessage());
}
 
It could use some improvement to the interface... Here is an interface I am considering for the next version.

Code: Select all

 
try {
    $filename = './data/orders.csv';
    $sniffer = new Csv_Sniffer($filename); // Csv_Sniffer will accept a filename in its constructor instead of a string of data
    // $sniffer = new Csv_Sniffer_String(); // this will do what Csv_Sniffer used to do, accept a string of csv data
    $dialect = $sniffer->sniff();
 
    $reader = new Csv_Reader($filename, $dialect);
    if ($sniffer->hasHeader()) {
        $header  = $reader->getHeader();
    }
 
    $dialect->quotechar = "'";
    $dialect->lineterminator = "\n";
    $dialect->quoting = Csv_Dialect::QUOTE_NONNUMERIC;
    $dialect->delimiter = "|";
 
    $writer = new Csv_Writer($filename, $dialect); // rewrite the file with new line-endings and quote character 
    $writer->writeRow($header);
    while ($row = $reader->getRow($header)) {
        $row['sku'] = strtoupper($row['sku']); // change column three to uppercase
        $row['is_active'] = (bool) $row['is_active']; // 
        $row['customer_name'] = $row['first_name'] . " " . $row['last_name'];
        unset($row['first_name'], $row['last_name']);
        $writer->writeRow($row);
    }
} catch (Csv_Exception $e) {
    // will catch Csv_Exception_FileNotFound or Csv_Exception_CannotDetermineDialect and Csv_Exception_CannotAccessFile
    // which will be called by Csv_Reader, Csv_Sniffer and Csv_Writer respectively
    printf("<p>%s</p>", $e->getMessage());
}
 

Re: PHP CSV Utilities

Posted: Sun Mar 16, 2008 12:56 am
by Christopher
Maybe I missed it, but I mean supporting reading and writing files that have the column names in the first row. And easily getting those column names.

Re: PHP CSV Utilities

Posted: Sun Mar 16, 2008 7:02 pm
by Luke
Did you look at the example I just posted? :?

Also, am I correct in saying that PHP must have a pecl extension installed to read zipped files?

Re: PHP CSV Utilities

Posted: Sun Mar 16, 2008 8:49 pm
by Christopher
Yes, I missed. It seem like the long way to go about a fairly common situation. I think build it in and have the thing pre-read lines if you want it to try to sort out what the file is.

And the whole $sniffer->sniff() seems to funny to code for me! ;)

Re: PHP CSV Utilities

Posted: Mon Mar 17, 2008 10:29 am
by Luke
Here is a blog post about the newest features:
http://www.mc2design.com/blog/php-csv-u ... n-released

Re: PHP CSV Utilities

Posted: Wed Mar 19, 2008 6:30 am
by matthijs
Great, am definitely going to take a look at this. A library like this could be useful in a web app, in which importing or exporting data as csv is needed.

Re: PHP CSV Utilities

Posted: Wed Mar 19, 2008 9:59 am
by John Cartwright
It seems to me that the "sniffing" should be done internally by CSV_Reader/Writer.

Re: PHP CSV Utilities

Posted: Thu Mar 20, 2008 10:20 am
by Luke
Why? I've always been told classes do one thing and one thing only. Why would reader and writer also detect csv format. I was thinking of changing the syntax to this though:

Code: Select all

try {
    $filename = './data/orders.csv';
    $sniffer = new Csv_Sniffer($filename);
    $reader = new Csv_Reader($filename, $sniffer->sniff());
    if ($sniffer->hasHeader()) {
        $header = $reader->getRow();
    }
    while ($row = $reader->getRow()) {
        // do something with rows
    }
} catch (// bla bla bla) {
}
EDIT: Or did you mean have Csv_Reader and Csv_Writer use the sniffer class internally?

Re: PHP CSV Utilities

Posted: Thu Mar 20, 2008 10:22 am
by John Cartwright
The Ninja Space Goat wrote:EDIT: Or did you mean have Csv_Reader and Csv_Writer use the sniffer class internally?
Bingo.

Re: PHP CSV Utilities

Posted: Thu Mar 20, 2008 10:27 am
by Luke
Interesting... that would definitely simplify things a bit. The only thing is... I don't want the user to HAVE to use the sniffer class... because it's not always accurate. I want it to be a sort of opt-in feature.