PHP CSV Utilities

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

PHP CSV Utilities

Post by Luke »

Tar and zip files are available via attachment (see below)

I am preparing to release version 0.2 of my csv library which I have tentatively named PHP CSV Utilities. I would like to come up with a less boring name, but haven't been able to think of anything. One of the main reasons I wrote the library is to learn the ins and outs of the release cycle. I've never really released a software package before so I picked a relatively easy problem to solve and solved it :)

So, for those of you who are interested, I'd like for you to do any / all of the following:

1) Let me know what version of PHP you are using and whether or not you had any issues. You can verify this by running its unit test in the tests folder. I have only tested it on version 5.2.4 and that is pretty bleeding edge. I know Csv_Reader_String will not work on versions less than 5.1 because it makes use of the php://temp input stream, but that's as much as I know. I don't plan on writing a PHP4 compatible version. PHP4 is for the birds! :P

2) Are there any interface issues you have with it? What do you like about it? What do you dislike about it? Is there anything that is confusing or just plain nonsensical?

3) What would you like to see in the next version?

4) Are there any Csv_Dialect classes you can think of that would be useful (so far I have one for excel, and there are plans for google docs and open office)

5) Are there any implementation improvements I can make? I happen to know that several of the methods can me optimized because I sort of rushed through them. I will eventually go back and optimize, but help from you would make that process that much easier.

6) Test that all of the components (especially Csv_Sniffer) return the right results MOST of the time (since csv files are notoriously malformed, I can't expect 100% accuracy)

What does this library do?

Provides an object-oriented (php5) interface to read and write basically any delimited data. Although I call it a csv library, it is very flexible as far as format goes. To read a csv file, there are several methods you can use. It implements the SPL interface Iterator as well as Countable

Code: Select all

try {
    $reader = new Csv_Reader("./data/orders.csv");
    foreach ($reader as $row) {
        list($orderid, $orderdate, $ordertotal, $customername, $etc) = $row;
        // do something with data
    }
} catch (Csv_Exception_FileNotFound $e) {
    printf("<p class=\"error\">%s</p>", $e->getMessage());
}
 
OR (skipped the exceptions stuff here for brevity)

Code: Select all

$reader = new Csv_Reader("./data/orders.csv");
$reader->getRow(); // gets header (first row)
while ($row = $reader->getRow()) {
    // do something with row
}
OR

Code: Select all

$reader = new Csv_Reader("./data/orders.csv");
while ($row = $reader->current()) {
    // do something with row
    $reader->next();
}
Writing to a csv file is really easy (skipped exception stuff for brevity)

Code: Select all

$writer = new Csv_Writer("./data/orders.csv");
foreach ($data as $row) {
    if (count($row) == 12) {
        $writer->writeRow($row);
    }
}
// if you have data in an array and you simply want to write it all to a csv file, use writeRows() instead
$writer->writeRows($data);
$writer->close(); // writes the file and closes the resource (also gets called in __destruct()
 
If you have a csv file already and you want to append it, simply provide a file handle in append mode

Code: Select all

$writer = new Csv_Writer(fopen("./data/orders.csv", "a"));
// now write to the file
Csv_Reader and Csv_Writer both assume the following:
delimiter - comma
quoting character - double quote
escape character - backslash
line terminator - carriage return + newline
and they assume that only columns with either the quote char or the delim character need to be quoted

If, however you need to use different parameters, it is as easy as providing a Csv_Dialect as the second option. Basically a Csv_Dialect (name borrowed from python's csv module) tells reader & writer the format of the csv file. As of now, I have only written two dialects. Standard (the default) and excel. It is very easy to change any of a dialect's parameters.

Code: Select all

$dialect = new Csv_Dialect_Excel(array("delimiter" => "\t")); // uses excel format, but instead of a comma it uses a tab
$reader = new Csv_Reader('customers.dat', $dialect);
// now reader will read a tab-delimited excel file without issue :)
 
$writer = new Csv_Writer('orders.dat', new Csv_Dialect(array("lineterminator" => "\n", "quoting" => Csv_Dialect::QUOTE_NONE)));
// now writer will use a newline as its line terminator and it won't quote any columns
This library makes use of exceptions where appropriate, so be aware of that.

Let's see... what else? Umm... oh yea! I almost forgot the coolest thing! Csv_Sniffer. Csv_Sniffer is basically a port of python's csv.sniffer class. I haven't thought out the interface particularly well because I was mostly concerned with getting it to work the way it's supposed to, so any interface advice you can give on it would be awesome. Csv_Sniffer::sniff() accepts a sample of the csv file (needs at least ten rows, and doesn't read more than 20 rows even if you provide them) and it returns a Csv_Dialect object. Csv_Sniffer::hasHeader() accepts a sample and returns true if the sample likely has a header and false otherwise.

Code: Select all

$file = 'data/products.csv';
$rows = file($file);
$sample = implode("", array_slice($rows, 0, 20 // grab 20 lines of a csv file as a string
try {
    $dialect = $sniffer->sniff($sample)
    $reader = new Csv_Reader($file, $dialect);
} catch (Csv_Exception_CannotDetermineDialect $e) {
    echo "<p class=\"error\">Sorry, unable to determine csv format</p>";
}
 
if ($sniffer->hasHeader($sample)) {
    echo "<p>File probably has a header</p>";
} else {
    echo "<p>File probably does not have a header</p>";
}
 
Tar and zip files are available via attachment (see below)

Check our blog for more info and updates. The project is hosted here
Attachments
csv-files.zip
Zipped source files
(178.54 KiB) Downloaded 339 times
csv-utils.tar
Tarred source files
(688.33 KiB) Downloaded 306 times
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: PHP CSV Utilities

Post by Christopher »

Looks good. I haven't had any problems running it yet. I do have a couple of comments:

- I would like to have exceptions be optional, so you could work with return codes.

- I still don't see an option to deal with reading or writing files with the column names in the first row. That is an essential feature for me.

- I don't like the writeRow(), writeRows(), writeData() interface. Is that how the Python library works? I would rather either only have writeRow() and writeRows(), and have them write immediately. Or have addRow()/addRows() to build the buffer and write()/append() to commit to disk.

- I haven't checked, but often you read a file, modify the data in the array, then write it back. So there should be some symmetry to the interface.
(#10850)
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Re: PHP CSV Utilities

Post by Luke »

Good point. I'm going to have to agree with you. I'll change that in the next version. Although this library does follow a lot of the same interface as the python module, there are several places where I have strayed from it. Mainly the Csv_Sniffer class is the one I followed python's module the most.

EDIT: Oh yea, what version of php did you run it on? How can I test it on more versions?
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: PHP CSV Utilities

Post by Christopher »

I ran it on 5.2.5, but did not throughly test.
(#10850)
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Re: PHP CSV Utilities

Post by Luke »

well all you need to do is run the unit tests. I have officially released version 0.2. It can be downloaded here:

http://code.google.com/p/php-csv-utils/downloads/list

I have begun documentation here:

http://code.google.com/p/php-csv-utils/ ... umentation

It's not the greatest documentation in the world, but hey it's the first time I've ever written docs before.
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Re: PHP CSV Utilities

Post by Luke »

I still don't see an option to deal with reading or writing files with the column names in the first row. That is an essential feature for me.
What do you mean by this?
I haven't checked, but often you read a file, modify the data in the array, then write it back. So there should be some symmetry to the interface
Well you can do this... is this what you mean?

Code: Select all

 
try {
    $filename = './data/tab-200.csv';
    $sample = implode("", array_slice(file($filename), 0, 20)); // grab 20 lines in a string
    $sniffer = new Csv_Sniffer();
    $dialect = $sniffer->sniff($sample); // try to determine dialect (if it can't it will throw an exception)
 
    $reader = new Csv_Reader($filename, new Csv_Dialect(array('delimiter' => "\t"))); 
    if ($sniffer->hasHeader($sample)) {
        $header = $reader->getRow(); // if sniffer detects a header, grab the first row and advance the internal pointer forward
    }
 
    $dialect->quotechar = "'";
    $dialect->lineterminator = "\n";
    $dialect->quoting = Csv_Dialect::QUOTE_NONNUMERIC;
    $dialect->delimiter = "|";
 
    // if you didn't need to change the values in the data, but only the dialect params, you could simply pass the reader to writer::writeRows()
    // $writer->writeRows($reader); // writeRows() accepts a reader or an array of data
 
    $writer = new Csv_Writer($filename, $dialect); // rewrite the file with new line-endings and quote character and  stuff
    unset($header[8], $header[9]); // unset the 8th and 9th columns to stay consistant with changes below
    $writer->writeRow($header);
 
    while ($row = $reader->getRow()) {
        $row[3] = strtoupper($row[3]); // change column three to uppercase
        $row[6] = (bool) $row[6]; // 
        $row[7] = $row[8] . " " . $row[9];
        unset($row[8], $row[9]);
        $writer->writeRow($row);
    }
 
    $writer->close(); // write the data to disk
 
} catch (Csv_Exception $e) {
    // will catch Csv_Exception_FileNotFound or Csv_Exception_CannotDetermineDialect and Csv_Exception_CannotAccessFile
    // which will be called by Csv_Reader, Csv_Sniffer and Csv_Writer respectively
    printf("<p>%s</p>", $e->getMessage());
}
 
It could use some improvement to the interface... Here is an interface I am considering for the next version.

Code: Select all

 
try {
    $filename = './data/orders.csv';
    $sniffer = new Csv_Sniffer($filename); // Csv_Sniffer will accept a filename in its constructor instead of a string of data
    // $sniffer = new Csv_Sniffer_String(); // this will do what Csv_Sniffer used to do, accept a string of csv data
    $dialect = $sniffer->sniff();
 
    $reader = new Csv_Reader($filename, $dialect);
    if ($sniffer->hasHeader()) {
        $header  = $reader->getHeader();
    }
 
    $dialect->quotechar = "'";
    $dialect->lineterminator = "\n";
    $dialect->quoting = Csv_Dialect::QUOTE_NONNUMERIC;
    $dialect->delimiter = "|";
 
    $writer = new Csv_Writer($filename, $dialect); // rewrite the file with new line-endings and quote character 
    $writer->writeRow($header);
    while ($row = $reader->getRow($header)) {
        $row['sku'] = strtoupper($row['sku']); // change column three to uppercase
        $row['is_active'] = (bool) $row['is_active']; // 
        $row['customer_name'] = $row['first_name'] . " " . $row['last_name'];
        unset($row['first_name'], $row['last_name']);
        $writer->writeRow($row);
    }
} catch (Csv_Exception $e) {
    // will catch Csv_Exception_FileNotFound or Csv_Exception_CannotDetermineDialect and Csv_Exception_CannotAccessFile
    // which will be called by Csv_Reader, Csv_Sniffer and Csv_Writer respectively
    printf("<p>%s</p>", $e->getMessage());
}
 
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: PHP CSV Utilities

Post by Christopher »

Maybe I missed it, but I mean supporting reading and writing files that have the column names in the first row. And easily getting those column names.
(#10850)
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Re: PHP CSV Utilities

Post by Luke »

Did you look at the example I just posted? :?

Also, am I correct in saying that PHP must have a pecl extension installed to read zipped files?
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: PHP CSV Utilities

Post by Christopher »

Yes, I missed. It seem like the long way to go about a fairly common situation. I think build it in and have the thing pre-read lines if you want it to try to sort out what the file is.

And the whole $sniffer->sniff() seems to funny to code for me! ;)
(#10850)
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Re: PHP CSV Utilities

Post by Luke »

Here is a blog post about the newest features:
http://www.mc2design.com/blog/php-csv-u ... n-released
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Re: PHP CSV Utilities

Post by matthijs »

Great, am definitely going to take a look at this. A library like this could be useful in a web app, in which importing or exporting data as csv is needed.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: PHP CSV Utilities

Post by John Cartwright »

It seems to me that the "sniffing" should be done internally by CSV_Reader/Writer.
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Re: PHP CSV Utilities

Post by Luke »

Why? I've always been told classes do one thing and one thing only. Why would reader and writer also detect csv format. I was thinking of changing the syntax to this though:

Code: Select all

try {
    $filename = './data/orders.csv';
    $sniffer = new Csv_Sniffer($filename);
    $reader = new Csv_Reader($filename, $sniffer->sniff());
    if ($sniffer->hasHeader()) {
        $header = $reader->getRow();
    }
    while ($row = $reader->getRow()) {
        // do something with rows
    }
} catch (// bla bla bla) {
}
EDIT: Or did you mean have Csv_Reader and Csv_Writer use the sniffer class internally?
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: PHP CSV Utilities

Post by John Cartwright »

The Ninja Space Goat wrote:EDIT: Or did you mean have Csv_Reader and Csv_Writer use the sniffer class internally?
Bingo.
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Re: PHP CSV Utilities

Post by Luke »

Interesting... that would definitely simplify things a bit. The only thing is... I don't want the user to HAVE to use the sniffer class... because it's not always accurate. I want it to be a sort of opt-in feature.
Post Reply