I am preparing to release version 0.2 of my csv library which I have tentatively named PHP CSV Utilities. I would like to come up with a less boring name, but haven't been able to think of anything. One of the main reasons I wrote the library is to learn the ins and outs of the release cycle. I've never really released a software package before so I picked a relatively easy problem to solve and solved it
So, for those of you who are interested, I'd like for you to do any / all of the following:
1) Let me know what version of PHP you are using and whether or not you had any issues. You can verify this by running its unit test in the tests folder. I have only tested it on version 5.2.4 and that is pretty bleeding edge. I know Csv_Reader_String will not work on versions less than 5.1 because it makes use of the php://temp input stream, but that's as much as I know. I don't plan on writing a PHP4 compatible version. PHP4 is for the birds!
2) Are there any interface issues you have with it? What do you like about it? What do you dislike about it? Is there anything that is confusing or just plain nonsensical?
3) What would you like to see in the next version?
4) Are there any Csv_Dialect classes you can think of that would be useful (so far I have one for excel, and there are plans for google docs and open office)
5) Are there any implementation improvements I can make? I happen to know that several of the methods can me optimized because I sort of rushed through them. I will eventually go back and optimize, but help from you would make that process that much easier.
6) Test that all of the components (especially Csv_Sniffer) return the right results MOST of the time (since csv files are notoriously malformed, I can't expect 100% accuracy)
What does this library do?
Provides an object-oriented (php5) interface to read and write basically any delimited data. Although I call it a csv library, it is very flexible as far as format goes. To read a csv file, there are several methods you can use. It implements the SPL interface Iterator as well as Countable
Code: Select all
try {
$reader = new Csv_Reader("./data/orders.csv");
foreach ($reader as $row) {
list($orderid, $orderdate, $ordertotal, $customername, $etc) = $row;
// do something with data
}
} catch (Csv_Exception_FileNotFound $e) {
printf("<p class=\"error\">%s</p>", $e->getMessage());
}
Code: Select all
$reader = new Csv_Reader("./data/orders.csv");
$reader->getRow(); // gets header (first row)
while ($row = $reader->getRow()) {
// do something with row
}Code: Select all
$reader = new Csv_Reader("./data/orders.csv");
while ($row = $reader->current()) {
// do something with row
$reader->next();
}Code: Select all
$writer = new Csv_Writer("./data/orders.csv");
foreach ($data as $row) {
if (count($row) == 12) {
$writer->writeRow($row);
}
}
// if you have data in an array and you simply want to write it all to a csv file, use writeRows() instead
$writer->writeRows($data);
$writer->close(); // writes the file and closes the resource (also gets called in __destruct()
Code: Select all
$writer = new Csv_Writer(fopen("./data/orders.csv", "a"));
// now write to the filedelimiter - comma
quoting character - double quote
escape character - backslash
line terminator - carriage return + newline
and they assume that only columns with either the quote char or the delim character need to be quoted
If, however you need to use different parameters, it is as easy as providing a Csv_Dialect as the second option. Basically a Csv_Dialect (name borrowed from python's csv module) tells reader & writer the format of the csv file. As of now, I have only written two dialects. Standard (the default) and excel. It is very easy to change any of a dialect's parameters.
Code: Select all
$dialect = new Csv_Dialect_Excel(array("delimiter" => "\t")); // uses excel format, but instead of a comma it uses a tab
$reader = new Csv_Reader('customers.dat', $dialect);
// now reader will read a tab-delimited excel file without issue :)
$writer = new Csv_Writer('orders.dat', new Csv_Dialect(array("lineterminator" => "\n", "quoting" => Csv_Dialect::QUOTE_NONE)));
// now writer will use a newline as its line terminator and it won't quote any columnsLet's see... what else? Umm... oh yea! I almost forgot the coolest thing! Csv_Sniffer. Csv_Sniffer is basically a port of python's csv.sniffer class. I haven't thought out the interface particularly well because I was mostly concerned with getting it to work the way it's supposed to, so any interface advice you can give on it would be awesome. Csv_Sniffer::sniff() accepts a sample of the csv file (needs at least ten rows, and doesn't read more than 20 rows even if you provide them) and it returns a Csv_Dialect object. Csv_Sniffer::hasHeader() accepts a sample and returns true if the sample likely has a header and false otherwise.
Code: Select all
$file = 'data/products.csv';
$rows = file($file);
$sample = implode("", array_slice($rows, 0, 20 // grab 20 lines of a csv file as a string
try {
$dialect = $sniffer->sniff($sample)
$reader = new Csv_Reader($file, $dialect);
} catch (Csv_Exception_CannotDetermineDialect $e) {
echo "<p class=\"error\">Sorry, unable to determine csv format</p>";
}
if ($sniffer->hasHeader($sample)) {
echo "<p>File probably has a header</p>";
} else {
echo "<p>File probably does not have a header</p>";
}
Check our blog for more info and updates. The project is hosted here