Data Shuffler - A data Mapper

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

Post Reply
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Data Shuffler - A data Mapper

Post by josh »

http://datashuffler.org/

Direct download link:
http://code.google.com/p/data-shuffler/downloads/list

Been working on this for a while. I've realized I'm going to need the help of the community for usability testing, documentation, code reviews etc... If anyone has time to try this out I'd appreciate any feedback! Enjoy
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Data Shuffler - A data Mapper

Post by Eran »

Not really sure why you double posted - I see both threads since I only ever view the "View last posts" and not a particular forum. I think most forum regulars do the same.

Regarding your mapping framework - I think it's a novel idea and you certainly put in the time and effort.
However, I think it might be a little too abstract for most developers, since its essentially learning another domain language when most developers are quite familiar with SQL. Another problem is flexibility - the SQL needed to retrieve more complex data sets can get very advanced and often has to be custom generated. How do you deal with such instances in your framework? what about table relationships?

My last point is regarding ignoring the existence of an SQL database as the underlying storage engine - while good in the name of abstraction, it does complicate the process (you can see that in needing to use a framework such as yours to achieve that) and prevents engine-specific optimizations which can make a lot of difference. The probability of abandoning an SQL database as the storage engine mid-project is so low, that I don't think it is worth incurring the overhead of using such abstractions.

I am interested to see how you will handle table relationships, as it is a subject I'm currently tackling.

By the way, notice that you put the documentation of your tests as the API documentation instead of the project classes.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Data Shuffler - A data Mapper

Post by josh »

pytrin wrote:Not really sure why you double posted - I see both threads since I only ever view the "View last posts" and not a particular forum. I think most forum regulars do the same.
I thought I remembered other members of the community cross posting from general to coding critique. I was warned by the mods and it wont happen again.
pytrin wrote: Regarding your mapping framework - I think it's a novel idea and you certainly put in the time and effort.
However, I think it might be a little too abstract for most developers, since its essentially learning another domain language when most developers are quite familiar with SQL. Another problem is flexibility - the SQL needed to retrieve more complex data sets can get very advanced and often has to be custom generated. How do you deal with such instances in your framework? what about table relationships?
Data mappers allow data schema and object schema to change independently, its not a query object implementation, there's a difference. You subclass the mapper and create specialized finders. Handles relationships. Still need to implement association tables / improve some other things but this is just a preview release.
pytrin wrote:My last point is regarding ignoring the existence of an SQL database as the underlying storage engine - while good in the name of abstraction, it does complicate the process (you can see that in needing to use a framework such as yours to achieve that) and prevents engine-specific optimizations which can make a lot of difference. The probability of abandoning an SQL database as the storage engine mid-project is so low, that I don't think it is worth incurring the overhead of using such abstractions.
This does use SQL, it uses Zend_Select. Data Mapper acknowledges the existence of the database where as active record tends to treat it like it doesnt exist.
By the way, notice that you put the documentation of your tests as the API documentation instead of the project classes.
Change the package from the default package.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Data Shuffler - A data Mapper

Post by Eran »

This does use SQL, it uses Zend_Select. Data Mapper acknowledges the existence of the database where as active record tends to treat it like it doesnt exist.
Of course it uses SQL (and I liked the ZendDb adapter btw, useful :) ), I meant it attempts to hide SQL from the level of the model, which is sometimes impossible. Or maybe I'm not understanding completely how it would work. I think some more advanced examples are in order (such of the use of several joins and some sub-selects).

Also, how do you generate the required query before the scripts knows everything it would need to retrieve (to avoid what you call ripple-loading)?
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Data Shuffler - A data Mapper

Post by josh »

pytrin wrote:Of course it uses SQL (and I liked the ZendDb adapter btw, useful :) ), I meant it attempts to hide SQL from the level of the model, which is sometimes impossible.
Well I'm not refuting that other patterns are better in some cases. I don't think its impossible to keep SQL out of the model. It's definitely impossible to not write any SQL at all, but there's nothing impossible about encapsulating your queries in a different layer than the domain layer, which results in more re-usable and lightweight models, that can be used even when databases aren't present
pytrin wrote: Or maybe I'm not understanding completely how it would work. I think some more advanced examples are in order (such of the use of several joins and some sub-selects).
I'm working on it, if you check out the tests theres some examples, and if you look at the API docs you'll see you can also do ->addSingle for a single valued relation, ->addCollection for a multi-valued relation, and also ->addPlugin to tell the mapper a field should be used as a "discriminator" field for single table inheritance / 'plugins'. Right now it aggressively loads which means you definitely shouldn't invoke findAll() on a huge table, ripple loading is only a problem once lazy loading is implemented
pytrin wrote:Also, how do you generate the required query before the scripts knows everything it would need to retrieve (to avoid what you call ripple-loading)?
Sounds like 2 different questions, query is generated based on a combination of the mappings you set up and mappings auto-detected from the table meta data. When you create your mapper you can override select(), insert(), update(), and delete() if you need your own logic.

I appreciate the feedback keep it coming, this will help me identify the missing holes in my documentation
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Data Shuffler - A data Mapper

Post by Eran »

If I'm understanding correctly, by saying it aggressively loads you mean that it loads everything ahead of time regardless of what will actually be needed it the script? what options exist in your framework to be more specific on what's to be loaded for a specific operation?
and if you look at the API docs you'll see you can also do ->addSingle for a single valued relation, ->addCollection for a multi-valued relation, and also ->addPlugin to tell the mapper a field should be used as a "discriminator" field for single table inheritance / 'plugins'.
I went through the API docs and didn't find relevant examples. I didn't go through everything though, so I might have missed it.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Data Shuffler - A data Mapper

Post by josh »

pytrin wrote:If I'm understanding correctly, by saying it aggressively loads you mean that it loads everything ahead of time regardless of what will actually be needed it the script? what options exist in your framework to be more specific on what's to be loaded for a specific operation?
Its no different from SQL, if you only want a limited amount of rows you'd create a finder that does limit clauses. I guess I need a few more finders along with more docs on how to create custom finders + queries.
I went through the API docs and didn't find relevant examples. I didn't go through everything though, so I might have missed it.
Its on the mapper class, Shuffler_Mapper. These are factory methods that create instances of Shuffler_Mapping subclasses internally and add them to the mapper
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Data Shuffler - A data Mapper

Post by Eran »

I saw that. It's a barebones API description without examples, not much use to someone who has no idea how this works in practice.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Data Shuffler - A data Mapper

Post by josh »

API docs aren't for examples. I'm trying to learn docbook syntax at the same time as Im writing the docs, I'll be continually updating it though.
Post Reply