Page 1 of 1

Data Shuffler - A data Mapper

Posted: Sat Feb 14, 2009 1:43 pm
by josh
http://datashuffler.org/

Direct download link:
http://code.google.com/p/data-shuffler/downloads/list

Been working on this for a while. I've realized I'm going to need the help of the community for usability testing, documentation, code reviews etc... If anyone has time to try this out I'd appreciate any feedback! Enjoy

Re: Data Shuffler - A data Mapper

Posted: Sat Feb 14, 2009 2:08 pm
by Eran
Not really sure why you double posted - I see both threads since I only ever view the "View last posts" and not a particular forum. I think most forum regulars do the same.

Regarding your mapping framework - I think it's a novel idea and you certainly put in the time and effort.
However, I think it might be a little too abstract for most developers, since its essentially learning another domain language when most developers are quite familiar with SQL. Another problem is flexibility - the SQL needed to retrieve more complex data sets can get very advanced and often has to be custom generated. How do you deal with such instances in your framework? what about table relationships?

My last point is regarding ignoring the existence of an SQL database as the underlying storage engine - while good in the name of abstraction, it does complicate the process (you can see that in needing to use a framework such as yours to achieve that) and prevents engine-specific optimizations which can make a lot of difference. The probability of abandoning an SQL database as the storage engine mid-project is so low, that I don't think it is worth incurring the overhead of using such abstractions.

I am interested to see how you will handle table relationships, as it is a subject I'm currently tackling.

By the way, notice that you put the documentation of your tests as the API documentation instead of the project classes.

Re: Data Shuffler - A data Mapper

Posted: Sat Feb 14, 2009 2:12 pm
by josh
pytrin wrote:Not really sure why you double posted - I see both threads since I only ever view the "View last posts" and not a particular forum. I think most forum regulars do the same.
I thought I remembered other members of the community cross posting from general to coding critique. I was warned by the mods and it wont happen again.
pytrin wrote: Regarding your mapping framework - I think it's a novel idea and you certainly put in the time and effort.
However, I think it might be a little too abstract for most developers, since its essentially learning another domain language when most developers are quite familiar with SQL. Another problem is flexibility - the SQL needed to retrieve more complex data sets can get very advanced and often has to be custom generated. How do you deal with such instances in your framework? what about table relationships?
Data mappers allow data schema and object schema to change independently, its not a query object implementation, there's a difference. You subclass the mapper and create specialized finders. Handles relationships. Still need to implement association tables / improve some other things but this is just a preview release.
pytrin wrote:My last point is regarding ignoring the existence of an SQL database as the underlying storage engine - while good in the name of abstraction, it does complicate the process (you can see that in needing to use a framework such as yours to achieve that) and prevents engine-specific optimizations which can make a lot of difference. The probability of abandoning an SQL database as the storage engine mid-project is so low, that I don't think it is worth incurring the overhead of using such abstractions.
This does use SQL, it uses Zend_Select. Data Mapper acknowledges the existence of the database where as active record tends to treat it like it doesnt exist.
By the way, notice that you put the documentation of your tests as the API documentation instead of the project classes.
Change the package from the default package.

Re: Data Shuffler - A data Mapper

Posted: Sat Feb 14, 2009 2:16 pm
by Eran
This does use SQL, it uses Zend_Select. Data Mapper acknowledges the existence of the database where as active record tends to treat it like it doesnt exist.
Of course it uses SQL (and I liked the ZendDb adapter btw, useful :) ), I meant it attempts to hide SQL from the level of the model, which is sometimes impossible. Or maybe I'm not understanding completely how it would work. I think some more advanced examples are in order (such of the use of several joins and some sub-selects).

Also, how do you generate the required query before the scripts knows everything it would need to retrieve (to avoid what you call ripple-loading)?

Re: Data Shuffler - A data Mapper

Posted: Sat Feb 14, 2009 2:25 pm
by josh
pytrin wrote:Of course it uses SQL (and I liked the ZendDb adapter btw, useful :) ), I meant it attempts to hide SQL from the level of the model, which is sometimes impossible.
Well I'm not refuting that other patterns are better in some cases. I don't think its impossible to keep SQL out of the model. It's definitely impossible to not write any SQL at all, but there's nothing impossible about encapsulating your queries in a different layer than the domain layer, which results in more re-usable and lightweight models, that can be used even when databases aren't present
pytrin wrote: Or maybe I'm not understanding completely how it would work. I think some more advanced examples are in order (such of the use of several joins and some sub-selects).
I'm working on it, if you check out the tests theres some examples, and if you look at the API docs you'll see you can also do ->addSingle for a single valued relation, ->addCollection for a multi-valued relation, and also ->addPlugin to tell the mapper a field should be used as a "discriminator" field for single table inheritance / 'plugins'. Right now it aggressively loads which means you definitely shouldn't invoke findAll() on a huge table, ripple loading is only a problem once lazy loading is implemented
pytrin wrote:Also, how do you generate the required query before the scripts knows everything it would need to retrieve (to avoid what you call ripple-loading)?
Sounds like 2 different questions, query is generated based on a combination of the mappings you set up and mappings auto-detected from the table meta data. When you create your mapper you can override select(), insert(), update(), and delete() if you need your own logic.

I appreciate the feedback keep it coming, this will help me identify the missing holes in my documentation

Re: Data Shuffler - A data Mapper

Posted: Sat Feb 14, 2009 2:36 pm
by Eran
If I'm understanding correctly, by saying it aggressively loads you mean that it loads everything ahead of time regardless of what will actually be needed it the script? what options exist in your framework to be more specific on what's to be loaded for a specific operation?
and if you look at the API docs you'll see you can also do ->addSingle for a single valued relation, ->addCollection for a multi-valued relation, and also ->addPlugin to tell the mapper a field should be used as a "discriminator" field for single table inheritance / 'plugins'.
I went through the API docs and didn't find relevant examples. I didn't go through everything though, so I might have missed it.

Re: Data Shuffler - A data Mapper

Posted: Sat Feb 14, 2009 2:53 pm
by josh
pytrin wrote:If I'm understanding correctly, by saying it aggressively loads you mean that it loads everything ahead of time regardless of what will actually be needed it the script? what options exist in your framework to be more specific on what's to be loaded for a specific operation?
Its no different from SQL, if you only want a limited amount of rows you'd create a finder that does limit clauses. I guess I need a few more finders along with more docs on how to create custom finders + queries.
I went through the API docs and didn't find relevant examples. I didn't go through everything though, so I might have missed it.
Its on the mapper class, Shuffler_Mapper. These are factory methods that create instances of Shuffler_Mapping subclasses internally and add them to the mapper

Re: Data Shuffler - A data Mapper

Posted: Sat Feb 14, 2009 3:00 pm
by Eran
I saw that. It's a barebones API description without examples, not much use to someone who has no idea how this works in practice.

Re: Data Shuffler - A data Mapper

Posted: Sat Feb 14, 2009 3:03 pm
by josh
API docs aren't for examples. I'm trying to learn docbook syntax at the same time as Im writing the docs, I'll be continually updating it though.