Page 1 of 1

Sophisticated versioning with tables

Posted: Wed Feb 24, 2010 11:08 am
by alex.barylski
I have a table sequences, which looks like:

Code: Select all

id_primary, id_previous, id_task, description
The fields id_previous and id_task are used for a quasi-version control. Essentially records with id_previous of ZERO are orginal records, whereas a user can clone a record and the id_previous of that record points to the original in the table.

There might be three duplicate entires in the table, like so:

Code: Select all

1, 0, 0, "This is a original description"
2, 1, 1, "This is a original description -- modified by a user"
3, 1, 2, "This is a original description -- modified again by a user"
I need a query (somehow) to SELECT all records by default id_task equals ZERO, which is easy.

Here is the challenging caveat: I only wish to show those whom have a id_task of ZERO when a record whose id_previous to match, does not exist.

So essentially the current SQL looks like:

[sql]SELECT * FROM sequences WHERE id_task = $id_task OR (id_task = 0 AND id_previous != id_primary)[/sql]

Obviously this SQL does not work, I assume I probably need a sub-select. I am reading this article provided the other day by pytrin:

http://www.xaprb.com/blog/2006/12/07/ho ... up-in-sql/

However I cannot seem to put pieces togather and figure out a way to essentially remove duplicates using the logic/specifications above.

Any ideas??

In PHP this could be done easily by iterating the list and checking for any records whose id_primary is also referenced in any other records id_previous, which is the cloned copy I need shown. I would rather have this done in SQL not PHP.

Cheers,
Alex

Re: Sophisticated versioning with tables

Posted: Wed Feb 24, 2010 11:15 am
by Eran
when a record whose id_previous to match, does not exist.
do you mean a record with an id_previous that has no matches in the table for id_primary? how come it has a value if it has no matches? (should be null I would think)

Re: Sophisticated versioning with tables

Posted: Wed Feb 24, 2010 11:24 am
by alex.barylski
do you mean a record with an id_previous that has no matches in the table for id_primary? how come it has a value if it has no matches? (should be null I would think)
It has a value only because ZERO is what I always use to represent NULL. Could be NULL I suppose just a force of habut to use ZERO.

The table sequences starts off with original entries. As the demo data shows (hopefully) entries can be cloned and modified but only the cloned version assigned to a given task should ever be shown, unless there is no clone in which case the original should be shown.

Original records never have a valid (in this case ZERO) id_previous nor do they have id_task.

id_task is used to associate cloned entries with a specific individual. When a user picks a task, that ID is stored in SESSION and they are shown all the live records (those which have ZERO as id_previous) and in addition, their own cloned records which may have been modified.

The caveat being the modified records should overwrite the results of the global, perhaps a better description of the problm?

[sql]SELECT * FROM sequences WHERE id_task = 0 OR id_task = $id_task[/sql]

This returns all records of interest but includes duplicates (original and clone) and I must NOT have the original if the clone exists, hopefully via the relationship between id_previous and id_primary of the original.

I wonder if it's possible to somehow select the original records first and then select the cloned versions, and using the id_previous, overwrite/merge the latter results with the former???

Cheers,
Alex

Re: Sophisticated versioning with tables

Posted: Wed Feb 24, 2010 11:34 am
by Eran
It sounds to me like the problem is in your schema and not queries. There are appears to be a one-to-many relationship between records and clones / versions, which you are using one table to store (instead of 2). I am not familiar enough with your domain to make any concrete statement on that.
Basically I don't see a way to reasonably achieve what you are asking for with this structure, as the separate rows (original + clones) are never a part of a group (so you can't use the technique shown in the article I linked you to).

You might want to have a look at this article about versioning with MySQL - http://www.adaniels.nl/articles/versioning-mysql-data/
Personally, I wouldn't use all those triggers, but the the basic idea is sound.

Re: Sophisticated versioning with tables

Posted: Wed Feb 24, 2010 11:54 am
by alex.barylski
It sounds to me like the problem is in your schema and not queries. There are appears to be a one-to-many relationship between records and clones / versions, which you are using one table to store (instead of 2).
Maybe there is, but it's the best solution I could come up with that did not involve re-writting a ton of code.

Not sure how I would use two tables, instead of one. While that might make sense, that now means I have to introduce a duplicate/mirror table with a few extra fields for versioning for another 7-8 tables (with the possibility for adding more).

That to me, seems like more work than using a single table with versioning fields.

Experimenting with UNION looks like I can query the same table twice, once using id_primary and once using id_previous, if I could now just GROUP BY id_primary and ensure the latest version was returned I think I would have my solution.

Cheers,
Alex

Re: Sophisticated versioning with tables

Posted: Wed Feb 24, 2010 11:58 am
by Eran
You can query the same table as many times as you wish (self join), but you can't group it since there is no common field to group by.
that now means I have to introduce a duplicate/mirror table with a few extra fields for versioning for another 7-8 tables (with the possibility for adding more).
That's how it's normally done. I implemented a similar structure for the issue tracker I mentioned (we version issue data). True, it is complicated versioning using MySQL as it's not something it's very good at.

You can try the following query on your structure to remove the originals:
[sql]SELECT * FROM sequences LEFT JOIN (    SELECT COUNT(*) AS count,id_previous FROM sequences GROUP BY id_previous) AS children ON children.id_previous=sequences.id_primaryWHERE  (id_task = 0 AND children.count = 0) OR id_task = $id_task[/sql]
Edit:
I'm not sure the count condition should be = 0 or IS NULL, try both