Determining shortest unique substring

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: Determining shortest unique substring

Post by Ambush Commander »

I'm sorry if I caused any confusion. By your implementation, 4th post would be like this:

Code: Select all

a
b
c
cd
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Determining shortest unique substring

Post by VladSun »

Ambush Commander wrote:I'm sorry if I caused any confusion. By your implementation, 4th post would be like this:

Code: Select all

a
b
c
cd
I'ts OK - I figured it out :)
But still your previous post and you 4-th post requirements don't match ... I really can't understand what you are trying to achieve :)

PS: I'm not quite sure but *maybe* the strpos() test function I used in the first example is closer to what you are looking for... It seems to me that you are looking for maximum unique substr ...
Last edited by VladSun on Thu Jan 31, 2008 4:51 pm, edited 2 times in total.
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: Determining shortest unique substring

Post by Ambush Commander »

Ok, maybe some "purpose" information would be helpful. I'd like to be able to reduce full SHA-1 hashes down to abbreviated, unique versions, i.e. 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 to 86f7e437f. So, let's suppose we have a database of pre-existing hashes, I'd like to retrieve both the hash and the compact version.

So, this means a number of important conditions:

1. Hashes must be calculated on the fly, or re-calculated with every new hash inserted into the table. This is because any new hash added to the database gives the possibility of a new collision and a short ID that is now invalid because it is ambiguous.

2. We must be able to retrieve the original hash with the shortened hash (ideally, we'd be able to get multiple hashes for this function, so we can present a disambiguation request if a user specifies an ambiguous prefix).

3. Abbreviated hashes should have a minimum length, to prevent excessive collisions.

Does that clarify things?
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Determining shortest unique substring

Post by VladSun »

So, you need to minimize the hashes length, while still keeping them unique?
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: Determining shortest unique substring

Post by Ambush Commander »

Yep!
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Determining shortest unique substring

Post by VladSun »

Hm... in order to have all of these conditions satisfied you need to store in the DB both the shortened hashes and the full length original hashes ... So, it doesn't make sense to have the shortened version, while you have the full version of the hash (and it is a must).
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: Determining shortest unique substring

Post by Ambush Commander »

Unless I have a really efficient way of calculating the shortened hash. Which is why we have this topic. :)
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Determining shortest unique substring

Post by VladSun »

And what will be the advantages of having this shortened version?
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: Determining shortest unique substring

Post by Ambush Commander »

It's more convenient for display. In this particular case, we're referring to a distributed revision control system, where an auto increment ID isn't feasible and changesets are referred to using hashes.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Determining shortest unique substring

Post by VladSun »

As far as I can understand it (please excuse my English :) ) you need to have this "shortened hash" as a version number?
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Re: Determining shortest unique substring

Post by Ambush Commander »

Yes.

If you want to read more about it, see here: http://www.selenic.com/mercurial/wiki/i ... gesetid%29
Post Reply