Page 2 of 2
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 4:35 pm
by Ambush Commander
I'm sorry if I caused any confusion. By your implementation, 4th post would be like this:
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 4:38 pm
by VladSun
Ambush Commander wrote:I'm sorry if I caused any confusion. By your implementation, 4th post would be like this:
I'ts OK - I figured it out

But still your previous post and you 4-th post requirements don't match ... I really can't understand what you are trying to achieve
PS: I'm not quite sure but *maybe* the strpos() test function I used in the first example is closer to what you are looking for... It seems to me that you are looking for maximum unique substr ...
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 4:49 pm
by Ambush Commander
Ok, maybe some "purpose" information would be helpful. I'd like to be able to reduce full SHA-1 hashes down to abbreviated, unique versions, i.e. 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 to 86f7e437f. So, let's suppose we have a database of pre-existing hashes, I'd like to retrieve both the hash and the compact version.
So, this means a number of important conditions:
1. Hashes must be calculated on the fly, or re-calculated with every new hash inserted into the table. This is because any new hash added to the database gives the possibility of a new collision and a short ID that is now invalid because it is ambiguous.
2. We must be able to retrieve the original hash with the shortened hash (ideally, we'd be able to get multiple hashes for this function, so we can present a disambiguation request if a user specifies an ambiguous prefix).
3. Abbreviated hashes should have a minimum length, to prevent excessive collisions.
Does that clarify things?
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 4:57 pm
by VladSun
So, you need to minimize the hashes length, while still keeping them unique?
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 5:01 pm
by Ambush Commander
Yep!
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 5:39 pm
by VladSun
Hm... in order to have all of these conditions satisfied you need to store in the DB both the shortened hashes and the full length original hashes ... So, it doesn't make sense to have the shortened version, while you have the full version of the hash (and it is a must).
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 5:41 pm
by Ambush Commander
Unless I have a really efficient way of calculating the shortened hash. Which is why we have this topic.

Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 5:43 pm
by VladSun
And what will be the advantages of having this shortened version?
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 5:50 pm
by Ambush Commander
It's more convenient for display. In this particular case, we're referring to a distributed revision control system, where an auto increment ID isn't feasible and changesets are referred to using hashes.
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 5:55 pm
by VladSun
As far as I can understand it (please excuse my English

) you need to have this "shortened hash" as a version number?
Re: Determining shortest unique substring
Posted: Thu Jan 31, 2008 6:00 pm
by Ambush Commander