Page 1 of 1

Identifying common sequences in two strings

Posted: Tue Oct 23, 2007 12:19 am
by aaronhall
I've got a problem that just surfaced, and I can't immediately see an easy solution... I'm hoping I'm over-thinking this and that someone can see this a different way.

Given two strings:

Code: Select all

$foo = "The name of the cat is Sprinkles. She likes balls of yarn, and has a hairball problem";
$bar = "The old lady knits a cat with a ball of yarn, enjoys sprinkles on her ice cream, and has a hairball problem";
Function f($foo, $bar) returns an array of continuous sequences of 2 or more characters that are common to both strings:

Code: Select all

array('the', 'cat', 'sprinkles', 'of yarn', 'and has a hairball problem') // these would naturally include leading/trailing spaces, but pretend i trimmed them
...in any order. I'm not concerned that it's implemented exactly like this, just that I can identify common sequences of variable length >1. Not certain if this is a regexp solution or not, so I'll post here for now. Can someone shove me in the right direction?

Posted: Tue Oct 23, 2007 1:27 am
by alex.barylski
Sounds like you'd probalby wanna be looking into implementing something like: http://en.wikipedia.org/wiki/Longest_co ... ce_problem

That is the algorithm (or a close derivative) used in diff'ing files, etc...which is what it sounds like you are trying to do...fortunately LCS is probably one of the easiest of many algorithms you learn in computer science, etc...

Well documented, well explained, many times implemented.

Have fun, hope it helps. :)

Posted: Tue Oct 23, 2007 1:31 am
by aaronhall
Hockey, you're awesome... exactly what I was looking for. Never had the privilege of formal CS training, and I had a feeling it was something in 101.

Thanks again... you can has cheezburger :D

Posted: Tue Oct 23, 2007 1:40 am
by alex.barylski
I can has cheezburger...LOL

I Googled funny cats last night...AWESOME...I love cats...those pictures are cute as hell.

Posted: Tue Oct 23, 2007 3:02 am
by Kieran Huggins
It should not be confused with the longest common substring problem (substrings are necessarily contiguous).

So... I can haz potayto chip?
Image

Posted: Tue Oct 23, 2007 3:37 am
by aaronhall
Now we're getting somewhere.

Kieran, you can haz fiftee potayto chip.