Identifying common sequences in two strings

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
aaronhall
DevNet Resident
Posts: 1040
Joined: Tue Aug 13, 2002 5:10 pm
Location: Back in Phoenix, missing the microbrews
Contact:

Identifying common sequences in two strings

Post by aaronhall »

I've got a problem that just surfaced, and I can't immediately see an easy solution... I'm hoping I'm over-thinking this and that someone can see this a different way.

Given two strings:

Code: Select all

$foo = "The name of the cat is Sprinkles. She likes balls of yarn, and has a hairball problem";
$bar = "The old lady knits a cat with a ball of yarn, enjoys sprinkles on her ice cream, and has a hairball problem";
Function f($foo, $bar) returns an array of continuous sequences of 2 or more characters that are common to both strings:

Code: Select all

array('the', 'cat', 'sprinkles', 'of yarn', 'and has a hairball problem') // these would naturally include leading/trailing spaces, but pretend i trimmed them
...in any order. I'm not concerned that it's implemented exactly like this, just that I can identify common sequences of variable length >1. Not certain if this is a regexp solution or not, so I'll post here for now. Can someone shove me in the right direction?
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Sounds like you'd probalby wanna be looking into implementing something like: http://en.wikipedia.org/wiki/Longest_co ... ce_problem

That is the algorithm (or a close derivative) used in diff'ing files, etc...which is what it sounds like you are trying to do...fortunately LCS is probably one of the easiest of many algorithms you learn in computer science, etc...

Well documented, well explained, many times implemented.

Have fun, hope it helps. :)
User avatar
aaronhall
DevNet Resident
Posts: 1040
Joined: Tue Aug 13, 2002 5:10 pm
Location: Back in Phoenix, missing the microbrews
Contact:

Post by aaronhall »

Hockey, you're awesome... exactly what I was looking for. Never had the privilege of formal CS training, and I had a feeling it was something in 101.

Thanks again... you can has cheezburger :D
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

I can has cheezburger...LOL

I Googled funny cats last night...AWESOME...I love cats...those pictures are cute as hell.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

It should not be confused with the longest common substring problem (substrings are necessarily contiguous).

So... I can haz potayto chip?
Image
User avatar
aaronhall
DevNet Resident
Posts: 1040
Joined: Tue Aug 13, 2002 5:10 pm
Location: Back in Phoenix, missing the microbrews
Contact:

Post by aaronhall »

Now we're getting somewhere.

Kieran, you can haz fiftee potayto chip.
Post Reply