Identifying similarities between 2 strings

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
visionmaster
Forum Contributor
Posts: 139
Joined: Wed Jul 14, 2004 4:06 am

Identifying similarities between 2 strings

Post by visionmaster »

Hello together,

I'm having problems finding a way to solve this one:

Code: Select all

$value = "CMC Concept";
$strFirmenname = "CMC    Conceptagentur fuer Marketing  und Communication GmbH";

// Get rid of more than one space
$pattern = "/(ї ]{2,})/";
$replacement = " ";

$strFirmenname = preg_replace($pattern, $replacement, $strFirmenname);
$value = preg_replace($pattern, $replacement, $value);        

echo "$value<br>";
echo "$strFirmenname<br>";
    
if (preg_match("/\\$value/i", $strFirmenname)) &#123;
    echo "A match was found.<br>";
&#125;
else &#123;
    echo "A match was _not_ found.";
&#125;
Output:
--------
A match was found.

Another input:
----------------
Now suppose $value holds another string:

$value = "CMC Concept - Testsatz steht hier";
$strFirmenname = "CMC Conceptagentur fuer Marketing und Communication GmbH";

=> Logically 'A match was _not_ found.' is outputed

Question:
-----------
How can I find a string which is in both of $value and $strFirmenname?
Here it would e.g. be 'CMC Concept'. I actually have no idea how I can solve that. I suppose there is no php function which brings in the function I need...

Appreciate any help! Thanks!
AGISB
Forum Contributor
Posts: 422
Joined: Fri Jul 09, 2004 1:23 am

Post by AGISB »

Where do you want to start? One letter is a string.

This funktion would have to run thousands of possible combinations just in short strings like yours.

If you limit the output to 4 letter portions e.g. you could loop thru string2 and check if a string of 4 or more letters is found in string 1

If you know what string you check for this is easy. If not this can be a function that runs for quite some time
Last edited by AGISB on Wed Jan 19, 2005 1:46 am, edited 1 time in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

AGISB
Forum Contributor
Posts: 422
Joined: Fri Jul 09, 2004 1:23 am

Post by AGISB »

Like levenshtein(), and soundex() you get a number with similar_text()

The way he described it he wants to find out the substring that is equal.

After a second thought I am not even sure if that problem makes any sense at all if the string you look for is not known.

You could get a list of many subsrings that are equal and have again no connection to each other and the problem on hand.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

yeah.. substring lookup isn't too difficult, even in crazy modes... but covering mispellings and things.. that's a different matter. The sound functions are better at handling that kinda stuff for sure, but will take longer to process.

now, if you could get the strings in phonemes.. then you're talking much easier work with fuzzy matching. :)
visionmaster
Forum Contributor
Posts: 139
Joined: Wed Jul 14, 2004 4:06 am

Post by visionmaster »

Thanks all for your comments and tips!

I would like to identify the largest string that matches both for my above example 'CMC Concept'

Another example would be

$value = strtolower("Laborgeraeteboerse GmbH - Labor- und Analysengeraete mit Garantie zu unschlagbar guenstigen Preisen");
$strFirmenname = strtolower("LABORGERAETEBOERSE Handelsgesellschaft für Analysensysteme mbH");

-> Here the match would be the substring 'Laborgeraeteboerse'

If I know what my substring is then finding a substring is easy, but how to I find the largest substring in both strings?

Thanks for your help!
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I can think of a few ways:

common part
  1. convert to lower case, and replace all extended characters with their major character.. like ü :arrow: u
  2. toss certain characters like dashes and things of that nature
  1. break the sentence up into their words, finding the common words. Or rather, toss the nonmatching ones.
  2. know how they were mapped so you can attempt to display the common parts.
  1. there's a potential to create a regular expression that handles the matching using a regular expression to convert the string into a usable pattern
Post Reply