Page 1 of 1

Identifying similarities between 2 strings

Posted: Tue Jan 18, 2005 1:51 pm
by visionmaster
Hello together,

I'm having problems finding a way to solve this one:

Code: Select all

$value = "CMC Concept";
$strFirmenname = "CMC    Conceptagentur fuer Marketing  und Communication GmbH";

// Get rid of more than one space
$pattern = "/(ї ]{2,})/";
$replacement = " ";

$strFirmenname = preg_replace($pattern, $replacement, $strFirmenname);
$value = preg_replace($pattern, $replacement, $value);        

echo "$value<br>";
echo "$strFirmenname<br>";
    
if (preg_match("/\\$value/i", $strFirmenname)) &#123;
    echo "A match was found.<br>";
&#125;
else &#123;
    echo "A match was _not_ found.";
&#125;
Output:
--------
A match was found.

Another input:
----------------
Now suppose $value holds another string:

$value = "CMC Concept - Testsatz steht hier";
$strFirmenname = "CMC Conceptagentur fuer Marketing und Communication GmbH";

=> Logically 'A match was _not_ found.' is outputed

Question:
-----------
How can I find a string which is in both of $value and $strFirmenname?
Here it would e.g. be 'CMC Concept'. I actually have no idea how I can solve that. I suppose there is no php function which brings in the function I need...

Appreciate any help! Thanks!

Posted: Tue Jan 18, 2005 2:07 pm
by AGISB
Where do you want to start? One letter is a string.

This funktion would have to run thousands of possible combinations just in short strings like yours.

If you limit the output to 4 letter portions e.g. you could loop thru string2 and check if a string of 4 or more letters is found in string 1

If you know what string you check for this is easy. If not this can be a function that runs for quite some time

Posted: Tue Jan 18, 2005 2:12 pm
by feyd

Posted: Wed Jan 19, 2005 1:45 am
by AGISB
Like levenshtein(), and soundex() you get a number with similar_text()

The way he described it he wants to find out the substring that is equal.

After a second thought I am not even sure if that problem makes any sense at all if the string you look for is not known.

You could get a list of many subsrings that are equal and have again no connection to each other and the problem on hand.

Posted: Wed Jan 19, 2005 1:50 am
by feyd
yeah.. substring lookup isn't too difficult, even in crazy modes... but covering mispellings and things.. that's a different matter. The sound functions are better at handling that kinda stuff for sure, but will take longer to process.

now, if you could get the strings in phonemes.. then you're talking much easier work with fuzzy matching. :)

Posted: Wed Jan 19, 2005 3:00 am
by visionmaster
Thanks all for your comments and tips!

I would like to identify the largest string that matches both for my above example 'CMC Concept'

Another example would be

$value = strtolower("Laborgeraeteboerse GmbH - Labor- und Analysengeraete mit Garantie zu unschlagbar guenstigen Preisen");
$strFirmenname = strtolower("LABORGERAETEBOERSE Handelsgesellschaft für Analysensysteme mbH");

-> Here the match would be the substring 'Laborgeraeteboerse'

If I know what my substring is then finding a substring is easy, but how to I find the largest substring in both strings?

Thanks for your help!

Posted: Wed Jan 19, 2005 3:14 am
by feyd
I can think of a few ways:

common part
  1. convert to lower case, and replace all extended characters with their major character.. like ü :arrow: u
  2. toss certain characters like dashes and things of that nature
  1. break the sentence up into their words, finding the common words. Or rather, toss the nonmatching ones.
  2. know how they were mapped so you can attempt to display the common parts.
  1. there's a potential to create a regular expression that handles the matching using a regular expression to convert the string into a usable pattern