PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Sat Jul 21, 2018 12:22 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Wed Oct 04, 2017 5:29 pm 
Offline
Forum Contributor

Joined: Wed Jan 18, 2017 4:43 pm
Posts: 197
Folks!

I am trying to add a banned words filter onto a web proxy.
I am NOT searching for banned words within other words on a page but searching for banned words within a loaded page.
I am not actually looking for banned words within other words but within the page (meta tags, content).

And so, if I am looking for the word "cock", then the word "cockerel" should not trigger the filter.

I just tested this code and, yes, as expected the code works but as you can guess there is a lot of cpu power cycling through. One moment the page loads, the other moment it goes grey and shows signs that the page is taking too long to load. And all this on localhost. Now, I can imagine what my webhost would do!
So now, we will have to come-up with a better solution. Any ideas ?
How-about we do not get the script to check on the loaded page for all the banned words ? How-about we get the script to halt as soon as 1 banned word is found and an echo has been made which banned word has been found and where on the page ? (meta tags, body content, etc.).
Any code suggestions ?

Here is what I got so far:

Syntax: [ Download ] [ Hide ]
    <?php
 
    /*
    ERROR HANDLING
    */

 
    // 1). $curl is going to be data type curl resource.
    $curl = curl_init();
 
    // 2). Set cURL options.
    curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-
    words-you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X'
);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
 
    // 3). Run cURL (execute http request).
    $result = curl_exec($curl);
    $response = curl_getinfo( $curl );
 
    if( $response['http_code'] == '200' )
        {
            //Set banned words.
            $banned_words = array("Prick","Dick","***");
 
            //Separate each words found on the cURL fetched page.
            $word = explode(" ", $result);
   
           //var_dump($word);
 
           for($i = 0; $i <= count($word); $i++)
           {
               foreach ($banned_words as $ban)
               {
                  if (strtolower($word[$i]) == strtolower($ban))
                  {
                      echo "word: $word[$i]<br />";
                      echo "Match: $ban<br>";
               }
              else
               {
                     echo "word: $word[$i]<br />";
                     echo "No Match: $ban<br>";  
                }
             }
          }
       }  
 
    // 4). Close cURL resource.
    curl_close($curl);
 


I am told to do it like this:

**Load the page into a string.
Use preg_match with "word boundaries" on the loaded string and loop through your banned words.**

UPDATE:
I updated my code inserting miknik's codes. It was working fine until I added this line before the cURL:
$banned_words = array("Prick","Dick","***");

Here's the update:

Syntax: [ Download ] [ Hide ]
    <?php
 
    /*
    ERROR HANDLING
    */


    // 1). Set banned words.
    $banned_words = array("Prick","Dick","***");
 
    // 2). $curl is going to be data type curl resource.
    $curl = curl_init();
 
    // 3). Set cURL options.
    curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-
    words-
    you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X'
);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
 
    // 4). Run cURL (execute http request).
    $result = curl_exec($curl);
    $response = curl_getinfo( $curl );
 
    if($response['http_code'] == '200' )
             {
                          $regex = '/\b';      // The beginning of the regex string syntax
                          $regex .= implode('\b|\b', $banned_words);      // joins all the
              banned words to the string with correct regex syntax
                          $regex .= '\b/i';    // Adds ending to regex syntax. Final i makes
              it case insensitive
                          $substitute = '****';
                          $cleanresult = preg_replace($regex, $substitute, $result);
                          echo $cleanresult;
             }

      curl_close($curl);

      ?>
 


Why do I now see a complete blank page ?


Top
 Profile  
 
PostPosted: Thu Oct 05, 2017 5:01 am 
Offline
Moderator
User avatar

Joined: Tue Nov 09, 2010 3:39 pm
Posts: 6424
Location: Montreal, Canada
UniqueIdeaMan wrote:
I updated my code inserting miknik's codes. It was working fine until I added this line before the cURL:
$banned_words = array("Prick","Dick","***");

Are you suggesting that simply creating an array broke functionality? Dubious. What sorts of errors are you seeing?

_________________
Supported PHP versions No longer supported versions


Top
 Profile  
 
PostPosted: Thu Oct 05, 2017 7:01 am 
Offline
Forum Contributor

Joined: Wed Jan 18, 2017 4:43 pm
Posts: 197
Celauran wrote:
UniqueIdeaMan wrote:
I updated my code inserting miknik's codes. It was working fine until I added this line before the cURL:
$banned_words = array("Prick","Dick","***");

Are you suggesting that simply creating an array broke functionality? Dubious. What sorts of errors are you seeing?


I get a complete blank page. No error. Error reporting on.
Update:

Syntax: [ Download ] [ Hide ]
<?php

/*
ERROR HANDLING
*/

declare(strict_types=1);
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);


// 1). Set banned words.
$banned_words = array("Prick","Dick","<span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span>");

// 2). $curl is going to be data type curl resource.
$curl = curl_init();

// 3). Set cURL options.
curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-
words-
you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X'
);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );

// 4). Run cURL (execute http request).
$result = curl_exec($curl);
$response = curl_getinfo( $curl );

if($response['http_code'] == '200' )
     {
          $regex = '/\b'; // The beginning of the regex string syntax
          $regex .= implode('\b|\b', $banned_words); // joins all the banned words to the string with correct regex syntax
          $regex .= '\b/i'; // Adds ending to regex syntax. Final i makes it case insensitive
          $substitute = 'd';
          $cleanresult = preg_replace($regex, $substitute, $result);
          echo $cleanresult;
     }

  curl_close($curl);

  ?>
 


Top
 Profile  
 
PostPosted: Thu Oct 05, 2017 5:00 pm 
Offline
Forum Contributor

Joined: Wed Jan 18, 2017 4:43 pm
Posts: 197
Celeraun,

Why don't you run my code on your Note Pad++ and see for yourself the blank page.
This is very very strange!


Top
 Profile  
 
PostPosted: Thu Oct 05, 2017 7:49 pm 
Offline
Moderator
User avatar

Joined: Tue Nov 09, 2010 3:39 pm
Posts: 6424
Location: Montreal, Canada
Your echo statement is inside a conditional. Have you checked the response from cURL? Maybe you're not getting a 200.

_________________
Supported PHP versions No longer supported versions


Top
 Profile  
 
PostPosted: Fri Oct 06, 2017 6:16 am 
Offline
Forum Contributor

Joined: Wed Jan 18, 2017 4:43 pm
Posts: 197
I was having word wrapping problem in my Note Pad++. Sorted now.
This edited code is working.

Code:
<?php
/*
ERROR HANDLING
*/
// 1). Set banned words.
$banned_words = array("blow", "nut", "smurf");
// 2). $curl is going to be data type curl resource.
$curl = curl_init();
// 3). Set cURL options.
curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-words-you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
// 4). Run cURL (execute http request).
$result = curl_exec($curl);
if (curl_errno($curl)) {
    echo 'Error:' . curl_error($curl);
}
$response = curl_getinfo( $curl );
if($response['http_code'] == '200' )
{
    $regex = '/\b';     
    $regex .= implode('\b|\b', $banned_words);   
    $regex .= '\b/i';
    $substitute = '****';
    $cleanresult = preg_replace($regex, $substitute, $result);
    echo $cleanresult;
}
curl_close($curl);
?>


Original code newbies can grab:
http://phpfiddle.org/main/code/0trx-6fng


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 5 hours


Who is online

Users browsing this forum: Google [Bot] and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group