Code: Select all
$sentence = "how are you
doing today?";
$singleWordRegex = '#[A-Za-z]+#sU';
preg_match_all($singleWordRegex, $sentence, $singleWordArray);Moderator: General Moderators
Code: Select all
$sentence = "how are you
doing today?";
$singleWordRegex = '#[A-Za-z]+#sU';
preg_match_all($singleWordRegex, $sentence, $singleWordArray);Code: Select all
$words = explode(' ', trim(preg_replace('~\W+~', ' ', $sentence))); Code: Select all
<?php
$text = file_get_contents('test.txt');
// here are regexes to match paragraphs, sentences and words
$re_paragraph = '/\s*+([^\r\n]++)\s*+/'; // Group 1 contains paragraph
$re_sentence = '/\s*+([^.?!\r\n]++[.?!]?)/'; // Group 1 contains sentence
$re_word = '/\b(\w\b|\w[\w\']*\w\b)/'; // Group 1 contains word
$paragraphs = array();
$sentences = array();
$words = array();
$paragraph_count = preg_match_all($re_paragraph, $text, $p_matches);
printf("The text has %d paragraphs:\n", $paragraph_count);
for ($i = 0; $i < $paragraph_count; $i++) {
$paragraphs[] = $p_matches[1][$i];
$sentence_count = preg_match_all($re_sentence, $p_matches[1][$i], $s_matches);
printf(" Paragraph %d has %d sentences:\n", $i, $sentence_count);
for ($j = 0; $j < $sentence_count; $j++) {
$sentences[] = $s_matches[1][$j];
$word_count = preg_match_all($re_word, $s_matches[1][$j], $w_matches);
printf(" Sentence %d has %d words.\n", $j, $word_count);
for ($k = 0; $k < $word_count; $k++) {
$words[] = $w_matches[1][$k];
}
}
}
printf("The text contains a total of %d paragraphs, %d sentences and %d words.\n",
count($paragraphs), count($sentences), count($words));
?>Code: Select all
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.
Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Typi non habent claritatem insitam; est usus legentis in iis qui facit eorum claritatem. Investigationes demonstraverunt lectores legere me lius quod ii legunt saepius.
Claritas est etiam processus dynamicus, qui sequitur mutationem consuetudium lectorum. Mirum est notare quam littera gothica, quam nunc putamus parum claram, anteposuerit litterarum formas humanitatis per seacula quarta decima et quinta decima. Eodem modo typi, qui nunc nobis videntur parum clari, fiant sollemnes in futurum. Clarita's est etiam processus dynamicus, qui sequitur mutationem consuetudium lector'um.
Code: Select all
The text has 3 paragraphs:
Paragraph 0 has 2 sentences:
Sentence 0 has 21 words.
Sentence 1 has 43 words.
Paragraph 1 has 3 sentences:
Sentence 0 has 19 words.
Sentence 1 has 14 words.
Sentence 2 has 10 words.
Paragraph 2 has 4 sentences:
Sentence 0 has 10 words.
Sentence 1 has 22 words.
Sentence 2 has 13 words.
Sentence 3 has 10 words.
The text contains a total of 3 paragraphs, 9 sentences and 162 words.