Help needed!!
Moderator: General Moderators
Help needed!!
Hi there
I need to download stock quotes at the end of day from a .txt file. Below is a sample of that file:
[text]
TOTAL TRANSACTIONS
A. NO. OF TRADES : 281001
B. VOLUME(Nos.) : 147060609
C. VALUE(Tk) : 17173664225.85
MARKET CAPITALISATION
1. EQUITY : 3009297651665.00
2. MUTUAL FUND : 44922595000.00
3. DEBT SECURITIES : 426562182500.00
TOTAL : 3480782429165.00
PRICES IN PUBLIC TRANSACTIONS : 2010-12-12
==========================================
A Group (Equity)
----------------
Instr Code Open High Low Close %Chg Trade Volume Value(Mn)
ABBANK 1605.00 1639.00 1505.00 1519.25 -6.13 8730 310400 479.608
ACI 380.00 380.00 368.00 372.80 -.16 205 23050 8.649
ACIFORMULA 143.50 146.80 139.00 140.50 -2.29 245 48700 6.896
AFTABAUTO 482.00 522.00 482.00 506.80 6.60 10209 1504600 759.390
AGNISYSL 71.90 71.90 65.90 67.00 -3.31 500 540000 36.615
AGRANINS 1073.00 1073.00 925.00 954.00 -8.59 418 31100 30.854
AL-HAJTEX 70.00 70.00 67.00 67.70 -.87 213 33750 2.310
ALARABANK 67.00 67.00 62.10 62.60 -3.98 2326 1630250 102.832
AMCL(PRAN) 1861.00 1861.00 1801.00 1817.75 -1.83 104 2100 3.860
APEXADELFT 4390.00 4390.00 4181.00 4213.25 -1.95 192 6340 27.058[/text]
I need regular expression so that only quotes, like in this case
[text]ABBANK 1605.00 1639.00 1505.00 1519.25 -6.13 8730 310400 479.608
ACI 380.00 380.00 368.00 372.80 -.16 205 23050 8.649
ACIFORMULA 143.50 146.80 139.00 140.50 -2.29 245 48700 6.896
AFTABAUTO 482.00 522.00 482.00 506.80 6.60 10209 1504600 759.390
AGNISYSL 71.90 71.90 65.90 67.00 -3.31 500 540000 36.615
AGRANINS 1073.00 1073.00 925.00 954.00 -8.59 418 31100 30.854
AL-HAJTEX 70.00 70.00 67.00 67.70 -.87 213 33750 2.310
ALARABANK 67.00 67.00 62.10 62.60 -3.98 2326 1630250 102.832
AMCL(PRAN) 1861.00 1861.00 1801.00 1817.75 -1.83 104 2100 3.860
APEXADELFT 4390.00 4390.00 4181.00 4213.25 -1.95 192 6340 27.058[/text]
only these can be seperated from the .txt file into a string or array.
Any help will be appreciated.
I am uploading the full .txt file here.
Thanks in advance.
I need to download stock quotes at the end of day from a .txt file. Below is a sample of that file:
[text]
TOTAL TRANSACTIONS
A. NO. OF TRADES : 281001
B. VOLUME(Nos.) : 147060609
C. VALUE(Tk) : 17173664225.85
MARKET CAPITALISATION
1. EQUITY : 3009297651665.00
2. MUTUAL FUND : 44922595000.00
3. DEBT SECURITIES : 426562182500.00
TOTAL : 3480782429165.00
PRICES IN PUBLIC TRANSACTIONS : 2010-12-12
==========================================
A Group (Equity)
----------------
Instr Code Open High Low Close %Chg Trade Volume Value(Mn)
ABBANK 1605.00 1639.00 1505.00 1519.25 -6.13 8730 310400 479.608
ACI 380.00 380.00 368.00 372.80 -.16 205 23050 8.649
ACIFORMULA 143.50 146.80 139.00 140.50 -2.29 245 48700 6.896
AFTABAUTO 482.00 522.00 482.00 506.80 6.60 10209 1504600 759.390
AGNISYSL 71.90 71.90 65.90 67.00 -3.31 500 540000 36.615
AGRANINS 1073.00 1073.00 925.00 954.00 -8.59 418 31100 30.854
AL-HAJTEX 70.00 70.00 67.00 67.70 -.87 213 33750 2.310
ALARABANK 67.00 67.00 62.10 62.60 -3.98 2326 1630250 102.832
AMCL(PRAN) 1861.00 1861.00 1801.00 1817.75 -1.83 104 2100 3.860
APEXADELFT 4390.00 4390.00 4181.00 4213.25 -1.95 192 6340 27.058[/text]
I need regular expression so that only quotes, like in this case
[text]ABBANK 1605.00 1639.00 1505.00 1519.25 -6.13 8730 310400 479.608
ACI 380.00 380.00 368.00 372.80 -.16 205 23050 8.649
ACIFORMULA 143.50 146.80 139.00 140.50 -2.29 245 48700 6.896
AFTABAUTO 482.00 522.00 482.00 506.80 6.60 10209 1504600 759.390
AGNISYSL 71.90 71.90 65.90 67.00 -3.31 500 540000 36.615
AGRANINS 1073.00 1073.00 925.00 954.00 -8.59 418 31100 30.854
AL-HAJTEX 70.00 70.00 67.00 67.70 -.87 213 33750 2.310
ALARABANK 67.00 67.00 62.10 62.60 -3.98 2326 1630250 102.832
AMCL(PRAN) 1861.00 1861.00 1801.00 1817.75 -1.83 104 2100 3.860
APEXADELFT 4390.00 4390.00 4181.00 4213.25 -1.95 192 6340 27.058[/text]
only these can be seperated from the .txt file into a string or array.
Any help will be appreciated.
I am uploading the full .txt file here.
Thanks in advance.
- Attachments
-
- mst12-12-10.rar
- (12.02 KiB) Downloaded 540 times
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Help needed!!
This regex matches all the entries in your test file and captures each column into a separate group:
[text]^([\w\-()&]+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+(\d+)[ ]+(\d+)[ ]+([+\-]?\d*\.\d+)$[/text]
[text]^([\w\-()&]+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+(\d+)[ ]+(\d+)[ ]+([+\-]?\d*\.\d+)$[/text]
Re: Help needed!!
Thanks ridgerunner for your help.
Regards
Regards
Re: Help needed!!
Hi ridgerunner,
I want to extract 6807.15313 and -120.34780 from-
[text]ALL SHARES PRICE INDEX (DSI) 6807.15313 -120.34780 -1.7372469[/text]
I used following regrex:
Where $data is the file .txt which was grabbed (where the subject/input includes)
but after running the php scripts following error is showing:
Warning: preg_match() [function.preg-match]: No ending delimiter '^' found in /public_html/extract.php on line 31
Please correct me where I am wrong.
You may find the similar searching string in the attached file that was given earlier in the topic [in first post] of this thread.
I want to extract 6807.15313 and -120.34780 from-
[text]ALL SHARES PRICE INDEX (DSI) 6807.15313 -120.34780 -1.7372469[/text]
I used following regrex:
Code: Select all
preg_match('^ALL\s+SHARES\s+PRICE\s+INDEX\s+\(DSI\)\s+([+\-]?\d*\.\d+)\s+([+\-]?\d*\.\d+)',$data,$match);
echo $match[1];but after running the php scripts following error is showing:
Warning: preg_match() [function.preg-match]: No ending delimiter '^' found in /public_html/extract.php on line 31
Please correct me where I am wrong.
You may find the similar searching string in the attached file that was given earlier in the topic [in first post] of this thread.
Re: Help needed!!
One more thing.
suppose I execute this code:
php shows "A match was not found".
but if I remove the "(DSI)" from the pattern so that it looks like this:
Then php shows "A match was found".
So I think problem is in the "(DSI)" ,i.e my pattern is unable to detect (DSI) may be due to it's opening or closing bracket["(" or ")"]. How to correctly escape it's bracket or why only for "(DSI)" pattern is not matching?
suppose I execute this code:
Code: Select all
if (preg_match("/\bALL SHARES PRICE INDEX (DSI)\b/i", $data)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}but if I remove the "(DSI)" from the pattern so that it looks like this:
Code: Select all
if (preg_match("/\bALL SHARES PRICE INDEX\b/i", $data)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}So I think problem is in the "(DSI)" ,i.e my pattern is unable to detect (DSI) may be due to it's opening or closing bracket["(" or ")"]. How to correctly escape it's bracket or why only for "(DSI)" pattern is not matching?
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Help needed!!
First let me address your questions...
Regular expressions are quite powerful, but using them does require learning the syntax and a bit of practice. I would highly recommend reading the tutorial at http://www.regular-expressions.info/ to learn the basics and to also read the man-pages for the PHP preg_*() functions (see: http://php.net/manual/en/book.pcre.php).
To illustrate how to extract specific data from your data file, here is a complete script which uses the preg_match_all() function with the regex I provided earlier to extract and print all the columns:
Note that I have used the 'm'=multi-line-mode modifier with the regex. (It won't work without it because it uses the '^' beginning-of-line and '$' end-of-line anchor metacharacters. Without the 'm' modifier, the '^' and '$' only match at the beginning and end of the entire string, but when 'm' is specified, they match at the beginning and end of every line.) Here is the output from the script:
[text]Record matches count = 5
Match 1 of 5:
Instr Code = ABBANK
Open = 1605.00
High = 1639.00
Low = 1505.00
Close = 1519.25
%Chg = -6.13
Trade = 8730
Volume = 310400
Value(Mn) = 479.608
Match 2 of 5:
Instr Code = ACI
Open = 380.00
High = 380.00
Low = 368.00
Close = 372.80
%Chg = -0.16
Trade = 205
Volume = 23050
Value(Mn) = 8.649
Match 3 of 5:
Instr Code = ACIFORMULA
Open = 143.50
High = 146.80
Low = 139.00
Close = 140.50
%Chg = -2.29
Trade = 245
Volume = 48700
Value(Mn) = 6.896
Match 4 of 5:
Instr Code = AFTABAUTO
Open = 482.00
High = 522.00
Low = 482.00
Close = 506.80
%Chg = 6.60
Trade = 10209
Volume = 1504600
Value(Mn) = 759.390
Match 5 of 5:
Instr Code = AGNISYSL
Open = 71.90
High = 71.90
Low = 65.90
Close = 67.00
%Chg = -3.31
Trade = 500
Volume = 540000
Value(Mn) = 36.615[/text]
Hope this helps!

As you probably already figured out, to use the PHP preg_*() functions, a regex pattern must be enclosed within /matching delimiters/. This delimiter character can be any non-alphanumeric - the forward slash '/' is frequently used for this purpose: e.g. '/regex/'. Your regex was not enclosed in delimiters and this is why you got the warning. You can also add pattern modifiers (such as 'i'=ignore-case-mode, 'm'=multi-line-mode, 's'=single-line-mode or 'x'=free-spacing-mode), after the closing delimiter : e.g. '/ReGeX/i'.infomamun wrote:...but after running the php scripts following error is showing:
Warning: preg_match() [function.preg-match]: No ending delimiter '^' found in /public_html/extract.php on line 31
Please correct me where I am wrong. ...
Yes, the parentheses '()' are metacharacters and need to be escaped. You escape them with a leading backslash like so: '\(DSI\)'. There are about a dozen or so metacharacters which have special meaning in a regular expression which need to be escaped. (And if your regex needs to use the delimiter character, it too, needs to be escaped.) And there are two sets of regex metacharacters: those that appear within a character class, and those that appear outside a character class.infomamun wrote:One more thing.
suppose I execute this code:php shows "A match was not found".Code: Select all
if (preg_match("/\bALL SHARES PRICE INDEX (DSI)\b/i", $data)) { echo "A match was found."; } else { echo "A match was not found."; }
but if I remove the "(DSI)" from the pattern so that it looks like this:Then php shows "A match was found".Code: Select all
if (preg_match("/\bALL SHARES PRICE INDEX\b/i", $data)) { echo "A match was found."; } else { echo "A match was not found."; }
So I think problem is in the "(DSI)" ,i.e my pattern is unable to detect (DSI) may be due to it's opening or closing bracket["(" or ")"]. How to correctly escape it's bracket or why only for "(DSI)" pattern is not matching?
Regular expressions are quite powerful, but using them does require learning the syntax and a bit of practice. I would highly recommend reading the tutorial at http://www.regular-expressions.info/ to learn the basics and to also read the man-pages for the PHP preg_*() functions (see: http://php.net/manual/en/book.pcre.php).
To illustrate how to extract specific data from your data file, here is a complete script which uses the preg_match_all() function with the regex I provided earlier to extract and print all the columns:
Code: Select all
<?php
$re = '/^([\w\-()&]+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+(\d+)[ ]+(\d+)[ ]+([+\-]?\d*\.\d+)$/m';
$data = '
PRICES IN PUBLIC TRANSACTIONS : 2010-12-12
==========================================
A Group (Equity)
----------------
Instr Code Open High Low Close %Chg Trade Volume Value(Mn)
ABBANK 1605.00 1639.00 1505.00 1519.25 -6.13 8730 310400 479.608
ACI 380.00 380.00 368.00 372.80 -.16 205 23050 8.649
ACIFORMULA 143.50 146.80 139.00 140.50 -2.29 245 48700 6.896
AFTABAUTO 482.00 522.00 482.00 506.80 6.60 10209 1504600 759.390
AGNISYSL 71.90 71.90 65.90 67.00 -3.31 500 540000 36.615
';
$record_count = preg_match_all($re, $data, $matches, PREG_SET_ORDER);
printf("Record matches count = %d\n", $record_count);
for ($i = 0; $i < $record_count; $i++) {
printf("Match %d of %d:\n", $i + 1, $record_count);
printf("\tInstr Code = %12s\n", $matches[$i][1]);
printf("\tOpen = %12.2f\n", (float)$matches[$i][2]);
printf("\tHigh = %12.2f\n", (float)$matches[$i][3]);
printf("\tLow = %12.2f\n", (float)$matches[$i][4]);
printf("\tClose = %12.2f\n", (float)$matches[$i][5]);
printf("\t%%Chg = %12.2f\n", (float)$matches[$i][6]);
printf("\tTrade = %9d\n", (int)$matches[$i][7]);
printf("\tVolume = %9d\n", (int)$matches[$i][8]);
printf("\tValue(Mn) = %13.3f\n", (float)$matches[$i][9]);
}
?>[text]Record matches count = 5
Match 1 of 5:
Instr Code = ABBANK
Open = 1605.00
High = 1639.00
Low = 1505.00
Close = 1519.25
%Chg = -6.13
Trade = 8730
Volume = 310400
Value(Mn) = 479.608
Match 2 of 5:
Instr Code = ACI
Open = 380.00
High = 380.00
Low = 368.00
Close = 372.80
%Chg = -0.16
Trade = 205
Volume = 23050
Value(Mn) = 8.649
Match 3 of 5:
Instr Code = ACIFORMULA
Open = 143.50
High = 146.80
Low = 139.00
Close = 140.50
%Chg = -2.29
Trade = 245
Volume = 48700
Value(Mn) = 6.896
Match 4 of 5:
Instr Code = AFTABAUTO
Open = 482.00
High = 522.00
Low = 482.00
Close = 506.80
%Chg = 6.60
Trade = 10209
Volume = 1504600
Value(Mn) = 759.390
Match 5 of 5:
Instr Code = AGNISYSL
Open = 71.90
High = 71.90
Low = 65.90
Close = 67.00
%Chg = -3.31
Trade = 500
Volume = 540000
Value(Mn) = 36.615[/text]
Hope this helps!
Re: Help needed!!
A lot of thanks ridgerunner, you described all things in a very easy way and spent your valuable time to solve my problem. Now I am improving at a quicker speed and my previous matching problem now has been fully resolved. Actually I used to visit http://www.regular-expressions.info/ for learning regrex before you mentioned, but there were some lacks in me like multiline mode for which I was not succeeded. Your description helped me in this case. I knew that escaping metacharacter by forward slash, but in my previous script I used word boundary (\b) together with slash (\), that's why match was not found and I mentioned here the script without escaping by forward slash. Now it has been resolved too.
I am also grateful to you for providing me a full script for extracting my required data. But I am unknown to the use of %d, %12s, %12.2f, %9d etc. Would you please describe it for me or provide me the source links from which I can know the use of those special commands?
Thanks in advance.
Regards
I am also grateful to you for providing me a full script for extracting my required data. But I am unknown to the use of %d, %12s, %12.2f, %9d etc. Would you please describe it for me or provide me the source links from which I can know the use of those special commands?
Thanks in advance.
Regards
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Help needed!!
The '%' is a printf format specifier. The printf() function (and its siblings sprintf and vsprintf) originated with the C programming language. It is a powerful way to print out data in a very controlled way and many other languages (like PHP) support it. You can print out many different types of variables (strings, integers and floats), with precise control of field width and decimal precision. Here is a link to the PHP manual page for sprintf, which describes the various options for the format string: PHP Manual: sprintf.
Example: When you need to output a floating point number such as a dollar and cents amount with exactly two decimal places, you use the 'f' specifier like so: '%f.2'. If the whole number needs to be exactly 12 chars long (including decimal point and optional sign), you would use '%12.2f'.
Very powerful once you learn the syntax. (Just like regex!)
Example: When you need to output a floating point number such as a dollar and cents amount with exactly two decimal places, you use the 'f' specifier like so: '%f.2'. If the whole number needs to be exactly 12 chars long (including decimal point and optional sign), you would use '%12.2f'.
Very powerful once you learn the syntax. (Just like regex!)
Re: Help needed!!
@infomamun: I don't have the intention to hijack your post, but my question is related to the information that ridgerunner posted.
@ridgerunner: I'm also learning regex and I did try your example, curiously it doesn't work at all for me (I did cut/paste your exact code.. no modifications at all) and at runtime it show 0 Record Matches... what is more curious is the fact that I did replace the $data content of your post with the original data that was posted by the OP and running the code again with that modification it shows only 1 record match... the last one with name "APEXADELFT"... my last test was eliminate only that line.. and again the record match count is 0
I'm very perplex with that behavior... any idea why I'm getting those results? I'm running on Windows XP (in case that we need to blame someone
) , Apache 2.2.15, Php 5.3.2
thanks for any idea on this
@ridgerunner: I'm also learning regex and I did try your example, curiously it doesn't work at all for me (I did cut/paste your exact code.. no modifications at all) and at runtime it show 0 Record Matches... what is more curious is the fact that I did replace the $data content of your post with the original data that was posted by the OP and running the code again with that modification it shows only 1 record match... the last one with name "APEXADELFT"... my last test was eliminate only that line.. and again the record match count is 0
thanks for any idea on this
- ridgerunner
- Forum Contributor
- Posts: 214
- Joined: Sun Jul 05, 2009 10:39 pm
- Location: SLC, UT
Re: Help needed!!
Yes, you are correct!
The script file works fine when the data records are separated with unix style line terminations (i.e. just a linefeed = \n). I just did a test and verified that the script fails with 0 matches when the file is formatted with DOS/Windows style line terminations (i.e. carriage return + linefeeds = \r\n). The end of the regex ('...\d+)$/') is written to not allow any characters between the last digit and the end of line. The preg_match_all() function does not appear to match a \r as end of line $. It sees the \r between the last digit and the \n and thus fails to match.
PHP uses the PCRE regex engine which can be compiled to recognize any combination of carriage return and linefeed combinations as a valid end of line (i.e. \r, \n or \r\n). It appears that this version of PHP was compiled to only match linefeeds as end-of-line.
A work around solution is to allow an optional \r before the $ like so:
Or the $ can simply be removed like so:
Many people (including myself) assume that a '$' matches the end of line for any text file - but now we know that this is not so (at least for PHP version 5.2.14 which is what I'm running). Thank you for pointing this out!
The script file works fine when the data records are separated with unix style line terminations (i.e. just a linefeed = \n). I just did a test and verified that the script fails with 0 matches when the file is formatted with DOS/Windows style line terminations (i.e. carriage return + linefeeds = \r\n). The end of the regex ('...\d+)$/') is written to not allow any characters between the last digit and the end of line. The preg_match_all() function does not appear to match a \r as end of line $. It sees the \r between the last digit and the \n and thus fails to match.
PHP uses the PCRE regex engine which can be compiled to recognize any combination of carriage return and linefeed combinations as a valid end of line (i.e. \r, \n or \r\n). It appears that this version of PHP was compiled to only match linefeeds as end-of-line.
A work around solution is to allow an optional \r before the $ like so:
Code: Select all
$re = '/^([\w\-()&]+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+(\d+)[ ]+(\d+)[ ]+([+\-]?\d*\.\d+)\r?$/m';Code: Select all
$re = '/^([\w\-()&]+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+([+\-]?\d*\.\d+)[ ]+(\d+)[ ]+(\d+)[ ]+([+\-]?\d*\.\d+)/m';Re: Help needed!!
Hi ridgerunner,
How to set preg_match pointer to previous preg_match pointer?
As for example, please look at my .txt file which I attached in my first post.
suppose at first I want to extract the value from:
[text]ALL SHARES PRICE INDEX (DSI) 6881.38381 -235.73029 -3.3121611[/text]
Then I want to extract the value from:
[text]B. VOLUME(Nos.) : 147060609[/text]
If I try to extract the above two patterns in one php script, then after matching the first pattern, preg_match again will start searching the second pattern from the beginning of the (.txt) file. But I am quite sure that the second pattern will come after the first pattern in that .txt file. So I don't want the regrex engine will search each time from the beginning of the .txt file. I want that the second preg_match will be started from the end of the first preg_match. How to set the pointer of second preg_match from the end of first preg_match in this case?
Regards.
How to set preg_match pointer to previous preg_match pointer?
As for example, please look at my .txt file which I attached in my first post.
suppose at first I want to extract the value from:
[text]ALL SHARES PRICE INDEX (DSI) 6881.38381 -235.73029 -3.3121611[/text]
Then I want to extract the value from:
[text]B. VOLUME(Nos.) : 147060609[/text]
If I try to extract the above two patterns in one php script, then after matching the first pattern, preg_match again will start searching the second pattern from the beginning of the (.txt) file. But I am quite sure that the second pattern will come after the first pattern in that .txt file. So I don't want the regrex engine will search each time from the beginning of the .txt file. I want that the second preg_match will be started from the end of the first preg_match. How to set the pointer of second preg_match from the end of first preg_match in this case?
Regards.