Hi everyone,
I am a complete newbie in regular expression, went thru a lot of regular expression tutorial but still can't figure out how to solve this problem. Maybe regex gurus can help.
What I am trying to do is: from the following text I want to first find a particular pattern of lines. "\\S+([ \\t]+-?[0-9.]+){8}" expression gives me all the lines I am looking for. Out of these lines I want to check if there are two lines that starts with the same word. If such a match is found then I want to add all the high, low, open, close values of the 2nd line to the 1st line and then remove the 2nd line from the text. I hope it doesnt sound too complicated. Is it possible? Or is it too difficult and too much to handle by regex?
e.g. "\\S+([ \\t]+-?[0-9.]+){8}" matches all the stocks from the "A GROUP" stocks to the Spot Transactions. The stock "LEGACYFOOT" is present in two lines (once in Z group and once in spot transaction). I want add the open,high, close, low of LEGACYFOOT in Spot Transaction to the "LEGACYFOOT" in Z category. After that delete the 2nd line of "LEGACYFOOT" occurence from the text.
thanks
omit
the text:(edited to simplify)
DHAKA STOCK EXCHANGE LTD.
TODAY'S SHARE MARKET : 2008-08-21
=================================
(If the page is not updated please press the refresh button)
EQUITY : 745081109873.65
DEBT SECURITIES : 202154936500.00
TOTAL : 947236046373.65
PRICES IN PUBLIC TRANSACTIONS : 2008-08-21
==========================================
A Group
-------
Instr Code Open High Low Close %Chg Trade Volume Value(Lc)
1STBSRS 705.00 710.00 686.00 691.25 -.18 85 5650 39.365
1STICB 5200.00 5250.00 5200.00 5224.75 4.22 6 40 2.090
2NDICB 1650.00 1650.00 1561.00 1583.00 -.07 9 75 1.187
3RDICB 1020.25 1036.00 1020.25 1029.50 -.50 6 85 .875
4THICB 1006.25 1050.00 1006.25 1035.00 1.42 11 160 1.656
MIRACLEIND 26.20 27.00 26.10 26.80 3.87 64 60000 15.965
MITHUNKNIT 184.50 185.00 176.00 180.75 .97 21 960 1.739
QSMDRYCELL 37.50 38.50 37.30 38.00 3.26 191 150500 57.158
RAHIMTEXT 390.00 420.00 390.00 410.00 5.12 2 30 .123
RANFOUNDRY 59.50 62.00 58.90 61.50 5.12 126 81000 49.217
UTTARABANK 2849.00 2956.50 2848.00 2900.25 2.94 2892 48130 1404.152
UTTARAFIN 766.00 825.00 766.00 819.75 5.06 179 15350 124.758
----- -------- ---------
----- -------- ---------
55122 12922983 22780.243
"A Group" Scrips traded in Public Market = 146
B Group
-------
Instr Code Open High Low Close %Chg Trade Volume Value(Lc)
AGRANINS 213.00 239.00 213.00 226.50 8.76 179 17550 39.805
BDAUTOCA 157.00 159.75 153.00 156.00 -.63 23 875 1.366
NITOLINS 332.25 357.00 332.25 340.00 2.10 67 6250 21.441
SONARBAINS 145.00 153.00 143.75 150.50 6.54 99 11800 17.340
----- -------- ---------
----- -------- ---------
741 223380 154.313
"B Group" Scrips traded in Public Market = 12
G Group
-------
"G Group" Scrips traded in Public Market = 0
N Group
-------
Instr Code Open High Low Close %Chg Trade Volume Value(Lc)
CONTININS 228.00 240.00 215.00 231.75 8.29 153 11450 26.258
DBH 1180.00 1249.00 1155.00 1224.75 6.63 94 5150 61.723
MPETROLEUM 131.50 133.00 129.90 130.60 2.03 496 96800 126.857
TITASGAS 354.50 357.75 344.00 350.75 .64 2126 379400 1331.344
----- -------- ---------
----- -------- ---------
3979 742515 1729.643
"N Group" Scrips traded in Public Market = 8
Z Group
-------
Instr Code Open High Low Close %Chg Trade Volume Value(Lc)
ALLTEX 68.75 73.00 68.50 71.75 4.36 18 1500 1.078
ANLIMAYARN 50.25 50.25 50.00 50.00 3.62 2 150 .075
LAFSURCEML 568.00 582.00 567.00 577.50 1.27 206 18550 107.275
LEGACYFOOT 14.80 17.00 14.80 16.50 10.73 77 64000 10.261
LEXCO 122.00 124.00 122.00 122.50 4.25 2 70 .086
SHYAMPSUG 10.90 10.90 10.90 10.90 3.80 6 700 .076
SOCIALINV 365.50 375.00 365.00 371.00 2.77 584 52100 193.397
WATACHEM 305.25 312.25 305.25 311.25 4.01 6 180 .560
WONDERTOYS 60.75 62.50 59.25 61.50 2.50 21 2700 1.662
ZEALBANGLA 14.50 14.90 14.50 14.60 .68 7 3900 .570
----- -------- ---------
----- -------- ---------
2888 467200 962.587
"Z Group" Scrips traded in Public Market = 60
===========================
62730 14356078 25626.792
Total number of scrips traded in Public Market = 226
PRICES IN SPOT TRANSACTIONS : 2008-08-21
==========================================
Instr Code Open High Low Close %Chg Trade Volume Value(Lc)
LEGACYFOOT 14.80 16.80 16.00 16.50 10.73 9 9000 1.461
PUBALIBANK 859.00 872.75 853.00 857.00 1.48 1216 38105 328.444
----- -------- ---------
----- -------- ---------
1225 47105 329.904
Total number of scrips traded in Spot Market = 2
PRICES IN SPOT TRANSACTIONS (BONDs) : 2008-08-21
==================================================
Total number of BONDs traded in Spot Market = 0
PRICES IN ODDLOT TRANSACTIONS : 2008-08-21
============================================
Instr Code Max Price Min Price Trades Quantity Value(In lakhs)
ABBANK 909.00 902.00 2 4 .036
ACI 475.00 475.00 2 30 .143
AGNISYSL 67.00 60.10 4 540 .345
ALARABANK 465.00 395.00 19 354 1.534
APEXADELFT 2600.00 2600.00 3 30 .780
UTTARABANK 2950.00 2950.00 1 1 .030
UTTARAFIN 800.00 800.00 3 62 .496
------ -------- ------------
------ -------- ------------
438 12122 27.815
Total number of scrips traded in Oddlot = 75
PRICES IN BLOCK TRANSACTIONS : 2008-08-21
===========================================
Total number of scrips traded in Block = 0
Head scratching regex problem for a newbie
Moderator: General Moderators
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: Head scratching regex problem for a newbie
Parts can be done using regex, but not all of it: comparing strings is not something regex can handle. In what language are you implementing this? Java?omit46 wrote:Hi everyone,
I am a complete newbie in regular expression, went thru a lot of regular expression tutorial but still can't figure out how to solve this problem. Maybe regex gurus can help.
What I am trying to do is: from the following text I want to first find a particular pattern of lines. "\\S+([ \\t]+-?[0-9.]+){8}" expression gives me all the lines I am looking for. Out of these lines I want to check if there are two lines that starts with the same word. If such a match is found then I want to add all the high, low, open, close values of the 2nd line to the 1st line and then remove the 2nd line from the text. I hope it doesnt sound too complicated. Is it possible? Or is it too difficult and too much to handle by regex?
...
Re: Head scratching regex problem for a newbie
thanks. I am doing this in c# and .net. I know how to replace it once I find the duplicate stocks. But I can't figure out how to search for it using regex.
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: Head scratching regex problem for a newbie
Like I said: comparing/searching strings is not something a regex engine can/should do. Matching the lines you're interested in (as you are doing now) is indeed a job for regex. Those matches can then be stored in a map-like collection (I believe it's called a NameValueCollection in C#). The name of the stock is your key, and a custom class/object holding those values is the value belonging to that key. It seems like the regex part of your problem has already been done: the rest is up to C#.omit46 wrote:thanks. I am doing this in c# and .net. I know how to replace it once I find the duplicate stocks. But I can't figure out how to search for it using regex.
Good luck!
Re: Head scratching regex problem for a newbie
prometheuzz wrote:Like I said: comparing/searching strings is not something a regex engine can/should do. Matching the lines you're interested in (as you are doing now) is indeed a job for regex. Those matches can then be stored in a map-like collection (I believe it's called a NameValueCollection in C#). The name of the stock is your key, and a custom class/object holding those values is the value belonging to that key. It seems like the regex part of your problem has already been done: the rest is up to C#.omit46 wrote:thanks. I am doing this in c# and .net. I know how to replace it once I find the duplicate stocks. But I can't figure out how to search for it using regex.
Good luck!
thanks. After the matching I should be using the c#. But once I get my matching result all I have to do is "match two lines that starts with the same word". Isn't it regex's job?
right now this is what I want to do: Match lines that start with the same word
For the following text two lines start with "2NDICB" and two lines with "3NDICB". I just want to get the matching lines. Is regex or c#'s string library better choice for extracting the values from the lines?
1STICB 5200.00 5250.00 5200.00 5224.75 4.22 6 40 2.090
2NDICB 1650.00 1650.00 1561.00 1583.00 -.07 9 75 1.187
3RDICB 1020.25 1036.00 1020.25 1029.50 -.50 6 85 .875
4THICB 1006.25 1050.00 1006.25 1035.00 1.42 11 160 1.656
3RDICB 1650.00 1650.00 1561.00 1583.00 -.07 9 75 1.187
2NDICB 5200.00 5250.00 5200.00 5224.75 4.22 6 40 2.090
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: Head scratching regex problem for a newbie
No, the functionality you describe isn't regex' job. You could however (ab)use regex' back references.omit46 wrote:...
thanks. After the matching I should be using the c#. But once I get my matching result all I have to do is "match two lines that starts with the same word". Isn't it regex's job?
...
Since this is a PHP forum, I'll post an example in PHP:
Code: Select all
<?php
$contents_of_file = '
1STICB 5200.00 5250.00 5200.00 5224.75 4.22 6 40 2.090
2NDICB 1650.00 1650.00 1561.00 1583.00 -.07 9 75 1.187
3RDICB 1020.25 1036.00 1020.25 1029.50 -.50 6 85 .875
4THICB 1006.25 1050.00 1006.25 1035.00 1.42 11 160 1.656
3RDICB 50.00 10.00 61.00 13.00 -.07 9 75 1.187
2NDICB 50.00 50.00 50.00 52.75 4.22 6 40 2.090
';
if(preg_match_all(
'/(^\w++)(?:\s++-?(?:(?:\d+)?\.)?\d+)+$(?=.*?(\1[^\n]++))/ms',
$contents_of_file, $matches)) {
print_r($matches);
}
/* the output when running this example:
Array
(
[0] => Array
(
[0] => 2NDICB 1650.00 1650.00 1561.00 1583.00 -.07 9 75 1.187
[1] => 3RDICB 1020.25 1036.00 1020.25 1029.50 -.50 6 85 .875
)
[1] => Array
(
[0] => 2NDICB
[1] => 3RDICB
)
[2] => Array
(
[0] => 2NDICB 50.00 50.00 50.00 52.75 4.22 6 40 2.090
[1] => 3RDICB 50.00 10.00 61.00 13.00 -.07 9 75 1.187
)
)
*/
?>Code: Select all
'(^\w++)(?:\s++-?(?:(?:\d+)?\.)?\d+)+$(?=.*?(\1[^\n]++))'m = multi line, so that for each line you can use the ^ as a beginning of the line and $ as the end of the line, otherwise the ^ and $ would have matches the beginning and end of the entire string;
s = dot-all, which causes the . (dot) meta character to match all characters. If not used, it wouldn't match a new-line character.
A short explanation:
Code: Select all
'(^\w++)(?:\s++-?(?:(?:\d+)?\.)?\d+)+$'
// Matches a complete line you're interested in and because of the ( and )
// around the '^\w+' it "remembers" the first word of your match and stores
// it in back reference "\1".
'(?=.*?(\1[^\n]++))'
// (?=X) is called positive look ahead. It matches any number of characters
// followed by the match from back reference 1. If such a back reference is
// found then keep matching until you encounter a new-line character. And again:
// because of the ( and ) around '\1[^\n]++' the match is remembered (as you
// can see in the output of my code snippet).So, my recommendation still stands: find the strings you're interested in using regex and store those matches in a key-value based collection and take steps when a key is found twice. It will be easier to program, and much, much easier to maintain.
Best of luck.
More information on:
back references: http://www.regular-expressions.info/brackets.html
(positive) look a rounds: http://www.regular-expressions.info/lookaround.html
- prometheuzz
- Forum Regular
- Posts: 779
- Joined: Fri Apr 04, 2008 5:51 am
Re: Head scratching regex problem for a newbie
Thanks for letting me know you've moved the discussion here:
http://regexadvice.com/forums/thread/45573.aspx
Bye.
http://regexadvice.com/forums/thread/45573.aspx
Bye.
Re: Head scratching regex problem for a newbie
Finally figured it out.prometheuzz wrote:Thanks for letting me know you've moved the discussion here:
http://regexadvice.com/forums/thread/45573.aspx
Bye.
I posted in other forums to get reply early. Your soln is very simple. Thank you for giving me your time.
cheers
omi