Python-parser running Beautiful Soup needs some review
Posted: Sat Dec 11, 2010 4:45 pm
good day - again me - again lin,
your new fan - i love this place! Frst of all. i have to admit: i am very happy that i have found
this great community!
i am currently trying to get a new scraper up and running. I want to create this in Python - making usage of Beautiful Soup. To be frank: i am new to Python and to Beatiful Soup also! It is told to be a great tool to parse and extract content. So here i am...:
I want to take the content of a <td>-tag of a table in a html-document. For example, i have this table
How can i use beautifulsoup to take the text "This is a sample text"?
Should i make use ofto get
the whole table.
See the target http://www.schulministerium.nrw.de/BP/S ... pDO=142323
Here my approach:
Well - what have we to do first:
The first thing is t o find the table:
i do this with Using find rather than findall returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0] to take the first element of the list):
Then use find again to find the first td:
Then we have to use renderContents() to extract the textual contents:
... and the job is done (though we may also want to use strip() to
remove leading and trailing spaces:
This should give us:....
as desired.
What do you think about the code? I love to hear from you!?
greetings
your lin
your new fan - i love this place! Frst of all. i have to admit: i am very happy that i have found
this great community!
i am currently trying to get a new scraper up and running. I want to create this in Python - making usage of Beautiful Soup. To be frank: i am new to Python and to Beatiful Soup also! It is told to be a great tool to parse and extract content. So here i am...:
I want to take the content of a <td>-tag of a table in a html-document. For example, i have this table
Code: Select all
<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>
<td>
This is the second sample text
</td>
</tr>
</table>
How can i use beautifulsoup to take the text "This is a sample text"?
Should i make use of
Code: Select all
soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) the whole table.
See the target http://www.schulministerium.nrw.de/BP/S ... pDO=142323
Here my approach:
Well - what have we to do first:
The first thing is t o find the table:
i do this with Using find rather than findall returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0] to take the first element of the list):
Code: Select all
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
Code: Select all
first_td = soup.find('td')
Code: Select all
text = first_td.renderContents()
remove leading and trailing spaces:
Code: Select all
trimmed_text = text.strip()
Code: Select all
print trimmed_text
This is a sample text
What do you think about the code? I love to hear from you!?
greetings
your lin