Perşembe, Kasım 05, 2015

Beautiful Soup get contents of a tag

I was embodying multiple body tags into a huge body tag to make a web page printable more efficiently. I used bash and python BeautifulSoup to get the web pages and parse them. Actually all of the job could have been done with python BeautifulSoup as the name suggests, but because I used some sed stuff previously, I did not attempt to change them into python equivalents (this requires usage of some libraries like urllib2 which I did not use since much time).
By using such a code
str(soup.find('body'))
BeautifulSoup returns all the contents of body including and tags. Because I aggregate these tags into a new one, multiple tags appear in the new html document. This might cause some CSS problems in the presentation of the document, I propose. Thats why I needed to eliminate those superfluous tags. 
This page from stackoverlow has helped me.
 The required code was like that
 
body=body+''.join(map(str,soup.find('body').contents))

I really like  BeautifulSoup as I like sed.

Hiç yorum yok: