Sinanin Yeri: Python mechanize, beautiful soup and html scraping

Salı, Mayıs 22, 2012

Python mechanize, beautiful soup and html scraping

I used to have bash,sed,awk,curl,beautifiul soup when I was doing html parsing,scraping and automating tasks. I was aware of mechanize but was not using it. Today I decided to give it a try for an automation task and I didnt regret. It was a real fun(!) learning(!) it. Actually I learned a bit of it. It helped me a lot on automating browser requests. Although I didnt yet use it for form handling, I know it has some magical power there too. I also realised that one must use Beautiful Soup with Mechanize. These make an awesome combo worth trying and using. Thanks to all the folks behind them.
here is a snippet
br = mechanize.Browser()

br.open("http://www.site.com")
all_links=[l for l in br.links(url_regex="pattern")]
for i in all_links[5:]:

br.follow_link(i)
temp=br.response().read()
soup=BeautifulSoup(temp)
link=soup.find('a', href=re.compile("mp3")) #title\/tt[0-9]*\/"))
if hasattr(link, "href"):
lin=link['href']
file=lin.split("/")[-1]
print file+"----"+lin
br.retrieve(lin,file)

Here are some helper links
http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
http://stockrt.github.com/p/handling-html-forms-with-python-mechanize-and-BeautifulSoup/

2 yorum:

Adsız dedi ki...: Having read this I believed it was very enlightening.
I appreciate you taking the time and effort to put this information together.
I once again find myself personally spending
way too much time both reading and commenting. But so what,
it was still worth it!

My web page Boutique Air Jordan; 9:32 ÖS
Adsız dedi ki...: Hi i am kavin, its my first occasion to commenting anywhere, when i
read this piece of writing i thought i could
also make comment due to this sensible article.; 12:51 ÖS

Yorum Gönder