jr3 (63K)

Grab all of the href links from a Page

Posted by Jessica Tue, 06 Feb 2007 17:49:00 GMT

Use Rubyful Soup to get all of the hyperlinks on a page…

soup = BeautifulSoup.new(page_content)
result = soup.find_all('a')
result.each { |tag| 
  urls[i] = tag['href']
  if urls[i].to_s.slice(0,1) != 'h' then
          #add first part of url to href link if link is internal
          urls[i] = home + urls[i]
  end
  i = i + 1
}
Trackbacks

Use the following link to trackback from your own site:
/blog/articles/trackback/8754

Comments

Leave a response

Comments


designed by jowensbysandifer