← See all

Tools & Initiatives

Website Text Scraper

This repository contains a Scrapy project named website_bodytext_scraper designed to scrape the body text from a list of websites. It works by accepting a list of websites is provided through a CSV file, scraping the body text of each site’s home page and associated sub-pages, and saves the data in a specified output format. It’s designed to provide the user with the raw body text of a website’s homepage and associated subpages, which can be used as inputs for AI applications, among other uses.


Author
GivingTuesday

Goals
More efficient workflows

Visit the project