iCrawl: The Integrated Focused Crawling Toolbox
iCrawl is a Web Crawling toolbox developed at L3S Research Center. It provides integrated and focused crawling of Web and Social Media data, i.e. the combined crawling of the Web and Social Media for a specified topic. It allows researchers, journalists and other domain experts to easily create and run web crawls to collect, analyze and archive web documents for their specific needs.
You can download the system and give it a go.
The system is described in the following published and several forthcoming papers:
- What Do You Want to Collect from the Web? (2014)
Thomas Risse, Elena Demidova, and Gerhard Gossen. Proc. of the Building Web Observatories Workshop BWOW 2014, page 1-7.
- The iCrawl Wizard – Supporting Interactive Focused Crawl Specification (2015)
Gerhard Gossen, Elena Demidova, and Thomas Risse. Proc. of the 37th European Conference on Information Retrieval ECIR’15.
- iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling (2015)
Gerhard Gossen, Elena Demidova, and Thomas Risse. Proceedings of the Joint Conference on Digital Libraries 2015.
- Analyzing Web archives through Topic and Event Focused Sub-Collections (2016)
Gerhard Gossen, Elena Demidova, and Thomas Risse. Proceedings of Web Science WebSci 2016.