Our   Works   on   Others

Qorser is a full service outsourcing web development company utilising Indonesia’s finest web designers and developers. Qorser has offices in Sydney, Australia and Bandung, Indonesia and is the largest web development company in Indonesia.

PHP spider / crawler / scraper script

The script is executed using command line / linux shell, mine data from a page directory of companies as the starting page.

Features:

  • It must be able to find data on paginated page, grabbing company infos such as company name, address, postal code, website, etc.
  • It must be able to grab data, even the data has different pattern inside HTML or URL.
  • Extensible as it able to create custom spider on different website,
  • Has a session / failsafe feature, means it can continue if it's halted.
  • Has a summary tool built in.

Library / technologies used:

  • RegEx
  • SimpleTest. By extending the SimpleBrowser class, the script is able to act as if it is a web browser
  • SPYC