This article will show you how to configure a crawl job using the settings in the Crawl Manager.
- Click View Management Console in the Project Explorer.
- Scroll down and select Crawl Manager. The Crawl Manager window will open.
- Click Add Job. The Add Crawl Job window will open.
- Enter the URL and customise crawl settings if desired, click Save.
Add the URL that is to be crawled in this text field.
Max pages to crawl
Use this text field to specify the number of pages that the crawler should search. The default setting is 2000 pages.
Max depth of crawl
Use this text field to limit the depth of the crawl. The default setting is 99999 folder levels.
Enter the time you want the crawl to start into the text field in (HH:MM) format. When you first enable a job, this time is randomised. We recommend that your crawl be set to run outside of your backup window and working hours: typically at night.
Enter the frequency (in days) of how often the crawl should run. The default setting is 7 days.
Check the box to instruct the crawler to follow links that are subdomains of your actual site. You can exclude specific subdomains in the Exclusions box below if required.
If you have secure areas on your site and want these to be scanned, check the Use Authentication checkbox and set a user ID.
With the Use Authentication checkbox checked, enter a Contensis User ID in this text field to allow the crawler to work as though it were an authenticated user and view secured pages.
Most people enabling this setting use an ID of 1 (the systems administration user). If you want to specify a different user you can find their ID number in the User Management table, within the Management Console. The default user ID is set to 0.
The Simultaneous requests settingspecifies the number of requests that are allowed to run in parallel. Default number of requests is 4.
Use the text fields to specify exclusion rules for pages or HTML codes, the rule is in the form of a regular expression. Exclusion rules can be based on querystrings or full URLs, giving you the flexibility to exclude pages with a broad stroke or in fine detail.
The defaults are:
Quality Assurance settings
Use the Enable Quality Assurance Checks checkbox to toggle each scan on or off.
Use the dropdown to select the WCAG 2.0 accessibility standard that you want your website to conform to. The default is AAA.
Maximum page size
Enter a numerical value in the text field to limit the maximum page size. Default is 81920 bytes (80kb).
Maximum total size
Enter a numerical value in the text field to limit the maximum page size. Default is 358400 bytes (350kb).
Perform Full Scan Now
Click this button to run the scan immediately.
Clear Scan Data
Click this button to delete all historical scan information.