Skip to main content
Logo

Crawl manager

This article will show you how to configure a crawl job using the settings in the Crawl Manager.

  1. Click View Management Console in the Project Explorer.
  2. Scroll down and select Crawl Manager. The Crawl Manager window will open.
  3. Click Add Job. The Add Crawl Job window will open.
  4. Enter the URL and customise crawl settings if desired, click Save.

Crawl details

URL

Add the URL that is to be crawled in this text field.

Max pages to crawl

Use this text field to specify the number of pages that the crawler should search. The default setting is 2000 pages.

Max depth of crawl

Use this text field to limit the depth of the crawl. The default setting is 99999 folder levels.

Scheduled time

Enter the time you want the crawl to start into the text field in (HH:MM) format. When you first enable a job, this time is randomised. We recommend that your crawl be set to run outside of your backup window and working hours: typically at night.

Frequency

Enter the frequency (in days) of how often the crawl should run. The default setting is 7 days.

Include subdomains

Check the box to instruct the crawler to follow links that are subdomains of your actual site. You can exclude specific subdomains in the Exclusions box below if required.

Advanced settings

Use Authentication

If you have secure areas on your site and want these to be scanned, check the Use Authentication checkbox and set a user ID.

User ID

With the Use Authentication checkbox checked, enter a Contensis User ID in this text field to allow the crawler to work as though it were an authenticated user and view secured pages.

Most people enabling this setting use an ID of 1 (the systems administration user). If you want to specify a different user you can find their ID number in the User Management table, within the Management Console. The default user ID is set to 0.

Simultaneous requests

The Simultaneous requests settingspecifies the number of requests that are allowed to run in parallel. Default number of requests is 4.

Exclusions

Use the text fields to specify exclusion rules for pages or HTML codes, the rule is in the form of a regular expression. Exclusion rules can be based on querystrings or full URLs, giving you the flexibility to exclude pages with a broad stroke or in fine detail.

The defaults are:

  • .*download=DownloadasPDFlink.*
  • .*EventsCalendar__EventDateSpan=ALL.*
  • .*/tags.*
  • .*Logout=true.*

Quality Assurance settings

Use the Enable Quality Assurance Checks checkbox to toggle each scan on or off.

Accessibility Standard

Use the dropdown to select the WCAG 2.0 accessibility standard that you want your website to conform to. The default is AAA.

Maximum page size

Enter a numerical value in the text field to limit the maximum page size. Default is 81920 bytes (80kb).

Maximum total size

Enter a numerical value in the text field to limit the maximum page size. Default is 358400 bytes (350kb).

Advanced options

Perform Full Scan Now

Click this button to run the scan immediately.

Clear Scan Data

Click this button to delete all historical scan information.