Link checker
Periodically checks for broken links in content by extracting hyperlinks and evaluating HTTP response codes, displaying results under Administration > Reports > Broken links.
linkchecker
Install
composer require 'drupal/linkchecker:^2.1'
composer require 'drupal/linkchecker:^2.0'
Overview
The Link checker module provides comprehensive broken link detection and management for Drupal sites. It extracts links from configured content fields when entities are saved and periodically checks these links by sending HTTP requests to remote servers and evaluating response codes.
The module supports extraction from various HTML elements including hyperlinks, images, audio, video, iframes, and embedded content. It can automatically repair permanently moved links (301 redirects) and unpublish content containing broken links (404 errors) after a configurable threshold of failed checks.
All broken links are displayed in a centralized report accessible at Administration > Reports > Broken links, where administrators can review link status, error messages, failure counts, and related content. The module integrates with Drupal's cron system for background processing and provides Drush commands for command-line operations.
Features
- Extracts links from text fields, link fields, and various HTML tags (a, area, audio, video, img, iframe, embed, object) when content is saved
- Periodically checks link status using HTTP HEAD/GET requests with configurable intervals (1-90 days)
- Supports concurrent HTTP connections (2-128 simultaneous) with per-domain limits to prevent server overload
- Provides broken links report view accessible at /admin/reports/broken-links with filtering and pagination
- Automatically repairs 301 redirected links by updating URLs in content after configurable failure threshold
- Automatically unpublishes content containing 404 broken links after configurable failure threshold
- Configurable User-Agent header for HTTP requests to handle sites that block default Drupal user agent
- Field-level configuration allowing selective scanning per content type field
- Supports both internal and external link checking with configurable URL type filtering
- Dispatches events for customizing HTTP request headers during link checking
- Provides Drush commands for analyzing content and checking links via command line
- Tracks link check history with failure counts, last check timestamps, and error messages
- Supports migration of settings from Drupal 6/7 to current version
- URL blacklist for excluding specific domains from checking (e.g., example.com reserved domains)
Use Cases
Content Quality Audit
Run periodic audits of your site content to identify broken external links that may have become unavailable. Configure the module to scan all text fields and link fields, set an appropriate check interval (e.g., weekly), and regularly review the Broken links report to maintain content quality.
Automated Link Repair
Enable automatic repair of permanently moved links by setting 'Update permanently moved links' to 'After three failed checks'. This ensures that when external sites provide proper 301 redirects, your content is automatically updated to use the new URLs without manual intervention.
Content Unpublishing for Quality Control
Configure automatic unpublishing for content with persistent broken links by setting 'Unpublish content on file not found error' to a threshold like 'After three file not found errors'. This prevents users from seeing content with dead links while you review and fix the issues.
Image and Media Link Verification
Enable extraction from <img>, <audio>, and <video> tags to verify that embedded media files are still accessible. This is particularly useful for sites with user-generated content or content imported from external sources.
SEO Maintenance
Use the broken links report to identify and fix dead links that negatively impact SEO. Search engines penalize sites with many broken links, so regular monitoring helps maintain search rankings.
Migration Validation
After migrating content from another system, use 'Clear link data and analyze content for links' to extract all links and run 'drush linkchecker:check' to immediately verify that all migrated links are functional.
Tips
- Start with a conservative configuration - enable only <a> tag extraction initially and expand as needed
- Use 'After three failed checks' for auto-repair and auto-unpublish to avoid acting on temporary outages
- Configure per-domain connection limits are set to 2 by default to avoid overwhelming external servers
- The reserved documentation domains (example.com, example.net, example.org) are always preserved in the URL blacklist per RFC 2606
- Enable new revisions in content type settings before enabling auto-repair to maintain edit history
- Run 'drush linkchecker:analyze' after upgrading the module to ensure all links are properly indexed
- For large sites, consider running link checks via Drush cron during off-peak hours to minimize impact
Technical Details
Admin Pages 3
/admin/config/content/linkchecker
Configure link extraction, checking behavior, and error handling settings for the Link checker module. This page allows administrators to control which HTML tags are scanned, set check intervals, configure concurrent connections, and define automated actions for broken links.
/admin/reports/broken-links
View and manage all extracted links with their HTTP status codes, error messages, and failure counts. Filter by status code, link type, or content. Click through to view or edit related content.
/admin/config/content/linkcheckerlink/{linkcheckerlink}/edit
Edit settings for individual link entities. Allows changing the request method and enabling/disabling link checking for specific URLs.
Permissions 4
Hooks 5
hook_entity_insert
Triggered when an entity is created. Link checker extracts links from configured fields and creates linkcheckerlink entities.
hook_entity_update
Triggered when an entity is updated. Link checker re-extracts links and updates/creates linkcheckerlink entities, removing orphaned links.
hook_entity_delete
Triggered when an entity is deleted. Link checker removes associated linkcheckerlink entities and cleans up queue entries.
hook_cron
Called during cron runs. Link checker processes unindexed entities for link extraction and queues links for HTTP checking.
hook_form_field_config_form_alter
Alters field configuration forms to add Link checker settings. Adds 'Scan broken links' checkbox and extractor selection.
Drush Commands 3
drush linkchecker:analyze
Reanalyzes content for links by extracting URLs from all configured fields. Recommended after module upgrade or configuration changes.
drush linkchecker:check
Processes queued links and checks their HTTP status. Links are checked based on configured intervals.
drush linkchecker:clear
Clears all link data and reanalyzes content. WARNING: Custom link settings are deleted.
Troubleshooting 6
Some servers block the default Drupal User-Agent. Try changing the User-Agent setting to a browser user agent like Firefox or Edge.
Ensure cron is called with the correct public site URL, not localhost. Configure the Base path setting or pass --uri parameter to Drush commands.
Verify that 'Scan broken links' is enabled in field settings under each content type. Check that the appropriate HTML tags are enabled in Link extraction settings.
Links are checked based on the configured interval. New links are queued first. Check Recent log messages for linkchecker activity. You can force a check with 'drush linkchecker:check'.
The 301 repair trusts the redirect location provided by the remote server. If sites provide incorrect redirects, disable the auto-repair feature and manually update links.
Reduce 'Number of simultaneous connections' to a lower value (e.g., 2 or 4) to decrease concurrent HTTP requests and server resource usage.
Security Notes 5
- The 'Administer Link checker' permission is marked as restricted - grant only to trusted administrators
- The impersonate account setting should use a user with appropriate permissions but consider security implications of automatic content modifications
- Be cautious with auto-repair feature as it trusts 301 redirects from external sites - a malicious redirect could inject unwanted URLs
- URLs in the blacklist are still extracted but not checked - they remain visible in content
- The module sends HTTP requests to external URLs which could potentially be used to trigger actions on remote servers if your content contains specially crafted URLs