Recover mistyped traffic: Redirect .htm to .html

Have you ever copied a URL and left out the last character? This seem to happen a lot to the “l” (L) in “.html”. Possibly because it looks like the text insertion cursor “|”. Regardless, I and others do it all the time. So let us fix that.

I noticed about 8 % of traffic to my sites’ 404 Not Found pages were variants of addresses missing the last character. I found many requests ending in “.htm” instead of “.html”, and found exactly zero requests ending in ”.ht”. Servers can be smart given proper configuration. Below are instructions for nginx and Apache servers for how to redirect all traffic missing the last character of the extension to their proper destination.

If you for any reason have one or more documents actually ending in .htm instead of .html, these will become inaccessible. Rename them .html before installing the below configurations.

Configuration example for nginx

Redirect all requests ending in .htm to .html:

location ~* "\.(htm)$" {
	return 307 "${scheme}://${server_name}${uri}l";
}

Same as above but integrate with Google Analytics:

location ~* "\.(htm)$" {
	set $args "$args&utm_source=${uri}&utm_medium=traffic_recovery&utm_campaign=redirect_add_trailing_l"
	return 307 "${scheme}://${server_name}${uri}l?${args}"
}

Configuration example for Apache

Redirect all requests ending in .htm to .html:

RewriteEngine On
RewriteRule "\.(htm)$" "%{REQUEST_SCHEME}://%{SERVER_NAME}%{REQUEST_URI}l" [QSA,R=307,L]

Same as above but integrate with Google Analytics:

RewriteEngine On
RewriteRule "\.(htm)$" "%{REQUEST_SCHEME}://%{SERVER_NAME}%{REQUEST_URI}l?utm_source=%{REQUEST_URI}&utm_medium=traffic_recovery&utm_campaign=redirect_add_trailing_l" [QSA,R=307,L]

I’m using 307 Temporary Redirects (using same request method) instead of a 301 Permanent Redirect in the examples. This tells the browser to always try the original request again before performing the redirect. This is a good choice if you ever want to place a file on a .htm instead of a .html address.

The Google Analytics integration as shown above sets the source to the original (L‐less) URI, medium to “traffic_recovery”, and campaign to “redirect_trailing_l”. If you want to do similar recovery operations (like redirecting .ht to .html), simply change the campaign name so you can keep track of which redirects traffic is coming from. You can adjust the examples to work with Piwik or other analytics products.</p

Update: .htmll to .html

Slight variation. Control–L focuses the address field in most browsers. It’ easy to imagine a Control–L l Control–C sequence resulting in an extra trailing slash. At least, that is my guess to what I’m seeing in my logs. I may also have shared one of those broken links myself. Here is a similar solution that redirects .htmll to .html:

Configuration example for nginx

Redirect all requests ending in .htmll to .html:

location ~* "^(/.*\.html)l$" {
	set $newuri "$1";
	return 307 "${scheme}://${server_name}${newuri}";
}

Same as above but integrate with Google Analytics:

location ~* "^(/.*\.html)l$" {
	set $args "$args&utm_source=${uri}&utm_medium=traffic_recovery&utm_campaign=redirect_remove_trailing_l"
	set $newuri "$1";
	return 307 "${scheme}://${server_name}${newuri}?${args}""
}

Configuration example for Apache

Redirect all requests ending in .htmll to .html:

RewriteEngine On
RewriteRule "^(/.*\.html)l$" "%{REQUEST_SCHEME}://%{SERVER_NAME}%{REQUEST_URI}l" [QSA,R=307,L]

Same as above but integrate with Google Analytics:

RewriteEngine On
RewriteRule "^(/.*\.html)l$" "%{REQUEST_SCHEME}://%{SERVER_NAME}$1?utm_source=%{REQUEST_URI}&utm_medium=traffic_recovery&utm_campaign=redirect_remove_trailing_l" [QSA,R=307,L]

Leave a Reply

Your email address will not be published. Be courteous and on-topic. Comments are moderated prior to publication.