Open library books in the background with the WordPress logo floating on top

Internet Archive plugin for WordPress

The Internet Archive stores snapshots of web pages in an effort to preserve online information and culture for the future. Anyone can see snapshots of old versions of web pages in the Internet Archive Wayback Machine. If anything ever happens to you or your blog ever goes down, your blog posts can still remain accessible in the Wayback Machine. You can now help this effort by making sure your WordPress blog is included in the archive.

By using my shiny new Post Archival in the Internet Archive plugin for WordPress, your WordPress blog will automatically ping the archive 12 hours after you publish new blog posts. The 12 hour window will allow you to correct spelling and other mistakes, or even unpublish the blog post entirely without having it stored in the archive. It helps to keep you honest and functions as a motivation to set high quality expectations for yourself.

The first time you activate the plugin on a WordPress site, it will start to send archive requests for existing blog posts. Only one blog post will be archived every 25 minutes, so this can take quite a while depending on the number of posts. You don’t need to do anything, just let the plugin do it’s thing.

Download Post Archival in the Internet Archive plugin from the WordPress Plugin Gallery to start archiving your WordPress blog immediately.

The plugin only archives blog posts and doesn’t archive pages, dated archives, taxonomy pages, or other types of pages. There are no configurable options for the plugin. All archival requests will use your permalink settings as well as shortlink settings (when configured).

The plugin uses the User-Agent “Post-Archival-Plugin/1.0 WordPress/<version>”. You’ll see requests from this User-Agent in your server logs echoed from IP addresses owned by the Internet Archive. This means your save request was successfully processed and that they’re retrieving the page, images, and any other resources required to render the page.

Your blog will not be archived if your robots.txt excludes or accidentally discriminates against the “ia_archiver” robot. The plugin doesn’t test that archival requests will succeed nor guarantee inclusion in the Internet Archive. All it does is ping the archive when your blog has new content. You can test for any potential problems by manually saving a blog post using the Save Page option at If you don’t run into any errors there, you shouldn’t have any problems using this plugin either.

If you found this plugin and the Internet Archive’s services useful, please consider donating to the Internet Archive project. They’re a non-profit trying to preserve the depth of information and culture that is available on the web. Thousands upon thousands of pages and sites disappear off the web every day; the Internet Archive is trying to preserve our digital heritage.

Download Post Archival in the Internet Archive plugin from the WordPress Plugin Gallery, or grab the source code from WordPress Plugin Hosting (Subversion).

As there are a few Nikola users following this blog, I’d like to remind them that I’ve also made a Internet Archive plugin for Nikola. Nikola is a static site generator written in Python, if WordPress isn’t quite your thing.

The photo of the books in the background of the feature image © 2016 Patrick Tomasso. WordPress and the WordPress circled-W logo are trademarks of WordPress Foundation. The Post Archival in the Internet Archive code is free software licensed under the GPLv3. The Internet Archive are just awesome.

2 thoughts on “Internet Archive plugin for WordPress”

  1. Nice idea for the case if you’re not sure about future of your own website 😉

    However, I’m more worried about a case when I’m linking to some other website, and then it disappears… For this case it would be nice to have a script which walks through all links on your site, submits them to or, and adds “cached version” link to the main link (like Google does in search results). Do you know about such script? Preferably compatible with static website generators 😉

    1. The best I can suggest is a reporting tool you could write in a few minutes. Using a scripting language like Python, you can parse the generated documents (don’t use regex, use a parser) and read ever link in the document. Then request each link and report on links that don’t report a 2xx or 3xx status message. For each reported link, you can check to see if it’s in the archive by requesting https​://*/<broken-url>. I’ve worked on this section of in Nikola (MIT license), and can recommend that you base your link checker script on it.

      For this to work, websites must of course be archived. Which is what my plugins are all about.

Leave a Reply

Your email address will not be published. Be courteous and on-topic. Comments are moderated prior to publication.