
|, this is a pipe, it sends the output of one program to another program for further processing.2>&1, take stderr and merge it with stdout.-p, get all page requisites such as images, styles, etc.-r, this means recursive so wget will keep trying to follow links deeper into your sites until it can find no more.-e robots=off, this one tells wget to ignore the robots.txt file.-debug, gives extra information that we need.-spider, this tells wget not to download anything.Let’s break this command down so you can see what wget is being told to do: | egrep -A 1 '(^HEAD|^Referer:|^Remote file does not)' > ~/wget.log wget -spider -debug -e robots=off -r -p 2>&1 \ The command to give wget is as follows, this will output the resulting file to your home directory ~/ so it may take a little while depending on the size of your website. With the installation complete, now it's time to find all the broken things.

configure -with-ssl=openssl -with-libssl-prefix=/usr/local/ssl Download the source cd /tmpĬonfigure with openSSL.


Linux users should be able to use wget with debug mode without any additional work, so feel free to skip this part.
#WGET NOT FOUND INSTALL#
Thankfully cURL is installed by default on OSX, so it's possible to use that to download and install wget. On OSX, using a package manager like Homebrew allows for the -with-debug option, but it doesn't appear to be working for me at the moment, luckily installing it from source is still an option. Turns out, it’s a pretty effective broken link finder.ĭebug mode is required for the command I'm going to run.
#WGET NOT FOUND ARCHIVE#
I've used wget before to create an offline archive ( mirror) of websites and even experimented with the spider flag but never put it to any real use.įor anyone not aware, the spider flag allows wget to function in an extremely basic web crawler, similar to Google's search/indexing technology and it can be used to follow every link it finds (including those of assets such as stylesheets etc) and log the results. And while fixing those was easy enough once pointed out to me, I wanted to know if there was any missing content that GSC had not found yet. After moving my blog from digital ocean a month ago I've had Google Search Console send me a few emails about broken links and missing content.
