Machine SEO: How to Do a Content Audit [Updated for 2017]

Posted by Everett

This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.

{Expand for more background}

TABLE OF CONTENTS

What is an SEO content audit?
What is the purpose of a content audit?
- How & why “pruning” works
How to do a content audit
The inventory & audit phase
- Step 1: Crawl all indexable URLs
- Crawling roadblocks & new technologies
- Crawling very large websites
- Crawling dynamic mobile sites
- Crawling and rendering JavaScript
- Step 2: Gather additional metrics
- Things you don’t need when analyzing the data
The analysis & recommendations phase
- Step 3: Put it all into a dashboard
- Step 4: Work the content audit dashboard
The reporting phase
- Step 5: Writing up the report
Content audit resources & further reading

What is a content audit?

A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.

What is the purpose of a content audit?

A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:

How to escape a content-related search engine ranking filter or penalty
Content that requires copywriting/editing for improved quality
Content that needs to be updated and made more current
Content that should be consolidated due to overlapping topics
Content that should be removed from the site
The best way to prioritize the editing or removal of content
Content gap opportunities
Which content is ranking for which keywords
Which content should be ranking for which keywords
The strongest pages on a domain and how to leverage them
Undiscovered content marketing opportunities
Due diligence when buying/selling websites or onboarding new clients

While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:

The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.

Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.

How & why “pruning” works

{Expand for more on pruning}

How to do a content audit

Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.

Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.

The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:

You now have a basic overview of how to perform a content audit. More specific instructions can be found below.

The process can be roughly split into three distinct phases:

Inventory & audit
Analysis & recommendations
Summary & reporting

The inventory & audit phase

Taking an inventory of all content, and related metrics, begins with crawling the site.

One difference between crawling for content audits and technical audits:

Technical SEO audit crawls are concerned with all crawlable content (among other things).

Content audit crawls for the purpose of SEO are concerned with all indexable content.

{Expand for more on crawlable vs. indexable content}

All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.

Until then, the process below should handle most situations.

Step 1: Crawl all indexable URLs

A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.

In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.

Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.

Crawling roadblocks & new technologies

Crawling very large websites

First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.

{Expand for more about crawling very large websites}

Crawling dynamic mobile sites

This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.

{Expand for more on crawling dynamic websites}

Crawling and rendering JavaScript

One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.

{Expand for more on crawling Javascript websites}

Step 2: Gather additional metrics

Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.

Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.

Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).

This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.

Once URL Profiler is finished, you should end up with something like this:

Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.

The risk of getting analytics data from a third-party tool

We've noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.

Metrics to pull for each URL:

Indexed or not?
- If crawlers are set up properly, all URLs should be “indexable.”
- A non-indexed URL is often a sign of an uncrawled or low-quality page.
Content uniqueness
- Copyscape, Siteliner, and now URL Profiler can provide this data.
Traffic from organic search
- Typically 90 days
- Keep a consistent timeframe across all metrics.
Revenue and/or conversions
- You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
Publish date
- If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
Internal links
- Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
External links
- These can come from Moz, SEMRush, and a variety of other tools, most of which integrate natively or via APIs with URL Profiler.
Landing pages resulting in low time-on-site
- Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
Landing pages resulting in Low Pages-Per-Visit
- Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
Response code
- Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that's the case on your domain.
Canonical tag
- Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that's the case on your domain.
Page speed and mobile-friendliness
- Again, URL Profiler comes through with their Google PageSpeed Insights API integration.

Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).

Things you don’t need when analyzing the data

{Expand for more on removing unnecessary data}

Hopefully by now you've made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.

The analysis & recommendations phase

Here's where the fun really begins. In a large organization, it's tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.

Step 3: Put it all into a dashboard

Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it's ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.

Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.

Use Data Validation and a drop-down selector to limit Action options.

Step 4: Work the content audit dashboard

All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That's all fine, as long as you're working toward the goal of determining what to do, if anything, for each piece of content on the website.

A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.

Causes of content-related penalties

These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.

{Expand to learn more about quality, duplication, and relevancy issues}

It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.

{Expand to learn more about what to look for}

Taking the hatchet to bloated websites

For big sites, it's best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you'll spend way too much time on the project, which eats into the ROI.

This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.

{Expand for examples of hatchet approaches}

As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.

After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.

URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.

Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.

WARNING!

As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:

Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.

The reporting phase

The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.

Counting actions from Column B

It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.

Step 5: Writing up the report

Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.

Here is a real example of an executive summary from one of Inflow's content audit strategies:

As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:

Removal of about 624 pages from Google index by deletion or consolidation:

203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
- Followed by a redirect to the page into which they were consolidated

Rewriting or improving of 668 pages

605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 "Other" pages to be rewritten due to low-quality or duplicate content.

Keeping 226 pages as-is

No rewriting or improvements needed

These changes reflect an immediate need to "improve or remove" content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.

The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.

We recommend the following three projects in order of their urgency and/or potential ROI for the site:

Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the "Details" column of the Content Audit Dashboard.

Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.

Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the "Details" column

Content audit resources & further reading

Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum
This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.

Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.

Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow
An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.

The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic
Praise for the life-changing powers of a good content audit inventory.

Everything You Need to Perform Content Audits

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Source: Moz Blog

Machine SEO

Pages

Wednesday, March 22, 2017

How to Do a Content Audit [Updated for 2017]