โŒ Don't Do This SEO Auditing Mistake


Use Data Or Be Used By Data!

โ€‹

The May 15 issue of Seotistics is here for you!

It's been a tough week after Google I/O but don't worry, I've got your weekly dose of Analytics!

This time we talk about a very common error I read online. Apparently, not many address this issue.

I am talking about cleaning your data, the most important activity out there!

Without it, you end up with a bunch of noisy insights.

Please move this email to your Primary inbox. This is to prevent Seotistics goes into spam by accident. Gmail users can read this tutorial to do it.

๐Ÿ”‘ Key Analytics Concepts

Let's start with some basic definitions:

  • Data cleaning: removing corrupted, non-relevant, duplicate data. It can also involve fixing data with an incorrect formatting, which is often the case.
  • Outliers: data points that lie at an abnormal distance compared to the others. Imagine a 7-feet person in a dataset where everyone else is below 5.
  • Missing Data: you don't have a certain value for a variable, e.g. a page doesn't have any clicks data (missing).
  • Data Transformation: from one format to another, e.g. from a number to a class.
  • Data Verification: assessing that everything is actually good. This will save you many headaches.

Many non-technical people take a template online and smash their data into it.

You should always clean your data before!!!

๐Ÿงฎ Actionable SEO Tip - Spring Cleaning

This issue can't be complete without some actionable SEO tip for you!

Since this topic is by nature practical, let's see what I recommend you filter OUT.

After you pull your data, you always start by cleaning your data. This is what you may exclude:

  • Pages that are not supposed to rank (About Us, Privacy Policy, etc.)
  • Pagination, because it's only noise
  • Foreign queries, like Japanese queries on a US website
  • Non-HTML files (CSS, JS, PDF)
  • Filler pages (/blog/, /tag/, /category/)

Every website has a different URL structure but some common rules apply, depending on the CMS.

โŒ Don't assume that every WordPress website will have the same paths, though.

โœ… Prepare filters in advance and keep a list with what you can filter out. This will save you a lot of time!

๐Ÿ’ก Why does this enrich our analyses?

By reducing noise, you surface the actual gold. Imagine you want to find all of your pages that get 0 clicks.

Without filtering, you are very likely to get a LOT of noise, such as pagination.

But it's not all, after cleaning, you should strive for transforming data.

This is where you can be creative and offer a unique perspective with your analysis.

โญ Transforming Data: The Next Step

After all of this talk about cleaning, you should also create new metrics.

Why so? Because the ones you've already got aren't tailored for you.

I provide a great example under ๐Ÿ“š Recommended Reads/Resources.

The plot below is a very simplistic plot that shows you how to visualize groups.

The "Page Groups" variable didn't exist before, it's created from Clicks and Impressions (Google Search Console).

Of course, in actual reports you may want to improve its aesthetic and be completely transparent about your classification.

๐Ÿ›‘ What About Missing Data?

Some of you may have noticed I mentioned missing data and didn't cover them.

This topic is often simplified and for bad reasons. After reading this section, you will have more tools to understand data.

There are 3 types of missing data and I will use Screaming Frog data as an example:

  1. Missing Completely at Random (MCAR): no logical reason why data is missing. Pure chance.
    โ€‹
    E.g. technical problems, glitches.
    โ€‹
  2. Missing At Random (MAR): systematic relationship between missing values and the observed data.
    โ€‹
    So here missing values exist due to another variable in your dataset.
    โ€‹
    E.g. If no-indexed pages (Indexability column) are more likely to have an empty meta description, it would fall under MAR.
    โ€‹
  3. Missing Not at Random (MNAR): not random and related to unobserved or unknown factors (not in your dataset). This is related to its values.
    โ€‹
    E.g. pages with more traffic are less likely to have meta descriptions. This implies you didn't connect any API with traffic data.

Fortunately, in SEO it's not that common to face such problems but...

If you dabble with customer data, this is something you must check.

N.B. This topic is quite advanced and hard to explain in one issue. We will go back to it later on!

๐Ÿงต My Selection Of Twitter Threads

A quick recap for those who haven't read them all or need a refresher:

๐Ÿ‘ฅ Launching a Community (Join The Waitlist)

I and some friends have decided to launch our personal SEO community. It won't be about Analytics only, as we will cover everything about SEO.

For sure, we will preserve the focus on data skills because that's the future of SEO!

N.B. We have already reached our minimum threshold. You can still join but the community will become reality next week.

๐Ÿ”Ž Analytics For SEO Ebook (v2)

This ebook is aimed at SEOs or Business Owners who want to explore the combination of SEO and Analytics.

It will teach you or your employees to:

๐Ÿ‘‰ Avoid common pitfalls that cost you money ๐Ÿ’ธ

๐Ÿ‘‰ Create meaningful analyses that add value ๐Ÿ’ฏ

๐Ÿ‘‰ Shorten the learning time of Analytics โณ

This comes with monthly updates because I want to create the Ultimate Guide out there.

The April update included the following new information:

โœ… Categorize Pages

โœ… More on Content Audits

โœ… Handling Large Files

v3 (20-23 May as an estimate) will feature:

  • Quick And Simple Way Of Detecting Keyword Cannibalization
  • Statistical Inference And Statistics (Update)
  • Update For Use Cases 2 and 5
  • Going Deeper With Analysis (Google Analytics, Screaming Frog, etc.)
  • R Approach To Some Problems

Sorry folks, I got sick so the release date is postponed as written above.

๐Ÿ“š Recommended Reads/Resources

These resources are extremely good. The Google Sheets template is a must-have to understand Data Transformation.

โ—๏ธ Feedback and Recommendations

If you have ideas/recommendations for the next issues of Seotistics, you can simply reply to this email.

Marco Giordano
โ€‹
SEO Specialist & Data Analyst

Follow me on ๐Ÿ”ฝ๐Ÿ”ฝ๐Ÿ”ฝ:

linkedintwitterexternal-link

Bernerstrasse Sรผd 169, Zurich, Switzerland
โ€‹Unsubscribe ยท Preferencesโ€‹

Seotistics - Web Analytics + Business + Strategy

The Seotistics newsletter is written by Marco Giordano, a Data/Web Analyst with the goal of combining business and web data. Tired of the usual boring Analytics content without any business impact? Seotistics teaches you how to use Analytics, web data and even content in your workflow while helping you with Strategy.

Read more from Seotistics - Web Analytics + Business + Strategy

Use Data Or Be Used By Data! The August 4 issue of Seotistics is here for you! Last week I've talked about the foundations of Analytics... now I will show you how it usually works with B2B enterprise businesses. This is my direct experience and I had to make some concepts more "generic" to avoid being too specific. Most of the skills you need are relatively basic but the real issue is how you combine them together. P.S. Maybe I will cover this topic in more depth in the next issues ๐Ÿ‘€ Please...

Use Data Or Be Used By Data! The July 28 issue of Seotistics is here for you! In a world where people obsess over the Google Stack and AI, I am here to show the truth. Traditional skills are the foundations for everything. Learn these and you will have the edge over many others who like shiny tools. I may cover more details in my next issue too, I am super tired by all the content I am producing lol P.S. The Seotistics Summer Sales end tomorrow at 23.59 (CET). Hurry up because there won't be...

Use Data Or Be Used By Data! The July 21 issue of Seotistics is here for you! Google Analytics 4 is hated by many. You see a lot of posts on LinkedIn of marketers complaining about it... True, the product is bad but with over (allegedly) 85% market share, what's the alternative? So instead of complaining about it, I show you how you can exploit it. People these days are GA4phobic, they are scared and angry at GA4... it doesn't have to be this way! P.S. New article about small vs big websites...