๐Ÿ’ก How To Generate SEO Data For Testing [Part 1]


Use Data Or Be Used By Data!

โ€‹

The May 29 issue of Seotistics is here for you!

Today we talk about practicing with data, something many of you always ask me.

But Marco, what if I don't have any data? Or even worse, "my client won't give me access".

This is quite normal in practice and can pose a threat to your SEO work.

After all, we can't really do SEO without data. And that's why we are going to create fake data.

P.S. This is only Part 1 of this fascinating topic, next week you'll get to read Part 2!

Please move this email to your Primary inbox. This is to prevent Seotistics from going into spam by accident. Gmail users can read this tutorial to do it.

๐Ÿ”‘ Key Concepts

Let's see some basic concepts:

  • Synthetic data: data that don't exist in reality. You generate them to emulate actual data (but they are still fake).
  • API: what allows you to access the features, functions or data of a software. This isn't the complete definition but let's stick with it.
  • Float number: decimal number, e.g. 6.7895
  • Integer number: whole number (positive or negative), e.g. 2 or -2

In SEO we don't have much literature or public datasets to work with.

Once again, you don't need to test on actual datasets, you can generate them yourself!

Of course, you also want to have real datasets if possible but reality isn't really a fairy tale.

๐Ÿงฎ Actionable SEO Tip - Do It Your Way

OK, you don't have any website or access to data...

Well, you still need to get them in order to work but if you need to test something, I have a solution.

It's possible to generate synthetic data with coding.

But first, let's see what you can get without any 1st party access:

  • Keyword data/Competitor data (Semrush/Ahrefs)
  • Crawl data (Screaming Frog/Sitebulb)

Not much, and I don't recommend relying on Semrush/Ahrefs if you have to make important decisions. โŒ

As said before, you may want to test your ideas on Google Search Console/Analytics data instead.

What if you are in your free time and can't use any client data?

Well, you generate them! Let's see a quick example.

I want a dataset with the following columns:

  • query: a string containing one or more words
  • page: a string with a specific format
  • date: date format, any is fine
  • clicks: integer number (you can't have 1.5 clicks, either 1 or 2)
  • impressions: integer number (as above)
  • ctr: clicks/impressions, so you get a float number

We have some requirements in place but that's not all!

We know that a page can be tied to multiple queries and dates... with different clicks and impressions.

Page A can rank for query x on a given day and get 50 clicks. The next day (for the same query) you may get 30 clicks.

This fact must be taken into account when generating data.

โš ๏ธ I don't use position except when I need to create pivot tables to count queries over time (check issue #1).

Generating such a metric is a little bit harder but we will see how in the next issues ;)

๐Ÿ’ก Using Coding To Solve The Issue

Coding comes to our rescue once more time as many of you were already giving up.

We have some idea on how we want our data to be, so now we should generate it.

What I've just said can be translated into code you can reuse when needed.

For this specific use case, I prefer R over Python because it makes more sense to me but anything goes.

The hardest part is generating queries that actually make sense when read, so for this you need to do some hard work.

You can create a list of words you want to combine to create different fictional combinations.

๐Ÿ”— Google Colab Link With R Codeโ€‹

If you have doubts, reply to this email. In any case, don't worry, Part 2 will make a lot of things clearer.

๐Ÿ’ก The SEO Insights

I have just showed you don't need any excuse to delay learning or testing data.

You can prepare some scripts or analyses in advance to persuade clients to give you data access.

I think that you shouldn't even accept in such cases but you know, sometimes you just need persuasion.

This practical exercise is also good to understand how data may be generated in practice.

Understanding how metrics work and what could be realistic values for them is a must for troubleshooting.

โ“ Are there other methods?

Yes, I have only showed you the tip of the iceberg.

Marketing literature has many examples of generating synthetic data and it's not as straightforward as you imagine.

โœ… Probability distributions are the best way to think about metrics... and this is coherent to how many professionals generate data.

But again, this topic will be covered in the next issues!

P.S. Thanks for reading! I recommend you check the resources because I mention one great library to generate fake data!

๐Ÿงต My Selection Of Twitter Threads

A quick recap for those who haven't read them all or need a refresher:

๐Ÿ‘ฅ Launching a Community (Join The Waitlist)

I and some friends have decided to launch our personal SEO community. It won't be about Analytics only, as we will cover everything about SEO.

For sure, we will preserve the focus on data skills because that's the future of SEO!

๐Ÿ”Ž Analytics For SEO Ebook (v2)

This ebook is aimed at SEOs or Business Owners who want to explore the combination of SEO and Analytics.

It will teach you or your employees to:

๐Ÿ‘‰ Avoid common pitfalls that cost you money ๐Ÿ’ธ

๐Ÿ‘‰ Create meaningful analyses that add value ๐Ÿ’ฏ

๐Ÿ‘‰ Shorten the learning time of Analytics โณ

This comes with monthly updates because I want to create the Ultimate Guide out there.

The April update includes the following new information:

โœ… Categorize Pages

โœ… More on Content Audits

โœ… Handling Large Files

v3 (coming out in a few days) will feature:

  • Quick And Simple Way Of Detecting Keyword Cannibalization
  • Statistical Inference And Statistics (Update)
  • Update For Use Cases 2 and 5
  • Going Deeper With Analysis (Google Analytics, Screaming Frog, etc.)
  • R Approach To Some Problems

๐Ÿ“š Recommended Reads

This week there are some peak recommendations you don't want to sleep on:

The first 2 reads were recommended by Benjamin Crane and honestly... it's peak quality!

โ—๏ธ Feedback and Recommendations

If you have ideas/recommendations for the next issues of Seotistics, you can simply reply to this email.

Marco Giordano
โ€‹
SEO Specialist & Data Analyst

Follow me on ๐Ÿ”ฝ๐Ÿ”ฝ๐Ÿ”ฝ:

linkedintwitterexternal-link

Bernerstrasse Sรผd 169, Zurich, Switzerland
โ€‹Unsubscribe ยท Preferencesโ€‹

Seotistics - Web Analytics + Business + Strategy

The Seotistics newsletter is written by Marco Giordano, a Data/Web Analyst with the goal of combining business and web data. Tired of the usual boring Analytics content without any business impact? Seotistics teaches you how to use Analytics, web data and even content in your workflow while helping you with Strategy.

Read more from Seotistics - Web Analytics + Business + Strategy

Use Data Or Be Used By Data! The December 9 issue of Seotistics is here for you! One of the most frequent questions they ask me is "how do we make money out of web data?". How is this even profitable? The mainstream industry won't help you because most content is tutorials. Seotistics will assist you, though. This is the situation after talking to many of you: Please move this email to your Primary inbox or reply to it. This is to prevent Seotistics goes into spam by accident. Gmail users can...

Use Data Or Be Used By Data! The December 2 issue of Seotistics is here for you! Last time we saw metric trees and analyzed why visualizing your metrics is important. Today, I show you the importance of giving context to metrics! You can list down all the cool stuff you want but you need context! It took me a lot to write this one, let me know what you think of it! P.S. Before we start I remind you tomorrow is the last day of limited offers for my products! Please move this email to your...

Use Data Or Be Used By Data! The November 25 issue of Seotistics is here for you! Metrics and KPIs are in everyone's mouth but many barely know how to make the most out of them. I spent the last months on this topic to find what can be improved. Mental models and frameworks can give you direction and make order out of chaos. P.S. Incoming 40% discount on all my products from November 29 to December 3! P.P.S. A new article about BigQuery (GSC & GA4) and why you should use it is available on my...