๐Ÿ’ก How To Generate SEO Data For Testing [Part 1]


Use Data Or Be Used By Data!

โ€‹

The May 29 issue of Seotistics is here for you!

Today we talk about practicing with data, something many of you always ask me.

But Marco, what if I don't have any data? Or even worse, "my client won't give me access".

This is quite normal in practice and can pose a threat to your SEO work.

After all, we can't really do SEO without data. And that's why we are going to create fake data.

P.S. This is only Part 1 of this fascinating topic, next week you'll get to read Part 2!

Please move this email to your Primary inbox. This is to prevent Seotistics from going into spam by accident. Gmail users can read this tutorial to do it.

๐Ÿ”‘ Key Concepts

Let's see some basic concepts:

  • Synthetic data: data that don't exist in reality. You generate them to emulate actual data (but they are still fake).
  • API: what allows you to access the features, functions or data of a software. This isn't the complete definition but let's stick with it.
  • Float number: decimal number, e.g. 6.7895
  • Integer number: whole number (positive or negative), e.g. 2 or -2

In SEO we don't have much literature or public datasets to work with.

Once again, you don't need to test on actual datasets, you can generate them yourself!

Of course, you also want to have real datasets if possible but reality isn't really a fairy tale.

๐Ÿงฎ Actionable SEO Tip - Do It Your Way

OK, you don't have any website or access to data...

Well, you still need to get them in order to work but if you need to test something, I have a solution.

It's possible to generate synthetic data with coding.

But first, let's see what you can get without any 1st party access:

  • Keyword data/Competitor data (Semrush/Ahrefs)
  • Crawl data (Screaming Frog/Sitebulb)

Not much, and I don't recommend relying on Semrush/Ahrefs if you have to make important decisions. โŒ

As said before, you may want to test your ideas on Google Search Console/Analytics data instead.

What if you are in your free time and can't use any client data?

Well, you generate them! Let's see a quick example.

I want a dataset with the following columns:

  • query: a string containing one or more words
  • page: a string with a specific format
  • date: date format, any is fine
  • clicks: integer number (you can't have 1.5 clicks, either 1 or 2)
  • impressions: integer number (as above)
  • ctr: clicks/impressions, so you get a float number

We have some requirements in place but that's not all!

We know that a page can be tied to multiple queries and dates... with different clicks and impressions.

Page A can rank for query x on a given day and get 50 clicks. The next day (for the same query) you may get 30 clicks.

This fact must be taken into account when generating data.

โš ๏ธ I don't use position except when I need to create pivot tables to count queries over time (check issue #1).

Generating such a metric is a little bit harder but we will see how in the next issues ;)

๐Ÿ’ก Using Coding To Solve The Issue

Coding comes to our rescue once more time as many of you were already giving up.

We have some idea on how we want our data to be, so now we should generate it.

What I've just said can be translated into code you can reuse when needed.

For this specific use case, I prefer R over Python because it makes more sense to me but anything goes.

The hardest part is generating queries that actually make sense when read, so for this you need to do some hard work.

You can create a list of words you want to combine to create different fictional combinations.

๐Ÿ”— Google Colab Link With R Codeโ€‹

If you have doubts, reply to this email. In any case, don't worry, Part 2 will make a lot of things clearer.

๐Ÿ’ก The SEO Insights

I have just showed you don't need any excuse to delay learning or testing data.

You can prepare some scripts or analyses in advance to persuade clients to give you data access.

I think that you shouldn't even accept in such cases but you know, sometimes you just need persuasion.

This practical exercise is also good to understand how data may be generated in practice.

Understanding how metrics work and what could be realistic values for them is a must for troubleshooting.

โ“ Are there other methods?

Yes, I have only showed you the tip of the iceberg.

Marketing literature has many examples of generating synthetic data and it's not as straightforward as you imagine.

โœ… Probability distributions are the best way to think about metrics... and this is coherent to how many professionals generate data.

But again, this topic will be covered in the next issues!

P.S. Thanks for reading! I recommend you check the resources because I mention one great library to generate fake data!

๐Ÿงต My Selection Of Twitter Threads

A quick recap for those who haven't read them all or need a refresher:

๐Ÿ‘ฅ Launching a Community (Join The Waitlist)

I and some friends have decided to launch our personal SEO community. It won't be about Analytics only, as we will cover everything about SEO.

For sure, we will preserve the focus on data skills because that's the future of SEO!

๐Ÿ”Ž Analytics For SEO Ebook (v2)

This ebook is aimed at SEOs or Business Owners who want to explore the combination of SEO and Analytics.

It will teach you or your employees to:

๐Ÿ‘‰ Avoid common pitfalls that cost you money ๐Ÿ’ธ

๐Ÿ‘‰ Create meaningful analyses that add value ๐Ÿ’ฏ

๐Ÿ‘‰ Shorten the learning time of Analytics โณ

This comes with monthly updates because I want to create the Ultimate Guide out there.

The April update includes the following new information:

โœ… Categorize Pages

โœ… More on Content Audits

โœ… Handling Large Files

v3 (coming out in a few days) will feature:

  • Quick And Simple Way Of Detecting Keyword Cannibalization
  • Statistical Inference And Statistics (Update)
  • Update For Use Cases 2 and 5
  • Going Deeper With Analysis (Google Analytics, Screaming Frog, etc.)
  • R Approach To Some Problems

๐Ÿ“š Recommended Reads

This week there are some peak recommendations you don't want to sleep on:

The first 2 reads were recommended by Benjamin Crane and honestly... it's peak quality!

โ—๏ธ Feedback and Recommendations

If you have ideas/recommendations for the next issues of Seotistics, you can simply reply to this email.

Marco Giordano
โ€‹
SEO Specialist & Data Analyst

Follow me on ๐Ÿ”ฝ๐Ÿ”ฝ๐Ÿ”ฝ:

linkedintwitterexternal-link

Bernerstrasse Sรผd 169, Zurich, Switzerland
โ€‹Unsubscribe ยท Preferencesโ€‹

Seotistics - Web Analytics + Business + Strategy

The Seotistics newsletter is written by Marco Giordano, a Data/Web Analyst with the goal of combining business and web data. Tired of the usual boring Analytics content without any business impact? Seotistics teaches you how to use Analytics, web data and even content in your workflow while helping you with Strategy.

Read more from Seotistics - Web Analytics + Business + Strategy

Use Data Or Be Used By Data! The July 15 issue of Seotistics is here for you! Metrics and KPIs are in everyone's mouth but many barely know how to make the most out of them. That's because the word KPI has been degraded. It doesn't help that most content online only covers the basics (e.g. traffic, revenue, that's it). Please move this email to your Primary inbox or reply to it. This is to prevent Seotistics goes into spam by accident. Gmail users can read this tutorial to do it. Read this in...

30% off on all of my products until July 29 (CET), be quick! Use the code "SUMMER25" at checkout. Analytics for SEO - Course (1-to-1 on demand, full support and exclusive content + free updates) P.S. The course was updated recently and will receive more content this week. Learn Analytics For SEO Now! If you are tired of being puzzled by Web data, I can help you: โœ… Python notebooks and SQL code to learn the hard skills โœ… SEO Processes and Examples to convince stakeholders and become actionable...

Use Data Or Be Used By Data! The July 7 issue of Seotistics is here for you! LLMs, AI, this and that, there are too many buzzwords but not much action. Was content actually affected? Yes and no. I will tell you exactly why and what you can do instead of reading another case study. Please move this email to your Primary inbox or reply to it. This is to prevent Seotistics goes into spam by accident. Gmail users can read this tutorial to do it. Read this in your browser ๐Ÿ“ฃ๐Ÿ“ฃ Important...