How to Use GPT 3.5 Turbo For SERP Analysis to Identify Low-competition Keywords at Scale

ChatGPT and generative AI have drastically changed the landscape of SEO. From AI Content Creation, to optimising title tags to increase CTR, to even writing code to solve complex rendering and indexation problemsacross the spectrum, the SEO Community has found some amazing use cases for Generative AI.

However, one largely under explored usage is the potential for GPT to be used for advanced types of SERP analysis that would typically require the trained eye of an SEO specialist, or the application of advanced pattern matching techniques.


In today’s blog post we’ll look at how you can use a combination of Arefs SERP Data, GPT For Work and the GPT-3.5 Turbo API to unearth extremely low-competition keywords, at scale and cost-effectively!

Using GPT 3.5 Turbo to Identify Irrelevant Results Ranking For Your Keywords at Scale

GPT 3.5 Turbo and ChatGPT (3.5 Model) are actually adept at determining the relevance of results ranking in relation to the keyword in question based on the title tag of the ranking result. If you have been in SEO for a meaningful amount of time you know the best keywords to target are the ones that have not been targeted at all.

For audiences that may be unfamiliar with the concept, it is probably worth taking the time to illustrate the concept using a real world example. Essentially, the strongest weighted ranking signal for Google is “relevance”:


Before Google assesses attributes like quality and backlinks, it looks at whether there are relevant results across the web. If there are no relevant results ranking for a query, then you can rank for that keyword fairly easily by creating relevant content. This is typically how low DR and low authority websites are able to rank for keywords. To illustrate this, let’s look at a specific example. In our example we’ll use the keyword “3rd gen 4runner wheels.”:





If we look at the results above we can see that a DR 10 and a DR 11 website are ranking in the top two organic positions below the image pack, ahead of a DR 16 website with page-level links and a DR 46 website. Why is this? It’s simply because the other results are just not relevant enough in relation to the query to rank, so lower authority websites that have relevant pages outrank the higher authority websites that have less relevant pages. The only time we really observe an exception to this rule is in the event that a website has been demoted or promoted by Google’s Reviews System, or if a higher DR website has been subjected to another type of penalty, that being manual or algorithmic. In our example below you can see that the title tags for the 3rd and 4th positions are “3rd gen 4Runner Exterior Accessories” and “3rd Gen 4Runner Aftermarket Wheel Thread Page 9” for the keyword "3rd gen 4runenr wheels." If I were a user searching for 3rd gen 4Runner wheels I would likely view both of these title tags as being irrelevant to my query, and so does Google, it just has no other content to serve!



Typically, when a greater number of irrelevant results dominate the top 10 rankings, it indicates lower competition for the keywords in question. The traditional approach to identifying these types of keywords is to check if the keyword is in the title of the top 10 results ranking or by using the AllinTitle approach.

More advanced versions of this concept include using things like RegEx to account for synonyms, though these types of methods based on pattern matching work to an extent, pattern matching has some inherent limitations:
  • When using pattern matching it becomes difficult to account for synonyms and natural language within large datasets.

  • Pattern matching doesn’t allow you to account for results that might be relevant to a users' query despite not using the keyword in the title tag.

  • Pattern matching doesn’t “understand” context.
Impressively, GPT-3.5 Turbo does actually demonstrate proficiency at identifying the relevance of a title tag in relation to keywords while taking into account the overall context of the query. For example, below you can see that ChatGPT knows that the title tag “3rd Gen 4Runner Exterior Accessories” is a not a relevant title tag for the query “3rd Gen 4Runner Wheels”:





In addition, GPT-3.5 Turbo also does a good job at understanding context which pattern matching and even the most advanced form of custom regular expressions fail to accomplish. 


In our example below we see that ChatGPT understands the second ranking result which has a title tag of “1996-2002 4Runner Aftermarket Wheels” as being a relevant title tag because 3rd Gen 4Runner wheels are wheels made for 4Runners produced between 1996-2002:




When we change the years in the prompt, ChatGPT no longer views the title tag as being relevant:

This type of rich contextual understanding gives GPT an unprecedented ability to identify irrelevant results ranking for any given keyword.


So, what is a user-friendly way that we can use GPT 3.5 to identify keywords with irrelevant results ranking at scale? Though there are various, and perhaps, technically more sound ways to do this, we'll walk through a practical and easy-to-understand example to demonstrate the concept in practice.


First, to get started you will need to register for an Ahrefs account (you will need to register for a standard account or above to be able to export SERP data), and in addition you will need to sign up to OpenAI’s API and generate an Open AI API Key. Finally, you will need to add the GPT For Work and Sheets Addon.


Step 1: Create a Google Sheet


To get started you’ll want to create a Google Sheet. For our example we’ll call it “4Runner Wheels.” After creating your sheet, you’ll want to have a keyword column and a column to report back the number of irrelevant results for each keyword. Additionally, you’ll want to create an additional sheet to store your SERP data:




Step 2: Export Your Ahrefs SERP Data For Your Seed Keyword(s)


After typing in your seed keyword(s) export the associated SERP Data from the matching terms report:



Ensure to include the top positions from the SERP in your export:





After exporting your data upload your export sheet directly into sheets. Once you have uploaded your data you are going to want to filter by organic listings only in order to keep the data clean (the data risks getting a bit muddied if you include other result types):



After filtering by organic, you can then export your data into the SERP Data sheet:



Step 3: Craft Your Prompt Your SERP Analysis


After importing your SERP data you will then need to create an additional prompt column that will allow you to prompt GPT to check if the title tag is relevant in relation to the keyword in question. You will want to keep your prompt succinct and limit the answer to a single word output of “Yes” or “No.” Open AI's API tokens usage is predominantly based on word count input (your prompt) and output (what the model spits out). Increasing the number of words for inputs and outputs means increasing costs in terms of both money and computational resources. In our example we’ll be using the following prompt: “Is this title tag relevant in relation to the keyword in question? Consider context, synonyms and natural language. Your output should simply be “yes or “no” Keyword: Title:” You’ll then want to fill the keyword and title field in the prompt using the values for keyword and title found within your exported Ahrefs SERP Data. To do this you can pull the values directly from the SERP data sheet to craft your prompt, in our example the prompt for each keyword and title is as follows: =" “Is this title tag relevant in relation to the keyword in question? Consider context, synonyms and natural language. Your output should simply be “yes or “no” Keyword:"&A2 & "title:" &R2 After crafting the logic for your prompt you can now Launch GPT For Sheets and Docs under extensions by clicking add-ons > GPT For Sheets and Docs > Launch and Enable Functions:



Step 4: Label Your SERP Data Using the GPT Prompt Function and Your Prompt


For this example we’ll simply add a column called “Relevant?” and in order to label the title tags we’ll use the GPT function made available by GPT For Work to label our keywords:




Next, you can simply fill handle down if you’re working with a smaller dataset:




After that GPT-3.5 Turbo API will have analysed the keyword in relation to every title tag ranking in the top organic results. Once we have this categorical data, it becomes pretty easy to translate into numerical data to identify low-competition keywords.

Step 5: Use a Countif Function to Count the Number of Irrelevant Title Tags For Each Keyword

Now, in order to count the number of irrelevant title tags all we have to do is use a CountIf Function:


After that simply copy the formula down a voila! You have the number of irrelevant pages ranking for each keyword in your dataset!

Step 6: Filter Keywords By Descending Order to See the Keywords With the Lowest Levels of Competition

Once all of the keywords are labelled you can either apply filtering conditions or just sort the range by descending order to identify the keywords with highest levels of irrelevant results ranking:


The Results of The Process

Even with a small dataset of only 82 keywords we unearthed some gems! Typically anything with 2 or more irrelevant results ranking warrants attention. What’s great about this particular approach is it allows you to identify keywords that DR filtering will not pick up. For a lot of these keywords high DRs actually rank, but it is clear that anyone could rank for the keyword because the results ranking are irrelevant and are accompanied by other signs of low-competition. DR filters would otherwise miss these keywords because of high DR websites ranking in the top 5-10 with outwardly “authoritative pages." However, in reality, these pages are irrelevant in relation to the actual query.
 
With that said let’s check out our results below:

Great Collection Pages For Low-competition Keywords

Right away, within our data, we can see patterns for both “bronze 4Runner wheels” and “black 4Runner wheels.” With this directional data, we can then drill into the SERP data a little deeper to see if these keywords are worth targeting.


When we further investigate the SERP we can see that in the top 5 there is a DR 16 website targeting “4Runner Wheels” ranking. This is a low DR website AND it does not target “black 4Runner wheels”:


 
We also notice other signs of low competition like an irrelevant Pinterest link that ranks with the title “Black 4Runner with trod wheels":





This means that we could simply create a collection page for “Black 4Runner Wheels” and we will more than likely rank in the top 5! Also for the topic of "bronze 4Runner wheels" we see signs of low-competition with a DR 1 website ranking in the top 11 and DR 15 website with non-targeted title tag ranking in 6th:



There you have it! We've identified two great low-competition keywords that can serve as the basis for creating compelling collection pages or engaging blog posts!

Awesome Brand Pages For Low-competition Keywords

In our data we also see a lot of branded searches occurring specific to to 4Runner wheels:


When we look at the SERP for "relation race wheels 4runner" we see that the competition is pretty low as a Pinterest post is ranking 3rd with the title tag “Pin on 4Runner Stuff” and a facebook post ranks 11th:


Additionally for “Method 4Runner Race Wheels" we observe lower DR websites ranking, which are not notably well-optimized for this specific topic.


These are both easy brand collection pages that present an excellent opportunity to create and capture traffic.

Summary of Results

There you have it! From just a small dataset of 82 keywords we are able to identify low-competition keywords to create posts/pages, and rank in the top 5 easily! You can imagine what the results might look like with much larger datasets.

The type of filtering we employed above, also gives you an advantage over a lot of filtering techniques such as DR filtering that would not likely identify these types of low-competition keywords because of high DR websites outranking with irrelevant results.

Advice and Caveats

Though this type of analysis is definitely helpful, there are some things to keep in mind:
  • If you decide to use GPT For Google Sheets and Docs you will only be able to label about 500 rows of data per minute, otherwise, you’ll receive an error saying you surpassed your rate limit, so don’t try to label more than 500 rows at a time.

  • If you are working with a dataset of more than 10,000 rows of SERP data and don’t want to get “too codey” you’ll want to make use of a macro. Otherwise, the labelling process is time consuming and monotonous.

  • With a single API account you can label about 30,000 to 40,000 rows of SERP data per hour, if you have a really large dataset take this into consideration.

  • When selecting an API, pick the GPT 3.5 Turbo which is the most cost effective and fastest option.
  • Avoid trying to include multiple title tags in a prompt, as it decreases both accuracy and counter-intuitively costs more tokens.

  • Remember that it costs about $5.00 per 100,000 rows of SERP data you analyse (roughly equally to about 10,000 keywords).

  • Frequently replace formulas with values as you might lose as all of your formulas, as they will reset if you leave and come back to the Google sheet.

  • GPT-3.5 is fairly generous in rewarding relevance to a title tag, so, when it does not reward relevance it’s a really strong indication it’s a low-comp SERP, so take it seriously!

Working With Larger Datasets (10,000 + keywords)

The example above is intended to be beginner friendly and is intentionally crude. If you are working with larger datasets do not attempt to follow the steps above, as it will take A LONG time and is not at all practical.
Depending on your situation and skillset there’s a variety of ways you can apply this concept to larger datasets, but here are few things to keep mind:
  • You’ll hit rate limits per minute fairly quickly while running this type of analysis, you’ll want to consider using an exponential backoff and you might also want to check out OpenAI’S Python notebook which discusses rate limits and how to deal with them while using Python. You can also request a rate limit increase using the OpenAI API Rate Limit Increase Request Form.
  • For anything beyond 30,000 + keywords the associated SERP data for keywords becomes pretty large and you’ll hit the Google Sheets 10 million cell limit quickly, so I wouldn’t recommend using Google sheets for any dataset containing more than 30,000 keywords and 500,000 rows of associated SERP data.

Using GPT 3.5 Turbo to Identify Forum Links Ranking For Any Given Keyword

LLMs thrive on natural dialogue data, and as such, GPT 3.5 has an incredible working knowledge of forum websites which are rich with dialogue between humans. This makes GPT-3.5 able to identify forum links rank with a VERY high-degree of accuracy. This makes GPT 3.5 Turbo well-suited to identify the number of forum links ranking for any given keyword. The higher the number of forum links ranking, typically the lower the competition for the keyword. Why is this? It’s because user generated content does not target keywords. Users are having conversations about what’s important to them, they are not creating threads to try to rank on Google. So, therefore, when a forum link ranks it's ranking for a keyword it's not targetted at, this indicates low-compeition levels for that keyword.

Going back to our example earlier we see two forum links ranking:



Neither of these forums are particularly targetted or relevant, indicating that the SERP is filled with non-targeted search results which means it will be easy rank with a more relevant page or resource.
We also see other signs of low-competition such as low DRs ranking and completely irrelevant title tags ranking like “3rd gen 4runner accessories” ranking as well. Similar to identifying the number of irrelevant results ranking, we can use the same method to identify the number of forum links ranking for each keyword. Let’s start by adding a new column to our sheet called “# of forum links”. 
 
Now what you need to do is craft a new prompt that asks GPT if the link is a forum link or from a conversation thread:



You can then simply fill handle the prompt down to check each URL in your SERP data export:

Next, similar to before, we can use a count if statement that checks for how many forum links rank for each URL:



After that simply fill handle the formula down and then you will have how many forum links are ranking for each keyword! Then you can simply sort by descending order to see keyword with the highest number of forum links ranking:


There ya have it! You now have how many forum links are ranking for each URL. Typically if there are three or more forum links ranking in the top 10 it is likely a pretty low-competition keyword you can go after. Another thing to keep an eye out for is if there are BOTH forum links and irrelevant results ranking in the organic results, if that is the case, it's DEFINETLY a low-competiion keyword. If both rank, it is almost a guarantee that the keyword is low-compeition and you can rank for it!

Conclusion

In today's post we looked at two unique ways we can use GPT 3.5 Turbo to identify low-competition keywords by identifying the number of irrelevant results and the number of forum links ranking for any given keyword we want to analyse.

This type of analysis is in its infancy and it will be interesting to see the improvements in accuracy achieved through the release of future models like GPT-4 (the GPT-4 API is still in its beta and expensive as hell) and fine-tuned models designed specifically for this purpose. 
 
The examples we discussed were intentionally designed to be beginner friendly. With that said, I would LOVE to hear from the community about how you might apply this method to larger datasets and how you would go about it!




Comments

Popular posts from this blog

How to Walk a Dog