Difficulty Level: 3/5
Forums are a great way to get some targeted traffic, back-links, topic ideas, build awareness of your brand, ect ect ect. Normally you will find endless amounts of forums in each niche, and its really hard to stay onto of. More so with finding topic/threads that you are knowledgeable in, and can contribute too. You will be amazed at what value you can get out of them, by just giving genuine advice and help. Don’t ask/push anything on them, just be a cool person and help out. In a nut sell we are collecting a list of Urls that gscraper scrapes from the bing/google search results of those keywords. It then Filters out any site that’s source code doesnt have one of our custom footprints. Lastly it then removes all the urls that dont have a very specific keyword within the HTML. ( the topic we know a lot about).
Step 1: Data Collection
Load up a file of private proxies or find a decent list of free ones online that will update. If you are planning on doing a lot of scrapping with gscrapper/scrapebox, then you might think about buying some semi private or dedicated proxies. My personal favorite type are called Backconnect, or Reverse Backconnect – Its a ip address that connects to a server that constantly change their address to a “residential looking ip.” This way you have a never ending rotating list of fresh proxies to scrape with without having to update/change anything in the software. Though its not 100% needed, 10 semi private proxies will get a lot of work done for you. You will find this config setup under the Gscraper Proxy Menu tab.
Finding A Free List of Proxies
If you search in google for “txt proxy list” or “inurl:proxy txt” you should find some Page url’s that end with txt. This is important because it needs to be in that format for it to work with gscraper. Enter this url into the Get proxies from own address text box and Import. If the proxy provider updates their list often during the day, or you are scraping them/hosting them yourself then you can select a time frame for it to refresh the page. This will clear and replace all proxies with the new updated list.
Finding A Footprint List
We are trying to look for forums, and generally most people are not going to pay/build a custom coded forum. They are going to use a web application like bbpress or vBulletin. The easiest way is to just google what you are needing list wise. Normally the scraping/seo community will share alot of good info. I googled Forum Footprint list and found this thread on blackhatworld.
Text Mechanic is diffidently bookmark worthy, you will use it alot with alot of the scraping/linking processes. Their Find and Replace is awesome for cleaning up random url/keyword list’s like this, being we dont need the ? in the phrase.
Creating A Laser Targeted Topic Keyword List
Rhymezone, I did end up finding this keyword tool to be more helpful for finding forums. We are really looking for very targeted 1 word phrases that give us a high chance of having the correct niche related results, and this did the best job at that. Long tail keywords would likely pull in a lot of false positive results.
Step 2: GScraper Configuration
The Main things to remember is to set a txt file location for the Urls list we are scraping.
Timeout: The amount of time the software waits for the page to load before moving on. 15-30 seconds is normally fine for most people.
Results: If you are looking for recent threads in forums to reply too, that likely will get a response back. Then go ahead and set it to pull results from the last 30 days. If you are just looking for any forums that are related to your niche, then any large time frame will work.
Threads: This really comes down to the proxies you are using, and how many you have. If its dedicated or semi private proxies, they are normally limited to 10 threads Per Proxy. So if you have 10 private proxies, then run 100 threads. If you are using a random list online or scraping fresh ones with another program, then 20-50 threads should be fine for most people.
Import Keywords and Footprints
Import Footprints – Manually or upload a txt file.
Import Keywords – Manually or upload a txt file.
Step 3: Harvesting URL’s with Gscraper
Start Scrape: and let the magic happen. If you are running a large list, you will get a prop asking if you want to show URLs when scraping, click no to save cpu/memory.
Once the gscraper is finished harvesting, you may not see any urls pre-populated in gscraper url list. If so, then you will need to List Import the txt file you created before the you started the scraping operation. These are the results i got with Forum Footprints, and WebDev related keywords. I ended up finding alot of random urls that were not 100% related, but we can filter this down more.
Trial and Error: Finding Keywords for Filtering
under the filter tab – you have multiple options of remove a url if it doesn’t met a cretin criteria. We want forums we can post in, that are asking/talking about WordPress related topics. This way we can most likely respond and help them with their question and build up a reputation in that group of people.
Remove if HTML doesnt include
These two keywords along with the forum footprints/main keywords did end up helping filter the list down to more of what I was looking for. I think digging abit more into forums and finding better footprints would help. As you will see the “showthread” keyword was a good one for finding forums post in the exported URL list.
Cleaning up the Results
Being we are looking for internal links to threads – we can remove all domains that are just a root domain.
Select Remove URL if only Root
and click the DO button.
List Export – CSV
Make sure you save the file as a CSV so we can open it up in Microsoft Spreadsheet or Openoffice calc.
(Yes, DayZ is awesome )
Manually Reviewing the Results
This is the part we were talking about earlier about digging more into filtering the scraping. The list is pretty decent right now, we just have to dig abit. Something along the lines of Inurl:showthread might be a good one to test. In the exported list of urls we got, I found this one with a low amount of No-follow link . Generally if the number of no-follows is low, there is chance it has Do-follows. If you see a ton of no-follow links; its not a bad thing, still might be worth replying if its getting traffic/views.
Step 4: Leave a Reply
When we take a look at this page, you can see its a recently posted thread related to website design/marketing. Without looking into their backlink profile, just based on their Domain Authority in Moz. I’d guess this is likely a good site to register with, and leave a helpful response/link on. Tip: If you take a look at the forum’s source code, you can dig for some more footprints/filters that other webdev forums would have. Then you can keep refining the results to only get what you wanted, it likely will take a few attempts.