The easiest way to find fresh sources of proxies is to filter googles results to the last 24 hours. This will pull up sites that have recently updated a page, or have created a new one. We want sites that are constantly updating their list, not old static blogs from 2015. These google passed proxies intended for web scraping tools. Such as GSA Search Engine Ranker, SEnuke, & Scpraebox. The process is the same for finding transparent, anonymous, & elite proxies as well.
When using the 24 hour result filter. The number of pages in the results are very limited, so you should use broad search terms. Below are some examples of search phrases you can enter into google. Most of these search operators are similar enough to use in Bing, Yahoo, and other search engines as well.
24hr Result Filter Search phrases:
- socks5 list
- free proxy list
- ssl proxy list
- free proxy
- socks4 list
- Socks proxy free
No filter Broad Search Phrases:
- socks4 OR socks5 + proxy OR proxies + list OR download OR .txt OR daily
- daily OR fresh + proxies OR proxy + list
- blogspot + socks OR socks4 OR socks5 + Proxy OR proxies
- SSL + proxies OR proxy + list OR server
- Proxy OR proxies + 24 + Free OR list OR download OR daily
- socks-5 OR socks-4 + list OR download OR free OR fresh
- Anonymous Proxies + list
- */*/2017 socks4 OR socks5 + proxy OR proxies + list
List of Common Proxy Ports:
If you want to pull a larger selection of results, or scrape for urls. The above phrases should return around 1 million results each. This is for uncovering harder to find proxy server sites
Blog "Label" Footprints:
Fast Proxy Server
Free Proxy Server List
High Anonymous Proxies
New Proxy Server List
Proxy Judge List
Proxy Server List
Google Proxy Servers
Fresh New Proxies
Fresh US Proxies
New Fresh Proxies
US Proxy Servers
Agario Bots Socks 5
Fast Socks 5 Servers
US Socks 5 Servers
Socks 5 List
Socks Proxy List
Socks 4/5 Uncommon Ports
Socks 5 Scanned
Socks 5 Yahoo Voice
US Socks Proxy List
VIP Socks 5
Proxy Scrapers and Testers
Sites to crawl for Socks4 & socks5 proxies:
(High Google Pass Results sites from json’s personal list in scrapebox)
Extracting Proxies from Inner pages Using Scrapebox:
If you are using scrapebox, and are using a source that creates new pages for the daily proxy updates. You will need to find a footprint within the URLs. Generally they almost all have one common footprint.
Examples: proxy-list, list.txt, ssl-list, free-ssl, proxylist.txt , ect ect ect.
If the source includes multiple list that use different footprint. Such as one list for anonymous and one of ssl proxies. Then you will separate the footprints with the | character.
Example of a blog style free proxy list. You will want to look at the post URLs and find a common footprint that they share. In this example, it would be one of the following. Any of these combinations would work.
URL must include:
I would recommend going with a semi-targeted footprint such as proxy-list.
Socks and list_ might pull in unwanted results.
Socks-proxy-list is a bit detailed and might filter out URLs we do want.
After you enter in the footprints, you will want to test it.
Scrapebox will popup a Test Proxy source menu. It should pull up a list of “links received from page”
Double click on the link to make sure it uploaded the proxies correctly.
After you have gathered your list of sources to harvest proxies from. Go ahead and test the results.
Run the complete test, and open up classify sources.
This will bring up a chart, where you can see the results from the sources. Showing you the % of Google passed, and anonymous passed proxies. This way you can delete low reward sites, to save time when scraping for fresh proxies on another day.