Friday, October 7, 2016

Analyzing free proxies from public websites


Recently,  I've been scraping free proxies from public websites like hidemyass and incloak.

I added another source to the scraper to pull proxies from my-proxy.com and stored the proxies to my local MySQL database.

Here's the table structure:

+------------------+--------------+------+-----+-------------------+----------------+
| Field            | Type         | Null | Key | Default           | Extra          |
+------------------+--------------+------+-----+-------------------+----------------+
| id               | int(11)      | NO   | PRI | NULL              | auto_increment |
| scrape_timestamp | datetime     | YES  |     | CURRENT_TIMESTAMP |                |
| last_update      | varchar(255) | YES  |     | NULL              |                |
| ip_address       | varchar(255) | YES  |     | NULL              |                |
| port             | varchar(255) | YES  |     | NULL              |                |
| country          | varchar(255) | YES  |     | NULL              |                |
| speed            | int(11)      | YES  |     | NULL              |                |
| connection_time  | int(11)      | YES  |     | NULL              |                |
| type             | varchar(255) | YES  |     | NULL              |                |
| anonymity        | varchar(255) | YES  |     | NULL              |                |
| source           | varchar(255) | YES  |     | NULL              |                |
| tested           | tinyint(1)   | YES  |     | NULL              |                |
| verified         | tinyint(1)   | YES  |     | NULL              |                |
| used             | tinyint(1)   | YES  |     | NULL              |                |
| date_used        | datetime     | YES  |     | NULL              |                |
+------------------+--------------+------+-----+-------------------+----------------+

Here's the data gathered after using whatismyipaddress.com for verifying the list.

SELECT t1.source,
       t2.total,
       t1.verified,
       (SELECT Truncate(( t1.verified / t2.total * 100 ) + 0.06, 0)) AS
       percent_working
FROM   (SELECT source,
               Count(*) AS verified
        FROM   proxies
        WHERE  verified = 1
        GROUP  BY source) AS t1
       LEFT JOIN (SELECT source,
                         Count(*) AS total
                  FROM   proxies
                  GROUP  BY source) AS t2
              ON t1.source = t2.source; 



+-----------+-------+----------+-----------------+
| source    | total | verified | percent_working |
+-----------+-------+----------+-----------------+
| hidemyass |   309 |       74 |              24 |
| incloak   |   436 |       78 |              17 |
| my-proxy  |   900 |       55 |               6 |
+-----------+-------+----------+-----------------+

Graph generated here: https://jsfiddle.net/L3vnuyx6/


my-proxy has the highest number of free proxies. It also has the lowest number of verified working proxies.

hidemyass has the highest percentage (verified / total)

incloak has the highest number of verified proxies in terms of count.

No comments:

Post a Comment