|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 和vvv 于 2017-6-25 20:00 编辑
28 同时使用用户代理池与IP代理池的方法
前面已经了解了用户代理池和IP代理池的构建和使用。现在,我们要将它们联合起来使用。其达到的目的相当于用别人的电脑IP和浏览器爬取网页,对我方形成了很好的伪装。
直接上代码:- import random
- import urllib.request
- import re
- #IP代理池:其中的代理IP请给据实际情况修改。
- def ip():
- ippools = [
- "122.226.168.180",
- "61.191.41.130",
- "115.231.175.68",
- ]
- thisip = random.choice(ippools)
- print(thisip)
- proxy = urllib.request.ProxyHandler({"http":thisip})
- opener = urllib.request.build_opener(proxy,urllib.request.HTTPHandler)
- urllib.request.install_opener(opener)
- #用户代理:其中的User-Agent尽量越多越好(但都必须是可用的)
- def ua():
- uapools = [
- "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0",
- "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)",
- "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)",
- ]
- thisua = random.choice(uapools)
- print(thisua)
- headers = ("User-Agent",thisua)
- opener = urllib.request.build_opener()
- opener.addheaders = [headers]
- urllib.request.install_opener(opener)
-
- #利用代理IP爬网站
- for i in range(0,30):
- try:
- #调用IP代理
- ip()
- #调用用户代理
- ua()
- data = urllib.request.urlopen("http://www.baidu.com").read().decode("utf-8")
- #提取标题
- pat ='<title>(.*?)</title>'
- rst = re.compile(pat).findall(data)
- print(rst[0])
- except Exception as err:
- print(err)
复制代码 由于此刻IP还好用,所以结果还可以。- 61.191.41.130
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 61.191.41.130
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 115.231.175.68
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 115.231.175.68
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)
- 百度一下,你就知道
- 115.231.175.68
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)
- 百度一下,你就知道
- 115.231.175.68
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)
- 百度一下,你就知道
- 115.231.175.68
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 115.231.175.68
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)
- 百度一下,你就知道
- 61.191.41.130
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 61.191.41.130
- Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 122.226.168.180
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)
- 百度一下,你就知道
- 115.231.175.68
- Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0
- 百度一下,你就知道
- 61.191.41.130
- Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0
- 百度一下,你就知道
- 61.191.41.130
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
- 百度一下,你就知道
- 61.191.41.130
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)
- 百度一下,你就知道
- 115.231.175.68
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)
复制代码
如果要将爬虫运用在实际工作中,用户代理和IP代理应该是必不可缺的。
如果大家觉得还可以的话,可以到我的淘专辑(http://bbs.fishc.com/forum.php?mod=collection&action=view&ctid=742&fromop=my)看看或者评评分。
|
评分
-
查看全部评分
|