|
楼主 |
发表于 2017-9-29 20:59:09
|
显示全部楼层
我的问题就是前面说的那样,代码运行时报错了,报错内容是:TypeError: can't use a string pattern on a bytes-like object
我想知道哪里错了,应该怎么改?这样说能够明白吗?真心求教。
前面导入了一个之前编写的下载网页的文件, download_url.py。下面附代码
- # !/usr/bin/env python
- # -*- coding:utf-8 -*-
- from urllib import request, error
- import chardet
- # 抓取网页内容
- def download_url(url, num_retries = 2):
- print('正在下载网页:', url)
- try:
- # 设置用户代理
- headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
- 'Chrome/52.0.2743.116 Safari/537.36 Edge/15.16193'}
- req = request.Request(url, headers=headers)
- html = request.urlopen(req).read()
- # 匹配网页编码方式
- charset = chardet.detect(html)['encoding']
- if charset == 'utf-8':
- html = html.decode('utf-8')
- elif charset == 'gbk' or charset == 'gb2312' or charset == 'GB2312':
- html = html.decode('GB18030')
- except error.URLError as err:
- print('Download error', err)
- html = None
- # 遇到5xx错误(服务端错误)时重新下载网页两次
- if num_retries > 0:
- if hasattr(err, 'code') and 500 <= err.code < 600:
- return download_url(url, num_retries-1)
- return html
- if __name__ == '__main__':
- download_url(url)
复制代码 |
|