'UCS-2' codec can't encode character......问题解决方法

°蓝鲤歌蓝 · 发表于 2018-1-1 20:23:11

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

今天无聊爬了一下豆瓣，我看到https://www.douban.com/photos/album/1649942160/里面是个黑白漫画，感觉有点意思，想把里面的图片爬取下来。
于是出现了这样的问题：'UCS-2' codec can't encode characters in position 40276-40276: Non-BMP character not supported in Tk
我觉得它的解决方法有点意思，所以发帖交流一下。哪位大神知道原理，可以在评论区告诉我，谢谢。
附上代码：

import requests
import os
import re
import sys
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
url = 'https://www.douban.com/photos/album/1649942160/'
def url_open(url):
headers = {'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0'
,'Referer':'https://www.douban.com/'
}
response = requests.get(url,headers=headers)
return response
#html = response.text
#print(html)
html = url_open(url).text.translate(non_bmp_map)#text
#print(html)
p = r'<img width="130" src="([^"]+\.jpg)"'
img_addrs = re.findall(p,html)
print(img_addrs)
x = 1
os.mkdir("douban")
os.chdir("douban")
for each in img_addrs:
file = str(x) +".jpg"
with open(file,"wb") as f:
img = url_open(each).content
f.write(img)
x +=1

复制代码

账号		自动登录	找回密码
密码			立即注册

[技术交流] 'UCS-2' codec can't encode character......问题解决方法

马上注册，结交更多好友，享用更多功能^_^