鱼C论坛

 找回密码
 立即注册
查看: 2165|回复: 1

[已解决]Python网络爬虫的问题

[复制链接]
发表于 2018-4-1 10:47:33 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
  1. import requests
  2. from bs4 import BeautifulSoup
  3. header = {
  4.     'User-Agent':' Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
  5. }
  6. res = requests.get('https://www.qidian.com/',headers=header)
  7. soup = BeautifulSoup(res.text,'html.parser')
  8. print(soup.prettify())
复制代码

初学爬虫,想要爬取起点网的网页源代码
结果出现这个
Traceback (most recent call last):
  File "F:/Python/Exercise set/爬虫之旅.1.py", line 6, in <module>
    res = requests.get('https://www.qidian.com/',headers=header)
  File "C:\Users\DELL\venv\lib\site-packages\requests\api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\DELL\venv\lib\site-packages\requests\api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\DELL\venv\lib\site-packages\requests\sessions.py", line 494, in request
    prep = self.prepare_request(req)
  File "C:\Users\DELL\venv\lib\site-packages\requests\sessions.py", line 437, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "C:\Users\DELL\venv\lib\site-packages\requests\models.py", line 306, in prepare
    self.prepare_headers(headers)
  File "C:\Users\DELL\venv\lib\site-packages\requests\models.py", line 440, in prepare_headers
    check_header_validity(header)
  File "C:\Users\DELL\venv\lib\site-packages\requests\utils.py", line 869, in check_header_validity
    raise InvalidHeader("Invalid return character or leading space in header: %s" % name)
requests.exceptions.InvalidHeader: Invalid return character or leading space in header: User-Agent
求解答,这是什么意思??
最佳答案
2018-4-1 12:49:54
  1. import requests
  2. from bs4 import BeautifulSoup
  3. header = {
  4.     'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
  5. }
  6. res = requests.get('https://www.qidian.com/',headers=header)
  7. soup = BeautifulSoup(res.text,'html.parser')
  8. print(soup.prettify())
复制代码

头部代理多了个空格,这是在HTTP2协议中不被允许的。
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2018-4-1 12:49:54 | 显示全部楼层    本楼为最佳答案   
  1. import requests
  2. from bs4 import BeautifulSoup
  3. header = {
  4.     'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
  5. }
  6. res = requests.get('https://www.qidian.com/',headers=header)
  7. soup = BeautifulSoup(res.text,'html.parser')
  8. print(soup.prettify())
复制代码

头部代理多了个空格,这是在HTTP2协议中不被允许的。
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-3-29 05:38

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表