鱼C论坛

 找回密码
 立即注册
查看: 1721|回复: 2

python3.6 scrapy的crawl命令老是出错,求大家帮忙看看问题,代码

[复制链接]
发表于 2017-12-27 22:04:35 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x

总是最后一行出错

总是最后一行出错

新建的scrapy项目

新建的scrapy项目

  1. #stocks.py
  2. # -*- coding: utf-8 -*-
  3. import scrapy
  4. import re


  5. class StocksSpider(scrapy.Spider):
  6.     name = 'stocks'
  7.     start_urls = ['http://quote.eastmoney.com/stocklist.html']

  8.     def parse(self, response):
  9.         for href in response.css('a::attr(href)').extract():
  10.             try:
  11.                 stock = re.findall(r"[s][hz]\d{6}",href)[0]
  12.                 url = 'https://gupiao.baidu.com/stock' + stock + '.html'
  13.                 yield scrapy.Resquest(url, callback = self.parse_stock)
  14.             except:
  15.                 continue
  16.             
  17.     def parse_stock(self, response):
  18.         infoDict = {}
  19.         stockInfo = response.css('.stock-bets')
  20.         name = stockInfo.css('.bets-name').extract()[0]
  21.         keyList = stockInfo.css('dt').extract()
  22.         for i in range(len(keyList)):
  23.             key = re.findall(r'>.*</dt>',keyList[i])[0][1:-5]
  24.             try:
  25.                 val = re.findall(r'\d+\.?.*</dd>',valueList[i])[0][0:-5]
  26.             except:
  27.                 val = '--'
  28.             infoDict[key] = val
  29.         infoDict.update(
  30.             {'股票名称':re.findall('\s.*\(',name)[0].split()[0]+\
  31.              re.findall('\>.*<',name)[0][1:-i]})
  32.         yield infoDict
  33.         
复制代码



  1. #pipelines.py
  2. # -*- coding: utf-8 -*-

  3. # Define your item pipelines here
  4. #
  5. # Don't forget to add your pipeline to the ITEM_PIPELINES setting
  6. # See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html


  7. class BaidustocksPipeline(object):
  8.     def process_item(self, item, spider):
  9.         return item

  10. class BaidustocksInfoPipeline(object):
  11.     def open_spider(self, spider):
  12.         self.f =open('BaidustockInfo.txt','w')

  13.     def close_spider(self,spider):
  14.         self.f.close()

  15.     def process_item(self, item , spider):
  16.         try:
  17.             line = str(dict(item)) + '\n'
  18.             self.f.write(line)

  19.         except:
  20.             pass
  21.         return item
复制代码



  1. #settings.py
  2. # -*- coding: utf-8 -*-

  3. # Scrapy settings for BaiduStocks project
  4. #
  5. # For simplicity, this file contains only settings considered important or
  6. # commonly used. You can find more settings consulting the documentation:
  7. #
  8. #     http://doc.scrapy.org/en/latest/topics/settings.html
  9. #     http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
  10. #     http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html

  11. BOT_NAME = 'BaiduStocks'

  12. SPIDER_MODULES = ['BaiduStocks.spiders']
  13. NEWSPIDER_MODULE = 'BaiduStocks.spiders'


  14. # Crawl responsibly by identifying yourself (and your website) on the user-agent
  15. #USER_AGENT = 'BaiduStocks (+http://www.yourdomain.com)'

  16. # Obey robots.txt rules
  17. ROBOTSTXT_OBEY = True

  18. # Configure maximum concurrent requests performed by Scrapy (default: 16)
  19. #CONCURRENT_REQUESTS = 32

  20. # Configure a delay for requests for the same website (default: 0)
  21. # See http://scrapy.readthedocs.org/en/latest/topics/settings.html#download-delay
  22. # See also autothrottle settings and docs
  23. #DOWNLOAD_DELAY = 3
  24. # The download delay setting will honor only one of:
  25. #CONCURRENT_REQUESTS_PER_DOMAIN = 16
  26. #CONCURRENT_REQUESTS_PER_IP = 16

  27. # Disable cookies (enabled by default)
  28. #COOKIES_ENABLED = False

  29. # Disable Telnet Console (enabled by default)
  30. #TELNETCONSOLE_ENABLED = False

  31. # Override the default request headers:
  32. #DEFAULT_REQUEST_HEADERS = {
  33. #   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  34. #   'Accept-Language': 'en',
  35. #}

  36. # Enable or disable spider middlewares
  37. # See http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html
  38. #SPIDER_MIDDLEWARES = {
  39. #    'BaiduStocks.middlewares.BaidustocksSpiderMiddleware': 543,
  40. #}

  41. # Enable or disable downloader middlewares
  42. # See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
  43. #DOWNLOADER_MIDDLEWARES = {
  44. #    'BaiduStocks.middlewares.MyCustomDownloaderMiddleware': 543,
  45. #}

  46. # Enable or disable extensions
  47. # See http://scrapy.readthedocs.org/en/latest/topics/extensions.html
  48. #EXTENSIONS = {
  49. #    'scrapy.extensions.telnet.TelnetConsole': None,
  50. #}

  51. # Configure item pipelines
  52. # See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html
  53. ITEM_PIPELINES = {
  54.     'BaiduStocks.pipelines.BaidustocksInfoPipeline': 300,
  55. #}

  56. # Enable and configure the AutoThrottle extension (disabled by default)
  57. # See http://doc.scrapy.org/en/latest/topics/autothrottle.html
  58. #AUTOTHROTTLE_ENABLED = True
  59. # The initial download delay
  60. #AUTOTHROTTLE_START_DELAY = 5
  61. # The maximum download delay to be set in case of high latencies
  62. #AUTOTHROTTLE_MAX_DELAY = 60
  63. # The average number of requests Scrapy should be sending in parallel to
  64. # each remote server
  65. #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
  66. # Enable showing throttling stats for every response received:
  67. #AUTOTHROTTLE_DEBUG = False

  68. # Enable and configure HTTP caching (disabled by default)
  69. # See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
  70. #HTTPCACHE_ENABLED = True
  71. #HTTPCACHE_EXPIRATION_SECS = 0
  72. #HTTPCACHE_DIR = 'httpcache'
  73. #HTTPCACHE_IGNORE_HTTP_CODES = []
  74. #HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
复制代码
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2017-12-28 00:42:32 | 显示全部楼层
# Configure item pipelines
# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
    'BaiduStocks.pipelines.BaidustocksInfoPipeline': 300,
#}
setting.py 68-70行,大括号后面没有打开注释,顶我上去
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2017-12-28 11:05:49 | 显示全部楼层
ド゛゜范 发表于 2017-12-28 00:42
# Configure item pipelines
# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html
...

好像还是不行呀,那个是为了使用我们自己定义的新类 333333.jpg
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-3-29 03:47

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表