一个简单的爬虫，求大佬帮我检查错在哪里

昔日少年郎 · 发表于 2018-6-17 16:53:25

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

from urllib import request
import re

class Spider():
      url = "https://www.panda.tv/cate/lol?pdt=1.24.s1.3.2c6qoma1l34"
      root_pattern = '<div class="video-info">([\s\S]*?)</div>'
      name_pattern = '</i>([\s\S*?])</span>'
      number_pattern = '<span class="video-number"></span>'

      def __fetch_content(self):
            r = request.urlopen(Spider.url)
            htmls = r.read()
            htmls = str(htmls,encoding = "utf-8")
            return htmls

      def __analysis(self,htmls):
            root_htmls = re.findall(Spider.root_pattern,htmls)
            anchors = []
            for html in root_htmls:
                     name = re.findall(Spider.name_pattern,root_htmls)
                     number = re.findall(Spider.number_pattern,root_htmls)
                     anchor = {'name':name,'number':number}
                     anchors.append(anchor)
            return anchors

      def __refine(self,anchors):
            l = lambda anchor:{'name':anchor['name'][0].strip(),'number':anchor['number'][0]}
            return map(l,anchors)

      def go(self):
            htmls = self.__fetch_content()
            anchors = self.__analysis(htmls)
            anchors = list(self.__refine(anchors))
            print(anchors)

s = Spider()
s.go()

Charles未晞 · 发表于 2018-6-17 18:28:41

root_htmls转成str类型

str(root_htmls)

复制代码

正则表达式对不对就不知道了

昔日少年郎 · 发表于 2018-6-17 19:15:26

Charles未晞发表于 2018-6-17 18:28
root_htmls转成str类型

正则表达式对不对就不知道了

登录/注册后可看大图

Charles未晞 · 发表于 2018-6-17 22:31:22

昔日少年郎发表于 2018-6-17 19:15

看不到你消息

账号		自动登录	找回密码
密码			立即注册