blackyellow
发表于 2018-10-31 00:04:26
盛大、
fyzhangyi
发表于 2018-10-31 20:37:06
想学习一下
zay7065346
发表于 2018-11-1 15:45:42
朕想知道
LOVEE
发表于 2018-11-4 00:19:18
..
szbin
发表于 2018-11-4 16:59:29
朕想知道
cclovepython
发表于 2018-11-5 08:17:18
朕想知道
老司基
发表于 2018-11-6 15:27:13
朕想知道
斯林
发表于 2018-11-6 15:46:45
朕想知道
掩耳盗驴
发表于 2018-11-7 16:48:22
朕想知道
binggod
发表于 2018-11-7 17:02:57
朕想知道
暗夜之隐
发表于 2018-11-7 22:04:39
import urllib.request
from bs4 import BeautifulSoup as bs
import re
import openpyxl
def urlopen(url):
head = {}
head['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
head['Accept-Language'] = 'zh-CN,zh;q=0.9'
head['Cache-Control'] = 'no-cache'
head['Connection'] = 'keep-alive'
head['Cookie']='bid=Wv1u2my5GJI; gr_user_id=ec943490-8875-40fe-b5b9-538d784cbf84; _vwo_uuid_v2=D6C966AC33758154BD3FC61FB43687FE2|718456dd9cd6126870e9d38c3a11a25e; douban-fav-remind=1; viewed="26820803_1200840_30209224"; ps=y; dbcl2="186505260:SSyljm2guj8"; push_noty_num=0; push_doumail_num=0; ck=giUI; ap_v=0,6.0; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1541515329%2C%22https%3A%2F%2Ffishc.com.cn%2Fthread-94979-1-1.html%22%5D; _pk_ses.100001.4cf6=*; __utma=30149280.1606666109.1528767205.1540645031.1541515329.9; __utmb=30149280.0.10.1541515329; __utmc=30149280; __utmz=30149280.1541515329.9.4.utmcsr=fishc.com.cn|utmccn=(referral)|utmcmd=referral|utmcct=/thread-94979-1-1.html; __utma=223695111.832413198.1541515329.1541515329.1541515329.1; __utmb=223695111.0.10.1541515329; __utmc=223695111; __utmz=223695111.1541515329.1.1.utmcsr=fishc.com.cn|utmccn=(referral)|utmcmd=referral|utmcct=/thread-94979-1-1.html; __yadk_uid=VJnac125pQedgBMAqBXBVd9hGRBckHeH; _pk_id.100001.4cf6=607e87647801a894.1541515329.1.1541515370.1541515329.'
head['Host']='movie.douban.com'
head['Pragma']='no-cache'
head['Upgrade-Insecure-Requests']='1'
head['User-Agent']='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
req = urllib.request.Request(url,headers = head)
html = urllib.request.urlopen(req)
html = html.read()
return html
def xia():
ye =0
wb = openpyxl.Workbook()
ws = wb.active
for i in range(10):
url = 'https://movie.douban.com/top250?start={}&filter='.format (ye)
ye = ye+25
html = urlopen(url)
html = html.decode('utf-8')
htm = bs(html,'lxml')
data = htm.ol
da = str(data)
url_name = re.findall(r'(href=".*?)">\n<span class="title">(.*?)<',da)
dao =re.findall(r'\n (.*?)<br',da)
pin = re.findall(r'property="v:average">(.*?)<',da)
for i in range(25):
print('电影名:'+url_name)
print('链接:'+url_name)
print('导演:'+dao)
print('电影评分'+pin+'\n\n')
ws.append(,pin,url_name,dao])
wb.save('电影.xlsx')
xia()
wy6616753
发表于 2018-11-8 16:10:45
怎么才能从txt文件中读出来以后,再写到excel中呢
水瓶座
发表于 2018-11-10 20:37:29
牛逼{:10_256:}
sslas
发表于 2018-11-15 21:00:09
朕想知道
dengwenxzuan
发表于 2018-11-19 15:09:04
“朕想知道
qingfengjd
发表于 2018-11-19 16:36:36
I love FishC.com!
月满霜华
发表于 2018-11-20 22:09:01
朕想知道
肖嘉远
发表于 2018-11-21 09:39:05
朕想知道
xingyunlz
发表于 2018-11-27 09:20:13
那个小明
发表于 2018-12-1 19:41:55
z