[已解决]请教: 如何解析出Json数据

payton24 · 发表于 2018-1-19 10:23:59

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

目标URL： http://quotes.toscrape.com/js/
目标内容：网页内容，如：The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” by Albert Einstein
问题：去除头部字符串后，无法生成合适的Json数据。

import requests
import json
from bs4 import BeautifulSoup
url = "http://quotes.toscrape.com/js/"
r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
print(r.status_code)
a = soup.findAll("script")[1].get_text()
b = a[a.find("{"):a.find("];")-1]
print(b)
json_data = json.loads(b)

复制代码

报错信息如下：

200
Traceback (most recent call last):
File "F:\01_Python\try.py", line 1878, in <module>
json_data = json.loads(b)
File "C:\Python36\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Python36\lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 14 column 6 (char 450)

复制代码

打印b的格式如下，为了简便起见，删除了里面若干行材料：

{
"tags": [
"change",
"deep-thoughts",
"thinking",
"world"
],
"author": {
"name": "Albert Einstein",
"goodreads_link": "/author/show/9810.Albert_Einstein",
"slug": "Albert-Einstein"
},
"text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d"
},
{
"tags": [
"abilities",
"choices"
],
"author": {
"name": "J.K. Rowling",
"goodreads_link": "/author/show/1077326.J_K_Rowling",
"slug": "J-K-Rowling"
},
"text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d"
},
{
"tags": [
"humor",
"obvious",
"simile"
],
"author": {
"name": "Steve Martin",
"goodreads_link": "/author/show/7103.Steve_Martin",
"slug": "Steve-Martin"
},
"text": "\u201cA day without sunshine is like, you know, night.\u201d"
}

复制代码

所以现在卡在无法将b转变成合适的json数据，请教各位大神帮帮忙，谢谢！
另外，已经尝试过一些简单的验证，包括验证b的头部和尾部，替代\n和空格，但依然报错。

最佳答案

月排行榜 / 总排行榜

sky

2018-1-19 10:30:38

本帖最后由 sky 于 2018-1-19 10:36 编辑

数据的格式是字典组成的列表
你的b的字符串是一个大括号包着的许多字典就成了集合set
是没法loads的

b = a[a.find("["):a.find("];")+1]

复制代码

用原本的格式试试

跳转到最佳答案楼层

sky · 发表于 2018-1-19 10:30:38

本帖最后由 sky 于 2018-1-19 10:36 编辑

数据的格式是字典组成的列表
你的b的字符串是一个大括号包着的许多字典就成了集合set
是没法loads的

b = a[a.find("["):a.find("];")+1]

复制代码

用原本的格式试试

payton24 · 发表于 2018-1-19 11:54:12

sky 发表于 2018-1-19 10:30
数据的格式是字典组成的列表
你的b的字符串是一个大括号包着的许多字典就成了集合set
是没法loads的

原来的格式指的是直接json_data = json.loads（r.text）吗？
这样子也会直接报错：

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

复制代码

sky · 发表于 2018-1-19 13:31:19

payton24 发表于 2018-1-19 11:54
原来的格式指的是直接json_data = json.loads（r.text）吗？
这样子也会直接报错：

我的意思不是 html源码是var data = 后面的[]

payton24 · 发表于 2018-1-19 16:21:04

本帖最后由 payton24 于 2018-1-19 16:22 编辑

sky 发表于 2018-1-19 13:31
我的意思不是 html源码是var data = 后面的[]

感谢前辈，这样子问题就转变成将对应的字符串转变成字典组成的列表。
我查阅了不少资料，最终使用

b = a[a.find("["):a.find("];")+1]
c = eval[b] #将字符串转变成列表

复制代码

获取第一个内容时使用：

c[0]['text']

复制代码

结果为：

'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'

复制代码

结果外层是单引号‘’，然后次外层是双引号“”，再使用：
c[0]['text'][1:-1]

结果为：

'The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.'

复制代码

总的来说，能完成功能，但是有更加简洁的写法吗？希望能指点一下。

sky · 发表于 2018-1-19 16:29:19

payton24 发表于 2018-1-19 16:21
感谢前辈，这样子问题就转变成将对应的字符串转变成字典组成的列表。
我查阅了不少资料，最终使用

data = json.loads(b)
quotes = [i["text"] for i in data]

复制代码

为什么放弃了json呢
eval不要用太危险了
万一里面有import os;os.system("xxxx")你就洗咯

payton24 · 发表于 2018-1-19 17:19:12

sky 发表于 2018-1-19 16:29
为什么放弃了json呢
eval不要用太危险了
万一里面有import os;os.system("xxxx")你就洗咯

eval() 函数可编译并执行任何 JavaScript 代码，原来这么危险啊，谢谢了。又学到了不少知识。

又恶补了一下json的基础知识，根据json-python转换表，本例使用json.loads()时，由string→unicode。
所以原string里面的内容是列表格式，就转变成python的列表格式。
如果原string里面的内容是字典格式，对应就会转成python的字典格式。
应该是这样子理解吧?

sky · 发表于 2018-1-19 17:27:18

payton24 发表于 2018-1-19 17:19
eval() 函数可编译并执行任何 JavaScript 代码，原来这么危险啊，谢谢了。又学到了不少知识。

又恶补 ...

eval执行字符串形式的python代码不是js代码
其他json.loads是load string的简写就是把string形式的json数据加载成python数据类型
百度多搜搜就理解了大体没问题
其实这里我也刚弄明白没多久字典组成的列表类似数据库的记录模拟的就是数据库存数据的格式
如果你了解过可以帮你理解没有就算了

payton24 · 发表于 2018-1-19 17:39:34

sky 发表于 2018-1-19 17:27
eval执行字符串形式的python代码不是js代码
其他json.loads是load string的简写就是把string ...

谢谢，感觉要学的东西越来越多了，数据库我才刚开始碰。
MongoDB，Mysql，暂时先搞定这两个。

账号		自动登录	找回密码
密码			立即注册