【Python爬虫】获取某电影网站电影排行，主要是JSON的应用。-吾爱汇编-防破解,反调试,反汇编,软件安全,逆向分析-52hb.com

小涩席 发表于 2020-3-15 20:46

【Python爬虫】获取某电影网站电影排行，主要是JSON的应用。

如题：主要是Python中爬虫对于JSON数据的采集和清洗。
其中运用到了列表、字典、键值对、文件操作、目录创建判断等。
代码如下：

# -*- coding : "UTF-8" -*-
# 学习豆瓣网JSON数据提取
# Author：XSX
# Python3.8 PyCharm Community Edition 2019.3.3

import requests
import json
import os

def UrlAdd():
URllists = []
url = "https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=20&page_start={}"
for i in range(16):
   urls = url.format(i * 20)
   URllists.append(urls)
print(URllists)
return URllists

def GetJson(URllists, headers):
ContentLists = []
for URllist in URllists:
   r = requests.get(URllist, headers=headers)
   r.encoding = r.apparent_encoding
   results = json.loads(r.text)
   for i in results['subjects']:
         contents = {}
         contents['电影名'] = i['title']
         contents['评分'] = i['rate']
         contents['链接'] = i['url']
         contents['图片地址'] = i['cover']
         ContentLists.append(contents)
print("采集所有电影完成！")
print("正在开始准备写入文件····")
return ContentLists

def SaveCVS(ContenLists):
if not os.path.exists('./DouBan'):
   os.mkdir('./DouBan')
try:
   os.remove('./DouBan/MV.csv')
except:
   pass
with open('./DouBan/MV.csv', 'a')as f:
   f.write('电影名, 评分, 链接, 图片地址' + '\n')
   for ContenList in ContenLists:
         f.write(ContenList['电影名'] + ',' + ContenList['评分'] + ',' + ContenList['链接'] + ',' + ContenList['图片地址'] + '\n')
   print('文件已写入完成！')

if __name__ == '__main__':
headers = {
   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
   'Cookie': '用自己浏览器中获取的'
}
SaveCVS(GetJson(UrlAdd(), headers))

1946010 发表于 2020-3-16 10:39

感谢分享，来看看

wj710000 发表于 2020-4-12 00:15

这个有用，学习了

zhengchaoit2020 发表于 2020-4-23 15:04

好好学习啊

水涧无形 发表于 2020-4-28 10:33

刚好入迷，学习了

aqw729 发表于 2020-8-30 17:42

支持支持哦感谢分享

wjdcq 发表于 2020-11-23 10:48

来学习一下

gesq32957 发表于 2022-3-1 01:22

谢谢分享

EPdkrKb710 发表于 2022-3-3 00:54

谢谢分享

CQPyO618 发表于 2022-3-3 01:04

我现在已经把楼主作为我的学习目标了！

页: [1] 2 3 4 5 6 7 8

吾爱汇编's Archiver

【Python爬虫】获取某电影网站电影排行，主要是JSON的应用。