爬取补天公益厂商
之前写过补天爬虫的脚本,感觉写的比较烂,重新写个吧。
讲讲为什么要这么做,如下图你可以看到,当我们点击提交漏洞的时候,会重定向到一个厂商专属的资料网站(每个厂商的区别在于这个cid值),然后才能在这个网站中获取我们想要的信息。
思路就是这样了,开始写吧!
获取cid值
我们在项目大厅中点击公益SRC,分析下这个发送的请求:
用POST提交了参数,测试一下,返回了这个玩意:1
{"status":1,"info":"\u68d2\u68d2\u54d2","data":{"count":171,"current":1,"list":[{"company_id":"61539","company_name":"\u5317\u4eac\u5dc5\u5cf0\u6e05\u5f71\u5546\u8d38\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61535","company_name":"\u5409\u6797\u7701\u59d4\u7ec4\u7ec7\u90e8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61517","company_name":"\u6613\u65b9\u79d1\u8d38\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61514","company_name":"\u5409\u6797\u7701\u6167\u6d77\u79d1\u6280\u4fe1\u606f\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61509","company_name":"\u897f\u5b89\u6559\u80b2\u7535\u89c6\u53f0","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61506","company_name":"\u4e1c\u839e\u5e02\u77f3\u9f99\u6cf0\u5766\u7f51\u7edc\u7ecf\u8425\u90e8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61497","company_name":"\u5c71\u897f\u7701\u53d1\u5c55\u548c\u6539\u9769\u59d4\u5458\u4f1a","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61494","company_name":"\u9655\u897f\u7701\u7a0e\u52a1\u5c40","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61493","company_name":"\u8f66\u597d\u591a\u65e7\u673a\u52a8\u8f66\u7ecf\u7eaa\uff08\u5317\u4eac\uff09\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p0.qhimg.com\/t019b6f6338cec04620.png"},{"company_id":"61491","company_name":"\u5317\u4eac\u4eca\u59cb\u79d1\u6280\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61488","company_name":"\u6d59\u6c5f\u5357\u6e56\u91d1\u878d\u8d44\u4ea7\u4ea4\u6613\u4e2d\u5fc3\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61486","company_name":"\u5e7f\u5dde\u5e7f\u4e4b\u65c5\u56fd\u9645\u65c5\u884c\u793e\u80a1\u4efd\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61485","company_name":"\u73e0\u6d77\u5e02\u5353\u8f69\u79d1\u6280\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61484","company_name":"\u90d1\u5dde\u5e02\u57ce\u4e61\u5efa\u8bbe\u59d4\u5458\u4f1a","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61476","company_name":"\u5e7f\u4e1c\u9f99\u90a6\u7269\u6d41\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61471","company_name":"\u5317\u4eac\u534f\u548c\u533b\u9662","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61469","company_name":"\u8bfa\u8fbe\u6559\u80b2","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61468","company_name":"\u6cb3\u5357\u7701\u4fe1\u606f\u4e2d\u5fc3","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61466","company_name":"\u897f\u5b89\u91d1\u878d\u7535\u5b50\u7ed3\u7b97\u4e2d\u5fc3","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61465","company_name":"\u676d\u5dde\u8d1d\u8d2d\u79d1\u6280\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61463","company_name":"\u5408\u80a5\u5f7c\u5cb8\u4e92\u8054\u4fe1\u606f\u6280\u672f\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p0.qhimg.com\/t019d543937529f2155.jpg"},{"company_id":"61458","company_name":"\u82cf\u5dde\u5e02\u6c11\u5361\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61455","company_name":"\u4f9d\u6ce2\u7cbe\u54c1\uff08\u6df1\u5733\uff09\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61450","company_name":"\u6b66\u6c49\u6d77\u8baf\u79d1\u6280\u4f1a\u52a1\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61449","company_name":"\u5c71\u897f\u7701\u516c\u5171\u8d44\u6e90\u4ea4\u6613\u4e2d\u5fc3\uff08\u5c71\u897f\u7701\u653f\u52a1\u670d\u52a1\u4e2d\u5fc3\uff09","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61446","company_name":"\u6b66\u6c49\u5c14\u6e7e\u6587\u5316\u4f20\u64ad\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61445","company_name":"\u6210\u90fd\u5e02\u7b2c\u4e00\u4eba\u6c11\u533b\u9662","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61442","company_name":"\u6e29\u5dde\u5e02\u6c34\u5229\u5c40","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"},{"company_id":"61441","company_name":"\u5b9c\u5bb6\u7535\u5b50\u5546\u52a1\uff08\u4e2d\u56fd\uff09\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p0.qhimg.com\/t0114c4c991894ba083.png"},{"company_id":"61436","company_name":"\u5317\u4eac\u91cd\u8f7d\u667a\u5b50\u79d1\u6280\u6709\u9650\u516c\u53f8","avatar":"http:\/\/p1.qhmsg.com\/dm\/150_150_100\/t011655040b3ed000bf.jpg"}]}}
看见了吧,companyid值就是我们要获取的。
发送请求:1
2
3
4
5
6
7
8
9
10
11import requests
url = "https://butian.360.cn/Reward/pub"
data = {
"s":"1",
"p":"1", #这里的p代表page
"token":""
}
r = requests.post(url,data=data)
print(r.content)
返回的这段字符串,我们要如何去获取companyid值,可以用正则匹配,还有一中我们将str转换成dict类型进行操作,这里要用eval()函数:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20print(type(r.content))
#<type 'str'>
data = eval(r.content)
print(type(data))
#<type 'dict'>
print(data.values())
#获取字典中建的值
print(data.values()[2])
#这个字典中的第三个值又是一个新的字典
list = data.values()[2]['list']
print(list)
#获取新的字典中list的值,这个值又是一个列表
for i in range(len(list)):
print(list[i]['company_id'])
#最终这个可以打印出company_id的值
获取目标url
获取了cid之后,通过拼凑成完整的url,访问这个url,通过bs4来获取相关的信息,不过这里要注意,要查看这个信息必须先登录,登录可以用Post来请求发送cookie就可以了1
跳转url:https://butian.360.cn/Loo/submit?cid= + 获取的cid值
获取网站的url和名称:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17import requests
from bs4 import BeautifulSoup
url = "https://butian.360.cn/Loo/submit?cid=61485"
headers = {
"Cookie":"Your_cookie"
}
r = requests.get(url,headers=headers)
# print(r.content)
soup = BeautifulSoup(r.content,"lxml")
src_url = soup.select("#tabs > form > div.tabs-con.tabs-con-loo > ul > li:nth-of-type(3) > input")[0]['value']
src_name = soup.select("#inputCompy")[0]['value']
print(src_url)
print(src_name)
1、这里要添加你的cookie
2、(line11-12)取出列表中的第一个元素并获取value的属性值
源码
1 | ##-*-coding:utf-8-*- |
这里先贴一个,多进程不晓得为什么老是搞不出来,速度不快,先这样吧。
转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论,也可以邮件至 sher10cksec@foxmail.com
文章标题:爬取补天公益厂商
本文作者:sher10ck
发布时间:2019-01-14, 16:36:02
最后更新:2020-01-13, 12:59:32
原始链接:http://sherlocz.github.io/2019/01/14/butian-spider/版权声明: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。