利用zoomeye API获取查询结果

  1. ACCESS_TOKEN
  2. GET_DATA
  3. GET_PAGE
  4. GET_DATA
  5. CODE

之前写过一个zoomeye的爬虫,使用selenium模拟浏览器来进行爬取数据的,后来发现有zoomeyeAPI参考手册 这个东西,就拿过来看看。

要想通过api获取数据,主要包含以下几个步骤:

  • 获取认证的access_token值
  • 获取查询返回的信息
  • 将返回的信息修改成自己想要的格式存储到文件当中

ACCESS_TOKEN

官方文档是这样写的,要想获取这个token值,将你的账号密码post提交给login的网址,会返回一个字典,这里要注意一下,我们提交的data格式要为json格式,要用到json.dumps函数
access_token

1
2
3
4
5
6
7
8
9
10
11
12
import requests
import json

url = "https://api.zoomeye.org/user/login"

data = {
"username":"YOUR_USER",
"password":"YOUR_PASS"
}

r = requests.post(url,data=json.dumps(data))
print(r.content)

返回的结果如下,我们做一个小小的处理:

1
2
3
{"access_token": "eyJhxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
return json.loads(r.text)['access_token']
eyJhxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

文档还说明了我们要设定一个http头:

1
2
3
4
headers = {
"Authorization" : "JWT <access_token>" ,
"Content-Type": "application/json"
}

GET_DATA

获取到token值之后,解释一下,搜索资源一共有两种

1
2
主机设备搜索:GET /host/search
Web应用搜索:GET /web/search

我们这里拿最近曝出漏洞的weblogic作为例子:

1
https://api.zoomeye.org/web/search?query=weblogic&fact=app,os

开始查询:

1
2
3
4
5
6
7
8
9
10
import requests
url = "https://api.zoomeye.org/web/search?query=weblogic&fact=app,os"

headers = {
"Authorization" : "JWT <access_token>" ,
"Content-Type": "application/json"
}
r = requests.get(url,headers=headers)
page_content = json.loads(r.content)
print(page_content)

我们分析一下page_content返回过来的json,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
{
u 'available': 57, u 'matches': [{
u 'description': u 'zcrab.com,alpha matrix,alpha matrix display,display,LCD,LCD display,dot matrix,visual basic,visual basic component,component,ocx,digit,character,source code,application,DisplayedString, alpha matrix display,vionics, LED, digital, instrumentation, software, technology, dial, graduated, microsoft, online, download, version, dial, number, representation, data, development, software, control, instruments, virtual instruments, virtual, GUI, graphic, user interface, interface Component, Components, ActiveX, EXE, DLL, VB, Visual Basic, Add-in, Addin, freeware, shareware,retail, Internet, Graphics, Reporting, VBX, OCX, download, resources,games, controls, In process, Out of Process, Stand alone, email, smtp,database, activex, component, components, internet, graphics, reports,SMTP, FTP, ftp, applications, windows, Windows, Windows 98, Windows 95,32-bit, 16-bit, Visual Basic, utilities, education, libraries, code, compression, source, software, multi-media, active, business, retail, free, trial,programming, audio, InnoVision, innovision, active-x, active x, 0-, -o, -0,animation, browser, bar code, tools, HTML, plug-ins, vbscript, scripting, script, collection, NT, nt,.net,activex,addin,addon,asp,bea,business objects,c#,c++,c++builder,cbd,clx,code,com,com+,com +,component,components,compnnent,component,componentsource,control,controls,corba,custom control,dcom,delphi,development tools,dll,download,ejb,gui,it events,it news,j2ee,java,javabeans,jbuilder,libraries,mts,net,objects,ocx,reusable components,reuse software,saveit,save-it,software,tools,ui,vb,vba,vbx,vcl,visual,visual basic,visual studio,weblogic,weblogic workshop,widgets,windows ce,windowsce,xml,build buy reuse sell components',
u 'language': [u 'PHP'],
u 'title': u 'Zcrab.com - OCX Component - Component software in thailand',
u 'ip': [u '203.151.24.12'],
u 'waf': [],
u 'component': [{
u 'version': u '5.2.17',
u 'name': u 'PHP',
u 'chinese': u 'PHP'
}],
u 'system': [],
u 'site': u 'zcrab.com',
u 'db': [],
u 'headers': u 'HTTP/1.1 200 OK\r\nDate: Mon, 29 Apr 2019 20:30:54 GMT\r\nServer: Apache\r\nX-Powered-By: PHP/5.2.17\r\nConnection: close\r\nTransfer-Encoding: chunked\r\nContent-Type: text/html\r\n',
u 'keywords': u 'zcrab.com,alpha matrix,alpha matrix display,display,LCD,LCD display,dot matrix,visual basic,visual basic component,component,ocx,digit,character,source code,application,DisplayedString, alpha matrix display,vionics, LED, digital, instrumentation, software, technology, dial, graduated, microsoft, online, download, version, dial, number, representation, data, development, software, control, instruments, virtual instruments, virtual, GUI, graphic, user interface, interface Component, Components, ActiveX, EXE, DLL, VB, Visual Basic, Add-in, Addin, freeware, shareware,retail, Internet, Graphics, Reporting, VBX, OCX, download, resources,games, controls, In process, Out of Process, Stand alone, email, smtp,database, activex, component, components, internet, graphics, reports,SMTP, FTP, ftp, applications, windows, Windows, Windows 98, Windows 95,32-bit, 16-bit, Visual Basic, utilities, education, libraries, code, compression, source, software, multi-media, active, business, retail, free, trial,programming, audio, InnoVision, innovision, active-x, active x, 0-, -o, -0,animation, browser, bar code, tools, HTML, plug-ins, vbscript, scripting, script, collection, NT, nt,.net,activex,addin,addon,asp,bea,business objects,c#,c++,c++builder,cbd,clx,code,com,com+,com +,component,components,compnnent,component,componentsource,control,controls,corba,custom control,dcom,delphi,development tools,dll,download,ejb,gui,it events,it news,j2ee,java,javabeans,jbuilder,libraries,mts,net,objects,ocx,reusable components,reuse software,saveit,save-it,software,tools,ui,vb,vba,vbx,vcl,visual,visual basic,visual studio,weblogic,weblogic workshop,widgets,windows ce,windowsce,xml,build buy reuse sell components',
u 'framework': [],
u 'timestamp': u '2019-04-30T04:42:26.614246',
u 'geoinfo': {
u 'city': {
u 'geoname_id': 1609350,
u 'names': {
u 'zh-CN': u '\u66fc\u8c37',
u 'en': u 'Bangkok'
}
},
u 'country': {
u 'geoname_id': 1605651,
u 'code': u 'TH',
u 'names': {
u 'zh-CN': u '\u6cf0\u56fd',
u 'en': u 'Thailand'
}
},
u 'isp': u 'Internet Thailand Company Limited',
u 'continent': {
u 'geoname_id': 6255147,
u 'code': u 'AS',
u 'names': {
u 'zh-CN': u '\u4e9a\u6d32',
u 'en': u 'Asia'
}
},
u 'subdivisions': {
u 'geoname_id': 1609348,
u 'code': u '10',
u 'names': {
u 'zh-CN': u '',
u 'en': u 'Bangkok'
}
},
u 'location': {
u 'lat': 13.75,
u 'lon': 100.5167
},
u 'organization': u 'Internet Thailand Company Limited',
u 'aso': u 'Internet Thailand Company Limited',
u 'asn': 4618
},
u 'webapp': [],
u 'server': [{
u 'version': None,
u 'name': u 'Apache httpd',
u 'chinese': u 'Apache httpd'
}],
u 'domains': []
}, {
u 'description': u '',
......
],
u 'domains': [u 'www.gf-iet.com', u 'down.itsvse.com', u 'www.haocax.com', u 'message', u 'www.v-running.com', u 'tmi.yokogawa.com', u 'www.yibizi.com', u 'blog.ask3.cn', u 'www.51qqt.com', u 'www.xndiguo.com', u 'www.51.la', u 'www.wyzc.com']
}], u 'facets': {}, u 'total': 192

首先这是一个字典,有available(可用),matches(匹配的),我们主要是关注这个matches中的信息,每一个matches又是一个列表,列表中有getinfo,language,title,ip等信息,这样我们就能想到如何去处理我们想要的相关信息了。

我们首先来看一下有多少个matches,就是能够获取多少个匹配我们查询的结果

1
print(len(page_content['matches']))	//20

那么我们明明有大于20个可用的,为什么只返回给了我们20个?
这里和zoomeye界面版查询的结果,说明我们api获取的也是一页20个数据,默认从第一页开始,这样我们之前的查询url要添加一个page参数了:
parm

1
https://api.zoomeye.org/web/search?query=weblogic&page=1

这里我们想要取出我们想要的所有结果,就要计算出总的page数了。

GET_PAGE

我们新定义一个函数,get_page()

首先我们获取总数:

1
2
total = int(page_content['total'])
192

然后判断我们的page:

1
2
3
4
5
total = int(page_content['total'])
page = total / 20
if page % 20 == 0:
print(page)
print(page+1)

这样我们这个函数就写好了:

1
2
3
4
5
6
def get_page(page_content):
total = int(page_content['total'])
page = total / 20
if page % 20 == 0:
return(page)
return(page + 1)

GET_DATA

当我们之前的准备工作都做好了之后,我们就可以从返回的数据中获取我们想要的信息了,我这里就只需要获取ip就行了,其他的照做。

1
ip = page_content['matches'][0]['ip'][0].encode('utf-8')

将我们的信息获取到之后,我们就可以写入文件中了。

CODE

https://github.com/SherLocZ/zoomeye_api


转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论,也可以邮件至 sher10cksec@foxmail.com

文章标题:利用zoomeye API获取查询结果

本文作者:sher10ck

发布时间:2019-04-30, 10:11:44

最后更新:2020-01-13, 13:01:42

原始链接:http://sherlocz.github.io/2019/04/30/zoomeye-api/

版权声明: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。

目录