信息收集总结

Yatming的博客2025-09-302025-09-30

外网打点—信息收集总结

[0]自动化信息收集阶段
	1.--工具自动化信息收集
[1]资产发现阶段
	1.--组织信息收集
	2.--主域名收集
	3.--子域名收集
[2]资产扩展阶段
	1.--端口收集
	2.--C段收集
[3]资产梳理阶段
	1.--测活+指纹识别
[4]自动化扫描阶段
	1.--漏洞扫描器测试
[5]重点目标针对收集阶段
	1.--架构信息收集
	2.--源码信息收集
	3.--网站基本信息
		(1)----语言
		(2)----数据库
		(3)----web容器
		(4)----操作系统
	4.--网站深度信息收集
		(1)----前端源码
		(2)----目录
		(3)----端口
		(4)----js接口
		(5)----快照
		(6)----插件
		(7)----旁站
	5.--网盘信息
	6.--社工信息
	7.--小程序信息
	8.--APP信息

自动化信息收集

你可以先使用自动化工具先帮你快速的收集一遍（运气好的话可以快速的帮你撕开一个口子），在自动化工具收集的同时，使用手工的方式进行搜索，然后可以考虑将两种方式的结果进行合并去重

ARL（灯塔）

项目地址：

1	https://github.com/ki9mu/ARL-plus-docker

nemo_go

项目地址：

1	https://github.com/hanc00l/nemo_go

收集目标企业信息

爱企查：https://www.aiqicha.com/
小蓝本：https://www.xiaolanben.com/
企查查：https://www.qcc.com/
天眼查：https://www.tianyancha.com/
风鸟：https://www.riskbird.com/

那么为什么要收集目标的组织结构呢？以hw为例，假设目标是一个大型集团，收集目标的组织结构有以下作用：

更全面的收集到目标的所有子公司，从而得到更全的主域名，这意味着有更广的攻击面
摸清了组织结构，可以采用自下而上的打法：A公司防守很严，暴露面少很难突破，那么如果收集了组织结构就可以去找A公司的子公司作为突破口，因为子公司一般没有其主公司安全性高，而且子公司与主公司有些资产（比如办公环境，内部业务环境等等）很有可能处于同一个内网，于是只要拿下了子公司的权限，就有可能通过内网横向移动拿下主公司的权限，从而轻松拿下整个目标
摸清了组织结构，还可以采用自上而下的打法：如果已经拿下了A公司的权限，也收集了组织结构信息，那么还可以进一步攻击A公司的子公司。因为A公司中很有可能就保存了子公司的一些重要敏感数据，密码凭据等等，而且内网也很可能是连通的，也可以尝试内网横向移动，从而扩大渗透成果，爽拿数据分和权限分

收集方法

股权收集法

以小米为例，使用爱企查看股权穿透图，收集子公司信息，(更具体的信息需要开会员)

同时可以用小米为关键字，然后在爱企查中搜索：

上述这些都是除开小米有限公司外的其他隶属于小米的资产，这些都可以作为目标资产进行收集。

这样我们就能摸清目标在整个主体中处于什么位置，以及与其他公司的关系，获得了更多的子公司名。接下来，只需手动筛选出所有的股权占比大于50%的子公司名字，全部记录下来存为company.txt，还有搞清楚目标在整个主体中处于什么位置，组织结构的收集就基本完成了

关键人物收集法

还可以从关键人物出发，去寻找更多子公司信息

比如从小米的关键人物入手，查看他代表的公司：

或者法人：

好吧法人也是雷军。。。

得到的结果能与上面提到的股权收集法互相补充

到此我们就得到了目标集团的所有股权占比较大的子公司名，保存为company.txt，后面有用

# company.txt 
# 为了便于演示，我就只列举几个

小米科技有限责任公司
小米通讯技术有限公司
......

ENScan_GO

该工具bug有点多。。。。

项目地址：

1	https://github.com/wgpsec/ENScan_GO

使用方式：

1	enscan-v2.0.0-windows-amd64.exe -v #使用-v参数生成配置文件

在生成的config.yaml进行配置

1	enscan-v2.0.0-windows-amd64.exe -n 小米 -invest 33 #对外投资占股33%的公司

主域名收集

收集主域名就是利用前面组织结构收集到的子公司名company.txt ，进一步分别收集他们公司的主域名，这是环环相扣的，只有组织结构收集收集得越全，收集到的主域名才会越多，攻击面才越广，才越有可能拿下目标。

上面的爱企查之类的网站也可以找一些资产，不过一般部分会跟icp进行重叠：

ICP备案

icp.py

import subprocess
import urllib.parse
import os
import time
import json
import random
from datetime import datetime


def get_target_file_input():
    """获取目标列表文件路径"""
    while True:
        file_path = input("请输入目标列表文件路径（每行一个目标）：").strip()
        if not file_path:
            print("文件路径不能为空！请重新输入：")
            continue
        if not os.path.exists(file_path):
            print(f"文件不存在：{file_path}！请重新输入：")
            continue
        if not os.path.isfile(file_path):
            print(f"{file_path} 不是有效文件！请重新输入：")
            continue
        with open(file_path, "r", encoding="utf-8") as f:
            targets = [line.strip() for line in f if line.strip()]
        if not targets:
            print(f"文件 {file_path} 中未找到有效目标！请检查文件内容：")
            continue
        print(f"\n成功读取目标列表（共 {len(targets)} 个有效目标，逐个处理）：")
        for i, target in enumerate(targets, 1):
            print(f"  {i}. {target}")
        return file_path, targets


BYPASS_HEADERS = [
    "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language: zh-CN,zh;q=0.9",
    "Connection: keep-alive",
    f"X-Forwarded-For: {random.randint(1, 254)}.{random.randint(1, 254)}.{random.randint(1, 254)}.{random.randint(1, 254)}"
]


def build_curl_command(resource_type, search_keyword, base_url="http://127.0.0.1:16181/query"):
    encoded_keyword = urllib.parse.quote(search_keyword, safe='')
    url = f"{base_url}/{resource_type}?search={encoded_keyword}&pageSize=1000"
    headers = BYPASS_HEADERS

    # 构建curl命令
    curl_cmd = f'curl "{url}" --max-time 60'
    for header in headers:
        curl_cmd += f' -H "{header}"'

    return curl_cmd


def log_to_file(message, resource_type, target=""):
    """日志记录"""
    log_dir = "request_logs"
    os.makedirs(log_dir, exist_ok=True)
    log_file = f"{log_dir}/single_requests_{datetime.now().strftime('%Y%m%d')}.log"
    target_prefix = f"[目标：{target}] " if target else ""
    with open(log_file, "a", encoding="utf-8") as f:
        f.write(f"[{datetime.now().strftime('%H:%M:%S')}] {target_prefix}{message}\n")


def extract_field(response_content, resource_type, target):
    if not response_content.strip():
        log_to_file(f"响应内容为空", resource_type, target)
        return [f"【{target}】{resource_type} 无有效数据"]

    try:
        json_data = json.loads(response_content)
        if json_data.get("code") != 200:
            log_to_file(f"响应失败（code：{json_data.get('code')}", resource_type, target)
            return [f"【{target}】{resource_type} 响应失败（code：{json_data.get('code')}）"]

        list_data = json_data["params"].get("list", [])
        field_map = {"web": "domain", "app": "serviceName", "mapp": "serviceName"}
        target_field = field_map[resource_type]
        fields = [item.get(target_field, "").strip() for item in list_data if item.get(target_field, "").strip()]

        if not fields:
            return [f"【{target}】{resource_type} 未找到有效{target_field}"]
        return fields

    except json.JSONDecodeError:
        log_to_file("JSON解析失败", resource_type, target)
        return [f"【{target}】{resource_type} 数据格式错误"]
    except Exception as e:
        log_to_file(f"提取失败：{str(e)}", resource_type, target)
        return [f"【{target}】{resource_type} 提取异常：{str(e)}"]


def run_single_request(resource_type, target, output_file):
    """单个请求处理（统一延迟为100-120秒）"""
    max_retries = 2
    retry_count = 0

    while retry_count <= max_retries:
        try:
            curl_cmd = build_curl_command(resource_type, target)
            log_to_file(f"执行请求：{curl_cmd}", resource_type, target)
            print(f"\n=== 处理目标【{target}】- 资源类型【{resource_type}】（重试：{retry_count}）===")
            print(f"执行命令：{curl_cmd}")

            result = subprocess.run(
                curl_cmd,
                shell=True,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                text=True,
                timeout=30
            )

            # WAF拦截检测
            if "安全验证" in result.stdout or "验证码" in result.stdout:
                log_to_file("被WAF拦截（检测到验证码）", resource_type, target)
                print("⚠️ 检测到WAF拦截，准备重试...")
                retry_count += 1
                if retry_count <= max_retries:
                    delay = random.randint(100, 120)  # 统一延迟范围
                    print(f"⌛ 等待{delay}秒后重试...")
                    time.sleep(delay)
                continue

            if result.returncode != 0:
                log_to_file(f"请求失败（状态码：{result.returncode}", resource_type, target)
                retry_count += 1
                if retry_count <= max_retries:
                    delay = random.randint(100, 120)  # 统一延迟范围
                    print(f"⚠️ 请求失败，剩余{max_retries - retry_count}次重试（等待{delay}秒）...")
                    time.sleep(delay)
                continue

            # 保存结果
            extracted = extract_field(result.stdout, resource_type, target)
            with open(output_file, "a+", encoding="utf-8") as f:
                f.write(f"\n{'=' * 50}\n")
                f.write(f"【{target}】{resource_type} 结果（{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}）\n")
                f.write(f"{'=' * 50}\n")
                f.write("\n".join(extracted) + "\n")

            print(f"✅ 【{target}】-【{resource_type}】结果已保存（{len(extracted)}条）")
            return True

        except subprocess.TimeoutExpired:
            log_to_file("请求超时", resource_type, target)
            retry_count += 1
            if retry_count <= max_retries:
                delay = random.randint(100, 120)  # 统一延迟范围
                print(f"⚠️ 请求超时，剩余{max_retries - retry_count}次重试（等待{delay}秒）...")
                time.sleep(delay)
            continue
        except Exception as e:
            log_to_file(f"请求异常：{str(e)}", resource_type, target)
            retry_count += 1
            if retry_count <= max_retries:
                delay = random.randint(100, 120)  # 统一延迟范围
                print(f"⚠️ 请求异常，剩余{max_retries - retry_count}次重试（等待{delay}秒）...")
                time.sleep(delay)
            continue

    print(f"❌ 【{target}】-【{resource_type}】请求失败（已达最大重试次数）")
    return False


def main():
    resource_config = [("web", "web_results.txt"), ("app", "app_results.txt"), ("mapp", "mapp_results.txt")]

    try:
        file_path, targets = get_target_file_input()
        print(f"\n配置：每次请求后延迟100-120秒\n")

        # 逐个处理目标和资源类型
        for target_idx, target in enumerate(targets, 1):
            print(f"📌 开始处理第 {target_idx}/{len(targets)} 个目标：{target}")

            for res_idx, (resource_type, output_file) in enumerate(resource_config):
                run_single_request(resource_type, target, output_file)

                # 资源类型间延迟（100-120秒）
                if res_idx < len(resource_config) - 1:
                    delay = random.randint(100, 120)
                    print(f"\n⌛ 等待{delay}秒后处理下一个资源类型...")
                    time.sleep(delay)

            # 目标间延迟（100-120秒）
            if target_idx < len(targets):
                delay = random.randint(100, 120)
                print(f"\n⌛ 等待{delay}秒后处理下一个目标...\n" + "-" * 80)
                time.sleep(delay)

        print(f"\n🎉 所有目标处理完成！")
        print(f"📁 结果文件：")
        for _, output_file in resource_config:
            print(f"   - {os.path.abspath(output_file)}")
        print(f"📊 日志目录：{os.path.abspath('request_logs')}")

    except Exception as e:
        print(f"\n❌ 程序异常终止：{str(e)}")
        exit(1)


if __name__ == "__main__":
    main()

将下图脚本红框中的50行左右的地址进行对应的更改

这里是对应修改你的icp项目启动的ip和端口，项目地址：

1	https://github.com/HG-ha/ICP_Query/releases

或者使用docker进行安装：

1	docker run -d -p 16181 yiminger/ymicp

用上面的python脚本运行之后会生成3个文件：

1
2
3

web_results.txt		#icp查询的域名
mapp_results.txt	#icp查询的小程序
app_results.txt		#icp查询的APP

结果类似于下图：

上述结果还是需要进行处理，使用脚本：

chuli_web.py

import re
import sys


def extract_pure_domains_ips(text):
    separator = r'[\r\n\s]+={50,}[\r\n\s]+'
    blocks = re.split(separator, text.strip())

    all_domains = set()
    all_ips = set()
    for block in blocks:
        block = block.strip()
        if not block:
            continue
        clean_block = re.sub(
            r'【[^】]+】web (结果（[^）]+）|未找到有效domain|响应失败（[^）]+）)',
            '',
            block
        ).strip()
        if not clean_block:
            continue
        ip_pattern = re.compile(
            r'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b')
        domain_pattern = re.compile(r'\b(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,6}\b', re.IGNORECASE)
        ips = ip_pattern.findall(clean_block)
        if ips:
            all_ips.update(ips)

        domains = domain_pattern.findall(clean_block)
        pure_domains = [d for d in domains if not ip_pattern.match(d)]
        if pure_domains:
            all_domains.update(pure_domains)
    return sorted(all_domains), sorted(all_ips)


def main():
    if len(sys.argv) != 2:
        print("用法: python chuli_web.py <输入文件路径>")
        sys.exit(1)

    input_path = sys.argv[1]
    try:
        with open(input_path, 'rb') as f:
            text = f.read().decode('utf-8', errors='ignore').strip()
        domains, ips = extract_pure_domains_ips(text)
        print("=" * 50)
        print("纯域名列表（共{}个）".format(len(domains)))
        print("=" * 50)
        for domain in domains:
            print(domain)

        print("\n" + "=" * 50)
        print("纯IP列表（共{}个）".format(len(ips)))
        print("=" * 50)
        for ip in ips:
            print(ip)
        with open('domain.txt', 'w', encoding='utf-8') as f:
            f.write('\n'.join(domains))
        with open('ip.txt', 'w', encoding='utf-8') as f:
            f.write('\n'.join(ips))

        print(f"\n✅ 提取完成！")
        print(f"域名已保存到：domain.txt（{len(domains)}个）")
        print(f"IP已保存到：ip.txt（{len(ips)}个）")

    except FileNotFoundError:
        print(f"❌ 错误：找不到文件「{input_path}」")
    except Exception as e:
        print(f"❌ 处理出错：{str(e)}")


if __name__ == "__main__":
    main()

结果保存的文件是：domain.txt 和 ip.txt，对应的还有小程序和app的也是差不多的写法，当然小程序和app你也可以不用进行处理，因为小程序和app的资产只能手测所以不处理也可以。

子域名收集

网络测绘引擎

FOFA

1	domain ="主域名" \|\| cert="公用名" \|\| cert="组织名1" \|\| \|\| cert="组织名2" \|\| cert="组织名3" ......

我习惯使用fofa查询，其他网络测绘搜索引擎也可以

domain是通过域名查，能查到web资产与非web资产，cert是通过证书查，只能查web资产，二者使用” || “结合拼接，会得到更多的子域名资产

综合语法

1	这里用华为为例：huaweiyun.com

使用domain可以查询到10条，如果加上cert证书查询语法呢？

1	domain="huaweiyun.com" \|\| cert="huaweiyun.com"

增加了12个资产，翻了一倍。

刚才我们只是使用了domain =主域名 || cert=公用名还没有考虑到其他公用名，和更多的组织名！

注意！公用名大多数时候就是网站的主域名或子域名，而同一个主域名的资产可能有多个组织名，于是我们都要尽可能多的找出来

怎么找公用名和组织名呢？我的方法是手动翻找https站点的证书

比如，点击查看几个https站点的证书详情

1 2	huaweicloud.com Huawei Software Technologies Co., Ltd.

继续拼接到fofa的cert语法中去，看看会发生什么？

1	domain="huaweiyun.com" \|\| cert="huaweicloud.com" \|\| cert="Huawei Software Technologies Co., Ltd."

直接到1千万。不得不说是真的很全，但是这么多资产也不方便测试，这里只是说可以用这种cert查询方式收集到更多的资产，扩大攻击面，实际肯定要根据情况而定，要学会灵活变通。像华为这种超级大型企业，资产本身就不少，所以不太适合使用证书查询，因为太多了，更多的适用于资产在几千条以内的中小型企业、政党单位、高校等等

批量查询

上面通过icp的方式得到了很多的主域名，然后在使用fofa进行搜索：

写了一个脚本，读取domain.txt，直接批量拼接成**domain =”主域名1” || cert=”公用名1” || domain =”主域名2” || cert=”公用名2” || domain =”主域名3” || cert=”公用名3” ||……**这种格式，再丢到fofa去查询

为什么这样拼接呢，就是因为公用名大多数时候就是其主域名或子域名，使用||连接就能查到更多的资产了

脚本-join.py

# 读取 domain.txt 文件并拼接数据
def read_and_concatenate(file_path):
    try:
        with open(file_path, 'r') as file:
            # 读取文件中的每一行，去掉空格和换行符
            lines = [line.strip() for line in file if line.strip()]

        # 使用 ' || ' 连接每行数据
        result = ' || '.join([f'domain="{line}" || cert="{line}"' for line in lines])

        print(result)
    except FileNotFoundError:
        print(f"文件 {file_path} 未找到。")
    except Exception as e:
        print(f"发生错误: {e}")

# 调用函数并传入文件路径
read_and_concatenate('domain.txt')

使用这个工具将上面生成的结果，放到里面进行搜索，我一般是搜索两个，一个是一年内的一个是一年之外的，如果你有账号就可以全部导出，但是如果没有账号的话，买的key只能获取到1w条

FOFA导出的数据，ip不需要进行处理只需要在电子表格中进行去重就好了，域名需要使用脚本进行处理一下：

format.py

import re
import chardet


# 自动检测文件编码
def detect_encoding(filename):
    with open(filename, 'rb') as file:
        result = chardet.detect(file.read())
    return result['encoding']


# 读取文件并处理内容
def process_file(input_file, subdomain_file, url_file):
    encoding = detect_encoding(input_file)

    # 打开input.txt文件并读取每一行
    with open(input_file, 'r', encoding=encoding) as file:
        lines = file.readlines()

    # 准备两个列表来存储不同的输出内容
    subdomains = []
    urls = []

    # 定义一个正则表达式来匹配特定的IP地址URL
    specific_ip_url_pattern = re.compile(r'http[s]?://127\.0\.0\.1(:\d+)?/?')

    # 处理每一行数据
    for line in lines:
        # 去除两端的空格和换行符
        line = line.strip()

        # 检查URL是否是特定的IP地址URL
        if specific_ip_url_pattern.match(line):
            # 如果是，则直接写入url.txt
            urls.append(line)
        else:
            # 检查URL是否基于IP地址，如果是则跳过
            if re.match(r'http[s]?://\d{1,3}(\.\d{1,3}){3}(:\d+)?/?', line):
                continue  # 如果当前行是基于IP的URL，跳过不处理

            # 检查并处理每种情况
            if line.startswith('http://'):
                subdomains.append(line[7:])  # 从'http://'后面开始截取
                urls.append(line)  # 保持原样
            elif line.startswith('https://'):
                subdomains.append(line[8:])  # 从'https://'后面开始截取
                urls.append(line)  # 保持原样
            else:
                subdomains.append(line)  # 没有协议头，直接使用
                urls.append('http://' + line)  # 添加'http://'
                urls.append('https://' + line)  # 添加"https://"

    # 将处理后的数据写入相应的文件
    with open(subdomain_file, 'w', encoding=encoding) as file:
        for subdomain in subdomains:
            file.write(subdomain + '\n')

    with open(url_file, 'w', encoding=encoding) as file:
        for url in urls:
            file.write(url + '\n')


# 调用处理函数
process_file('input.txt', 'subdomain.txt', 'url.txt')

还有一点需要注意，就是有些站点只能使用https访问，有些又只能使用http访问，有些都可以，而且使用http访问原本是https的站点还有可能绕过waf，于是整理格式生成url.txt时，最好把每个域名都添加上http+https，这样资产更全，但是也可根据实际情况而定，只要http或者只要https的，只需在脚本中注释即可

到此我们就通过domain.txt得到了 subdomain-1.txt url-1.txt ip-1.txt了保存备用，后面会合并去重

英文证书查询

除了fofa中的证书查询，还可以使用crt.sh查询网站查询证书为英文名的资产，这样也能找到很多隐蔽资产！二者可以相互补充，并且这种方式可以得到大量公用名、组织名，也可以手动拼接上述语法中去！

cert_sh.py

这个里面是用最开始用icp找到的domain进行收集。

import re
import requests


def extract_unique_common_names(url, target, result_file):
    try:
        # 自定义 User-Agent 头
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
        }

        # 获取页面内容
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # 检查是否请求成功
        html_content = response.text

        # 提取 <TD> 标签中以 .cn, .com, .org, .net 结尾的内容
        common_names = re.findall(r'<TD.*?>([^<]*\.(?:cn|com|org|net))</TD>', html_content, re.S)

        # 去重处理
        unique_common_names = set(common_names)

        # 输出提取的公用名到控制台
        if unique_common_names:
            print(f"\n===== 针对 {target} 的查询结果 =====")
            print("Extracted Unique Common Names:")
            for name in unique_common_names:
                print(name.strip())  # 去掉前后的空格

            # 仅将域名写入文件，不包含其他提示信息
            with open(result_file, "a", encoding="utf-8") as f:
                for name in unique_common_names:
                    f.write(f"{name.strip()}\n")
        else:
            print(f"\n===== 针对 {target} 的查询结果 =====")
            print("No Common Names found.")
    except requests.RequestException as e:
        print(f"\n===== 针对 {target} 的查询出错 =====")
        print(f"Error fetching URL: {e}")


def batch_query_from_file():
    # 结果文件名称
    result_file = "domain_results.txt"

    # 清空之前的结果文件
    with open(result_file, "w", encoding="utf-8") as f:
        pass  # 只打开文件并清空内容

    # 从urls.txt文件中读取域名和IP
    try:
        with open("domain.txt", "r", encoding="utf-8") as file:
            # 读取所有行并去除空行和空格
            targets = [line.strip() for line in file if line.strip()]

            if not targets:
                print("urls.txt文件中没有有效的域名或IP")
                return

            # 遍历每个目标进行查询
            for target in targets:
                # 构造查询URL
                url = f"https://crt.sh/?q={target}"
                # 调用提取函数，传入结果文件参数
                extract_unique_common_names(url, target, result_file)

    except FileNotFoundError:
        print("找不到urls.txt文件，请确保该文件存在于当前目录下")
    except Exception as e:
        print(f"处理文件时发生错误: {e}")


# 执行批量查询
batch_query_from_file()

格式处理，还是如法炮制，把这些域名复制下来，使用刚才的格式处理format.txt脚本生成带有协议头的url资产和不带协议头的子域名资产

到此我们又得到了subdomain-2.txt url-2.txt

oneforall

1 2	python oneforall.py --target huaweiyun.com run python oneforall.py --targets domain.txt run

格式处理，以huaweiyun.com单个目标做演示，实际是对整个domain.txt中的主域名批量找子域名，但是如果太多也可以自由取舍。

oneforall的结果就比较规范了，不需要处理了

到此我们就又通过domain.txt得到了 subdomain-3.txt url-2.txt ip-2.txt

端口收集

上面收集了大量的主域名，子域名，已经得到了大量初步资产，但是我们都知道一个服务器可以开65535个端口，对于有些不富裕的目标来说，他们往往会在一个服务器上开放多个端口，部署多个web资产，于是端口收集就显得特别重要，这能帮我们进一步打开攻击面，发现更多的比较隐蔽的资产。于是我们对前面收集整理到的ip.txt进行全端口扫描！这里就体现了前面为什么要整理好格式了

端口扫描的话，我们使用fscan来做，同时还能进行基础的漏洞扫描和弱口令爆破

1	fscan.exe -hf ip.txt -t 3000 -p 1-65535 -num 100 -np -o result.txt

扫描时间比较长，建议放在国外的VPS上后台运行完成

扫描完成之后，fscan的输出格式比较混乱，不便于进行数据整理，于是还可以使用下fscanoutput.py这个脚本，进行扫描结果归类整理

1	python fscanOutput.py result.txt

工具

1 2	fscan：https://github.com/shadow1ng/fscan fscanoutput：https://github.com/ZororoZ/fscanOutput

C段收集

但是，我们有时候也不要忘记收集，C段，为什么呢？因为很多大型目标购买ip资产时都是按照C段来买的，比如220.111.222.1/24这个C段可能就是全部属于某一个大型目标的，同时，在一些情况下，有些ip资产不易被常规的收集方法找到，这会导致我们漏掉一些隐蔽的资产。比如有些ip资产既没有证书，也没有配置域名，那么这种情况，我们无脑地去扫描目标的几个C段，探测该C段下，有哪些ip存活，存活的ip又有哪些端口开放……这样就又有可能发现一些更加隐蔽的ip资产，从而发现更多潜在的攻击面！但是如果目标是小型目标，那么不推荐收集C段，因为在同一个C段中，属于该目标的ip资产压根就不多，很容易打偏！

那么怎么收集目标的C段呢？

推荐使用Eeyes根据目标的域名列表，整理目标的C段，这里又用到了前面收集整理的 subdomain.txt

该工具会优先排除架设有CDN的资产，然后再进行C段整理

1	Eeyes -l subdomain.txt

根据整理结果手动选择几个存活ip比较多的C段，保存为c.txt

拿到这些C段之后，下一步干什么呢，当然是端口扫描啊，也就是与上面的ip.txt，合并为ip_c.txt 还是交给fscan全端口扫描（这一步可以不做，可以直接合并成ip_c_subdomain.txt）

最后，再补充一点，就是有些情况下，资产严格绑定了域名，就是比如域名是test.com:8080，其ip是111.222.333.444:8080，但是访问111.222.333.444:8080是无法访问的，于是为了万无一失，还可以把subdomain.txt也加进去，进行全端口扫描

1	fscan.exe -hf ip_c_subdomain.txt -t 3000 -p 1-65535 -num 100 -np -o result.txt

等待fscan扫描完成后，就有可能发现更多的web资产，再把他们合并入url.txt中，这样操作下来，url.txt的资产就更全了！

工具

Eeyes：https://github.com/EdgeSecurityTeam/Eeyes

HOST碰撞

当然，既然是”进一步收集更多隐蔽的资产”，那肯定少不了HOST碰撞。前面不是已经收集整理到了 subdomain.txt 以及 ip_c.txt 吗，那我们再把subdomain.txt 去批量解析一下域名，记录下无法正常解析的域名，就作为待碰撞的域名字典，ip_c.txt一般不算太多，于是可以直接全部拿去碰撞，或者手工筛选出几个也是OK的

脚本-domain_auth.py

import socket

def check_domain_resolution(domain):
    try:
        # 尝试获取域名对应的IP地址
        ip = socket.gethostbyname(domain)
        return True
    except socket.gaierror:
        # 如果发生解析错误，说明该域名无法解析
        return False

def main():
    # 读取子域名文件
    input_file = 'subdomain.txt'
    output_file = 'result.txt'

    with open(input_file, 'r') as file:
        domains = [line.strip() for line in file.readlines()]

    # 检查每个域名的解析情况
    unresolved_domains = []
    for domain in domains:
        if not check_domain_resolution(domain):
            print(f"无法解析的域名: {domain}")
            unresolved_domains.append(domain)

    # 将无法解析的域名写入到result.txt
    if unresolved_domains:
        with open(output_file, 'w') as result_file:
            for domain in unresolved_domains:
                result_file.write(f"{domain}\n")
        print(f"解析异常的域名已写入 {output_file}")
    else:
        print("所有域名均能解析")

if __name__ == '__main__':
    main()

经过HOST碰撞后，运气好的话直接就撕开口子了，可能发现一些敏感脆弱的资产

工具

1 2	HostCollision：https://github.com/pmiaowu/HostCollision Hosts_scan：https://github.com/fofapro/Hosts_scan

测活+指纹识别

目前我们已经得到了经过了整合的url.txt，为了更全一点，我们把之前整理的ip.txt也加入进去，加入之前，先给每一个ip加上http协议头

ipadd.py

# 打开输入文件
with open("ip.txt", "r") as input_file:
    # 读取文件内容并按行分割
    ip_addresses = input_file.read().splitlines()

# 打开输出文件
with open("output.txt", "w") as output_file:
    # 遍历每个 IP 地址
    for ip in ip_addresses:
        # 添加 http:// 前缀
        http_ip = "http://" + ip
        # 添加 https:// 前缀
        https_ip = "https://" + ip
        # 将结果写入输出文件
        output_file.write(http_ip + "\n")
        output_file.write(https_ip + "\n")

然后再次对url.txt进行去重，就得到了web.txt–经过了多次整理得到的最终的web总资产

然后，下一步是什么呢？这些大量web资产不一定都可访问，不一定都是存活的，于是我们需要进行存活探测，于是使用一款指纹识别的工具，在指纹识别的同时也就做了存活探测了，而且指纹的识别也对我们快速打点发现漏洞切入点有极大的帮助，识别出指纹，我们就能从大量杂乱无章的资产中，优先地看一些历史漏洞比较多的CMS，或者OA系统的资产，这样就有可能快速发现Nday，一把梭了~

使用TideFInger或者Ehole做指纹识别

1 2	TideFinger -uf web.txt -nobr -nopoc TideFinger -uf web.txt

1	ehole.exe finger -l web.txt

工具

1 2	TideFinger：https://github.com/TideSec/TideFinger_Go Ehole：https://github.com/EdgeSecurityTeam/EHole

JS

首推转子女神

1	https://github.com/Snow-Mountain-Passengers/Rotor-Goddess/

网盘信息

有时候网盘信息也有可能泄露敏感信息……

这里我把github也归类为”网盘”

Github语法
利用Github语法检索敏感信息

in:name test			  #仓库标题搜索含有关键字
in:descripton test         #仓库描述搜索含有关键字
in:readme test             #Readme文件搜素含有关键字
stars:>3000 test           #stars数量大于3000的搜索关键字
stars:1000..3000 test      #stars数量大于1000小于3000的搜索关键字
forks:>1000 test           #forks数量大于1000的搜索关键字
forks:1000..3000 test      #forks数量大于1000小于3000的搜索关键字
size:>=5000 test           #指定仓库大于5000k(5M)的搜索关键字
pushed:>2019-02-12 test    #发布时间大于2019-02-12的搜索关键字
created:>2019-02-12 test   #创建时间大于2019-02-12的搜索关键字
user:test                  #用户名搜素
license:apache-2.0 test    #明确仓库的 LICENSE 搜索关键字
language:java test         #在java语言的代码中搜索关键字
user:test in:name test     #组合搜索,用户名test的标题含有test的

site:Github.com smtp
site:Github.com smtp @qq.com
site:Github.com smtp @126.com
site:Github.com smtp @163.com
site:Github.com smtp @sina.com.cn
site:Github.com smtp password
site:Github.com String password smtp
site:Github.com smtp @baidu.com

site:Github.com sa password
site:Github.com root password
site:Github.com User ID=’sa’;Password
site:Github.com inurl:sql

site:Github.com svn
site:Github.com svn username
site:Github.com svn password
site:Github.com svn username password

site:Github.com password
site:Github.com ftp ftppassword
site:Github.com 密码
site:Github.com 内部