diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 00000000..9ab3c9b8
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,6 @@
+# https://docs.github.com/en/github/building-a-strong-community/configuring-issue-templates-for-your-repository#configuring-the-template-chooser
+blank_issues_allowed: false # We have a blank template which assigns labels
+contact_links:
+ - name: Questions about using feapder?
+ url: "https://github.com/Boris-code/feapder/discussions"
+ about: Please see our guide on how to ask questions
\ No newline at end of file
diff --git a/.github/workflows/workflow.yml b/.github/workflows/workflow.yml
new file mode 100644
index 00000000..e69de29b
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 00000000..63d42cb0
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,15 @@
+# 贡献指南
+感谢你的宝贵时间。你的贡献将使这个项目变得更好!在提交贡献之前,请务必花点时间阅读下面的入门指南。
+
+## 提交 Pull Request
+1. Fork [此仓库](https://github.com/Boris-code/feapder.git),
+2. clone到本地,从 `develop` 创建分支,对代码进行更改。
+3. 请确保进行了相应的测试。
+4. 推送代码到自己Fork的仓库中。
+5. 在Fork的仓库中点击 Pull request 链接
+6. 点击「New pull request」按钮。
+7. 填写提交说明后,「Create pull request」。提交到`develop`分支。
+
+## License
+
+[MIT](./LICENSE)
diff --git a/README.md b/README.md
index 2ce95aec..7bde6250 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,8 @@
读音: `[ˈfiːpdə]`
-
+
+
## 文档地址
@@ -35,23 +36,30 @@
From PyPi:
-通用版
+精简版
+
+```shell
+pip install feapder
+```
+浏览器渲染版:
```shell
-pip3 install feapder
+pip install "feapder[render]"
```
完整版:
```shell
-pip3 install feapder[all]
+pip install "feapder[all]"
```
-通用版与完整版区别:
+三个版本区别:
-1. 完整版支持基于内存去重
+1. 精简版:不支持浏览器渲染、不支持基于内存去重、不支持入库mongo
+2. 浏览器渲染版:不支持基于内存去重、不支持入库mongo
+3. 完整版:支持所有功能
-完整版可能会安装出错,若安装出错,请参考[安装问题](question/安装问题)
+完整版可能会安装出错,若安装出错,请参考[安装问题](docs/question/安装问题.md)
## 小试一下
@@ -99,13 +107,56 @@ FirstSpider|2021-02-09 14:55:14,620|air_spider.py|run|line:80|INFO| 无任务,
1. start_requests: 生产任务
2. parse: 解析数据
+
+## 感谢以下代理赞助商
+
+### Rapidproxy代理
+
+
+
+
+
+
+
+
+
+### SWIFTPROXY
+
+
+
+
+
+
+
+
+
+### NovProxy
+
+
+
+
+
+
+
+
+
+
+## 参与贡献
+
+贡献之前请先阅读 [贡献指南](./CONTRIBUTING.md)
+
+感谢所有做过贡献的人!
+
+
+
+
+
## 爬虫工具推荐
1. 爬虫在线工具库:http://www.spidertools.cn
2. 爬虫管理系统:http://feapder.com/#/feapder_platform/feaplat
3. 验证码识别库:https://github.com/sml2h3/ddddocr
-
## 微信赞赏
如果您觉得这个项目帮助到了您,您可以帮作者买一杯咖啡表示鼓励 🍹
@@ -121,16 +172,16 @@ FirstSpider|2021-02-09 14:55:14,620|air_spider.py|run|line:80|INFO| 无任务,
| 知识星球:17321694 |
作者微信: boris_tm |
- QQ群号:485067374 |
+ QQ群号:521494615 |
|
|
- |
+ |
- 加好友备注:feapder
\ No newline at end of file
+ 加好友备注:feapder
diff --git a/docs/README.md b/docs/README.md
index b9a814d3..08ccb6aa 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -16,7 +16,7 @@
读音: `[ˈfiːpdə]`
-
+
## 文档地址
@@ -35,21 +35,29 @@
From PyPi:
-通用版
+精简版
```shell
-pip3 install feapder
+pip install feapder
+```
+
+浏览器渲染版:
+```shell
+pip install "feapder[render]"
```
完整版:
```shell
-pip3 install feapder[all]
+pip install "feapder[all]"
```
-通用版与完整版区别:
+三个版本区别:
+
+1. 精简版:不支持浏览器渲染、不支持基于内存去重、不支持入库mongo
+2. 浏览器渲染版:不支持基于内存去重、不支持入库mongo
+3. 完整版:支持所有功能
-1. 完整版支持基于内存去重
完整版可能会安装出错,若安装出错,请参考[安装问题](question/安装问题)
@@ -78,7 +86,7 @@ class FirstSpider(feapder.AirSpider):
if __name__ == "__main__":
FirstSpider().start()
-
+
```
直接运行,打印如下:
@@ -107,30 +115,30 @@ FirstSpider|2021-02-09 14:55:14,620|air_spider.py|run|line:80|INFO| 无任务,
3. 验证码识别库:https://github.com/sml2h3/ddddocr
-## 微信赞赏
+
## 学习交流
-
-
- | 知识星球:17321694 |
- 作者微信: boris_tm |
- QQ群号:485067374 |
-
-
+
+
+ | 知识星球:17321694 |
+ 作者微信: boris_tm |
+ QQ群号:521494615 |
+
+
- |
- |
- |
-
-
+
+ |
+ |
+
+
加好友备注:feapder
\ No newline at end of file
diff --git a/docs/_sidebar.md b/docs/_sidebar.md
index ef55dce7..bef51b37 100644
--- a/docs/_sidebar.md
+++ b/docs/_sidebar.md
@@ -38,6 +38,7 @@
* [海量数据去重-dedup](source_code/dedup.md)
* [报警及监控](source_code/报警及监控.md)
* [监控打点](source_code/监控打点.md)
+ * [自定义下载器](source_code/custom_downloader.md)
* 爬虫管理系统
* [简介及部署](feapder_platform/feaplat.md)
diff --git a/docs/feapder_platform/feaplat.md b/docs/feapder_platform/feaplat.md
index d69476e2..405f3e0c 100644
--- a/docs/feapder_platform/feaplat.md
+++ b/docs/feapder_platform/feaplat.md
@@ -26,6 +26,8 @@
## 功能概览
+暂时不支持 苹果电脑的Apple芯片
+
### 1. 项目管理
添加/编辑项目
@@ -95,11 +97,16 @@ worker节点根据任务动态生成,一个worker只运行一个任务实例
## 部署
-> 下面部署以centos为例, 其他平台docker安装方式可参考docker官方文档:https://docs.docker.com/compose/install/
+> 安装方式参考docker官方文档:https://docs.docker.com/compose/install/
### 1. 安装docker
-删除旧版本(可选,需要重装升级时执行)
+#### 1.1 centos系统
+
+> docker --version
+> 作者的docker版本为 20.10.12,低于此版本的可能会存在问题
+
+删除旧版本(可选,需要重装升级docker时执行)
```shell
yum remove docker docker-common docker-selinux docker-engine
@@ -118,14 +125,69 @@ yum install -y yum-utils device-mapper-persistent-data lvm2 && python2 /usr/bin/
curl -sSL https://get.daocloud.io/docker | sh
```
+启动docker服务
-
-启动
```shell
systemctl enable docker
systemctl start docker
```
+验证: 打开终端,输入
+
+```shell
+docker ps
+```
+
+#### 1.2 ubuntu系统
+
+```
+sudo apt update
+sudo apt install docker.io docker-compose
+```
+
+启动docker服务
+
+```shell
+sudo systemctl enable docker
+sudo systemctl start docker
+```
+
+验证: 打开终端,输入
+
+```shell
+sudo docker ps
+```
+
+#### 1.3 window系统
+
+访问下面的链接,下载Docker Desktop, 然后安装即可
+
+https://docs.docker.com/desktop/setup/install/windows-install/
+
+
+运行安装好的Docker Desktop
+
+验证: 打开cmd终端,输入
+
+```shell
+docker ps
+```
+
+#### 1.4 mac系统
+
+访问下面的链接,下载Docker Desktop, 然后安装即可
+
+https://docs.docker.com/desktop/setup/install/mac-install/
+
+
+运行安装好的Docker Desktop
+
+验证: 打开终端,输入
+```shell
+docker ps
+```
+
+
### 2. 安装 docker swarm
docker swarm init
@@ -133,7 +195,12 @@ systemctl start docker
# 如果你的 Docker 主机有多个网卡,拥有多个 IP,必须使用 --advertise-addr 指定 IP
docker swarm init --advertise-addr 192.168.99.100
-### 3. 安装docker-compose
+### 3. 安装docker-compose(非必须)
+一般安装完docker后,会自带 docker compose。可先输入下面的命令验证是否有改环境,若有则不需要安装
+``` shell
+docker compose
+```
+若无`docker compose`命令,则按照下面的安装
```shell
sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
@@ -144,6 +211,9 @@ sudo chmod +x /usr/local/bin/docker-compose
sudo curl -L "https://get.daocloud.io/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
```
+安装后输入`docker-compose`验证是否成功
+
+注:`docker-compose` 与 `docker compose` 两种命令用法一样,是一个东西,只不过不同版本的docker可能叫法不一
### 4. 部署feaplat爬虫管理系统
#### 预备项
@@ -153,13 +223,16 @@ yum -y install git
```
#### 1. 下载项目
+> 先按照下面命令拉取develop分支代码运行。
+> master分支不支持urllib3>=2.0版本,现在已经运行不起来了,但之前老用户不受影响。待后续测试好兼容性,不影响老用户后,会将develop分支合并到master
+
gitub
```shell
-git clone https://github.com/Boris-code/feaplat.git
+git clone -b develop https://github.com/Boris-code/feaplat.git
```
gitee
```shell
-git clone https://gitee.com/Boris-code/feaplat.git
+git clone -b develop https://gitee.com/Boris-code/feaplat.git
```
#### 2. 运行
@@ -168,6 +241,8 @@ git clone https://gitee.com/Boris-code/feaplat.git
```shell
cd feaplat
+docker compose up -d
+或者
docker-compose up -d
```
@@ -242,28 +317,9 @@ docker node ls
docker swarm leave
```
-## 拉取私有项目
-
-拉取私有项目需在git仓库里添加如下公钥
-
-```
-ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCd/k/tjbcMislEunjtYQNXxz5tgEDc/fSvuLHBNUX4PtfmMQ07TuUX2XJIIzLRPaqv3nsMn3+QZrV0xQd545FG1Cq83JJB98ATTW7k5Q0eaWXkvThdFeG5+n85KeVV2W4BpdHHNZ5h9RxBUmVZPpAZacdC6OUSBYTyCblPfX9DvjOk+KfwAZVwpJSkv4YduwoR3DNfXrmK5P+wrYW9z/VHUf0hcfWEnsrrHktCKgohZn9Fe8uS3B5wTNd9GgVrLGRk85ag+CChoqg80DjgFt/IhzMCArqwLyMn7rGG4Iu2Ie0TcdMc0TlRxoBhqrfKkN83cfQ3gDf41tZwp67uM9ZN feapder@qq.com
-```
-
-或在系统设置页面配置您的SSH私钥,然后在git仓库里添加您的公钥,例如:
-
-
-注意,公私钥加密方式为RSA,其他的可能会有问题
+## 使用
-生成RSA公私钥方式如下:
-```shell
-ssh-keygen -t rsa -C "备注" -f 生成路径/文件名
-```
-如:
-`ssh-keygen -t rsa -C "feaplat" -f id_rsa`
-然后一路回车,不要输密码
-
-最终生成 `id_rsa`、`id_rsa.pub` 文件,复制`id_rsa.pub`文件内容到git仓库,复制`id_rsa`文件内容到feaplat爬虫管理系统
+见 [FEAPLAT使用说明](feapder_platform/usage)
## 自定义爬虫镜像
@@ -355,18 +411,18 @@ SPIDER_IMAGE=my_feapder:1.0
## 学习交流
-
-
- | 知识星球:17321694 |
- 作者微信: boris_tm |
- QQ群号:750614606 |
-
-
+
+
+ | 知识星球:17321694 |
+ 作者微信: boris_tm |
+ QQ群号:521494615 |
+
+
- |
- |
- |
-
-
-
- 加好友备注:feaplat
+
+ |
+ |
+
+
+
+ 加好友备注:feapder
diff --git a/docs/feapder_platform/question.md b/docs/feapder_platform/question.md
index 15c31f11..78de0f2f 100644
--- a/docs/feapder_platform/question.md
+++ b/docs/feapder_platform/question.md
@@ -94,7 +94,62 @@ INFLUXDB_PORT_UDP=8089
rm -f /etc/localtime
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
-# 校对时间
+# 校对时间 方式1
clock --hctosys
+# 校对时间 方式2
+ntpdate 0.asia.pool.ntp.org
```
-
\ No newline at end of file
+
+## 我搭建了个集群,如何让主节点不跑任务
+
+在主节点上执行下面命令,将其设置成drain状态即可
+
+ docker node update --availability drain 节点id
+
+ ## Network 问题
+
+attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded
+ 
+
+1. 确定当前节点是不是Drain节点:docker node ls
+
+ 
+
+ 是则继续往下看,不是则在评论区留言
+
+1. 修复
+
+ ```
+ docker node update --availability active 节点id
+ docker node update --availability drain 节点id
+ ```
+
+原因是Drain节点,不能为其分配网络资源,需要先改成active,然后启动,之后在改回drain
+
+**若不是以上情况,可能是network内的可分配的ip满了(老版本feaplat会有这个问题),那么可继续往下看**
+
+1. 先检查feaplat目录下的docker-compost.yaml,翻到最后,看network相关配置是否为如下。若不是,则改成下面这样的。若下面指定的11 ip段和主机有冲突,可以写12、13等
+
+ ```
+ networks:
+ default:
+ name: feaplat
+ driver: overlay
+ attachable: true
+ ipam:
+ config:
+ - subnet: 11.0.0.0/8
+ gateway: 11.0.0.1
+ ```
+
+ 完整配置见:https://github.com/Boris-code/feaplat/blob/develop/docker-compose.yaml
+
+
+2. 改完后,需要删除之前的network,使其重新创建,命令如下:
+
+ ```
+ docker service ls -q | xargs docker service rm # 注意 这个会停止掉所有任务。
+ docker network rm feaplat # 删除网络
+ docker compose rm # 删除之前feaplat运行环境
+ docker compose up -d # 启动
+ ```
\ No newline at end of file
diff --git a/docs/feapder_platform/usage.md b/docs/feapder_platform/usage.md
index 100cd423..20e7bb12 100644
--- a/docs/feapder_platform/usage.md
+++ b/docs/feapder_platform/usage.md
@@ -31,7 +31,7 @@
1. 准备项目,项目结构如下:

-2. 压缩后上传:
+2. 压缩后上传:(推荐使用 `feapder zip` 命令压缩)

- 工作路径:上传的项目会被放到docker里的根目录下(跟你本机项目路径没关系),然后解压运行。因`feapder_demo.zip`解压后为`feapder_demo`,所以工作路径配置`/feapder_demo`
- 本项目没依赖,可以不配置`requirements.txt`
@@ -44,6 +44,30 @@

可以看到已经运行完毕
+
+## git方式拉取私有项目
+
+拉取私有项目需在git仓库里添加如下公钥
+
+```
+ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCd/k/tjbcMislEunjtYQNXxz5tgEDc/fSvuLHBNUX4PtfmMQ07TuUX2XJIIzLRPaqv3nsMn3+QZrV0xQd545FG1Cq83JJB98ATTW7k5Q0eaWXkvThdFeG5+n85KeVV2W4BpdHHNZ5h9RxBUmVZPpAZacdC6OUSBYTyCblPfX9DvjOk+KfwAZVwpJSkv4YduwoR3DNfXrmK5P+wrYW9z/VHUf0hcfWEnsrrHktCKgohZn9Fe8uS3B5wTNd9GgVrLGRk85ag+CChoqg80DjgFt/IhzMCArqwLyMn7rGG4Iu2Ie0TcdMc0TlRxoBhqrfKkN83cfQ3gDf41tZwp67uM9ZN feapder@qq.com
+```
+
+或在系统设置页面配置您的SSH私钥,然后在git仓库里添加您的公钥,例如:
+
+
+注意,公私钥加密方式为RSA,其他的可能会有问题
+
+生成RSA公私钥方式如下:
+```shell
+ssh-keygen -t rsa -C "备注" -f 生成路径/文件名
+```
+如:
+`ssh-keygen -t rsa -C "feaplat" -f id_rsa`
+然后一路回车,不要输密码
+
+最终生成 `id_rsa`、`id_rsa.pub` 文件,复制`id_rsa.pub`文件内容到git仓库,复制`id_rsa`文件内容到feaplat爬虫管理系统
+
## 爬虫监控
diff --git a/docs/images/aliyun_sale.jpg b/docs/images/aliyun_sale.jpg
deleted file mode 100644
index f7b42b1a..00000000
Binary files a/docs/images/aliyun_sale.jpg and /dev/null differ
diff --git a/docs/images/qingguo.jpg b/docs/images/qingguo.jpg
new file mode 100644
index 00000000..24331df2
Binary files /dev/null and b/docs/images/qingguo.jpg differ
diff --git a/docs/index.html b/docs/index.html
index a501a519..d1112896 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -75,8 +75,8 @@
function (hook) {
var header = [
'',
- '',
- '
',
+ '',
+ '
',
'',
'
'
].join('')
@@ -88,8 +88,8 @@
].join('')
hook.afterEach(function (html) {
// var isReadme = window.location.href.indexOf("README");
- var isReadme = 1 // 可以投放广告
- if (isReadme === -1) {
+ var isReadme = 0 // 可以投放广告
+ if (isReadme === 1) {
return header + html + footer
} else {
return html + footer
@@ -117,7 +117,7 @@
-
+
diff --git "a/docs/question/\350\277\220\350\241\214\351\227\256\351\242\230.md" "b/docs/question/\350\277\220\350\241\214\351\227\256\351\242\230.md"
index cbc84e3b..ade03f4d 100644
--- "a/docs/question/\350\277\220\350\241\214\351\227\256\351\242\230.md"
+++ "b/docs/question/\350\277\220\350\241\214\351\227\256\351\242\230.md"
@@ -21,7 +21,7 @@
delete_keys为需要删除的key,类型: 元组/bool/string,支持正则; 常用于清空任务队列,否则重启时会断点续爬,如写成`delete_keys=True`也是可以的
-1. 手动修改任务分数为小于当前时间搓的分数
+1. 手动修改任务分数为小于当前时间戳的分数

diff --git a/docs/source_code/Item.md b/docs/source_code/Item.md
index 3aafe547..e48218b9 100644
--- a/docs/source_code/Item.md
+++ b/docs/source_code/Item.md
@@ -102,6 +102,26 @@ class SpiderDataItem(Item):
self.title = self.title.strip()
```
+## 指定入库使用的pipelines
+
+```python
+
+from feapder import Item
+from feapder.pipelines.csv_pipeline import CsvPipeline
+
+
+class SpiderDataItem(Item):
+
+ __pipelines__ = [CsvPipeline()]
+
+ def __init__(self, *args, **kwargs):
+ # self.id = None
+ self.title = None
+```
+
+使用__pipelines__指定后,该item只会流经指定的pipelines处理
+
+
## 更新数据
采集过程中,往往会有些数据漏采或解析出错,如果我们想更新已入库的数据,可将Item转为UpdateItem
diff --git a/docs/source_code/UpdateItem.md b/docs/source_code/UpdateItem.md
index a461fad4..3036628a 100644
--- a/docs/source_code/UpdateItem.md
+++ b/docs/source_code/UpdateItem.md
@@ -1,6 +1,6 @@
# UpdateItem
-UpdateItem用于更新数据,继承至Item,所以使用方式基本与Item一致,下载只说不同之处
+UpdateItem用于更新数据,继承至Item,所以使用方式基本与Item一致,下面只说不同之处
## 更新逻辑
@@ -70,4 +70,4 @@ item = item.to_UpdateItem()
item.update_key = "title"
```
-**推荐方式1,直接改Item类,不用修改爬虫代码**
\ No newline at end of file
+**推荐方式1,直接改Item类,不用修改爬虫代码**
diff --git a/docs/source_code/custom_downloader.md b/docs/source_code/custom_downloader.md
new file mode 100644
index 00000000..eb7c8c05
--- /dev/null
+++ b/docs/source_code/custom_downloader.md
@@ -0,0 +1,300 @@
+# 自定义下载器
+
+下载器一共分为三种:**普通下载器**、**支持保持session的下载器**以及**浏览器渲染下载器**。默认已经在框架中内置,setting中的配置如下
+
+```
+DOWNLOADER = "feapder.network.downloader.RequestsDownloader" # 请求下载器
+SESSION_DOWNLOADER = "feapder.network.downloader.RequestsSessionDownloader"
+RENDER_DOWNLOADER = "feapder.network.downloader.SeleniumDownloader" # 渲染下载器
+```
+
+- session下载器当配置中`USE_SESSION = True`时会启用
+- 渲染下载器当使用浏览器下载功能时会启用
+
+这些下载器均为插件的形式,我们可以自定义
+
+## 自定义普通下载器
+
+1. 编写下载器。如在 `xxx-spider/downloader/my_downloader.py `下自定义了如下下载器
+
+ ```
+ import requests
+
+ from feapder.network.downloader.base import Downloader
+ from feapder.network.response import Response
+
+ class RequestsDownloader(Downloader):
+ def download(self, request) -> Response:
+ response = requests.request(
+ request.method, request.url, **request.requests_kwargs
+ )
+ # 将requests的response转化为feapder的Response 对象,方便后续解析时使用xpath、re等方法
+ response = Response(response)
+ return response
+ ```
+
+ 注:这里返回的response对象不强制要求为是feapder的Response。返回值会传到解析函数的response参数里,若返回的是文本,则接收到的也是文本。
+
+ 但为了代码可读性,建议将返回值转为feapder的Response后再返回。
+
+ 转feapder的Response的方式有如下几种
+
+ ```
+ # 方式1
+ # response参数为reqeusts的response
+ Response(response)
+
+ # 方式2
+ Response.from_text(text="html内容")
+ ```
+
+2. 在settings中指定下载器
+
+ ```
+ DOWNLOADER = "downloader.my_downloader.RequestsDownloader"
+ ```
+
+## 自定义session下载器
+
+1. 和普通下载器一样,都是继承`Downloader`,如何保持session,可自定义。代码示例 `xxx-spider/downloader/my_downloader.py `
+
+ ```
+ class RequestsSessionDownloader(Downloader):
+ session = None
+
+ @property
+ def _session(self):
+ if not self.__class__.session:
+ self.__class__.session = requests.Session()
+ # pool_connections – 缓存的 urllib3 连接池个数 pool_maxsize – 连接池中保存的最大连接数
+ http_adapter = HTTPAdapter(pool_connections=1000, pool_maxsize=1000)
+ # 任何使用该session会话的 HTTP 请求,只要其 URL 是以给定的前缀开头,该传输适配器就会被使用到。
+ self.__class__.session.mount("http", http_adapter)
+
+ return self.__class__.session
+
+ def download(self, request) -> Response:
+ response = self._session.request(
+ request.method, request.url, **request.requests_kwargs
+ )
+ response = Response(response)
+ return response
+ ```
+
+2. 在settings中指定下载器
+
+ ```
+ SESSION_DOWNLOADER = "downloader.my_downloader.RequestsSessionDownloader"
+ ```
+
+注意,这里要配置 `SESSION_DOWNLOADER`
+
+## 自定义浏览器渲染下载器
+
+1. 编写下载器 `xxx-spider/downloader/my_downloader.py `
+
+**若浏览器框架本身不支持多线程,但想在多线程中使用,如playwright使用,参考如下:**
+
+```
+import feapder.setting as setting
+import feapder.utils.tools as tools
+from feapder.network.downloader.base import RenderDownloader
+from feapder.network.response import Response
+from feapder.utils.webdriver import WebDriverPool, PlaywrightDriver
+
+
+class MyDownloader(RenderDownloader):
+ webdriver_pool: WebDriverPool = None
+
+ @property
+ def _webdriver_pool(self):
+ if not self.__class__.webdriver_pool:
+ self.__class__.webdriver_pool = WebDriverPool(
+ **setting.PLAYWRIGHT, driver_cls=PlaywrightDriver, thread_safe=True
+ )
+
+ return self.__class__.webdriver_pool
+
+ def download(self, request) -> Response:
+ # 代理优先级 自定义 > 配置文件 > 随机
+ if request.custom_proxies:
+ proxy = request.get_proxy()
+ elif setting.PLAYWRIGHT.get("proxy"):
+ proxy = setting.PLAYWRIGHT.get("proxy")
+ else:
+ proxy = request.get_proxy()
+
+ # user_agent优先级 自定义 > 配置文件 > 随机
+ if request.custom_ua:
+ user_agent = request.get_user_agent()
+ elif setting.PLAYWRIGHT.get("user_agent"):
+ user_agent = setting.PLAYWRIGHT.get("user_agent")
+ else:
+ user_agent = request.get_user_agent()
+
+ cookies = request.get_cookies()
+ url = request.url
+ render_time = request.render_time or setting.PLAYWRIGHT.get("render_time")
+ wait_until = setting.PLAYWRIGHT.get("wait_until") or "domcontentloaded"
+ if request.get_params():
+ url = tools.joint_url(url, request.get_params())
+
+ driver: PlaywrightDriver = self._webdriver_pool.get(
+ user_agent=user_agent, proxy=proxy
+ )
+ try:
+ if cookies:
+ driver.url = url
+ driver.cookies = cookies
+ driver.page.goto(url, wait_until=wait_until)
+
+ if render_time:
+ tools.delay_time(render_time)
+
+ html = driver.page.content()
+ response = Response.from_dict(
+ {
+ "url": driver.page.url,
+ "cookies": driver.cookies,
+ "_content": html.encode(),
+ "status_code": 200,
+ "elapsed": 666,
+ "headers": {
+ "User-Agent": driver.user_agent,
+ "Cookie": tools.cookies2str(driver.cookies),
+ },
+ }
+ )
+
+ response.driver = driver
+ response.browser = driver
+ return response
+ except Exception as e:
+ self._webdriver_pool.remove(driver)
+ raise e
+
+ def close(self, driver):
+ if driver:
+ self._webdriver_pool.remove(driver)
+
+ def put_back(self, driver):
+ """
+ 释放浏览器对象
+ """
+ self._webdriver_pool.put(driver)
+
+ def close_all(self):
+ """
+ 关闭所有浏览器
+ """
+ # 不支持
+ # self._webdriver_pool.close()
+ pass
+```
+
+这里使用了WebDriverPool,参数`thread_safe=True`,即要保证使用时的线程安全,确保同个浏览器对象只能被同一个线程调用
+
+**若浏览器框架本身支持多线程,如selenium,则参考如下**
+
+```
+import feapder.setting as setting
+import feapder.utils.tools as tools
+from feapder.network.downloader.base import RenderDownloader
+from feapder.network.response import Response
+from feapder.utils.webdriver import WebDriverPool, SeleniumDriver
+
+
+class MyDownloader(RenderDownloader):
+ webdriver_pool: WebDriverPool = None
+
+ @property
+ def _webdriver_pool(self):
+ if not self.__class__.webdriver_pool:
+ self.__class__.webdriver_pool = WebDriverPool(
+ **setting.WEBDRIVER, driver=SeleniumDriver
+ )
+
+ return self.__class__.webdriver_pool
+
+ def download(self, request) -> Response:
+ # 代理优先级 自定义 > 配置文件 > 随机
+ if request.custom_proxies:
+ proxy = request.get_proxy()
+ elif setting.WEBDRIVER.get("proxy"):
+ proxy = setting.WEBDRIVER.get("proxy")
+ else:
+ proxy = request.get_proxy()
+
+ # user_agent优先级 自定义 > 配置文件 > 随机
+ if request.custom_ua:
+ user_agent = request.get_user_agent()
+ elif setting.WEBDRIVER.get("user_agent"):
+ user_agent = setting.WEBDRIVER.get("user_agent")
+ else:
+ user_agent = request.get_user_agent()
+
+ cookies = request.get_cookies()
+ url = request.url
+ render_time = request.render_time or setting.WEBDRIVER.get("render_time")
+ if request.get_params():
+ url = tools.joint_url(url, request.get_params())
+
+ browser: SeleniumDriver = self._webdriver_pool.get(
+ user_agent=user_agent, proxy=proxy
+ )
+ try:
+ browser.get(url)
+ if cookies:
+ browser.cookies = cookies
+ # 刷新使cookie生效
+ browser.get(url)
+
+ if render_time:
+ tools.delay_time(render_time)
+
+ html = browser.page_source
+ response = Response.from_dict(
+ {
+ "url": browser.current_url,
+ "cookies": browser.cookies,
+ "_content": html.encode(),
+ "status_code": 200,
+ "elapsed": 666,
+ "headers": {
+ "User-Agent": browser.user_agent,
+ "Cookie": tools.cookies2str(browser.cookies),
+ },
+ }
+ )
+
+ response.driver = browser
+ response.browser = browser
+ return response
+ except Exception as e:
+ self._webdriver_pool.remove(browser)
+ raise e
+
+ def close(self, driver):
+ if driver:
+ self._webdriver_pool.remove(driver)
+
+ def put_back(self, driver):
+ """
+ 释放浏览器对象
+ """
+ self._webdriver_pool.put(driver)
+
+ def close_all(self):
+ """
+ 关闭所有浏览器
+ """
+ self._webdriver_pool.close()
+```
+
+2. 在settings中指定下载器
+
+```
+RENDER_DOWNLOADER = "downloader.my_downloader.MyDownloader"
+```
+
+注,这里要写`RENDER_DOWNLOADER`
\ No newline at end of file
diff --git a/docs/source_code/pipeline.md b/docs/source_code/pipeline.md
index 14dd7455..6a04dbf1 100644
--- a/docs/source_code/pipeline.md
+++ b/docs/source_code/pipeline.md
@@ -2,11 +2,26 @@
Pipeline是数据入库时流经的管道,用户可自定义,以便对接其他数据库。
-框架已内置mysql及mongo管道,其他管道作为扩展方式提供,可从[feapder_pipelines](https://github.com/Boris-code/feapder_pipelines)项目中按需安装
+框架已内置mysql、mongo、csv管道,其他管道作为扩展方式提供,可从[feapder_pipelines](https://github.com/Boris-code/feapder_pipelines)项目中按需安装
项目地址:https://github.com/Boris-code/feapder_pipelines
-## 使用方式
+## 选择内置的pipeline
+
+在配置文件 `setting.py` 中的 `ITEM_PIPELINES` 中启用:
+
+```python
+ITEM_PIPELINES = [
+ "feapder.pipelines.mysql_pipeline.MysqlPipeline",
+ # "feapder.pipelines.mongo_pipeline.MongoPipeline",
+ # "feapder.pipelines.csv_pipeline.CsvPipeline",
+ # "feapder.pipelines.console_pipeline.ConsolePipeline",
+]
+```
+
+然后 爬虫中`yield`的`item`会流经选择的pipeline自动存储
+
+## 自定义pipeline
注:item会被聚合成多条一起流经pipeline,方便批量入库
diff --git a/docs/source_code/proxy.md b/docs/source_code/proxy.md
index b961ecf0..de87845a 100644
--- a/docs/source_code/proxy.md
+++ b/docs/source_code/proxy.md
@@ -1,12 +1,13 @@
# 代理使用说明
-代理使用有两种方式
-1. 用框架内置的代理池
-2. 自己写
+代理使用有三种方式
+1. 使用框架内置代理池
+2. 自定义代理池
+3. 请求中直接指定
-## 1. 框架内置的代理池
+## 方式1. 使用框架内置代理池
-### 基本使用
+### 配置代理
在配置文件中配置代理提取接口
@@ -14,9 +15,10 @@
# 设置代理
PROXY_EXTRACT_API = None # 代理提取API ,返回的代理分割符为\r\n
PROXY_ENABLE = True
+PROXY_MAX_FAILED_TIMES = 5 # 代理最大失败次数,超过则不使用,自动删除
```
-要求API返回的代理格式为:
+要求API返回的代理格式为使用 /r/n 分隔:
```
ip:port
@@ -26,13 +28,11 @@ ip:port
这样feapder在请求时会自动随机使用上面的代理请求了
-### 高阶
+## 管理代理
-> 注意:高阶用法现在不太友好,后期会调整使用方式
+1. 删除代理(默认是请求异常连续5次,再删除代理)
-1. 标记代理失效或延时使用
-
- 例如在发生异常时处理代理
+ 例如在发生异常时删除代理
```python
import feapder
@@ -44,49 +44,48 @@ ip:port
print(response)
def exception_request(self, request, response):
-
- # request.proxies_pool.tag_proxy(request.requests_kwargs.get("proxies"), -1) # 废弃本次代理
- request.proxies_pool.tag_proxy(request.requests_kwargs.get("proxies"), 1, 30) # 延迟本次代理30秒后再使用
- ```
-
-1. 指定代理拉取时间间隔等
-
- 在代码头部给feapder.Request.proxies_pool重新赋值
-
- ```python
- import feapder
- from feapder.network.proxy_pool import ProxyPool
-
- proxy_pool= ProxyPool(reset_interval_max=180, reset_interval=5)
- feapder.Request.proxies_pool = proxy_pool
+ request.del_proxy()
+
```
- 相当于修改了代理池的默认参数值,更多参数看源码
+## 方式2. 自定义代理池
-1. 从redis里提取代理
+1. 编写代理池:例如在你的项目下创建个my_proxypool.py,实现下面的函数
```python
- import feapder
- from feapder.network.proxy_pool import ProxyPool
-
- proxy_pool = ProxyPool(
- proxy_source_url="redis://:passwd@host:ip/db", redis_proxies_key="proxies"
- )
- feapder.Request.proxies_pool = proxy_pool
+ from feapder.network.proxy_pool import BaseProxyPool
+
+ class MyProxyPool(BaseProxyPool):
+ def get_proxy(self):
+ """
+ 获取代理
+ Returns:
+ {"http": "xxx", "https": "xxx"}
+ """
+ pass
+
+ def del_proxy(self, proxy):
+ """
+ @summary: 删除代理
+ ---------
+ @param proxy: xxx
+ """
+ pass
```
-
- 要求redis使用zset集合存储代理,存储内容示例如下:
+
+3. 修改setting的代理配置
+
```
- ip:port
- ip:port
- ip:port
+ PROXY_POOL = "my_proxypool.MyProxyPool" # 代理池
```
- redis_proxies_key及为存储代理的key,每次拉取时会拉取全量
+ 将编写好的代理池配置进来,值为类的模块路径,需要指定到具体的类名
+
+
-## 2. 自己写
+## 方式3. 不使用代理池,直接给请求指定代理
-自己写就比较灵活,自己随机取个代理,然后给request赋值即可,例如在下载中间件里使用
+直接给request.proxies赋值即可,例如在下载中间件里使用
```python
import feapder
@@ -96,7 +95,7 @@ class TestProxy(feapder.AirSpider):
yield feapder.Request("https://www.baidu.com")
def download_midware(self, request):
- # 这里随机取个代理使用即可
+ # 这里使用代理使用即可
request.proxies = {"https": "https://ip:port", "http": "http://ip:port"}
return request
diff --git "a/docs/source_code/\346\212\245\350\255\246\345\217\212\347\233\221\346\216\247.md" "b/docs/source_code/\346\212\245\350\255\246\345\217\212\347\233\221\346\216\247.md"
index 023bd06f..87dbc695 100644
--- "a/docs/source_code/\346\212\245\350\255\246\345\217\212\347\233\221\346\216\247.md"
+++ "b/docs/source_code/\346\212\245\350\255\246\345\217\212\347\233\221\346\216\247.md"
@@ -1,5 +1,7 @@
# 报警及监控
+支持钉钉、飞书、企业微信、邮件报警
+
## 钉钉报警
条件:需要有钉钉群,需要获取钉钉机器人的Webhook地址
@@ -10,15 +12,19 @@

+或使用加签方式,然后在setting中设置密钥
+
相关配置:
```python
# 钉钉报警
DINGDING_WARNING_URL = "" # 钉钉机器人api
DINGDING_WARNING_PHONE = "" # 报警人 支持列表,可指定多个
+DINGDING_WARNING_ALL = False # 是否提示所有人, 默认为False
+DINGDING_WARNING_SECRET = None # 加签密钥
```
-## 微信报警
+## 企业微信报警
条件:需要企业微信群,并获取企业微信机器人的Webhook地址
@@ -39,6 +45,17 @@ WECHAT_WARNING_PHONE = "" # 报警人 将会在群内@此人, 支持列表,
WECHAT_WARNING_ALL = False # 是否提示所有人, 默认为False
```
+## 飞书报警
+
+可参考文档设置机器人:https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN#e1cdee9f
+
+然后在feapder的setting文件中修改如下配置
+
+```
+FEISHU_WARNING_URL = "" # 飞书机器人api
+FEISHU_WARNING_USER = None # 报警人 {"open_id":"ou_xxxxx", "name":"xxxx"} 或 [{"open_id":"ou_xxxxx", "name":"xxxx"}]
+FEISHU_WARNING_ALL = False # 是否提示所有人, 默认为False
+```
## 邮件报警
@@ -69,6 +86,20 @@ EMAIL_RECEIVER = "" # 收件人 支持列表,可指定多个
4. 将本邮箱账号添加到白名单中
+## Qmsg酱报警
+
+Qmsg酱是一个QQ消息推送机器人,用来通知自己消息的免费服务。
+
+可以参考文档:https://qmsg.zendee.cn/docs/api/
+
+```python
+# QMSG报警
+QMSG_WARNING_URL = "" # qmsg机器人api
+QMSG_WARNING_QQ = "" # 指定要接收消息的QQ号或者QQ群。多个以英文逗号分割,例如:12345,12346,支持列表,可指定多人
+QMSG_WARNING_BOT = "" # 机器人的QQ号
+```
+
+
## 报警间隔及报警级别
框架会对相同的报警进行过滤,防止刷屏,默认的报警时间间隔为1小时,可通过以下配置修改:
diff --git "a/docs/source_code/\346\265\217\350\247\210\345\231\250\346\270\262\346\237\223-Selenium.md" "b/docs/source_code/\346\265\217\350\247\210\345\231\250\346\270\262\346\237\223-Selenium.md"
index 665f5aed..089f9537 100644
--- "a/docs/source_code/\346\265\217\350\247\210\345\231\250\346\270\262\346\237\223-Selenium.md"
+++ "b/docs/source_code/\346\265\217\350\247\210\345\231\250\346\270\262\346\237\223-Selenium.md"
@@ -4,7 +4,7 @@
框架内置一个浏览器渲染池,默认的池子大小为1,请求时重复利用浏览器实例,只有当代理失效请求异常时,才会销毁、创建一个新的浏览器实例
-内置浏览器渲染支持 **CHROME** 、**PHANTOMJS**、**FIREFOX**
+内置浏览器渲染支持 **CHROME**、**EDGE**、**PHANTOMJS**、**FIREFOX**
## 使用方式:
@@ -14,7 +14,7 @@ def start_requests(self):
```
在返回的Request中传递`render=True`即可
-框架支持`CHROME`、`PHANTOMJS`、`FIREFOX` 三种浏览器渲染,可通过[配置文件](source_code/配置文件)进行配置。相关配置如下:
+框架支持`CHROME`、`EDGE`、`PHANTOMJS`、`FIREFOX` 三种浏览器渲染,可通过[配置文件](source_code/配置文件)进行配置。相关配置如下:
```python
# 浏览器渲染
@@ -24,7 +24,7 @@ WEBDRIVER = dict(
user_agent=None, # 字符串 或 无参函数,返回值为user_agent
proxy=None, # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
headless=False, # 是否为无头浏览器
- driver_type="CHROME", # CHROME 、PHANTOMJS、FIREFOX
+ driver_type="CHROME", # CHROME、EDGE、PHANTOMJS、FIREFOX
timeout=30, # 请求超时时间
window_size=(1024, 800), # 窗口大小
executable_path=None, # 浏览器路径,默认为默认路径
@@ -80,7 +80,7 @@ def download_midware(self, request):
}
return request
```
-
+
## 设置Cookie
通过 `feapder.Request`携带,如:
@@ -219,7 +219,7 @@ class TestRender(feapder.AirSpider):
user_agent=None, # 字符串 或 无参函数,返回值为user_agent
proxy=None, # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
headless=False, # 是否为无头浏览器
- driver_type="CHROME", # CHROME、PHANTOMJS、FIREFOX
+ driver_type="CHROME", # CHROME、EDGE、PHANTOMJS、FIREFOX
timeout=30, # 请求超时时间
window_size=(1024, 800), # 窗口大小
executable_path=None, # 浏览器路径,默认为默认路径
diff --git "a/docs/source_code/\351\205\215\347\275\256\346\226\207\344\273\266.md" "b/docs/source_code/\351\205\215\347\275\256\346\226\207\344\273\266.md"
index 547a6d16..e22be333 100644
--- "a/docs/source_code/\351\205\215\347\275\256\346\226\207\344\273\266.md"
+++ "b/docs/source_code/\351\205\215\347\275\256\346\226\207\344\273\266.md"
@@ -69,7 +69,7 @@
# user_agent=None, # 字符串 或 无参函数,返回值为user_agent
# proxy=None, # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
# headless=False, # 是否为无头浏览器
-# driver_type="CHROME", # CHROME、PHANTOMJS、FIREFOX
+# driver_type="CHROME", # CHROME、EDGE、PHANTOMJS、FIREFOX
# timeout=30, # 请求超时时间
# window_size=(1024, 800), # 窗口大小
# executable_path=None, # 浏览器路径,默认为默认路径
@@ -202,10 +202,10 @@
```python
import feapder
-
-
+
+
class SpiderTest(feapder.AirSpider):
__custom_setting__ = dict(
SPIDER_MAX_RETRY_TIMES=20,
)
-```
\ No newline at end of file
+```
diff --git a/docs/usage/AirSpider.md b/docs/usage/AirSpider.md
index 08c14185..71ac053c 100644
--- a/docs/usage/AirSpider.md
+++ b/docs/usage/AirSpider.md
@@ -243,7 +243,7 @@ def start_requests(self):
```
在返回的Request中传递`render=True`即可
-框架支持`CHROME`和`PHANTOMJS`两种浏览器渲染,可通过[配置文件](source_code/配置文件)进行配置。相关配置如下:
+框架支持`CHROME`、`EDGE`和`PHANTOMJS`浏览器渲染,可通过[配置文件](source_code/配置文件)进行配置。相关配置如下:
```python
# 浏览器渲染
@@ -253,7 +253,7 @@ WEBDRIVER = dict(
user_agent=None, # 字符串 或 无参函数,返回值为user_agent
proxy=None, # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
headless=False, # 是否为无头浏览器
- driver_type="CHROME", # CHROME 或 PHANTOMJS,
+ driver_type="CHROME", # CHROME、EDGE或PHANTOMJS,
timeout=30, # 请求超时时间
window_size=(1024, 800), # 窗口大小
executable_path=None, # 浏览器路径,默认为默认路径
@@ -282,7 +282,7 @@ class AirSpeedTest(feapder.AirSpider):
return request, response
def parse(self, request, response):
- print(response)
+ print(response)
if __name__ == "__main__":
@@ -314,7 +314,25 @@ class AirSpeedTest(feapder.AirSpider):
print(title)
```
-## 15. 完整的代码示例
+## 15. 主动停止爬虫
+
+```
+import feapder
+
+
+class AirTest(feapder.AirSpider):
+ def start_requests(self):
+ yield feapder.Request("http://www.baidu.com")
+
+ def parse(self, request, response):
+ self.stop_spider() # 停止爬虫,可以在任意地方调用该方法
+
+
+if __name__ == "__main__":
+ AirTest().start()
+```
+
+## 16. 完整的代码示例
AirSpider:https://github.com/Boris-code/feapder/blob/master/tests/air-spider/test_air_spider.py
diff --git a/docs/usage/TaskSpider.md b/docs/usage/TaskSpider.md
index 719f6481..5978dff9 100644
--- a/docs/usage/TaskSpider.md
+++ b/docs/usage/TaskSpider.md
@@ -31,6 +31,7 @@ from feapder import ArgumentParser
class TaskSpiderTest(feapder.TaskSpider):
# 自定义数据库,若项目中有setting.py文件,此自定义可删除
+ # redis 必须,mysql可选
__custom_setting__ = dict(
REDISDB_IP_PORTS="localhost:6379",
REDISDB_USER_PASS="",
@@ -43,7 +44,7 @@ class TaskSpiderTest(feapder.TaskSpider):
)
def add_task(self):
- # 加种子任务
+ # 加种子任务 框架会调用这个函数,方便往redis里塞任务,但不能写成死循环。实际业务中可以自己写个脚本往redis里塞任务
self._redisdb.zadd(self._task_table, {"id": 1, "url": "https://www.baidu.com"})
def start_requests(self, task):
@@ -69,7 +70,6 @@ def start(args):
task_keys=["id", "url"], # 表里查询的字段
redis_key="test:task_spider", # redis里做任务队列的key
keep_alive=True, # 是否常驻
- delete_keys=True, # 重启时是否删除redis里的key,若想断点续爬,设置False
)
if args == 1:
spider.start_monitor_task()
@@ -86,7 +86,7 @@ def start2(args):
task_table_type="redis", # 任务表类型为redis
redis_key="test:task_spider", # redis里做任务队列的key
keep_alive=True, # 是否常驻
- delete_keys=True, # 重启时是否删除redis里的key,若想断点续爬,设置False
+ use_mysql=False, # 若用不到mysql,可以不使用
)
if args == 1:
spider.start_monitor_task()
diff --git a/feapder/VERSION b/feapder/VERSION
index ff2fd4fb..7b0231f5 100644
--- a/feapder/VERSION
+++ b/feapder/VERSION
@@ -1 +1 @@
-1.8.5
\ No newline at end of file
+1.9.3
\ No newline at end of file
diff --git a/feapder/buffer/item_buffer.py b/feapder/buffer/item_buffer.py
index 874dcefa..35f9bb01 100644
--- a/feapder/buffer/item_buffer.py
+++ b/feapder/buffer/item_buffer.py
@@ -52,15 +52,28 @@ def __init__(self, redis_key, task_table=None):
# 'table_name': ['id', 'name'...] # 缓存table_name与__update_key__的关系
}
+ self._item_pipelines = {
+ # 'table_name': ['pipeline1', 'pipeline2'] # 缓存table_name与pipelines的关系
+ }
+
self._pipelines = self.load_pipelines()
self._have_mysql_pipeline = MYSQL_PIPELINE_PATH in setting.ITEM_PIPELINES
self._mysql_pipeline = None
if setting.ITEM_FILTER_ENABLE and not self.__class__.dedup:
- self.__class__.dedup = Dedup(
- to_md5=False, **setting.ITEM_FILTER_SETTING
- )
+ if setting.ITEM_FILTER_SETTING.get(
+ "filter_type"
+ ) == Dedup.BloomFilter or setting.ITEM_FILTER_SETTING.get("name"):
+ self.__class__.dedup = Dedup(
+ to_md5=False, **setting.ITEM_FILTER_SETTING
+ )
+ else:
+ self.__class__.dedup = Dedup(
+ to_md5=False,
+ name=self._redis_key,
+ **setting.ITEM_FILTER_SETTING,
+ )
# 导出重试的次数
self.export_retry_times = 0
@@ -208,7 +221,7 @@ def __pick_items(self, items, is_update_item=False):
将每个表之间的数据分开 拆分后 原items为空
@param items:
@param is_update_item:
- @return:
+ @return: 表名与数据的字典
"""
datas_dict = {
# 'table_name': [{}, {}]
@@ -223,22 +236,24 @@ def __pick_items(self, items, is_update_item=False):
if not table_name:
table_name = item.table_name
self._item_tables[item_name] = table_name
+ self._item_pipelines[table_name] = item.pipelines
+
+ if is_update_item and table_name not in self._item_update_keys:
+ self._item_update_keys[table_name] = item.update_key
if table_name not in datas_dict:
datas_dict[table_name] = []
datas_dict[table_name].append(item.to_dict)
- if is_update_item and table_name not in self._item_update_keys:
- self._item_update_keys[table_name] = item.update_key
-
return datas_dict
- def __export_to_db(self, table, datas, is_update=False, update_keys=()):
- for pipeline in self._pipelines:
+ def __export_to_db(self, table, datas, is_update=False, update_keys=(), used_pipelines=None):
+ pipelines = used_pipelines or self._pipelines # 优先采用指定的pipelines
+ for pipeline in pipelines:
if is_update:
if table == self._task_table and not isinstance(
- pipeline, MysqlPipeline
+ pipeline, MysqlPipeline
):
continue
@@ -258,7 +273,7 @@ def __export_to_db(self, table, datas, is_update=False, update_keys=()):
# 若是任务表, 且上面的pipeline里没mysql,则需调用mysql更新任务
if not self._have_mysql_pipeline and is_update and table == self._task_table:
if not self.mysql_pipeline.update_items(
- table, datas, update_keys=update_keys
+ table, datas, update_keys=update_keys
):
log.error(
f"{self.mysql_pipeline.__class__.__name__} 更新数据失败. table: {table} items: {datas}"
@@ -269,7 +284,7 @@ def __export_to_db(self, table, datas, is_update=False, update_keys=()):
return True
def __add_item_to_db(
- self, items, update_items, requests, callbacks, items_fingerprints
+ self, items, update_items, requests, callbacks, items_fingerprints
):
export_success = True
self._is_adding_to_db = True
@@ -278,7 +293,7 @@ def __add_item_to_db(
if setting.ITEM_FILTER_ENABLE:
items, items_fingerprints = self.__dedup_items(items, items_fingerprints)
- # 分捡
+ # 分捡(返回值包含 pipelines_dict)
items_dict = self.__pick_items(items)
update_items_dict = self.__pick_items(update_items, is_update_item=True)
@@ -286,6 +301,7 @@ def __add_item_to_db(
failed_items = {"add": [], "update": [], "requests": []}
while items_dict:
table, datas = items_dict.popitem()
+ used_pipelines = self._item_pipelines.get(table)
log.debug(
"""
@@ -296,13 +312,14 @@ def __add_item_to_db(
% (table, tools.dumps_json(datas, indent=16))
)
- if not self.__export_to_db(table, datas):
+ if not self.__export_to_db(table, datas, used_pipelines=used_pipelines):
export_success = False
failed_items["add"].append({"table": table, "datas": datas})
# 执行批量update
while update_items_dict:
table, datas = update_items_dict.popitem()
+ used_pipelines = self._item_pipelines.get(table)
log.debug(
"""
@@ -315,7 +332,7 @@ def __add_item_to_db(
update_keys = self._item_update_keys.get(table)
if not self.__export_to_db(
- table, datas, is_update=True, update_keys=update_keys
+ table, datas, is_update=True, update_keys=update_keys, used_pipelines=used_pipelines
):
export_success = False
failed_items["update"].append(
diff --git a/feapder/buffer/request_buffer.py b/feapder/buffer/request_buffer.py
index 22366e24..70677a94 100644
--- a/feapder/buffer/request_buffer.py
+++ b/feapder/buffer/request_buffer.py
@@ -28,14 +28,16 @@ def __init__(self, db=None, dedup_name: str = None):
self._db = db or MemoryDB()
if not self.__class__.dedup and setting.REQUEST_FILTER_ENABLE:
- if dedup_name:
+ if setting.REQUEST_FILTER_SETTING.get(
+ "filter_type"
+ ) == Dedup.BloomFilter or setting.REQUEST_FILTER_SETTING.get("name"):
self.__class__.dedup = Dedup(
- name=dedup_name, to_md5=False, **setting.REQUEST_FILTER_SETTING
- ) # 默认使用内存去重
+ to_md5=False, **setting.REQUEST_FILTER_SETTING
+ )
else:
self.__class__.dedup = Dedup(
- to_md5=False, **setting.REQUEST_FILTER_SETTING
- ) # 默认使用内存去重
+ to_md5=False, name=dedup_name, **setting.REQUEST_FILTER_SETTING
+ )
def is_exist_request(self, request):
if (
diff --git a/feapder/commands/cmdline.py b/feapder/commands/cmdline.py
index cb2a3187..91d0531e 100644
--- a/feapder/commands/cmdline.py
+++ b/feapder/commands/cmdline.py
@@ -11,6 +11,7 @@
import re
import sys
from os.path import dirname, join
+import os
import requests
@@ -77,6 +78,9 @@ def check_new_version():
if new_version:
version = f"feapder=={VERSION.replace('-beta', 'b')}"
tip = NEW_VERSION_TIP.format(version=version, new_version=new_version)
+ # 修复window下print不能带颜色输出的问题
+ if os.name == "nt":
+ os.system("")
print(tip)
except Exception as e:
pass
diff --git a/feapder/commands/create/create_table.py b/feapder/commands/create/create_table.py
index 2358da7f..15162782 100644
--- a/feapder/commands/create/create_table.py
+++ b/feapder/commands/create/create_table.py
@@ -141,8 +141,9 @@ def create(self, table_name):
unique=unique,
)
print(sql)
-
- if self._db.execute(sql):
+ result=self._db.execute(sql)
+ # 建立表成功。受影响的行数为 0,因此返回0
+ if result==0:
print("\n%s 创建成功" % table_name)
print("注意手动检查下字段类型,确保无误!!!")
else:
diff --git a/feapder/core/base_parser.py b/feapder/core/base_parser.py
index 6264b5ae..a06f9c44 100644
--- a/feapder/core/base_parser.py
+++ b/feapder/core/base_parser.py
@@ -13,6 +13,9 @@
from feapder.db.mysqldb import MysqlDB
from feapder.network.item import UpdateItem
from feapder.utils.log import log
+from feapder.network.request import Request
+from feapder.network.response import Response
+from feapder.utils.perfect_dict import PerfectDict
class BaseParser(object):
@@ -26,7 +29,7 @@ def start_requests(self):
pass
- def download_midware(self, request):
+ def download_midware(self, request: Request):
"""
@summary: 下载中间件 可修改请求的一些参数, 或可自定义下载,然后返回 request, response
---------
@@ -37,7 +40,7 @@ def download_midware(self, request):
pass
- def validate(self, request, response):
+ def validate(self, request: Request, response: Response):
"""
@summary: 校验函数, 可用于校验response是否正确
若函数内抛出异常,则重试请求
@@ -53,7 +56,7 @@ def validate(self, request, response):
pass
- def parse(self, request, response):
+ def parse(self, request: Request, response: Response):
"""
@summary: 默认的解析函数
---------
@@ -65,7 +68,7 @@ def parse(self, request, response):
pass
- def exception_request(self, request, response, e):
+ def exception_request(self, request: Request, response: Response, e: Exception):
"""
@summary: 请求或者parser里解析出异常的request
---------
@@ -78,7 +81,7 @@ def exception_request(self, request, response, e):
pass
- def failed_request(self, request, response, e):
+ def failed_request(self, request: Request, response: Response, e: Exception):
"""
@summary: 超过最大重试次数的request
可返回修改后的request 若不返回request,则将传进来的request直接人redis的failed表。否则将修改后的request入failed表
@@ -135,7 +138,7 @@ def add_task(self):
@result:
"""
- def start_requests(self, task):
+ def start_requests(self, task: PerfectDict):
"""
@summary:
---------
diff --git a/feapder/core/collector.py b/feapder/core/collector.py
index 4e063a7b..5b8ff652 100644
--- a/feapder/core/collector.py
+++ b/feapder/core/collector.py
@@ -63,7 +63,7 @@ def __input_data(self):
current_timestamp = tools.get_current_timestamp()
- # 取任务,只取当前时间搓以内的任务,同时将任务分数修改为 current_timestamp + setting.REQUEST_LOST_TIMEOUT
+ # 取任务,只取当前时间戳以内的任务,同时将任务分数修改为 current_timestamp + setting.REQUEST_LOST_TIMEOUT
requests_list = self._db.zrangebyscore_set_score(
self._tab_requests,
priority_min="-inf",
diff --git a/feapder/core/handle_failed_items.py b/feapder/core/handle_failed_items.py
index 09f1b95a..655330f5 100644
--- a/feapder/core/handle_failed_items.py
+++ b/feapder/core/handle_failed_items.py
@@ -58,7 +58,7 @@ def reput_failed_items_to_db(self):
for _data in datas:
item = UpdateItem(**_data)
item.table_name = table
- item.update_keys = update_keys
+ item.update_key = update_keys
self._item_buffer.put_item(item)
total_count += 1
diff --git a/feapder/core/parser_control.py b/feapder/core/parser_control.py
index 381a6e8a..021d2956 100644
--- a/feapder/core/parser_control.py
+++ b/feapder/core/parser_control.py
@@ -38,6 +38,8 @@ class ParserControl(threading.Thread):
_failed_task_count = 0
_total_task_count = 0
+ _hook_parsers = set()
+
def __init__(self, collector, redis_key, request_buffer, item_buffer):
super(ParserControl, self).__init__()
self._parsers = []
@@ -133,7 +135,7 @@ def deal_request(self, request):
)
)
used_download_midware_enable = True
- if not response:
+ if response is None:
response = (
request_temp.get_response()
if not setting.RESPONSE_CACHED_USED
@@ -236,6 +238,8 @@ def deal_request(self, request):
self.record_download_status(
ParserControl.DOWNLOAD_EXCEPTION, parser.name
)
+ if request.retry_times % setting.PROXY_MAX_FAILED_TIMES == 0:
+ request.del_proxy()
else:
# 记录解析程序异常
@@ -431,21 +435,19 @@ def stop(self):
def add_parser(self, parser: BaseParser):
# 动态增加parser.exception_request和parser.failed_request的参数, 兼容旧版本
- if len(inspect.getfullargspec(parser.exception_request).args) == 3:
- _exception_request = parser.exception_request
-
- def exception_request(request, response, e):
- return _exception_request(request, response)
-
- parser.exception_request = exception_request
-
- if len(inspect.getfullargspec(parser.failed_request).args) == 3:
- _failed_request = parser.failed_request
-
- def failed_request(request, response, e):
- return _failed_request(request, response)
+ if parser not in self.__class__._hook_parsers:
+ self.__class__._hook_parsers.add(parser)
+ if len(inspect.getfullargspec(parser.exception_request).args) == 3:
+ _exception_request = parser.exception_request
+ parser.exception_request = (
+ lambda request, response, e: _exception_request(request, response)
+ )
- parser.failed_request = failed_request
+ if len(inspect.getfullargspec(parser.failed_request).args) == 3:
+ _failed_request = parser.failed_request
+ parser.failed_request = lambda request, response, e: _failed_request(
+ request, response
+ )
self._parsers.append(parser)
@@ -543,7 +545,7 @@ def deal_request(self, request):
)
request = request_temp
- if not response:
+ if response is None:
response = (
request.get_response()
if not setting.RESPONSE_CACHED_USED
@@ -611,6 +613,8 @@ def deal_request(self, request):
self.record_download_status(
ParserControl.DOWNLOAD_EXCEPTION, parser.name
)
+ if request.retry_times % setting.PROXY_MAX_FAILED_TIMES == 0:
+ request.del_proxy()
else:
# 记录解析程序异常
diff --git a/feapder/core/scheduler.py b/feapder/core/scheduler.py
index 011c42d9..0177d185 100644
--- a/feapder/core/scheduler.py
+++ b/feapder/core/scheduler.py
@@ -17,8 +17,8 @@
from feapder.buffer.request_buffer import RequestBuffer
from feapder.core.base_parser import BaseParser
from feapder.core.collector import Collector
-from feapder.core.handle_failed_requests import HandleFailedRequests
from feapder.core.handle_failed_items import HandleFailedItems
+from feapder.core.handle_failed_requests import HandleFailedRequests
from feapder.core.parser_control import ParserControl
from feapder.db.redisdb import RedisDB
from feapder.network.item import Item
@@ -26,6 +26,7 @@
from feapder.utils import metrics
from feapder.utils.log import log
from feapder.utils.redis_lock import RedisLock
+from feapder.utils.tail_thread import TailThread
SPIDER_START_TIME_KEY = "spider_start_time"
SPIDER_END_TIME_KEY = "spider_end_time"
@@ -33,7 +34,7 @@
HEARTBEAT_TIME_KEY = "heartbeat_time"
-class Scheduler(threading.Thread):
+class Scheduler(TailThread):
__custom_setting__ = {}
def __init__(
@@ -122,8 +123,7 @@ def __init__(
setattr(setting, "SPIDER_THREAD_COUNT", thread_count)
self._thread_count = setting.SPIDER_THREAD_COUNT
- self._spider_name = redis_key
- self._project_name = redis_key.split(":")[0]
+ self._spider_name = self.name
self._task_table = task_table
self._tab_spider_status = setting.TAB_SPIDER_STATUS.format(redis_key=redis_key)
@@ -137,9 +137,6 @@ def __init__(
self._stop_heartbeat = False # 是否停止心跳
self._redisdb = RedisDB()
- self._project_total_state_table = "{}_total_state".format(self._project_name)
- self._is_exist_project_total_state_table = False
-
# Request 缓存设置
Request.cached_redis_key = redis_key
Request.cached_expire_time = setting.RESPONSE_CACHED_EXPIRE_TIME
@@ -155,6 +152,8 @@ def __init__(
# 重置丢失的任务
self.reset_task()
+ self._stop_spider = False
+
def init_metrics(self):
"""
初始化打点系统
@@ -176,7 +175,7 @@ def run(self):
while True:
try:
- if self.all_thread_is_done():
+ if self._stop_spider or self.all_thread_is_done():
if not self._is_notify_end:
self.spider_end() # 跑完一轮
self._is_notify_end = True
@@ -196,15 +195,13 @@ def run(self):
tools.delay_time(1) # 1秒钟检查一次爬虫状态
def __add_task(self):
- # 启动parser 的 start_requests
- self.spider_begin() # 不自动结束的爬虫此处只能执行一遍
-
# 判断任务池中属否还有任务,若有接着抓取
todo_task_count = self._collector.get_requests_count()
if todo_task_count:
log.info("检查到有待做任务 %s 条,不重下发新任务,将接着上回异常终止处继续抓取" % todo_task_count)
else:
for parser in self._parsers:
+ # 启动parser 的 start_requests
results = parser.start_requests()
# 添加request到请求队列,由请求队列统一入库
if results and not isinstance(results, Iterable):
@@ -237,6 +234,8 @@ def __add_task(self):
self._item_buffer.flush()
def _start(self):
+ self.spider_begin()
+
# 将失败的item入库
if setting.RETRY_FAILED_ITEMS:
handle_failed_items = HandleFailedItems(
@@ -487,8 +486,9 @@ def spider_end(self):
spand_time = tools.get_current_timestamp() - begin_timestamp
- msg = "《%s》爬虫结束,耗时 %s" % (
+ msg = "《%s》爬虫%s,采集耗时 %s" % (
self._spider_name,
+ "被终止" if self._stop_spider else "结束",
tools.format_seconds(spand_time),
)
log.info(msg)
@@ -586,3 +586,6 @@ def reset_task(self, heartbeat_interval=10):
lose_count = len(datas)
if lose_count:
log.info("重置丢失任务完毕,共{}条".format(len(datas)))
+
+ def stop_spider(self):
+ self._stop_spider = True
diff --git a/feapder/core/spiders/air_spider.py b/feapder/core/spiders/air_spider.py
index d2ef4868..70c30112 100644
--- a/feapder/core/spiders/air_spider.py
+++ b/feapder/core/spiders/air_spider.py
@@ -8,8 +8,6 @@
@email: boris_liu@foxmail.com
"""
-from threading import Thread
-
import feapder.setting as setting
import feapder.utils.tools as tools
from feapder.buffer.item_buffer import ItemBuffer
@@ -20,9 +18,10 @@
from feapder.network.request import Request
from feapder.utils import metrics
from feapder.utils.log import log
+from feapder.utils.tail_thread import TailThread
-class AirSpider(BaseParser, Thread):
+class AirSpider(BaseParser, TailThread):
__custom_setting__ = {}
def __init__(self, thread_count=None):
@@ -41,11 +40,12 @@ def __init__(self, thread_count=None):
self._memory_db = MemoryDB()
self._parser_controls = []
- self._item_buffer = ItemBuffer(redis_key="air_spider")
+ self._item_buffer = ItemBuffer(redis_key=self.name)
self._request_buffer = AirSpiderRequestBuffer(
db=self._memory_db, dedup_name=self.name
)
+ self._stop_spider = False
metrics.init(**setting.METRICS_OTHER_ARGS)
def distribute_task(self):
@@ -97,7 +97,7 @@ def run(self):
while True:
try:
- if self.all_thread_is_done():
+ if self._stop_spider or self.all_thread_is_done():
# 停止 parser_controls
for parser_control in self._parser_controls:
parser_control.stop()
@@ -108,7 +108,10 @@ def run(self):
# 关闭webdirver
Request.render_downloader and Request.render_downloader.close_all()
- log.info("无任务,爬虫结束")
+ if self._stop_spider:
+ log.info("爬虫被终止")
+ else:
+ log.info("无任务,爬虫结束")
break
except Exception as e:
@@ -130,3 +133,6 @@ def join(self, timeout=None):
return
super().join()
+
+ def stop_spider(self):
+ self._stop_spider = True
diff --git a/feapder/core/spiders/batch_spider.py b/feapder/core/spiders/batch_spider.py
index edbc2918..6b2ae092 100644
--- a/feapder/core/spiders/batch_spider.py
+++ b/feapder/core/spiders/batch_spider.py
@@ -1002,7 +1002,7 @@ def run(self):
while True:
try:
- if (
+ if self._stop_spider or (
self.task_is_done() and self.all_thread_is_done()
): # redis全部的任务已经做完 并且mysql中的任务已经做完(检查各个线程all_thread_is_done,防止任务没做完,就更新任务状态,导致程序结束的情况)
if not self._is_notify_end:
diff --git a/feapder/core/spiders/spider.py b/feapder/core/spiders/spider.py
index a2a726e4..a1097559 100644
--- a/feapder/core/spiders/spider.py
+++ b/feapder/core/spiders/spider.py
@@ -184,7 +184,7 @@ def run(self):
while True:
try:
- if self.all_thread_is_done():
+ if self._stop_spider or self.all_thread_is_done():
if not self._is_notify_end:
self.spider_end() # 跑完一轮
self._is_notify_end = True
diff --git a/feapder/core/spiders/task_spider.py b/feapder/core/spiders/task_spider.py
index 603988fd..41cb3596 100644
--- a/feapder/core/spiders/task_spider.py
+++ b/feapder/core/spiders/task_spider.py
@@ -50,6 +50,7 @@ def __init__(
delete_keys=(),
keep_alive=None,
batch_interval=0,
+ use_mysql=True,
**kwargs,
):
"""
@@ -91,6 +92,7 @@ def __init__(
@param task_condition: 任务条件 用于从一个大任务表中挑选出数据自己爬虫的任务,即where后的条件语句
@param task_order_by: 取任务时的排序条件 如 id desc
@param batch_interval: 抓取时间间隔 默认为0 天为单位 多次启动时,只有当前时间与第一次抓取结束的时间间隔大于指定的时间间隔时,爬虫才启动
+ @param use_mysql: 是否使用mysql数据库
---------
@result:
"""
@@ -109,7 +111,7 @@ def __init__(
)
self._redisdb = RedisDB()
- self._mysqldb = MysqlDB()
+ self._mysqldb = MysqlDB() if use_mysql else None
self._task_table = task_table # mysql中的任务表
self._task_keys = task_keys # 需要获取的任务字段
@@ -516,7 +518,7 @@ def run(self):
while True:
try:
- if (
+ if self._stop_spider or (
self.all_thread_is_done()
and self.task_is_done()
and self.related_spider_is_done()
diff --git a/feapder/db/mongodb.py b/feapder/db/mongodb.py
index e826b2bb..791fe0d9 100644
--- a/feapder/db/mongodb.py
+++ b/feapder/db/mongodb.py
@@ -12,7 +12,7 @@
from urllib import parse
import pymongo
-from pymongo import MongoClient
+from pymongo import MongoClient, UpdateOne, UpdateMany
from pymongo.collection import Collection
from pymongo.database import Database
from pymongo.errors import DuplicateKeyError, BulkWriteError
@@ -23,30 +23,33 @@
class MongoDB:
def __init__(
- self,
- ip=None,
- port=None,
- db=None,
- user_name=None,
- user_pass=None,
- url=None,
- **kwargs,
+ self,
+ ip=None,
+ port=None,
+ db=None,
+ user_name=None,
+ user_pass=None,
+ url=None,
+ **kwargs,
):
+ if not ip:
+ ip = setting.MONGO_IP
+ if not port:
+ port = setting.MONGO_PORT
+ if not db:
+ db = setting.MONGO_DB
+ if not user_name:
+ user_name = setting.MONGO_USER_NAME
+ if not user_pass:
+ user_pass = setting.MONGO_USER_PASS
+ if not url:
+ url = setting.MONGO_URL
+
if url:
self.client = MongoClient(url, **kwargs)
else:
- if not ip:
- ip = setting.MONGO_IP
- if not port:
- port = setting.MONGO_PORT
- if not db:
- db = setting.MONGO_DB
- if not user_name:
- user_name = setting.MONGO_USER_NAME
- if not user_pass:
- user_pass = setting.MONGO_USER_PASS
self.client = MongoClient(
- host=ip, port=port, username=user_name, password=user_pass
+ host=ip, port=port, username=user_name, password=user_pass, **kwargs
)
self.db = self.get_database(db)
@@ -94,7 +97,7 @@ def get_collection(self, coll_name, **kwargs) -> Collection:
return self.db.get_collection(coll_name, **kwargs)
def find(
- self, coll_name: str, condition: Optional[Dict] = None, limit: int = 0, **kwargs
+ self, coll_name: str, condition: Optional[Dict] = None, limit: int = 0, **kwargs
) -> List[Dict]:
"""
@summary:
@@ -133,13 +136,13 @@ def find(
return dataset
def add(
- self,
- coll_name,
- data: Dict,
- replace=False,
- update_columns=(),
- update_columns_value=(),
- insert_ignore=False,
+ self,
+ coll_name,
+ data: Dict,
+ replace=False,
+ update_columns=(),
+ update_columns_value=(),
+ insert_ignore=False,
):
"""
添加单条数据
@@ -195,13 +198,13 @@ def add(
return affect_count
def add_batch(
- self,
- coll_name: str,
- datas: List[Dict],
- replace=False,
- update_columns=(),
- update_columns_value=(),
- condition_fields: dict = None,
+ self,
+ coll_name: str,
+ datas: List[Dict],
+ replace=False,
+ update_columns=(),
+ update_columns_value=(),
+ condition_fields: dict = None,
):
"""
批量添加数据
@@ -331,6 +334,70 @@ def update(self, coll_name, data: Dict, condition: Dict, upsert: bool = False):
else:
return True
+ def update_many(self, coll_name, data: Dict, condition: Dict, upsert: bool = False):
+ """
+ 批量更新
+ Args:
+ coll_name: 集合名
+ data: 单条数据 {"xxx":"xxx"}
+ condition: 更新条件 {"_id": "xxxx"}
+ upsert: 数据不存在则插入,默认为 False
+
+ Returns: True / False
+ """
+ try:
+ collection = self.get_collection(coll_name)
+ collection.update_many(condition, {"$set": data}, upsert=upsert)
+ except Exception as e:
+ log.error(
+ """
+ error:{}
+ condition: {}
+ """.format(
+ e, condition
+ )
+ )
+ return False
+ else:
+ return True
+
+ def update_batch(
+ self,
+ coll_name: str,
+ update_data_list: List[Dict],
+ condition_field: str,
+ upsert: bool = False,
+ ):
+ """
+ 批量更新数据
+ Args:
+ coll_name: 集合名
+ update_data_list: 更新数据列表
+ condition_field: 更新条件字段
+ upsert: 数据不存在则插入,默认为 False
+
+ Returns: 更新行数
+
+ """
+ if not update_data_list:
+ return 0
+
+ collection = self.get_collection(coll_name)
+ bulk_operations = []
+
+ for update_data in update_data_list:
+ condition = {condition_field: update_data.get(condition_field)}
+ update_operation = UpdateMany(
+ condition, {"$set": update_data}, upsert=upsert
+ )
+ bulk_operations.append(update_operation)
+ try:
+ result = collection.bulk_write(bulk_operations, ordered=False)
+ return result.modified_count + result.upserted_count
+ except BulkWriteError as e:
+ log.error(f"Bulk write error: {e.details}")
+ return 0
+
def delete(self, coll_name, condition: Dict) -> bool:
"""
删除
@@ -401,7 +468,7 @@ def get_index_key(self, coll_name, index_name):
return index_keys
def __get_update_condition(
- self, coll_name: str, data: dict, duplicate_errmsg: str
+ self, coll_name: str, data: dict, duplicate_errmsg: str
) -> dict:
"""
根据索引冲突的报错信息 获取更新条件
@@ -420,3 +487,15 @@ def __get_update_condition(
def __getattr__(self, name):
return getattr(self.db, name)
+
+
+if __name__ == "__main__":
+ update_data_list = [{"_id": "1", "status": 1}, {"_id": "2", "status": 1}]
+ mongo = MongoDB()
+ updated_count = mongo.update_batch("your_table_name", update_data_list, "_id")
+ print(f"Updated {updated_count} documents.")
+
+ id_list = ["1", "2"]
+ result = mongo.update_many(
+ "your_table_name", {"status": 1}, {"_id": {"$in": id_list}}
+ )
diff --git a/feapder/db/mysqldb.py b/feapder/db/mysqldb.py
index 5677a8fa..9043bafe 100644
--- a/feapder/db/mysqldb.py
+++ b/feapder/db/mysqldb.py
@@ -41,7 +41,7 @@ def wapper(*args, **kwargs):
class MysqlDB:
def __init__(
- self, ip=None, port=None, db=None, user_name=None, user_pass=None, **kwargs
+ self, ip=None, port=None, db=None, user_name=None, user_pass=None, charset="utf8mb4", set_session=None, **kwargs
):
# 可能会改setting中的值,所以此处不能直接赋值为默认值,需要后加载赋值
if not ip:
@@ -68,8 +68,10 @@ def __init__(
user=user_name,
passwd=user_pass,
db=db,
- charset="utf8mb4",
+ charset=charset,
+ setsession=set_session,
cursorclass=cursors.SSCursor,
+ **kwargs
) # cursorclass 使用服务的游标,默认的在多线程下大批量插入数据会使内存递增
except Exception as e:
@@ -83,7 +85,7 @@ def __init__(
user_pass: {}
exception: {}
""".format(
- ip, port, db, user_name, user_pass, e
+ ip, port, db, user_name, user_pass, charset, e
)
)
else:
@@ -117,7 +119,9 @@ def from_url(cls, url, **kwargs):
"user_pass": url_parsed.password.strip(),
"db": url_parsed.path.strip("/").strip(),
}
-
+ # 解析 query 字符串参数,比如 ?charset=utf8
+ query_params = dict(parse.parse_qsl(url_parsed.query))
+ connect_params.update(query_params)
connect_params.update(kwargs)
return cls(**connect_params)
@@ -190,7 +194,7 @@ def find(self, sql, limit=0, to_json=False, conver_col=True):
else:
result = cursor.fetchall()
- if to_json:
+ if to_json and result:
columns = [i[0] for i in cursor.description]
# 处理数据
@@ -198,7 +202,7 @@ def convert(col):
if isinstance(col, (datetime.date, datetime.time)):
return str(col)
elif isinstance(col, str) and (
- col.startswith("{") or col.startswith("[")
+ col.startswith("{") or col.startswith("[")
):
try:
# col = self.unescape_string(col)
@@ -269,12 +273,13 @@ def add_smart(self, table, data: Dict, **kwargs):
sql = make_insert_sql(table, data, **kwargs)
return self.add(sql)
- def add_batch(self, sql, datas: List[Dict]):
+ def add_batch(self, sql, datas: List[List]):
"""
@summary: 批量添加数据
---------
- @ param sql: insert ignore into (xxx,xxx) values (%s, %s, %s)
- # param datas: 列表 [{}, {}, {}]
+ @ param sql: insert ignore into (xxx,xxx,xxx) values (%s, %s, %s)
+ @ param datas: 列表 [[v1,v2,v3], [v1,v2,v3]]
+ 列表里的值要和插入的key的顺序对应上
---------
@result: 添加行数
"""
@@ -299,7 +304,7 @@ def add_batch(self, sql, datas: List[Dict]):
return affect_count
- def add_batch_smart(self, table, datas: List[Dict], **kwargs):
+ def add_batch_smart(self, table, datas: List[Dict], **kwargs) -> int:
"""
批量添加数据, 直接传递list格式的数据,不用拼sql
Args:
@@ -313,12 +318,13 @@ def add_batch_smart(self, table, datas: List[Dict], **kwargs):
sql, datas = make_batch_sql(table, datas, **kwargs)
return self.add_batch(sql, datas)
- def update(self, sql):
+ def update(self, sql) -> int:
+ affect_count = None
conn, cursor = None, None
try:
conn, cursor = self.get_connection()
- cursor.execute(sql)
+ affect_count = cursor.execute(sql)
conn.commit()
except Exception as e:
log.error(
@@ -328,13 +334,12 @@ def update(self, sql):
"""
% (e, sql)
)
- return False
- else:
- return True
finally:
self.close_connection(conn, cursor)
- def update_smart(self, table, data: Dict, condition):
+ return affect_count
+
+ def update_smart(self, table, data: Dict, condition) -> int:
"""
更新, 不用拼sql
Args:
@@ -342,25 +347,26 @@ def update_smart(self, table, data: Dict, condition):
data: 数据 {"xxx":"xxx"}
condition: 更新条件 where后面的条件,如 condition='status=1'
- Returns: True / False
+ Returns: 影响行数
"""
sql = make_update_sql(table, data, condition)
return self.update(sql)
- def delete(self, sql):
+ def delete(self, sql) -> int:
"""
删除
Args:
sql:
- Returns: True / False
+ Returns: 影响行数
"""
+ affect_count = None
conn, cursor = None, None
try:
conn, cursor = self.get_connection()
- cursor.execute(sql)
+ affect_count = cursor.execute(sql)
conn.commit()
except Exception as e:
log.error(
@@ -370,17 +376,24 @@ def delete(self, sql):
"""
% (e, sql)
)
- return False
- else:
- return True
finally:
self.close_connection(conn, cursor)
- def execute(self, sql):
+ return affect_count
+
+ def execute(self, sql) -> int:
+ """
+
+ Args:
+ sql:
+
+ Returns: 影响行数
+ """
+ affect_count = None
conn, cursor = None, None
try:
conn, cursor = self.get_connection()
- cursor.execute(sql)
+ affect_count = cursor.execute(sql)
conn.commit()
except Exception as e:
log.error(
@@ -390,8 +403,7 @@ def execute(self, sql):
"""
% (e, sql)
)
- return False
- else:
- return True
finally:
self.close_connection(conn, cursor)
+
+ return affect_count
diff --git a/feapder/db/redisdb.py b/feapder/db/redisdb.py
index a30e0576..d882e687 100644
--- a/feapder/db/redisdb.py
+++ b/feapder/db/redisdb.py
@@ -6,16 +6,15 @@
---------
@author: Boris
"""
-
+import os
import time
+from typing import Union, List
import redis
-from redis._compat import unicode, long, basestring
from redis.connection import Encoder as _Encoder
from redis.exceptions import ConnectionError, TimeoutError
from redis.exceptions import DataError
from redis.sentinel import Sentinel
-from rediscluster import RedisCluster
import feapder.setting as setting
from feapder.utils.log import log
@@ -34,19 +33,19 @@ def encode(self, value):
# )
elif isinstance(value, float):
value = repr(value).encode()
- elif isinstance(value, (int, long)):
+ elif isinstance(value, int):
# python 2 repr() on longs is '123L', so use str() instead
value = str(value).encode()
elif isinstance(value, (list, dict, tuple)):
- value = unicode(value)
- elif not isinstance(value, basestring):
+ value = str(value)
+ elif not isinstance(value, str):
# a value we don't know how to deal with. throw an error
typename = type(value).__name__
raise DataError(
"Invalid input of type: '%s'. Convert to a "
"bytes, string, int or float first." % typename
)
- if isinstance(value, unicode):
+ if isinstance(value, str):
value = value.encode(self.encoding, self.encoding_errors)
return value
@@ -87,6 +86,8 @@ def __init__(
user_pass = setting.REDISDB_USER_PASS
if service_name is None:
service_name = setting.REDISDB_SERVICE_NAME
+ if kwargs is None:
+ kwargs = setting.REDISDB_KWARGS
self._is_redis_cluster = False
@@ -156,6 +157,12 @@ def get_connect(self):
)
else:
+ try:
+ from rediscluster import RedisCluster
+ except ModuleNotFoundError as e:
+ log.error('请安装 pip install "feapder[all]"')
+ os._exit(0)
+
# log.debug("使用redis集群模式")
self._redis = RedisCluster(
startup_nodes=startup_nodes,
@@ -180,7 +187,7 @@ def get_connect(self):
self._is_redis_cluster = False
else:
self._redis = redis.StrictRedis.from_url(
- self._url, decode_responses=self._decode_responses
+ self._url, decode_responses=self._decode_responses, **self._kwargs
)
self._is_redis_cluster = False
@@ -583,18 +590,17 @@ def zexists(self, table, values):
return is_exists
def lpush(self, table, values):
-
if isinstance(values, list):
pipe = self._redis.pipeline()
if not self._is_redis_cluster:
pipe.multi()
for value in values:
- pipe.rpush(table, value)
+ pipe.lpush(table, value)
pipe.execute()
else:
- return self._redis.rpush(table, values)
+ return self._redis.lpush(table, values)
def lpop(self, table, count=1):
"""
@@ -738,27 +744,41 @@ def hget_count(self, table):
def hkeys(self, table):
return self._redis.hkeys(table)
- def setbit(self, table, offsets, values):
+ def hvals(self, key):
+ return self._redis.hvals(key)
+
+ def setbit(
+ self, table, offsets: Union[int, List[int]], values: Union[int, List[int]]
+ ):
"""
- 设置字符串数组某一位的值, 返回之前的值
- @param table:
+ 设置字符串数组某一位的值,返回之前的值
+ @param table: Redis key
@param offsets: 支持列表或单个值
@param values: 支持列表或单个值
@return: list / 单个值
"""
if isinstance(offsets, list):
- if not isinstance(values, list):
- values = [values] * len(offsets)
+ if isinstance(values, int):
+ # 使用lua脚本,数据是一起传给redis的,降低了网络开销,但redis会阻塞
+ script = """
+ local value = table.remove(ARGV, 1)
+ local offsets = ARGV
+ local results = {}
+ for i, offset in ipairs(offsets) do
+ results[i] = redis.call('SETBIT', KEYS[1], offset, value)
+ end
+ return results
+ """
+ return self._redis.eval(script, 1, table, values, *offsets)
else:
assert len(offsets) == len(values), "offsets值要与values值一一对应"
+ pipe = self._redis.pipeline()
+ pipe.multi()
- pipe = self._redis.pipeline()
- pipe.multi()
-
- for offset, value in zip(offsets, values):
- pipe.setbit(table, offset, value)
+ for offset, value in zip(offsets, values):
+ pipe.setbit(table, offset, value)
- return pipe.execute()
+ return pipe.execute()
else:
return self._redis.setbit(table, offsets, values)
@@ -785,6 +805,20 @@ def bitcount(self, table):
return self._redis.bitcount(table)
def strset(self, table, value, **kwargs):
+ """
+ 设置键值
+ Args:
+ table:
+ value:
+ **kwargs:
+ ex: Union[None, int, timedelta] = ..., 设置键的过期时间为 second 秒
+ px: Union[None, int, timedelta] = ..., 设置键的过期时间为 millisecond 毫秒
+ nx: bool = ..., 只有键不存在时,才对键进行设置操作
+ xx: bool = ..., 只有键已经存在时,才对键进行设置操作
+ keepttl: bool = ..., 保留键的过期时间
+ Returns:
+
+ """
return self._redis.set(table, value, **kwargs)
def str_incrby(self, table, value):
diff --git a/feapder/dedup/bitarray.py b/feapder/dedup/bitarray.py
index 6d77719a..348ceb46 100644
--- a/feapder/dedup/bitarray.py
+++ b/feapder/dedup/bitarray.py
@@ -48,7 +48,7 @@ def __init__(self, num_bits):
import bitarray
except Exception as e:
raise Exception(
- "需要安装feapder完整版\ncommand: pip install feapder[all]\n若安装出错,参考:https://feapder.com/#/question/%E5%AE%89%E8%A3%85%E9%97%AE%E9%A2%98"
+ '需要安装feapder完整版\ncommand: pip install "feapder[all]"\n若安装出错,参考:https://feapder.com/#/question/%E5%AE%89%E8%A3%85%E9%97%AE%E9%A2%98'
)
self.num_bits = num_bits
@@ -127,7 +127,18 @@ def set(self, offsets, values):
@param values: 支持列表或单个值
@return: list / 单个值
"""
- return self.redis_db.setbit(self.name, offsets, values)
+ # 对offsets进行分片,最大100000个
+ results = []
+ batch_size = 170000
+ for i in range(0, len(offsets), batch_size):
+ results.extend(
+ self.redis_db.setbit(
+ self.name,
+ offsets[i : i + batch_size],
+ values[i : i + batch_size] if isinstance(values, list) else values,
+ )
+ )
+ return results
def get(self, offsets):
return self.redis_db.getbit(self.name, offsets)
diff --git a/feapder/network/downloader/__init__.py b/feapder/network/downloader/__init__.py
index 9c7cc20f..f036271e 100644
--- a/feapder/network/downloader/__init__.py
+++ b/feapder/network/downloader/__init__.py
@@ -1,4 +1,12 @@
from ._requests import RequestsDownloader
from ._requests import RequestsSessionDownloader
-from ._selenium import SeleniumDownloader
-from ._playwright import PlaywrightDownloader
+
+# 下面是非必要依赖
+try:
+ from ._selenium import SeleniumDownloader
+except ModuleNotFoundError:
+ pass
+try:
+ from ._playwright import PlaywrightDownloader
+except ModuleNotFoundError:
+ pass
diff --git a/feapder/network/downloader/_playwright.py b/feapder/network/downloader/_playwright.py
index 3b5a7838..facc75cd 100644
--- a/feapder/network/downloader/_playwright.py
+++ b/feapder/network/downloader/_playwright.py
@@ -58,7 +58,8 @@ def download(self, request) -> Response:
if cookies:
driver.url = url
driver.cookies = cookies
- driver.page.goto(url, wait_until=wait_until)
+ http_response = driver.page.goto(url, wait_until=wait_until)
+ status_code = http_response.status
if render_time:
tools.delay_time(render_time)
@@ -69,7 +70,7 @@ def download(self, request) -> Response:
"url": driver.page.url,
"cookies": driver.cookies,
"_content": html.encode(),
- "status_code": 200,
+ "status_code": status_code,
"elapsed": 666,
"headers": {
"User-Agent": driver.user_agent,
diff --git a/feapder/network/item.py b/feapder/network/item.py
index dd961f10..33eae79c 100644
--- a/feapder/network/item.py
+++ b/feapder/network/item.py
@@ -9,6 +9,7 @@
"""
import re
+from typing import List
import feapder.utils.tools as tools
@@ -20,12 +21,14 @@ def __new__(cls, name, bases, attrs):
attrs.setdefault("__name_underline__", None)
attrs.setdefault("__update_key__", None)
attrs.setdefault("__unique_key__", None)
+ attrs.setdefault("__pipelines__", None)
return type.__new__(cls, name, bases, attrs)
class Item(metaclass=ItemMetaclass):
- __unique_key__ = []
+ __unique_key__: List = []
+ __pipelines__: List = None
def __init__(self, **kwargs):
self.__dict__ = kwargs
@@ -64,11 +67,12 @@ def to_dict(self):
propertys = {}
for key, value in self.__dict__.items():
if key not in (
- "__name__",
- "__table_name__",
- "__name_underline__",
- "__update_key__",
- "__unique_key__",
+ "__name__",
+ "__table_name__",
+ "__name_underline__",
+ "__update_key__",
+ "__unique_key__",
+ "__pipelines__",
):
if key.startswith(f"_{self.__class__.__name__}"):
key = key.replace(f"_{self.__class__.__name__}", "")
@@ -123,13 +127,24 @@ def unique_key(self, keys):
else:
self.__unique_key__ = (keys,)
+ @property
+ def pipelines(self):
+ return self.__pipelines__ or self.__class__.__pipelines__
+
+ @pipelines.setter
+ def pipelines(self, pipelines):
+ if isinstance(pipelines, (tuple, list)):
+ self.__pipelines__ = pipelines
+ else:
+ self.__pipelines__ = (pipelines,)
+
@property
def fingerprint(self):
args = []
for key, value in self.to_dict.items():
if value:
if (self.unique_key and key in self.unique_key) or not self.unique_key:
- args.append(str(value))
+ args.append(key + str(value))
if args:
args = sorted(args)
diff --git a/feapder/network/proxy_pool/__init__.py b/feapder/network/proxy_pool/__init__.py
new file mode 100644
index 00000000..0a6935b6
--- /dev/null
+++ b/feapder/network/proxy_pool/__init__.py
@@ -0,0 +1,11 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2023/7/25 10:16
+---------
+@summary:
+---------
+@author: Boris
+@email: boris_liu@foxmail.com
+"""
+from .base import BaseProxyPool
+from .proxy_pool import ProxyPool
diff --git a/feapder/network/proxy_pool/base.py b/feapder/network/proxy_pool/base.py
new file mode 100644
index 00000000..0a2dc590
--- /dev/null
+++ b/feapder/network/proxy_pool/base.py
@@ -0,0 +1,43 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2023/7/25 10:03
+---------
+@summary:
+---------
+@author: Boris
+@email: boris_liu@foxmail.com
+"""
+
+import abc
+
+from feapder.utils.log import log
+
+
+class BaseProxyPool:
+ @abc.abstractmethod
+ def get_proxy(self):
+ """
+ 获取代理
+ Returns:
+ {"http": "xxx", "https": "xxx"}
+ """
+ raise NotImplementedError
+
+ @abc.abstractmethod
+ def del_proxy(self, proxy):
+ """
+ @summary: 删除代理
+ ---------
+ @param proxy: ip:port
+ """
+ raise NotImplementedError
+
+ def tag_proxy(self, **kwargs):
+ """
+ @summary: 标记代理
+ ---------
+ @param kwargs:
+ @return:
+ """
+ log.warning("暂不支持标记代理")
+ pass
diff --git a/feapder/network/proxy_pool/proxy_pool.py b/feapder/network/proxy_pool/proxy_pool.py
new file mode 100644
index 00000000..ce492633
--- /dev/null
+++ b/feapder/network/proxy_pool/proxy_pool.py
@@ -0,0 +1,69 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2022/10/19 10:40 AM
+---------
+@summary:
+---------
+@author: Boris
+@email: boris_liu@foxmail.com
+"""
+from queue import Queue
+
+import requests
+
+import feapder.setting as setting
+from feapder.network.proxy_pool.base import BaseProxyPool
+from feapder.utils import metrics
+from feapder.utils import tools
+
+
+class ProxyPool(BaseProxyPool):
+ """
+ 通过API提取代理,存储在内存中,无代理时会自动提取
+ API返回的代理以 \r\n 分隔
+ """
+
+ def __init__(self, proxy_api=None, **kwargs):
+ self.proxy_api = proxy_api or setting.PROXY_EXTRACT_API
+ self.proxy_queue = Queue()
+
+ def format_proxy(self, proxy):
+ return {"http": "http://" + proxy, "https": "http://" + proxy}
+
+ @tools.retry(3, interval=5)
+ def pull_proxies(self):
+ resp = requests.get(self.proxy_api)
+ proxies = resp.text.strip()
+ resp.close()
+ if "{" in proxies or not proxies:
+ raise Exception("获取代理失败", proxies)
+ # 使用 /r/n 分隔
+ return proxies.split("\r\n")
+
+ def get_proxy(self):
+ try:
+ if self.proxy_queue.empty():
+ proxies = self.pull_proxies()
+ for proxy in proxies:
+ self.proxy_queue.put_nowait(proxy)
+ metrics.emit_counter("total", 1, classify="proxy")
+
+ proxy = self.proxy_queue.get_nowait()
+ self.proxy_queue.put_nowait(proxy)
+
+ metrics.emit_counter("used_times", 1, classify="proxy")
+
+ return self.format_proxy(proxy)
+ except Exception as e:
+ tools.send_msg("获取代理失败", level="error")
+ raise Exception("获取代理失败", e)
+
+ def del_proxy(self, proxy):
+ """
+ @summary: 删除代理
+ ---------
+ @param proxy: ip:port
+ """
+ if proxy in self.proxy_queue.queue:
+ self.proxy_queue.queue.remove(proxy)
+ metrics.emit_counter("invalid", 1, classify="proxy")
diff --git a/feapder/network/proxy_pool.py b/feapder/network/proxy_pool_old.py
similarity index 100%
rename from feapder/network/proxy_pool.py
rename to feapder/network/proxy_pool_old.py
diff --git a/feapder/network/request.py b/feapder/network/request.py
index 152e6127..95e51604 100644
--- a/feapder/network/request.py
+++ b/feapder/network/request.py
@@ -9,6 +9,7 @@
"""
import copy
+import os
import re
import requests
@@ -20,7 +21,7 @@
from feapder.db.redisdb import RedisDB
from feapder.network import user_agent
from feapder.network.downloader.base import Downloader, RenderDownloader
-from feapder.network.proxy_pool import ProxyPool
+from feapder.network.proxy_pool import BaseProxyPool
from feapder.network.response import Response
from feapder.utils.log import log
@@ -30,7 +31,7 @@
class Request:
user_agent_pool = user_agent
- proxies_pool: ProxyPool = None
+ proxies_pool: BaseProxyPool = None
cache_db = None # redis / pika
cached_redis_key = None # 缓存response的文件文件夹 response_cached:cached_redis_key:md5
@@ -195,13 +196,19 @@ def __setattr__(self, key, value):
if key in self.__class__.__REQUEST_ATTRS__:
self.requests_kwargs[key] = value
+ # def __getattr__(self, item):
+ # try:
+ # return self.__dict__[item]
+ # except:
+ # raise AttributeError("Request has no attribute %s" % item)
+
def __lt__(self, other):
return self.priority < other.priority
@property
def _proxies_pool(self):
if not self.__class__.proxies_pool:
- self.__class__.proxies_pool = ProxyPool()
+ self.__class__.proxies_pool = tools.import_cls(setting.PROXY_POOL)()
return self.__class__.proxies_pool
@@ -224,9 +231,13 @@ def _session_downloader(self):
@property
def _render_downloader(self):
if not self.__class__.render_downloader:
- self.__class__.render_downloader = tools.import_cls(
- setting.RENDER_DOWNLOADER
- )()
+ try:
+ self.__class__.render_downloader = tools.import_cls(
+ setting.RENDER_DOWNLOADER
+ )()
+ except AttributeError:
+ log.error('当前是渲染模式,请安装 pip install "feapder[render]"')
+ os._exit(0)
return self.__class__.render_downloader
@@ -244,6 +255,7 @@ def to_dict(self):
self.download_midware = [
getattr(download_midware, "__name__")
if callable(download_midware)
+ and download_midware.__class__.__name__ == "method"
else download_midware
for download_midware in self.download_midware
]
@@ -251,6 +263,7 @@ def to_dict(self):
self.download_midware = (
getattr(self.download_midware, "__name__")
if callable(self.download_midware)
+ and self.download_midware.__class__.__name__ == "method"
else self.download_midware
)
@@ -265,11 +278,11 @@ def to_dict(self):
if value is not None:
if key in self.__class__.__REQUEST_ATTRS__:
if not isinstance(
- value, (bytes, bool, float, int, str, tuple, list, dict)
+ value, (bool, float, int, str, tuple, list, dict)
):
value = tools.dumps_obj(value)
else:
- if not isinstance(value, (bytes, bool, float, int, str)):
+ if not isinstance(value, (bool, float, int, str)):
value = tools.dumps_obj(value)
request_dict[key] = value
@@ -331,7 +344,7 @@ def make_requests_kwargs(self):
proxies = self.requests_kwargs.get("proxies", -1)
if proxies == -1 and setting.PROXY_ENABLE and setting.PROXY_EXTRACT_API:
while True:
- proxies = self._proxies_pool.get()
+ proxies = self._proxies_pool.get_proxy()
if proxies:
self.requests_kwargs.update(proxies=proxies)
break
@@ -422,6 +435,12 @@ def get_proxy(self) -> str:
"http.*?//", "", proxies.get("http", "") or proxies.get("https", "")
)
+ def del_proxy(self):
+ proxy = self.get_proxy()
+ if proxy:
+ self._proxies_pool.del_proxy(proxy)
+ del self.requests_kwargs["proxies"]
+
def get_headers(self) -> dict:
return self.requests_kwargs.get("headers", {})
diff --git a/feapder/network/response.py b/feapder/network/response.py
index 7fd78878..7f97861b 100644
--- a/feapder/network/response.py
+++ b/feapder/network/response.py
@@ -211,13 +211,14 @@ def _make_absolute(self, link):
def _absolute_links(self, text):
regexs = [
- r'(<(?i)a.*?href\s*?=\s*?["\'])(.+?)(["\'])', # a
- r'(<(?i)img.*?src\s*?=\s*?["\'])(.+?)(["\'])', # img
- r'(<(?i)link.*?href\s*?=\s*?["\'])(.+?)(["\'])', # css
- r'(<(?i)script.*?src\s*?=\s*?["\'])(.+?)(["\'])', # js
+ r'( 标签后插入一个标签
repl = fr'\1'
- body = re.sub(rb"(|\s.*?>))", repl.encode('utf-8'), body)
+ body = re.sub(rb"(|\s.*?>))", repl.encode("utf-8"), body)
fd, fname = tempfile.mkstemp(".html")
os.write(fd, body)
diff --git a/feapder/network/selector.py b/feapder/network/selector.py
index ea8b2eff..901f4eb5 100644
--- a/feapder/network/selector.py
+++ b/feapder/network/selector.py
@@ -12,6 +12,7 @@
import parsel
import six
from lxml import etree
+from packaging import version
from parsel import Selector as ParselSelector
from parsel import SelectorList as ParselSelectorList
from parsel import selector
@@ -65,7 +66,7 @@ def create_root_node(text, parser_cls, base_url=None):
return root
-if parsel.__version__ < "1.7.0":
+if version.parse(parsel.__version__) < version.parse("1.7.0"):
selector.create_root_node = create_root_node
diff --git a/feapder/network/user_agent.py b/feapder/network/user_agent.py
index 28df6325..7f9024d4 100644
--- a/feapder/network/user_agent.py
+++ b/feapder/network/user_agent.py
@@ -61,6 +61,683 @@
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.15 (KHTML, like Gecko) Chrome/24.0.1295.0 Safari/537.15",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.14 (KHTML, like Gecko) Chrome/24.0.1292.0 Safari/537.14",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3215.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3790.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.92 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.63 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.24 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.136 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.0.3016 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36 Kinza/6.1.5",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.48 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.2.0.1713 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.47 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.2 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.819 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.41 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.785 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.9 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3235.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3409.85 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4371.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.9 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.43 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 CravingExplorer/2.4.1",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4121.813 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.107 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.9 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.158 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.58 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.140 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36",
+ "Mozilla/5.0 (Microsoft Windows NT 10.0.16299.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 (FTM)",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4500.0 Iron Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4427.5 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3835.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/82.0.4085.4 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.116 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.116 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.91 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.4000.0 Iron Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.41 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.116 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.41 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 5.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 ADG/11.0.2566 AOLBUILD/11.0.2566 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.108 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.152 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 ADG/11.0.2510 AOLBUILD/11.0.2510 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 AOLShield/83.0.4103.0",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 AOL/11.0 AOLBUILD/11.0.1839 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 ADG/11.0.2414 AOLBUILD/11.0.2414 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 ADG/11.0.2566 AOLBUILD/11.0.2566 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 AOLShield/83.0.4103.2",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.87 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.105 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.183 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.152 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/90.0.4430.72 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 ADG/11.0.2510 AOLBUILD/11.0.2510 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 ADG/11.0.2566 AOLBUILD/11.0.2566 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.97 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.105 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.182 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.108 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 ADG/11.0.2510 AOLBUILD/11.0.2510 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.101 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 AOL/11.0 AOLBUILD/11.0.1839 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 ADG/11.0.2470 AOLBUILD/11.0.2470 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 ADG/11.0.2566 AOLBUILD/11.0.2566 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36 AOLShield/79.0.3945.5",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/77.0.3865.90 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/79.0.3945.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.162 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.89 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.99 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.141 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.72 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.123 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4558.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.102 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4564.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.72 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.81 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.81 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3409.13 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.26 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.81 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4591.54 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.101.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.7113.93 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.49 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.54 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.1150.52 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4950.0 Iron Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4450.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 11.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4868.173 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.1483.27 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.66 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.3478.83 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.115 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.5118.205 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Agency/97.8.8247.48",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4137.1 SputnikBrowser/5.6.6280.0 (GOST) Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.43 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/82.0.4078.2 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.3538.77 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.5 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.6 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.1 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3409.631 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.3 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.101 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.2 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.93 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.8 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.5 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3409.1 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.44 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.779 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.19 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.6 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 FS",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36\tChrome 79.0",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36\tChrome Generic",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_16_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_16_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.69 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.186 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.170 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4450.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/524.34",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.105 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.51 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.3538.77 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/77.0.3865.99 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/81.0.4044.108 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/83.0.4103.118 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/84.0.4147.108 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/84.0.4147.140 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/85.0.4183.122 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/87.0.4280.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/88.0.4324.175 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/89.0.4389.93 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/89.0.4389.127 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.75 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/79.0.3945.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.116 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/81.0.4044.113 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.135 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.75 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.141 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.72 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.70 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.116 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.162 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.75 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.67 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.152 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/77.0.3865.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.108 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.87 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.162 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/83.0.4103.116 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/85.0.4183.83 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.99 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.198 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.141 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.182 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/90.0.4430.72 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/79.0.3945.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/79.0.3945.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/77.0.3865.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.108 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.122 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/81.0.4044.113 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.89 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/85.0.4183.102 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.183 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.146 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.72 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.108 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.70 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.97 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/79.0.3945.130 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.108 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.87 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.149 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.89 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.99 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.149 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/81.0.4044.122 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.89 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.101 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/83.0.4103.97 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.105 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.75 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/78.0.3904.87 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/83.0.4103.106 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/84.0.4147.125 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/85.0.4183.121 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.183 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.152 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/83.0.4103.116 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/85.0.4183.102 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.111 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.60 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.141 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.182 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_16_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/80.0.3987.116 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.183 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.67 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.96 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.192 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.67 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.96 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.72 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.101 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.152 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/87.0.4280.101 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.182 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_2) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.146 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_2) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.72 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/88.0.4324.96 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.72 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Mediapartners-Google) Chrome/89.0.4389.130 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.69 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4582.189 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/82.0.4083.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4612.206 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4702.147 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4691.94 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4889.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.79 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.79 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.9999.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.40 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4880.146 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.147 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.109 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.109 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4886.93 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/89.0.4389.105 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4886.148 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5163.147 Safari/537.36"
],
"opera": [
"Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16",
diff --git a/feapder/network/user_pool/guest_user_pool.py b/feapder/network/user_pool/guest_user_pool.py
index 0e550dde..9d34aad3 100644
--- a/feapder/network/user_pool/guest_user_pool.py
+++ b/feapder/network/user_pool/guest_user_pool.py
@@ -45,7 +45,7 @@ def __init__(
user_agent: 字符串 或 无参函数,返回值为user_agent
proxy: xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
headless: 是否启用无头模式
- driver_type: CHROME 或 PHANTOMJS,FIREFOX
+ driver_type: CHROME,EDGE 或 PHANTOMJS,FIREFOX
timeout: 请求超时时间
window_size: # 窗口大小
executable_path: 浏览器路径,默认为默认路径
diff --git a/feapder/network/user_pool/normal_user_pool.py b/feapder/network/user_pool/normal_user_pool.py
index f14c7656..63c99726 100644
--- a/feapder/network/user_pool/normal_user_pool.py
+++ b/feapder/network/user_pool/normal_user_pool.py
@@ -209,9 +209,9 @@ def run(self):
retry_times = 0
while retry_times <= self._login_retry_times:
try:
- user = self.login(user)
- if user:
- self.add_user(user)
+ login_user = self.login(user)
+ if login_user:
+ self.add_user(login_user)
else:
self.handle_login_failed_user(user)
break
diff --git a/feapder/pipelines/csv_pipeline.py b/feapder/pipelines/csv_pipeline.py
new file mode 100644
index 00000000..922a77d3
--- /dev/null
+++ b/feapder/pipelines/csv_pipeline.py
@@ -0,0 +1,254 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2025-10-16
+---------
+@summary: CSV 数据导出Pipeline
+---------
+@author: 道长
+@email: ctrlf4@yeah.net
+"""
+
+import csv
+import os
+import threading
+from typing import Dict, List, Tuple
+
+from feapder.pipelines import BasePipeline
+from feapder.utils.log import log
+
+
+class CsvPipeline(BasePipeline):
+ """
+ CSV 数据导出Pipeline
+
+ 将爬虫数据保存为CSV文件。支持批量保存、并发写入控制、断点续爬等功能。
+
+ 特点:
+ - 单表单锁设计,避免全局锁带来的性能问题
+ - 自动创建导出目录
+ - 支持追加模式,便于断点续爬
+ - 通过fsync确保数据落盘
+ - 表级别的字段名缓存,确保跨批字段顺序一致
+ """
+
+ # 用于保护每个表的文件写入操作(Per-Table Lock)
+ _file_locks = {}
+
+ # 用于缓存每个表的字段名顺序(Per-Table Fieldnames Cache)
+ # 确保跨批次、跨线程的字段顺序一致
+ _table_fieldnames = {}
+
+ def __init__(self, csv_dir=None):
+ """
+ 初始化CSV Pipeline
+
+ Args:
+ csv_dir: CSV文件保存目录
+ - 如果不传,从 setting.CSV_EXPORT_PATH 读取
+ - 支持相对路径(如 "data/csv")
+ - 支持绝对路径(如 "/Users/xxx/exports/csv")
+ """
+ super().__init__()
+
+ # 如果未传入参数,从配置文件读取
+ if csv_dir is None:
+ import feapder.setting as setting
+ csv_dir = setting.CSV_EXPORT_PATH
+
+ # 支持绝对路径和相对路径,统一转换为绝对路径
+ self.csv_dir = os.path.abspath(csv_dir)
+ self._ensure_csv_dir_exists()
+
+ def _ensure_csv_dir_exists(self):
+ """确保CSV保存目录存在"""
+ if not os.path.exists(self.csv_dir):
+ try:
+ os.makedirs(self.csv_dir, exist_ok=True)
+ log.info(f"创建CSV保存目录: {self.csv_dir}")
+ except Exception as e:
+ log.error(f"创建CSV目录失败: {e}")
+ raise
+
+ @staticmethod
+ def _get_lock(table):
+ """
+ 获取表对应的文件锁
+
+ 采用Per-Table Lock设计,每个表都有独立的锁,避免锁竞争。
+ 这样设计既能保证单表的文件写入安全,又能充分利用多表并行写入的优势。
+
+ Args:
+ table: 表名
+
+ Returns:
+ threading.Lock: 该表对应的锁对象
+ """
+ if table not in CsvPipeline._file_locks:
+ CsvPipeline._file_locks[table] = threading.Lock()
+ return CsvPipeline._file_locks[table]
+
+ @staticmethod
+ def _get_and_cache_fieldnames(table, items):
+ """
+ 获取并缓存表对应的字段名顺序
+
+ 第一次调用时从items[0]提取字段名并缓存,后续调用直接返回缓存的字段名。
+ 这样设计确保:
+ 1. 跨批次的字段顺序保持一致(解决数据列错位问题)
+ 2. 多线程并发时字段顺序不被污染
+ 3. 避免重复提取,性能更优
+
+ Args:
+ table: 表名
+ items: 数据列表 [{},{},...]
+
+ Returns:
+ list: 字段名列表
+ """
+ # 如果该表已经缓存了字段名,直接返回缓存的
+ if table in CsvPipeline._table_fieldnames:
+ return CsvPipeline._table_fieldnames[table]
+
+ # 第一次调用,从items提取字段名并缓存
+ if not items:
+ return []
+
+ first_item = items[0]
+ fieldnames = list(first_item.keys()) if isinstance(first_item, dict) else []
+
+ if fieldnames:
+ # 缓存字段名(使用静态变量,跨实例共享)
+ CsvPipeline._table_fieldnames[table] = fieldnames
+ log.info(f"表 {table} 的字段名已缓存: {fieldnames}")
+
+ return fieldnames
+
+ def _get_csv_file_path(self, table):
+ """
+ 获取表对应的CSV文件路径
+
+ Args:
+ table: 表名
+
+ Returns:
+ str: CSV文件的完整路径
+ """
+ return os.path.join(self.csv_dir, f"{table}.csv")
+
+
+ def _file_exists_and_has_content(self, csv_file):
+ """
+ 检查CSV文件是否存在且有内容
+
+ Args:
+ csv_file: CSV文件路径
+
+ Returns:
+ bool: 文件存在且有内容返回True
+ """
+ return os.path.exists(csv_file) and os.path.getsize(csv_file) > 0
+
+ def save_items(self, table, items: List[Dict]) -> bool:
+ """
+ 保存数据到CSV文件
+
+ 采用追加模式打开文件,支持断点续爬。第一次写入时会自动添加表头。
+ 使用Per-Table Lock确保多线程写入时的数据一致性。
+ 使用缓存的字段名确保跨批次字段顺序一致,避免数据列错位。
+
+ Args:
+ table: 表名(对应CSV文件名)
+ items: 数据列表,[{}, {}, ...]
+
+ Returns:
+ bool: 保存成功返回True,失败返回False
+ 失败时ItemBuffer会自动重试(最多10次)
+ """
+ if not items:
+ return True
+
+ csv_file = self._get_csv_file_path(table)
+
+ # 使用缓存机制获取字段名(关键!确保跨批字段顺序一致)
+ fieldnames = self._get_and_cache_fieldnames(table, items)
+
+ if not fieldnames:
+ log.warning(f"无法提取字段名,items: {items}")
+ return False
+
+ try:
+ # 获取表级别的锁(关键!保证文件写入安全)
+ lock = self._get_lock(table)
+ with lock:
+ # 检查文件是否已存在且有内容
+ file_exists = self._file_exists_and_has_content(csv_file)
+
+ # 以追加模式打开文件
+ with open(
+ csv_file,
+ "a",
+ encoding="utf-8",
+ newline=""
+ ) as f:
+ writer = csv.DictWriter(f, fieldnames=fieldnames)
+
+ # 如果文件不存在或为空,写入表头
+ if not file_exists:
+ writer.writeheader()
+
+ # 批量写入数据行
+ # 使用缓存的fieldnames确保列顺序一致,避免跨批数据错位
+ writer.writerows(items)
+
+ # 刷新缓冲区到磁盘,确保数据不丢失
+ f.flush()
+ os.fsync(f.fileno())
+
+ # 记录导出日志
+ log.info(
+ f"共导出 {len(items)} 条数据 到 {table}.csv (文件路径: {csv_file})"
+ )
+ return True
+
+ except Exception as e:
+ log.error(
+ f"CSV写入失败. table: {table}, csv_file: {csv_file}, error: {e}"
+ )
+ return False
+
+ def update_items(self, table, items: List[Dict], update_keys=Tuple) -> bool:
+ """
+ 更新数据
+
+ 注意:CSV文件本身不支持真正的"更新"操作(需要查询后替换)。
+ 目前的实现是直接追加写入,相当于INSERT操作。
+
+ 如果需要真正的UPDATE操作,建议:
+ 1. 定期重新生成CSV文件
+ 2. 使用数据库(MySQL/MongoDB)来处理UPDATE
+ 3. 或在应用层进行去重和更新
+
+ Args:
+ table: 表名
+ items: 数据列表,[{}, {}, ...]
+ update_keys: 更新的字段(此实现中未使用)
+
+ Returns:
+ bool: 操作成功返回True
+ """
+ # 对于CSV,update操作实现为追加写入
+ # 若需要真正的UPDATE操作,建议在应用层处理
+ return self.save_items(table, items)
+
+ def close(self):
+ """
+ 关闭Pipeline,释放资源
+
+ 在爬虫结束时由ItemBuffer自动调用。
+ """
+ try:
+ # 清理文件锁字典(可选,用于释放内存)
+ # 在长期运行的场景下,可能需要定期清理
+ pass
+ except Exception as e:
+ log.error(f"关闭CSV Pipeline时出错: {e}")
diff --git a/feapder/pipelines/mysql_pipeline.py b/feapder/pipelines/mysql_pipeline.py
index 8899761b..3ffb3fc1 100644
--- a/feapder/pipelines/mysql_pipeline.py
+++ b/feapder/pipelines/mysql_pipeline.py
@@ -45,6 +45,8 @@ def save_items(self, table, items: List[Dict]) -> bool:
log.info(
"共导出 %s 条数据 到 %s, 重复 %s 条" % (datas_size, table, datas_size - add_count)
)
+ else:
+ log.debug("没有插入数据,可能全部重复")
return add_count != None
diff --git a/feapder/requirements.txt b/feapder/requirements.txt
index 49fc6fbb..21717674 100644
--- a/feapder/requirements.txt
+++ b/feapder/requirements.txt
@@ -16,6 +16,6 @@ urllib3>=1.25.8
loguru>=0.5.3
influxdb>=5.3.1
pyperclip>=1.8.2
-webdriver-manager>=3.5.3
+webdriver-manager>=4.0.0
terminal-layout>=2.1.3
playwright
\ No newline at end of file
diff --git a/feapder/setting.py b/feapder/setting.py
index 5dd18246..c52b318c 100644
--- a/feapder/setting.py
+++ b/feapder/setting.py
@@ -27,12 +27,15 @@
MONGO_DB = os.getenv("MONGO_DB")
MONGO_USER_NAME = os.getenv("MONGO_USER_NAME")
MONGO_USER_PASS = os.getenv("MONGO_USER_PASS")
+MONGO_URL = os.getenv("MONGO_URL")
# REDIS
# ip:port 多个可写为列表或者逗号隔开 如 ip1:port1,ip2:port2 或 ["ip1:port1", "ip2:port2"]
REDISDB_IP_PORTS = os.getenv("REDISDB_IP_PORTS")
REDISDB_USER_PASS = os.getenv("REDISDB_USER_PASS")
REDISDB_DB = int(os.getenv("REDISDB_DB", 0))
+# 连接redis时携带的其他参数,如ssl=True
+REDISDB_KWARGS = dict()
# 适用于redis哨兵模式
REDISDB_SERVICE_NAME = os.getenv("REDISDB_SERVICE_NAME")
@@ -40,8 +43,10 @@
ITEM_PIPELINES = [
"feapder.pipelines.mysql_pipeline.MysqlPipeline",
# "feapder.pipelines.mongo_pipeline.MongoPipeline",
+ # "feapder.pipelines.csv_pipeline.CsvPipeline",
# "feapder.pipelines.console_pipeline.ConsolePipeline",
]
+CSV_EXPORT_PATH = "data/csv" # CSV文件保存路径,支持相对路径和绝对路径
EXPORT_DATA_MAX_FAILED_TIMES = 10 # 导出数据时最大的失败次数,包括保存和更新,超过这个次数报警
EXPORT_DATA_MAX_RETRY_TIMES = 10 # 导出数据时最大的重试次数,包括保存和更新,超过这个次数则放弃重试
@@ -65,7 +70,7 @@
user_agent=None, # 字符串 或 无参函数,返回值为user_agent
proxy=None, # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
headless=False, # 是否为无头浏览器
- driver_type="CHROME", # CHROME、PHANTOMJS、FIREFOX
+ driver_type="CHROME", # CHROME、EDGE、PHANTOMJS、FIREFOX
timeout=30, # 请求超时时间
window_size=(1024, 800), # 窗口大小
executable_path=None, # 浏览器路径,默认为默认路径
@@ -130,6 +135,8 @@
# 设置代理
PROXY_EXTRACT_API = None # 代理提取API ,返回的代理分割符为\r\n
PROXY_ENABLE = True
+PROXY_MAX_FAILED_TIMES = 5 # 代理最大失败次数,超过则不使用,自动删除
+PROXY_POOL = "feapder.network.proxy_pool.ProxyPool" # 代理池
# 随机headers
RANDOM_HEADERS = True
@@ -141,9 +148,9 @@
USE_SESSION = False
# 下载
-DOWNLOADER = "feapder.network.downloader.RequestsDownloader"
+DOWNLOADER = "feapder.network.downloader.RequestsDownloader" # 请求下载器
SESSION_DOWNLOADER = "feapder.network.downloader.RequestsSessionDownloader"
-RENDER_DOWNLOADER = "feapder.network.downloader.SeleniumDownloader"
+RENDER_DOWNLOADER = "feapder.network.downloader.SeleniumDownloader" # 渲染下载器
# RENDER_DOWNLOADER="feapder.network.downloader.PlaywrightDownloader"
MAKE_ABSOLUTE_LINKS = True # 自动转成绝对连接
@@ -161,8 +168,10 @@
# 报警 支持钉钉、飞书、企业微信、邮件
# 钉钉报警
DINGDING_WARNING_URL = "" # 钉钉机器人api
-DINGDING_WARNING_PHONE = "" # 报警人 支持列表,可指定多个
+DINGDING_WARNING_PHONE = "" # 被@的群成员手机号,支持列表,可指定多个。
+DINGDING_WARNING_USER_ID = "" # 被@的群成员userId,支持列表,可指定多个
DINGDING_WARNING_ALL = False # 是否提示所有人, 默认为False
+DINGDING_WARNING_SECRET = None # 加签密钥
# 飞书报警
# https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN#e1cdee9f
FEISHU_WARNING_URL = "" # 飞书机器人api
@@ -177,6 +186,10 @@
WECHAT_WARNING_URL = "" # 企业微信机器人api
WECHAT_WARNING_PHONE = "" # 报警人 将会在群内@此人, 支持列表,可指定多人
WECHAT_WARNING_ALL = False # 是否提示所有人, 默认为False
+# QMSG报警
+QMSG_WARNING_URL = "" # qmsg机器人api
+QMSG_WARNING_QQ = "" # 指定要接收消息的QQ号或者QQ群。多个以英文逗号分割,例如:12345,12346,支持列表,可指定多人
+QMSG_WARNING_BOT = "" # 机器人的QQ号
# 时间间隔
WARNING_INTERVAL = 3600 # 相同报警的报警时间间隔,防止刷屏; 0表示不去重
WARNING_LEVEL = "DEBUG" # 报警级别, DEBUG / INFO / ERROR
diff --git a/feapder/templates/project_template/setting.py b/feapder/templates/project_template/setting.py
index 59b7a04d..140aaa07 100644
--- a/feapder/templates/project_template/setting.py
+++ b/feapder/templates/project_template/setting.py
@@ -16,12 +16,15 @@
# MONGO_DB = ""
# MONGO_USER_NAME = ""
# MONGO_USER_PASS = ""
+# MONGO_URL = "
#
# # REDIS
# # ip:port 多个可写为列表或者逗号隔开 如 ip1:port1,ip2:port2 或 ["ip1:port1", "ip2:port2"]
# REDISDB_IP_PORTS = "localhost:6379"
# REDISDB_USER_PASS = ""
# REDISDB_DB = 0
+# # 连接redis时携带的其他参数,如ssl=True
+# REDISDB_KWARGS = dict()
# # 适用于redis哨兵模式
# REDISDB_SERVICE_NAME = ""
#
@@ -29,8 +32,10 @@
# ITEM_PIPELINES = [
# "feapder.pipelines.mysql_pipeline.MysqlPipeline",
# # "feapder.pipelines.mongo_pipeline.MongoPipeline",
+# # "feapder.pipelines.csv_pipeline.CsvPipeline",
# # "feapder.pipelines.console_pipeline.ConsolePipeline",
# ]
+# CSV_EXPORT_PATH = "data/csv" # CSV文件保存路径,支持相对路径和绝对路径
# EXPORT_DATA_MAX_FAILED_TIMES = 10 # 导出数据时最大的失败次数,包括保存和更新,超过这个次数报警
# EXPORT_DATA_MAX_RETRY_TIMES = 10 # 导出数据时最大的重试次数,包括保存和更新,超过这个次数则放弃重试
#
@@ -46,9 +51,9 @@
# KEEP_ALIVE = False # 爬虫是否常驻
# 下载
-# DOWNLOADER = "feapder.network.downloader.RequestsDownloader"
+# DOWNLOADER = "feapder.network.downloader.RequestsDownloader" # 请求下载器
# SESSION_DOWNLOADER = "feapder.network.downloader.RequestsSessionDownloader"
-# RENDER_DOWNLOADER = "feapder.network.downloader.SeleniumDownloader"
+# RENDER_DOWNLOADER = "feapder.network.downloader.SeleniumDownloader" # 渲染下载器
# # RENDER_DOWNLOADER="feapder.network.downloader.PlaywrightDownloader"
# MAKE_ABSOLUTE_LINKS = True # 自动转成绝对连接
@@ -59,7 +64,7 @@
# user_agent=None, # 字符串 或 无参函数,返回值为user_agent
# proxy=None, # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
# headless=False, # 是否为无头浏览器
-# driver_type="CHROME", # CHROME、PHANTOMJS、FIREFOX
+# driver_type="CHROME", # CHROME、EDGE、PHANTOMJS、FIREFOX
# timeout=30, # 请求超时时间
# window_size=(1024, 800), # 窗口大小
# executable_path=None, # 浏览器路径,默认为默认路径
@@ -119,6 +124,8 @@
# # 设置代理
# PROXY_EXTRACT_API = None # 代理提取API ,返回的代理分割符为\r\n
# PROXY_ENABLE = True
+# PROXY_MAX_FAILED_TIMES = 5 # 代理最大失败次数,超过则不使用,自动删除
+# PROXY_POOL = "feapder.network.proxy_pool.ProxyPool" # 代理池
#
# # 随机headers
# RANDOM_HEADERS = True
@@ -143,8 +150,10 @@
# # 报警 支持钉钉、飞书、企业微信、邮件
# # 钉钉报警
# DINGDING_WARNING_URL = "" # 钉钉机器人api
-# DINGDING_WARNING_PHONE = "" # 报警人 支持列表,可指定多个
+# DINGDING_WARNING_PHONE = "" # 被@的群成员手机号,支持列表,可指定多个。
+# DINGDING_WARNING_USER_ID = "" # 被@的群成员userId,支持列表,可指定多个
# DINGDING_WARNING_ALL = False # 是否提示所有人, 默认为False
+# DINGDING_WARNING_SECRET = None # 加签密钥
# # 飞书报警
# # https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN#e1cdee9f
# FEISHU_WARNING_URL = "" # 飞书机器人api
@@ -159,6 +168,10 @@
# WECHAT_WARNING_URL = "" # 企业微信机器人api
# WECHAT_WARNING_PHONE = "" # 报警人 将会在群内@此人, 支持列表,可指定多人
# WECHAT_WARNING_ALL = False # 是否提示所有人, 默认为False
+# # QMSG报警
+# QMSG_WARNING_URL = "" # qmsg机器人api
+# QMSG_WARNING_QQ = "" # 指定要接收消息的QQ号或者QQ群。多个以英文逗号分割,例如:12345,12346,支持列表,可指定多人
+# QMSG_WARNING_BOT = "" # 机器人的QQ号
# # 时间间隔
# WARNING_INTERVAL = 3600 # 相同报警的报警时间间隔,防止刷屏; 0表示不去重
# WARNING_LEVEL = "DEBUG" # 报警级别, DEBUG / INFO / ERROR
diff --git a/feapder/utils/log.py b/feapder/utils/log.py
index 2d25ad20..e993f760 100644
--- a/feapder/utils/log.py
+++ b/feapder/utils/log.py
@@ -67,7 +67,6 @@ def doRollover(self):
self.stream = self._open()
def shouldRollover(self, record):
-
if self.stream is None: # delay was set...
self.stream = self._open()
if self.max_bytes > 0: # are we rolling over?
@@ -225,6 +224,13 @@ def get_logger(
class Log:
log = None
+ def func(self, log_level):
+ def wrapper(msg, *args, **kwargs):
+ if self.isEnabledFor(log_level):
+ self._log(log_level, msg, args, **kwargs)
+
+ return wrapper
+
def __getattr__(self, name):
# 调用log时再初始化,为了加载最新的setting
if self.__class__.log is None:
@@ -239,6 +245,12 @@ def debug(self):
def info(self):
return self.__class__.log.info
+ @property
+ def success(self):
+ log_level = logging.INFO + 1
+ logging.addLevelName(log_level, "success".upper())
+ return self.func(log_level)
+
@property
def warning(self):
return self.__class__.log.warning
diff --git a/feapder/utils/metrics.py b/feapder/utils/metrics.py
index 0594769e..ab88ee1e 100644
--- a/feapder/utils/metrics.py
+++ b/feapder/utils/metrics.py
@@ -72,6 +72,19 @@ def define_tagkv(self, tagk, tagvs):
def _point_tagset(self, p):
return f"{p['measurement']}-{sorted(p['tags'].items())}-{p['time']}"
+ def _make_time_to_ns(self, _time):
+ """
+ 将时间转换为 ns 级别的时间戳,补足长度 19 位
+ Args:
+ _time:
+
+ Returns:
+
+ """
+ time_len = len(str(_time))
+ random_str = "".join(random.sample(string.digits, 19 - time_len))
+ return int(str(_time) + random_str)
+
def _accumulate_points(self, points):
"""
对于处于同一个 key 的点做聚合
@@ -102,18 +115,18 @@ def _accumulate_points(self, points):
continue
# 增加 _seq tag,以便区分不同的点
point["tags"]["_seq"] = timer_seqs[tagset]
+ point["time"] = self._make_time_to_ns(point["time"])
timer_seqs[tagset] += 1
new_points.append(point)
else:
if self.ratio < 1.0 and random.random() > self.ratio:
continue
+ point["time"] = self._make_time_to_ns(point["time"])
new_points.append(point)
for point in counters.values():
# 修改下counter类型的点的时间戳,补足19位, 伪装成纳秒级时间戳,防止influxdb对同一秒内的数据进行覆盖
- time_len = len(str(point["time"]))
- random_str = "".join(random.sample(string.digits, 19 - time_len))
- point["time"] = int(str(point["time"]) + random_str)
+ point["time"] = self._make_time_to_ns(point["time"])
new_points.append(point)
# 把拟合后的 counter 值添加进来
@@ -306,6 +319,8 @@ def init(
use_udp=False,
timeout=22,
ssl=False,
+ retention_policy_replication: str = "1",
+ set_retention_policy_default=True,
**kwargs,
):
"""
@@ -326,6 +341,8 @@ def init(
use_udp: 是否使用udp协议打点
timeout: 与influxdb建立连接时的超时时间
ssl: 是否使用https协议
+ retention_policy_replication: 保留策略的副本数, 确保数据的可靠性和高可用性。如果一个节点发生故障,其他节点可以继续提供服务,从而避免数据丢失和服务不可用的情况
+ set_retention_policy_default: 是否设置为默认的保留策略,当retention_policy初次创建时有效
**kwargs: 可传递MetricsEmitter类的参数
Returns:
@@ -376,8 +393,8 @@ def init(
influxdb_client.create_retention_policy(
retention_policy,
retention_policy_duration,
- replication="1",
- default=True,
+ replication=retention_policy_replication,
+ default=set_retention_policy_default,
)
except Exception as e:
log.error("metrics init falied: {}".format(e))
@@ -410,7 +427,7 @@ def emit_any(
fields: influxdb的field的字段和值
classify: 点的类别
measurement: 存储的表
- timestamp: 点的时间搓,默认为当前时间
+ timestamp: 点的时间戳,默认为当前时间
Returns:
@@ -441,7 +458,7 @@ def emit_counter(
classify: 点的类别
tags: influxdb的tag的字段和值
measurement: 存储的表
- timestamp: 点的时间搓,默认为当前时间
+ timestamp: 点的时间戳,默认为当前时间
Returns:
@@ -472,7 +489,7 @@ def emit_timer(
classify: 点的类别
tags: influxdb的tag的字段和值
measurement: 存储的表
- timestamp: 点的时间搓,默认为当前时间
+ timestamp: 点的时间戳,默认为当前时间
Returns:
@@ -503,7 +520,7 @@ def emit_store(
classify: 点的类别
tags: influxdb的tag的字段和值
measurement: 存储的表
- timestamp: 点的时间搓,默认为当前时间
+ timestamp: 点的时间戳,默认为当前时间
Returns:
diff --git a/feapder/utils/redis_lock.py b/feapder/utils/redis_lock.py
index 8c0aed47..9df0b85d 100644
--- a/feapder/utils/redis_lock.py
+++ b/feapder/utils/redis_lock.py
@@ -62,7 +62,7 @@ def __enter__(self):
if self.locked:
# 延长锁的时间
thread = threading.Thread(target=self.prolong_life)
- thread.setDaemon(True)
+ thread.daemon = True
thread.start()
return self
@@ -83,11 +83,12 @@ def acquire(self):
if self.wait_timeout > 0:
if time.time() - start > self.wait_timeout:
- log.info("加锁失败")
+ log.debug("获取锁失败")
break
else:
+ log.debug("获取锁失败")
break
- log.debug("等待加锁: {} wait:{}".format(self, time.time() - start))
+ log.debug("等待锁: {} wait:{}".format(self, time.time() - start))
if self.wait_timeout > 10:
time.sleep(5)
else:
diff --git a/feapder/utils/tail_thread.py b/feapder/utils/tail_thread.py
new file mode 100644
index 00000000..eda266d5
--- /dev/null
+++ b/feapder/utils/tail_thread.py
@@ -0,0 +1,33 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2024/3/19 20:00
+---------
+@summary:
+---------
+@author: Boris
+@email: boris_liu@foxmail.com
+"""
+import sys
+import threading
+
+
+class TailThread(threading.Thread):
+ """
+ 所有子线程结束后,主线程才会退出
+ """
+
+ def start(self) -> None:
+ """
+ 解决python3.12 RuntimeError: cannot join thread before it is started的报错
+ """
+ super().start()
+
+ if sys.version_info.minor >= 12 and sys.version_info.major >= 3:
+ for thread in threading.enumerate():
+ if (
+ thread.daemon
+ or thread is threading.current_thread()
+ or not thread.is_alive()
+ ):
+ continue
+ thread.join()
diff --git a/feapder/utils/tools.py b/feapder/utils/tools.py
index b55fcdea..31952876 100644
--- a/feapder/utils/tools.py
+++ b/feapder/utils/tools.py
@@ -15,6 +15,7 @@
import datetime
import functools
import hashlib
+import hmac
import html
import importlib
import json
@@ -507,7 +508,8 @@ def fit_url(urls, identis):
def get_param(url, key):
- match = re.search(f"{key}=([^&]+)", url)
+ pattern = r"(?:[?&])" + re.escape(key) + r"=([^&]+)"
+ match = re.search(pattern, url)
if match:
return match.group(1)
return None
@@ -2466,12 +2468,43 @@ def reach_freq_limit(rate_limit, *key):
def dingding_warning(
- message, message_prefix=None, rate_limit=None, url=None, user_phone=None
+ message,
+ *,
+ message_prefix=None,
+ rate_limit=None,
+ url=None,
+ user_phone=None,
+ user_id=None,
+ secret=None,
):
+ """
+ 钉钉报警,user_phone与user_id 二选一即可
+ Args:
+ message:
+ message_prefix: 消息摘要,用于去重
+ rate_limit: 包名频率,单位秒,相同的报警内容在rate_limit时间内只会报警一次
+ url: 钉钉报警url
+ user_phone: 被@的群成员手机号,支持列表,可指定多个。
+ user_id: 被@的群成员userId,支持列表,可指定多个
+ secret: 钉钉报警加签密钥
+ Returns:
+
+ """
# 为了加载最新的配置
rate_limit = rate_limit if rate_limit is not None else setting.WARNING_INTERVAL
url = url or setting.DINGDING_WARNING_URL
user_phone = user_phone or setting.DINGDING_WARNING_PHONE
+ user_id = user_id or setting.DINGDING_WARNING_USER_ID
+ secret = secret or setting.DINGDING_WARNING_SECRET
+ if secret:
+ timestamp = str(round(time.time() * 1000))
+ secret_enc = secret.encode("utf-8")
+ string_to_sign_enc = f"{timestamp}\n{secret}".encode("utf-8")
+ hmac_code = hmac.new(
+ secret_enc, string_to_sign_enc, digestmod=hashlib.sha256
+ ).digest()
+ sign = urllib.parse.quote_plus(base64.b64encode(hmac_code))
+ url = f"{url}×tamp={timestamp}&sign={sign}"
if not all([url, message]):
return
@@ -2483,10 +2516,17 @@ def dingding_warning(
if isinstance(user_phone, str):
user_phone = [user_phone] if user_phone else []
+ if isinstance(user_id, str):
+ user_id = [user_id] if user_id else []
+
data = {
"msgtype": "text",
"text": {"content": message},
- "at": {"atMobiles": user_phone, "isAtAll": setting.DINGDING_WARNING_ALL},
+ "at": {
+ "atMobiles": user_phone,
+ "atUserIds": user_id,
+ "isAtAll": setting.DINGDING_WARNING_ALL,
+ },
}
headers = {"Content-Type": "application/json"}
@@ -2675,13 +2715,61 @@ def feishu_warning(message, message_prefix=None, rate_limit=None, url=None, user
return False
-def send_msg(msg, level="DEBUG", message_prefix=""):
+def qmsg_warning(
+ message,
+ message_prefix=None,
+ rate_limit=None,
+ url=None,
+ user_qq=None,
+ bot_qq=None
+):
+ """qmsg报警"""
+
+ # 为了加载最新的配置
+ rate_limit = rate_limit if rate_limit is not None else setting.WARNING_INTERVAL
+ url = url or setting.QMSG_WARNING_URL
+ user_qq = user_qq or setting.QMSG_WARNING_QQ
+ bot_qq = bot_qq or setting.QMSG_WARNING_BOT
+
+ if isinstance(user_qq, list):
+ user_qq = ','.join(map(str, user_qq))
+
+ if not all([url, message]):
+ return
+
+ if reach_freq_limit(rate_limit, url, user_qq, message_prefix or message):
+ log.info("报警时间间隔过短,此次报警忽略。 内容 {}".format(message))
+ return
+
+ data = {
+ "msg": message,
+ "qq": user_qq,
+ "bot": bot_qq,
+ }
+
+ headers = {"Content-Type": "application/json"}
+
+ try:
+ response = requests.post(
+ url, headers=headers, data=json.dumps(data).encode("utf8")
+ )
+ result = response.json()
+ response.close()
+ if result.get("code") == 0:
+ return True
+ else:
+ raise Exception(result.get("reason"))
+ except Exception as e:
+ log.error("报警发送失败。 报警内容 {}, error: {}".format(message, e))
+ return False
+
+
+def send_msg(msg, level="DEBUG", message_prefix="", keyword="feapder报警系统\n"):
if setting.WARNING_LEVEL == "ERROR":
if level.upper() != "ERROR":
return
if setting.DINGDING_WARNING_URL:
- keyword = "feapder报警系统\n"
dingding_warning(keyword + msg, message_prefix=message_prefix)
if setting.EMAIL_RECEIVER:
@@ -2691,13 +2779,14 @@ def send_msg(msg, level="DEBUG", message_prefix=""):
email_warning(msg, message_prefix=message_prefix, title=title)
if setting.WECHAT_WARNING_URL:
- keyword = "feapder报警系统\n"
wechat_warning(keyword + msg, message_prefix=message_prefix)
if setting.FEISHU_WARNING_URL:
- keyword = "feapder报警系统\n"
feishu_warning(keyword + msg, message_prefix=message_prefix)
+ if setting.QMSG_WARNING_URL:
+ qmsg_warning(keyword + msg, message_prefix=message_prefix)
+
###################
diff --git a/feapder/utils/webdriver/playwright_driver.py b/feapder/utils/webdriver/playwright_driver.py
index 0d445c06..fe7e5062 100644
--- a/feapder/utils/webdriver/playwright_driver.py
+++ b/feapder/utils/webdriver/playwright_driver.py
@@ -59,7 +59,7 @@ def __init__(
self.url = None
self.storage_state_path = storage_state_path
- self._driver_type = driver_type
+ self._driver_type = driver_type or "chromium"
self._page_on_event_callback = page_on_event_callback
self._url_regexes = url_regexes
self._save_all = save_all
diff --git a/feapder/utils/webdriver/selenium_driver.py b/feapder/utils/webdriver/selenium_driver.py
index 594a029c..9f46d54b 100644
--- a/feapder/utils/webdriver/selenium_driver.py
+++ b/feapder/utils/webdriver/selenium_driver.py
@@ -29,6 +29,7 @@
class SeleniumDriver(WebDriver, RemoteWebDriver):
CHROME = "CHROME"
+ EDGE = "EDGE"
PHANTOMJS = "PHANTOMJS"
FIREFOX = "FIREFOX"
@@ -43,6 +44,8 @@ class SeleniumDriver(WebDriver, RemoteWebDriver):
"keep_alive",
}
+ __EDGE_ATTRS__ = __CHROME_ATTRS__
+
__FIREFOX_ATTRS__ = {
"firefox_profile",
"firefox_binary",
@@ -75,6 +78,7 @@ def __init__(self, xhr_url_regexes: list = None, **kwargs):
"""
super(SeleniumDriver, self).__init__(**kwargs)
self._xhr_url_regexes = xhr_url_regexes
+ self._driver_type = self._driver_type or SeleniumDriver.CHROME
if self._xhr_url_regexes and self._driver_type != SeleniumDriver.CHROME:
raise Exception(
@@ -84,6 +88,9 @@ def __init__(self, xhr_url_regexes: list = None, **kwargs):
if self._driver_type == SeleniumDriver.CHROME:
self.driver = self.chrome_driver()
+ elif self._driver_type == SeleniumDriver.EDGE:
+ self.driver = self.edge_driver()
+
elif self._driver_type == SeleniumDriver.PHANTOMJS:
self.driver = self.phantomjs_driver()
@@ -128,9 +135,18 @@ def get_driver(self):
return self.driver
def firefox_driver(self):
+ if webdriver.__version__ >= "4.0.0":
+ raise Exception(
+ f"暂未适配selenium=={webdriver.__version__}版本的firefox API,建议安装selenium==3.141.0版本或使用CHROME浏览器"
+ )
+
firefox_profile = webdriver.FirefoxProfile()
firefox_options = webdriver.FirefoxOptions()
firefox_capabilities = webdriver.DesiredCapabilities.FIREFOX
+ try:
+ from selenium.webdriver.firefox.service import Service
+ except (ImportError, ModuleNotFoundError):
+ Service = None
if self._proxy:
proxy = self._proxy() if callable(self._proxy) else self._proxy
@@ -162,10 +178,16 @@ def firefox_driver(self):
kwargs = self.filter_kwargs(self._kwargs, self.__FIREFOX_ATTRS__)
- if self._executable_path:
- kwargs.update(executable_path=self._executable_path)
- elif self._auto_install_driver:
- kwargs.update(executable_path=GeckoDriverManager().install())
+ if Service is None:
+ if self._executable_path:
+ kwargs.update(executable_path=self._executable_path)
+ elif self._auto_install_driver:
+ kwargs.update(executable_path=GeckoDriverManager().install())
+ else:
+ if self._executable_path:
+ kwargs.update(service=Service(self._executable_path))
+ elif self._auto_install_driver:
+ kwargs.update(service=Service(GeckoDriverManager().install()))
driver = webdriver.Firefox(
capabilities=firefox_capabilities,
@@ -186,6 +208,10 @@ def chrome_driver(self):
chrome_options.add_experimental_option("useAutomationExtension", False)
# docker 里运行需要
chrome_options.add_argument("--no-sandbox")
+ try:
+ from selenium.webdriver.chrome.service import Service
+ except (ImportError, ModuleNotFoundError):
+ Service = None
if self._proxy:
chrome_options.add_argument(
@@ -229,10 +255,16 @@ def chrome_driver(self):
chrome_options.add_argument(arg)
kwargs = self.filter_kwargs(self._kwargs, self.__CHROME_ATTRS__)
- if self._executable_path:
- kwargs.update(executable_path=self._executable_path)
- elif self._auto_install_driver:
- kwargs.update(executable_path=ChromeDriverManager().install())
+ if Service is None:
+ if self._executable_path:
+ kwargs.update(executable_path=self._executable_path)
+ elif self._auto_install_driver:
+ kwargs.update(executable_path=ChromeDriverManager().install())
+ else:
+ if self._executable_path:
+ kwargs.update(service=Service(self._executable_path))
+ elif self._auto_install_driver:
+ kwargs.update(service=Service(ChromeDriverManager().install()))
driver = webdriver.Chrome(options=chrome_options, **kwargs)
@@ -273,6 +305,110 @@ def chrome_driver(self):
return driver
+ def edge_driver(self):
+ edge_options = webdriver.EdgeOptions()
+ # 此步骤很重要,设置为开发者模式,防止被各大网站识别出来使用了Selenium
+ edge_options.add_experimental_option("excludeSwitches", ["enable-automation"])
+ edge_options.add_experimental_option("useAutomationExtension", False)
+ # docker 里运行需要
+ edge_options.add_argument("--no-sandbox")
+ try:
+ from selenium.webdriver.edge.service import Service
+ except (ImportError, ModuleNotFoundError):
+ Service = None
+
+ if self._proxy:
+ edge_options.add_argument(
+ "--proxy-server={}".format(
+ self._proxy() if callable(self._proxy) else self._proxy
+ )
+ )
+ if self._user_agent:
+ edge_options.add_argument(
+ "user-agent={}".format(
+ self._user_agent()
+ if callable(self._user_agent)
+ else self._user_agent
+ )
+ )
+ if not self._load_images:
+ edge_options.add_experimental_option(
+ "prefs", {"profile.managed_default_content_settings.images": 2}
+ )
+
+ if self._headless:
+ edge_options.add_argument("--headless")
+ edge_options.add_argument("--disable-gpu")
+
+ if self._window_size:
+ edge_options.add_argument(
+ "--window-size={},{}".format(self._window_size[0], self._window_size[1])
+ )
+
+ if self._download_path:
+ os.makedirs(self._download_path, exist_ok=True)
+ prefs = {
+ "download.prompt_for_download": False,
+ "download.default_directory": self._download_path,
+ }
+ edge_options.add_experimental_option("prefs", prefs)
+
+ # 添加自定义的配置参数
+ if self._custom_argument:
+ for arg in self._custom_argument:
+ edge_options.add_argument(arg)
+
+ kwargs = self.filter_kwargs(self._kwargs, self.__CHROME_ATTRS__)
+ if Service is None:
+ if self._executable_path:
+ kwargs.update(executable_path=self._executable_path)
+ elif self._auto_install_driver:
+ raise NotImplementedError("edge not support auto install driver")
+ else:
+ if self._executable_path:
+ kwargs.update(service=Service(self._executable_path))
+ elif self._auto_install_driver:
+ raise NotImplementedError("edge not support auto install driver")
+
+ driver = webdriver.Edge(options=edge_options, **kwargs)
+
+ # 隐藏浏览器特征
+ if self._use_stealth_js:
+ with open(
+ os.path.join(os.path.dirname(__file__), "../js/stealth.min.js")
+ ) as f:
+ js = f.read()
+ driver.execute_cdp_cmd(
+ "Page.addScriptToEvaluateOnNewDocument", {"source": js}
+ )
+
+ if self._xhr_url_regexes:
+ assert isinstance(self._xhr_url_regexes, list)
+ with open(
+ os.path.join(os.path.dirname(__file__), "../js/intercept.js")
+ ) as f:
+ js = f.read()
+ driver.execute_cdp_cmd(
+ "Page.addScriptToEvaluateOnNewDocument", {"source": js}
+ )
+ js = f"window.__urlRegexes = {self._xhr_url_regexes}"
+ driver.execute_cdp_cmd(
+ "Page.addScriptToEvaluateOnNewDocument", {"source": js}
+ )
+
+ if self._download_path:
+ driver.command_executor._commands["send_command"] = (
+ "POST",
+ "/session/$sessionId/chromium/send_command",
+ )
+ params = {
+ "cmd": "Page.setDownloadBehavior",
+ "params": {"behavior": "allow", "downloadPath": self._download_path},
+ }
+ driver.execute("send_command", params)
+
+ return driver
+
def phantomjs_driver(self):
import warnings
diff --git a/feapder/utils/webdriver/webdirver.py b/feapder/utils/webdriver/webdirver.py
index bfc38704..8fa2a34e 100644
--- a/feapder/utils/webdriver/webdirver.py
+++ b/feapder/utils/webdriver/webdirver.py
@@ -52,7 +52,7 @@ def __init__(
user_agent: 字符串 或 无参函数,返回值为user_agent
proxy: xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
headless: 是否启用无头模式
- driver_type: CHROME 或 PHANTOMJS,FIREFOX
+ driver_type: CHROME,EDGE 或 PHANTOMJS,FIREFOX
timeout: 请求超时时间
window_size: # 窗口大小
executable_path: 浏览器路径,默认为默认路径
diff --git a/setup.py b/setup.py
index a30cc072..cf4fe542 100644
--- a/setup.py
+++ b/setup.py
@@ -42,20 +42,26 @@
"requests>=2.22.0",
"bs4>=0.0.1",
"ipython>=7.14.0",
- "redis-py-cluster>=2.1.0",
"cryptography>=3.3.2",
- "selenium>=3.141.0",
- "pymongo>=3.10.1",
"urllib3>=1.25.8",
"loguru>=0.5.3",
"influxdb>=5.3.1",
"pyperclip>=1.8.2",
- "webdriver-manager>=3.5.3",
"terminal-layout>=2.1.3",
+]
+
+render_requires = [
+ "webdriver-manager>=4.0.0",
"playwright",
+ "selenium>=3.141.0",
]
-extras_requires = ["bitarray>=1.5.3", "PyExecJS>=1.5.1"]
+all_requires = [
+ "bitarray>=1.5.3",
+ "PyExecJS>=1.5.1",
+ "pymongo>=3.10.1",
+ "redis-py-cluster>=2.1.0",
+] + render_requires
setuptools.setup(
name="feapder",
@@ -64,11 +70,11 @@
license="MIT",
author_email="feapder@qq.com",
python_requires=">=3.6",
- description="feapder是一款支持分布式、批次采集、任务防丢、报警丰富的python爬虫框架",
+ description="feapder是一款支持分布式、批次采集、数据防丢、报警丰富的python爬虫框架",
long_description=long_description,
long_description_content_type="text/markdown",
install_requires=requires,
- extras_require={"all": extras_requires},
+ extras_require={"all": all_requires, "render": render_requires},
entry_points={"console_scripts": ["feapder = feapder.commands.cmdline:execute"]},
url="https://github.com/Boris-code/feapder.git",
packages=packages,
diff --git a/tests/air-spider/test_air_spider.py b/tests/air-spider/test_air_spider.py
index 90301075..597bfe48 100644
--- a/tests/air-spider/test_air_spider.py
+++ b/tests/air-spider/test_air_spider.py
@@ -24,7 +24,7 @@ def end_callback(self):
print("爬虫结束")
def start_requests(self, *args, **kws):
- for i in range(200):
+ for i in range(1):
print(i)
yield feapder.Request("https://www.baidu.com")
diff --git a/tests/air-spider/test_render_spider.py b/tests/air-spider/test_render_spider.py
new file mode 100644
index 00000000..3067a443
--- /dev/null
+++ b/tests/air-spider/test_render_spider.py
@@ -0,0 +1,29 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2020/4/22 10:41 PM
+---------
+@summary:
+---------
+@author: Boris
+@email: boris_liu@foxmail.com
+"""
+
+import feapder
+
+
+class TestAirSpider(feapder.AirSpider):
+ def start_requests(self, *args, **kws):
+ yield feapder.Request("https://www.baidu.com", render=True)
+
+ # def download_midware(self, request):
+ # request.proxies = {
+ # "http": "http://xxx.xxx.xxx.xxx:8888",
+ # "https": "http://xxx.xxx.xxx.xxx:8888",
+ # }
+
+ def parse(self, request, response):
+ print(response.bs4().title)
+
+
+if __name__ == "__main__":
+ TestAirSpider(thread_count=1).start()
diff --git a/tests/batch-spider/spiders/test_spider.py b/tests/batch-spider/spiders/test_spider.py
index bc213e78..684961bb 100644
--- a/tests/batch-spider/spiders/test_spider.py
+++ b/tests/batch-spider/spiders/test_spider.py
@@ -18,7 +18,7 @@ class TestSpider(feapder.BatchSpider):
def start_requests(self, task):
# task 为在任务表中取出的每一条任务
id, url = task # id, url为所取的字段,main函数中指定的
- yield feapder.Request(url, task_id=id)
+ yield feapder.Request(url, task_id=id, render=True) # task_id为任务id,用于更新任务状态
def parse(self, request, response):
title = response.xpath('//title/text()').extract_first() # 取标题
diff --git a/tests/task-spider/test_task_spider.py b/tests/task-spider/test_task_spider.py
index 8fba0931..3a361633 100644
--- a/tests/task-spider/test_task_spider.py
+++ b/tests/task-spider/test_task_spider.py
@@ -13,7 +13,7 @@
class TestTaskSpider(feapder.TaskSpider):
def add_task(self):
- # 加种子任务
+ # 加种子任务 框架会调用这个函数,方便往redis里塞任务,但不能写成死循环。实际业务中可以自己写个脚本往redis里塞任务
self._redisdb.zadd(self._task_table, {"id": 1, "url": "https://www.baidu.com"})
def start_requests(self, task):
@@ -40,7 +40,6 @@ def start(args):
task_keys=["id", "url"],
redis_key="test:task_spider",
keep_alive=True,
- delete_keys=True,
)
if args == 1:
spider.start_monitor_task()
@@ -56,8 +55,8 @@ def start2(args):
task_table="spider_task2",
task_table_type="redis",
redis_key="test:task_spider",
- keep_alive=False,
- delete_keys=True,
+ keep_alive=True,
+ use_mysql=False,
)
if args == 1:
spider.start_monitor_task()
@@ -68,8 +67,12 @@ def start2(args):
if __name__ == "__main__":
parser = ArgumentParser(description="测试TaskSpider")
- parser.add_argument("--start", type=int, nargs=1, help="用mysql做种子表 (1|2)", function=start)
- parser.add_argument("--start2", type=int, nargs=1, help="用redis做种子表 (1|2)", function=start2)
+ parser.add_argument(
+ "--start", type=int, nargs=1, help="用mysql做种子表 (1|2)", function=start
+ )
+ parser.add_argument(
+ "--start2", type=int, nargs=1, help="用redis做种子表 (1|2)", function=start2
+ )
parser.start()
diff --git a/tests/test-debugger/README.md b/tests/test-debugger/README.md
new file mode 100644
index 00000000..c160ae2c
--- /dev/null
+++ b/tests/test-debugger/README.md
@@ -0,0 +1,8 @@
+# xxx爬虫文档
+## 调研
+
+## 数据库设计
+
+## 爬虫逻辑
+
+## 项目架构
\ No newline at end of file
diff --git a/tests/test-debugger/items/__init__.py b/tests/test-debugger/items/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/test-debugger/main.py b/tests/test-debugger/main.py
new file mode 100644
index 00000000..929f347b
--- /dev/null
+++ b/tests/test-debugger/main.py
@@ -0,0 +1,19 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2023-06-09 20:26:29
+---------
+@summary: 爬虫入口
+---------
+@author: Boris
+"""
+
+import feapder
+
+from spiders import *
+
+
+if __name__ == "__main__":
+ test_debugger.TestDebugger.to_DebugSpider(
+ request=feapder.Request("https://spidertools.cn", render=True),
+ redis_key="test:xxx",
+ ).start()
diff --git a/tests/test-debugger/setting.py b/tests/test-debugger/setting.py
new file mode 100644
index 00000000..2191f57c
--- /dev/null
+++ b/tests/test-debugger/setting.py
@@ -0,0 +1,185 @@
+# -*- coding: utf-8 -*-
+"""爬虫配置文件"""
+# import os
+# import sys
+#
+# # MYSQL
+# MYSQL_IP = "localhost"
+# MYSQL_PORT = 3306
+# MYSQL_DB = ""
+# MYSQL_USER_NAME = ""
+# MYSQL_USER_PASS = ""
+#
+# # MONGODB
+# MONGO_IP = "localhost"
+# MONGO_PORT = 27017
+# MONGO_DB = ""
+# MONGO_USER_NAME = ""
+# MONGO_USER_PASS = ""
+#
+# # REDIS
+# # ip:port 多个可写为列表或者逗号隔开 如 ip1:port1,ip2:port2 或 ["ip1:port1", "ip2:port2"]
+# REDISDB_IP_PORTS = "localhost:6379"
+# REDISDB_USER_PASS = ""
+# REDISDB_DB = 0
+# # 连接redis时携带的其他参数,如ssl=True
+# REDISDB_KWARGS = dict()
+# # 适用于redis哨兵模式
+# REDISDB_SERVICE_NAME = ""
+#
+# # 数据入库的pipeline,可自定义,默认MysqlPipeline
+# ITEM_PIPELINES = [
+# "feapder.pipelines.mysql_pipeline.MysqlPipeline",
+# # "feapder.pipelines.mongo_pipeline.MongoPipeline",
+# # "feapder.pipelines.console_pipeline.ConsolePipeline",
+# ]
+# EXPORT_DATA_MAX_FAILED_TIMES = 10 # 导出数据时最大的失败次数,包括保存和更新,超过这个次数报警
+# EXPORT_DATA_MAX_RETRY_TIMES = 10 # 导出数据时最大的重试次数,包括保存和更新,超过这个次数则放弃重试
+#
+# # 爬虫相关
+# # COLLECTOR
+# COLLECTOR_TASK_COUNT = 32 # 每次获取任务数量,追求速度推荐32
+#
+# # SPIDER
+# SPIDER_THREAD_COUNT = 1 # 爬虫并发数,追求速度推荐32
+# # 下载时间间隔 单位秒。 支持随机 如 SPIDER_SLEEP_TIME = [2, 5] 则间隔为 2~5秒之间的随机数,包含2和5
+# SPIDER_SLEEP_TIME = 0
+# SPIDER_MAX_RETRY_TIMES = 10 # 每个请求最大重试次数
+# KEEP_ALIVE = False # 爬虫是否常驻
+
+# 下载
+# DOWNLOADER = "feapder.network.downloader.RequestsDownloader"
+# SESSION_DOWNLOADER = "feapder.network.downloader.RequestsSessionDownloader"
+# RENDER_DOWNLOADER = "feapder.network.downloader.SeleniumDownloader"
+# # RENDER_DOWNLOADER="feapder.network.downloader.PlaywrightDownloader"
+# MAKE_ABSOLUTE_LINKS = True # 自动转成绝对连接
+
+# # 浏览器渲染
+WEBDRIVER = dict(
+ pool_size=1, # 浏览器的数量
+ load_images=True, # 是否加载图片
+ user_agent=None, # 字符串 或 无参函数,返回值为user_agent
+ proxy=None, # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
+ headless=False, # 是否为无头浏览器
+ driver_type="CHROME", # CHROME、EDGE、PHANTOMJS、FIREFOX
+ timeout=30, # 请求超时时间
+ window_size=(1024, 800), # 窗口大小
+ executable_path=None, # 浏览器路径,默认为默认路径
+ render_time=0, # 渲染时长,即打开网页等待指定时间后再获取源码
+ custom_argument=[
+ "--ignore-certificate-errors",
+ "--disable-blink-features=AutomationControlled",
+ ], # 自定义浏览器渲染参数
+ xhr_url_regexes=None, # 拦截xhr接口,支持正则,数组类型
+ auto_install_driver=True, # 自动下载浏览器驱动 支持chrome 和 firefox
+ download_path=None, # 下载文件的路径
+ use_stealth_js=False, # 使用stealth.min.js隐藏浏览器特征
+)
+
+# PLAYWRIGHT = dict(
+# user_agent=None, # 字符串 或 无参函数,返回值为user_agent
+# proxy=None, # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
+# headless=False, # 是否为无头浏览器
+# driver_type="chromium", # chromium、firefox、webkit
+# timeout=30, # 请求超时时间
+# window_size=(1024, 800), # 窗口大小
+# executable_path=None, # 浏览器路径,默认为默认路径
+# download_path=None, # 下载文件的路径
+# render_time=0, # 渲染时长,即打开网页等待指定时间后再获取源码
+# wait_until="networkidle", # 等待页面加载完成的事件,可选值:"commit", "domcontentloaded", "load", "networkidle"
+# use_stealth_js=False, # 使用stealth.min.js隐藏浏览器特征
+# page_on_event_callback=None, # page.on() 事件的回调 如 page_on_event_callback={"dialog": lambda dialog: dialog.accept()}
+# storage_state_path=None, # 保存浏览器状态的路径
+# url_regexes=None, # 拦截接口,支持正则,数组类型
+# save_all=False, # 是否保存所有拦截的接口, 配合url_regexes使用,为False时只保存最后一次拦截的接口
+# )
+#
+# # 爬虫启动时,重新抓取失败的requests
+# RETRY_FAILED_REQUESTS = False
+# # 爬虫启动时,重新入库失败的item
+# RETRY_FAILED_ITEMS = False
+# # 保存失败的request
+# SAVE_FAILED_REQUEST = True
+# # request防丢机制。(指定的REQUEST_LOST_TIMEOUT时间内request还没做完,会重新下发 重做)
+# REQUEST_LOST_TIMEOUT = 600 # 10分钟
+# # request网络请求超时时间
+# REQUEST_TIMEOUT = 22 # 等待服务器响应的超时时间,浮点数,或(connect timeout, read timeout)元组
+# # item在内存队列中最大缓存数量
+# ITEM_MAX_CACHED_COUNT = 5000
+# # item每批入库的最大数量
+# ITEM_UPLOAD_BATCH_MAX_SIZE = 1000
+# # item入库时间间隔
+# ITEM_UPLOAD_INTERVAL = 1
+# # 内存任务队列最大缓存的任务数,默认不限制;仅对AirSpider有效。
+# TASK_MAX_CACHED_SIZE = 0
+#
+# # 下载缓存 利用redis缓存,但由于内存大小限制,所以建议仅供开发调试代码时使用,防止每次debug都需要网络请求
+# RESPONSE_CACHED_ENABLE = False # 是否启用下载缓存 成本高的数据或容易变需求的数据,建议设置为True
+# RESPONSE_CACHED_EXPIRE_TIME = 3600 # 缓存时间 秒
+# RESPONSE_CACHED_USED = False # 是否使用缓存 补采数据时可设置为True
+#
+# # 设置代理
+# PROXY_EXTRACT_API = None # 代理提取API ,返回的代理分割符为\r\n
+# PROXY_ENABLE = True
+#
+# # 随机headers
+# RANDOM_HEADERS = True
+# # UserAgent类型 支持 'chrome', 'opera', 'firefox', 'internetexplorer', 'safari','mobile' 若不指定则随机类型
+# USER_AGENT_TYPE = "chrome"
+# # 默认使用的浏览器头
+# DEFAULT_USERAGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
+# # requests 使用session
+# USE_SESSION = False
+#
+# # 去重
+# ITEM_FILTER_ENABLE = False # item 去重
+# REQUEST_FILTER_ENABLE = False # request 去重
+# ITEM_FILTER_SETTING = dict(
+# filter_type=1 # 永久去重(BloomFilter) = 1 、内存去重(MemoryFilter) = 2、 临时去重(ExpireFilter)= 3、轻量去重(LiteFilter)= 4
+# )
+# REQUEST_FILTER_SETTING = dict(
+# filter_type=3, # 永久去重(BloomFilter) = 1 、内存去重(MemoryFilter) = 2、 临时去重(ExpireFilter)= 3、 轻量去重(LiteFilter)= 4
+# expire_time=2592000, # 过期时间1个月
+# )
+#
+# # 报警 支持钉钉、飞书、企业微信、邮件
+# # 钉钉报警
+# DINGDING_WARNING_URL = "" # 钉钉机器人api
+# DINGDING_WARNING_PHONE = "" # 报警人 支持列表,可指定多个
+# DINGDING_WARNING_ALL = False # 是否提示所有人, 默认为False
+# # 飞书报警
+# # https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN#e1cdee9f
+# FEISHU_WARNING_URL = "" # 飞书机器人api
+# FEISHU_WARNING_USER = None # 报警人 {"open_id":"ou_xxxxx", "name":"xxxx"} 或 [{"open_id":"ou_xxxxx", "name":"xxxx"}]
+# FEISHU_WARNING_ALL = False # 是否提示所有人, 默认为False
+# # 邮件报警
+# EMAIL_SENDER = "" # 发件人
+# EMAIL_PASSWORD = "" # 授权码
+# EMAIL_RECEIVER = "" # 收件人 支持列表,可指定多个
+# EMAIL_SMTPSERVER = "smtp.163.com" # 邮件服务器 默认为163邮箱
+# # 企业微信报警
+# WECHAT_WARNING_URL = "" # 企业微信机器人api
+# WECHAT_WARNING_PHONE = "" # 报警人 将会在群内@此人, 支持列表,可指定多人
+# WECHAT_WARNING_ALL = False # 是否提示所有人, 默认为False
+# # 时间间隔
+# WARNING_INTERVAL = 3600 # 相同报警的报警时间间隔,防止刷屏; 0表示不去重
+# WARNING_LEVEL = "DEBUG" # 报警级别, DEBUG / INFO / ERROR
+# WARNING_FAILED_COUNT = 1000 # 任务失败数 超过WARNING_FAILED_COUNT则报警
+#
+# LOG_NAME = os.path.basename(os.getcwd())
+# LOG_PATH = "log/%s.log" % LOG_NAME # log存储路径
+# LOG_LEVEL = "DEBUG"
+# LOG_COLOR = True # 是否带有颜色
+# LOG_IS_WRITE_TO_CONSOLE = True # 是否打印到控制台
+# LOG_IS_WRITE_TO_FILE = False # 是否写文件
+# LOG_MODE = "w" # 写文件的模式
+# LOG_MAX_BYTES = 10 * 1024 * 1024 # 每个日志文件的最大字节数
+# LOG_BACKUP_COUNT = 20 # 日志文件保留数量
+# LOG_ENCODING = "utf8" # 日志文件编码
+# OTHERS_LOG_LEVAL = "ERROR" # 第三方库的log等级
+#
+# # 切换工作路径为当前项目路径
+# project_path = os.path.abspath(os.path.dirname(__file__))
+# os.chdir(project_path) # 切换工作路经
+# sys.path.insert(0, project_path)
+# print("当前工作路径为 " + os.getcwd())
diff --git a/tests/test-debugger/spiders/__init__.py b/tests/test-debugger/spiders/__init__.py
new file mode 100644
index 00000000..4243fbe2
--- /dev/null
+++ b/tests/test-debugger/spiders/__init__.py
@@ -0,0 +1,3 @@
+__all__ = [
+ "test_debugger"
+]
\ No newline at end of file
diff --git a/tests/test-debugger/spiders/test_debugger.py b/tests/test-debugger/spiders/test_debugger.py
new file mode 100644
index 00000000..2ef73f56
--- /dev/null
+++ b/tests/test-debugger/spiders/test_debugger.py
@@ -0,0 +1,28 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2023-06-09 20:26:47
+---------
+@summary:
+---------
+@author: Boris
+"""
+
+import feapder
+
+
+class TestDebugger(feapder.Spider):
+ def start_requests(self):
+ yield feapder.Request("https://spidertools.cn", render=True)
+
+ def parse(self, request, response):
+ # 提取网站title
+ print(response.xpath("//title/text()").extract_first())
+ # 提取网站描述
+ print(response.xpath("//meta[@name='description']/@content").extract_first())
+ print("网站地址: ", response.url)
+
+
+if __name__ == "__main__":
+ TestDebugger.to_DebugSpider(
+ request=feapder.Request("https://spidertools.cn", render=True), redis_key="test:xxx"
+ ).start()
diff --git a/tests/test-pipeline/items/spider_data_item.py b/tests/test-pipeline/items/spider_data_item.py
index 3072d9a5..1960649a 100644
--- a/tests/test-pipeline/items/spider_data_item.py
+++ b/tests/test-pipeline/items/spider_data_item.py
@@ -8,6 +8,7 @@
"""
from feapder import Item
+from feapder.pipelines.csv_pipeline import CsvPipeline
class SpiderDataItem(Item):
@@ -15,6 +16,7 @@ class SpiderDataItem(Item):
This class was generated by feapder.
command: feapder create -i spider_data.
"""
+ __pipelines__ = [CsvPipeline()]
def __init__(self, *args, **kwargs):
# self.id = None # type : int(10) unsigned | allow_null : NO | key : PRI | default_value : None | extra : auto_increment | column_comment :
diff --git a/tests/test-pipeline/setting.py b/tests/test-pipeline/setting.py
index ca852ad4..ba985f09 100644
--- a/tests/test-pipeline/setting.py
+++ b/tests/test-pipeline/setting.py
@@ -19,7 +19,8 @@
# 数据入库的pipeline,可自定义,默认MysqlPipeline
ITEM_PIPELINES = [
- "pipeline.Pipeline"
+ "pipeline.Pipeline",
+ # "feapder.pipelines.csv_pipeline.CsvPipeline"
]
# # 爬虫相关
diff --git a/tests/test-pipeline/spiders/test_csv_pipeline_spider.py b/tests/test-pipeline/spiders/test_csv_pipeline_spider.py
new file mode 100644
index 00000000..83d4b842
--- /dev/null
+++ b/tests/test-pipeline/spiders/test_csv_pipeline_spider.py
@@ -0,0 +1,28 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2025-12-16 14:52:29
+---------
+@summary:
+---------
+@author: Boris
+"""
+
+import feapder
+from items import *
+
+
+class TestCsvPipelineSpider(feapder.AirSpider):
+ def start_requests(self):
+ for i in range(100):
+ yield feapder.Request("https://baidu.com", page=i)
+
+ def parse(self, request, response):
+ # 提取网站title
+ title = response.xpath("//title/text()").extract_first()
+ item = spider_data_item.SpiderDataItem() # 声明一个item
+ item.title = title # 给item属性赋值
+ yield item # 返回item, item会自动批量入库
+
+
+if __name__ == "__main__":
+ TestCsvPipelineSpider().start()
diff --git a/tests/test_csv_pipeline/test_functionality.py b/tests/test_csv_pipeline/test_functionality.py
new file mode 100644
index 00000000..190c9137
--- /dev/null
+++ b/tests/test_csv_pipeline/test_functionality.py
@@ -0,0 +1,454 @@
+# -*- coding: utf-8 -*-
+"""
+CSV Pipeline 功能测试
+
+测试内容:
+1. 基础功能测试
+2. 异常处理测试
+3. 边界条件测试
+4. 兼容性测试
+
+Created on 2025-10-16
+@author: 道长
+@email: ctrlf4@yeah.net
+"""
+
+import csv
+import os
+import sys
+import shutil
+from pathlib import Path
+
+# 添加项目路径
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+
+from feapder.pipelines.csv_pipeline import CsvPipeline
+
+
+class FunctionalityTester:
+ """CSV Pipeline 功能测试器"""
+
+ def __init__(self, test_dir="test_output"):
+ """初始化测试器"""
+ self.test_dir = test_dir
+ self.pipeline = None
+ self.passed = 0
+ self.failed = 0
+
+ def setup(self):
+ """测试前准备"""
+ if os.path.exists(self.test_dir):
+ shutil.rmtree(self.test_dir)
+
+ os.makedirs(self.test_dir, exist_ok=True)
+
+ csv_dir = os.path.join(self.test_dir, "csv")
+ self.pipeline = CsvPipeline(csv_dir=csv_dir)
+
+ print(f"✅ 测试环境准备完成")
+
+ def teardown(self):
+ """测试后清理"""
+ if self.pipeline:
+ self.pipeline.close()
+
+ def assert_true(self, condition, message):
+ """断言真"""
+ if condition:
+ print(f" ✅ {message}")
+ self.passed += 1
+ else:
+ print(f" ❌ {message}")
+ self.failed += 1
+
+ def assert_false(self, condition, message):
+ """断言假"""
+ self.assert_true(not condition, message)
+
+ def assert_equal(self, actual, expected, message):
+ """断言相等"""
+ if actual == expected:
+ print(f" ✅ {message}")
+ self.passed += 1
+ else:
+ print(f" ❌ {message} (期望: {expected}, 实际: {actual})")
+ self.failed += 1
+
+ def test_basic_save(self):
+ """测试基础保存功能"""
+ print("\n" + "=" * 80)
+ print("测试 1: 基础保存功能")
+ print("=" * 80)
+
+ # 测试保存单条数据
+ item = {"id": 1, "name": "Test Product", "price": 99.99}
+ result = self.pipeline.save_items("product", [item])
+ self.assert_true(result, "保存单条数据")
+
+ # 检查文件是否创建
+ csv_file = os.path.join(self.pipeline.csv_dir, "product.csv")
+ self.assert_true(os.path.exists(csv_file), "CSV 文件已创建")
+
+ # 检查数据是否正确
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ rows = list(reader)
+ self.assert_equal(len(rows), 1, "文件中有 1 条数据")
+ if rows:
+ self.assert_equal(rows[0]["id"], "1", "数据 ID 正确")
+ self.assert_equal(rows[0]["name"], "Test Product", "数据名称正确")
+
+ def test_batch_save(self):
+ """测试批量保存"""
+ print("\n" + "=" * 80)
+ print("测试 2: 批量保存功能")
+ print("=" * 80)
+
+ # 生成测试数据
+ items = []
+ for i in range(10):
+ items.append({
+ "id": i + 1,
+ "name": f"Product_{i + 1}",
+ "price": 100 + i,
+ })
+
+ result = self.pipeline.save_items("batch_test", items)
+ self.assert_true(result, "批量保存 10 条数据")
+
+ # 检查数据行数
+ csv_file = os.path.join(self.pipeline.csv_dir, "batch_test.csv")
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ rows = list(reader)
+ self.assert_equal(len(rows), 10, "批量保存数据行数正确")
+
+ def test_empty_items(self):
+ """测试空数据处理"""
+ print("\n" + "=" * 80)
+ print("测试 3: 空数据处理")
+ print("=" * 80)
+
+ result = self.pipeline.save_items("empty_test", [])
+ self.assert_true(result, "空数据列表返回 True")
+
+ def test_special_characters(self):
+ """测试特殊字符处理"""
+ print("\n" + "=" * 80)
+ print("测试 4: 特殊字符处理")
+ print("=" * 80)
+
+ items = [
+ {
+ "id": 1,
+ "name": "产品名称",
+ "description": 'Contains "quotes" and, commas',
+ "emoji": "😀🎉🚀",
+ "newline": "Line1\nLine2",
+ }
+ ]
+
+ result = self.pipeline.save_items("special_chars", items)
+ self.assert_true(result, "保存包含特殊字符的数据")
+
+ # 读取并检查
+ csv_file = os.path.join(self.pipeline.csv_dir, "special_chars.csv")
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ rows = list(reader)
+ if rows:
+ self.assert_equal(rows[0]["name"], "产品名称", "中文字符正确")
+ self.assert_equal(
+ rows[0].get("emoji", ""),
+ "😀🎉🚀",
+ "Emoji 正确"
+ )
+
+ def test_multiple_tables(self):
+ """测试多表存储"""
+ print("\n" + "=" * 80)
+ print("测试 5: 多表存储")
+ print("=" * 80)
+
+ tables = ["product", "user", "order"]
+ for table in tables:
+ item = {"id": 1, "name": f"Test {table}"}
+ result = self.pipeline.save_items(table, [item])
+ self.assert_true(result, f"保存到表 {table}")
+
+ # 检查所有文件
+ for table in tables:
+ csv_file = os.path.join(self.pipeline.csv_dir, f"{table}.csv")
+ self.assert_true(os.path.exists(csv_file), f"表 {table} 的 CSV 文件存在")
+
+ def test_header_only_once(self):
+ """测试表头只写一次"""
+ print("\n" + "=" * 80)
+ print("测试 6: 表头只写一次")
+ print("=" * 80)
+
+ table = "header_test"
+
+ # 第一次写入
+ items1 = [{"id": 1, "name": "Product 1"}]
+ self.pipeline.save_items(table, items1)
+
+ # 第二次写入
+ items2 = [{"id": 2, "name": "Product 2"}]
+ self.pipeline.save_items(table, items2)
+
+ # 检查表头行数
+ csv_file = os.path.join(self.pipeline.csv_dir, f"{table}.csv")
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ lines = f.readlines()
+ # 应该是:1 个表头 + 2 条数据
+ self.assert_equal(len(lines), 3, "文件中只有 1 行表头和 2 行数据")
+
+ def test_numeric_values(self):
+ """测试数值类型"""
+ print("\n" + "=" * 80)
+ print("测试 7: 数值类型处理")
+ print("=" * 80)
+
+ items = [
+ {
+ "id": 1,
+ "price": 99.99,
+ "stock": 100,
+ "rating": 4.5,
+ "active": True,
+ }
+ ]
+
+ result = self.pipeline.save_items("numeric_test", items)
+ self.assert_true(result, "保存包含各类数值的数据")
+
+ # 读取并检查
+ csv_file = os.path.join(self.pipeline.csv_dir, "numeric_test.csv")
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ rows = list(reader)
+ if rows:
+ self.assert_equal(rows[0]["price"], "99.99", "浮点数正确")
+ self.assert_equal(rows[0]["stock"], "100", "整数正确")
+ self.assert_equal(rows[0]["rating"], "4.5", "小数正确")
+
+ def test_large_values(self):
+ """测试大值处理"""
+ print("\n" + "=" * 80)
+ print("测试 8: 大值处理")
+ print("=" * 80)
+
+ large_text = "x" * 10000 # 10KB 的文本
+ items = [
+ {
+ "id": 1,
+ "name": "Large Content",
+ "content": large_text,
+ }
+ ]
+
+ result = self.pipeline.save_items("large_test", items)
+ self.assert_true(result, "保存大内容数据")
+
+ # 检查数据完整性
+ csv_file = os.path.join(self.pipeline.csv_dir, "large_test.csv")
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ rows = list(reader)
+ if rows:
+ self.assert_equal(
+ len(rows[0]["content"]),
+ len(large_text),
+ "大内容数据完整"
+ )
+
+ def test_update_items_fallback(self):
+ """测试 update_items 降级为 save"""
+ print("\n" + "=" * 80)
+ print("测试 9: update_items 降级为 save")
+ print("=" * 80)
+
+ items = [{"id": 1, "name": "Product 1", "price": 100}]
+ result = self.pipeline.update_items("update_test", items, ("price",))
+ self.assert_true(result, "update_items 返回 True")
+
+ # 检查数据是否存在
+ csv_file = os.path.join(self.pipeline.csv_dir, "update_test.csv")
+ self.assert_true(os.path.exists(csv_file), "update_items 创建了 CSV 文件")
+
+ def test_file_operations(self):
+ """测试文件操作"""
+ print("\n" + "=" * 80)
+ print("测试 10: 文件操作")
+ print("=" * 80)
+
+ items = [{"id": 1, "name": "Test"}]
+ table = "file_test"
+
+ result = self.pipeline.save_items(table, items)
+ self.assert_true(result, "保存数据")
+
+ csv_file = os.path.join(self.pipeline.csv_dir, f"{table}.csv")
+
+ # 检查文件是否可读
+ try:
+ with open(csv_file, 'r', encoding='utf-8') as f:
+ f.read()
+ self.assert_true(True, "CSV 文件可读")
+ except Exception as e:
+ self.assert_true(False, f"CSV 文件可读 ({e})")
+
+ # 检查文件大小
+ file_size = os.path.getsize(csv_file)
+ self.assert_true(file_size > 0, f"CSV 文件大小 > 0 ({file_size} 字节)")
+
+ def test_concurrent_same_table(self):
+ """测试同表并发写入"""
+ print("\n" + "=" * 80)
+ print("测试 11: 同表并发写入(Per-Table Lock)")
+ print("=" * 80)
+
+ import threading
+
+ table = "concurrent_same_table"
+ errors = []
+
+ def write_data(thread_id):
+ try:
+ items = [{"id": thread_id, "name": f"Item_{thread_id}"}]
+ result = self.pipeline.save_items(table, items)
+ if not result:
+ errors.append(f"线程{thread_id}写入失败")
+ except Exception as e:
+ errors.append(f"线程{thread_id}异常: {e}")
+
+ # 创建多个线程
+ threads = []
+ for i in range(5):
+ t = threading.Thread(target=write_data, args=(i,))
+ t.start()
+ threads.append(t)
+
+ # 等待所有线程完成
+ for t in threads:
+ t.join()
+
+ self.assert_equal(len(errors), 0, "并发写入无错误")
+
+ # 检查数据完整性
+ csv_file = os.path.join(self.pipeline.csv_dir, f"{table}.csv")
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ rows = list(reader)
+ self.assert_true(len(rows) > 0, "并发写入产生了数据")
+
+ def test_directory_creation(self):
+ """测试目录自动创建"""
+ print("\n" + "=" * 80)
+ print("测试 12: 目录自动创建")
+ print("=" * 80)
+
+ # 创建新的 pipeline 实例,指定不存在的目录
+ new_csv_dir = os.path.join(self.test_dir, "new_csv_dir")
+ self.assert_false(os.path.exists(new_csv_dir), "新目录不存在")
+
+ new_pipeline = CsvPipeline(csv_dir=new_csv_dir)
+ self.assert_true(os.path.exists(new_csv_dir), "目录自动创建")
+
+ new_pipeline.close()
+
+ def test_none_values(self):
+ """测试 None 值处理"""
+ print("\n" + "=" * 80)
+ print("测试 13: None 值处理")
+ print("=" * 80)
+
+ items = [
+ {
+ "id": 1,
+ "name": "Product",
+ "description": None,
+ "optional_field": "",
+ }
+ ]
+
+ result = self.pipeline.save_items("none_test", items)
+ self.assert_true(result, "保存包含 None 值的数据")
+
+ # 检查文件
+ csv_file = os.path.join(self.pipeline.csv_dir, "none_test.csv")
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ rows = list(reader)
+ if rows:
+ # None 会被转换为字符串 "None"
+ self.assert_true("None" in rows[0]["description"],
+ "None 值被正确处理")
+
+ def run_all_tests(self):
+ """运行所有测试"""
+ print("\n")
+ print("╔" + "═" * 78 + "╗")
+ print("║" + " CSV Pipeline 功能测试 ".center(78) + "║")
+ print("║" + " 作者: 道长 | 日期: 2025-10-16 ".center(78) + "║")
+ print("╚" + "═" * 78 + "╝")
+
+ try:
+ self.setup()
+
+ # 运行所有测试
+ self.test_basic_save()
+ self.test_batch_save()
+ self.test_empty_items()
+ self.test_special_characters()
+ self.test_multiple_tables()
+ self.test_header_only_once()
+ self.test_numeric_values()
+ self.test_large_values()
+ self.test_update_items_fallback()
+ self.test_file_operations()
+ self.test_concurrent_same_table()
+ self.test_directory_creation()
+ self.test_none_values()
+
+ # 打印总结
+ self.print_summary()
+
+ return self.failed == 0
+
+ except Exception as e:
+ print(f"\n❌ 测试过程中出错: {e}")
+ import traceback
+ traceback.print_exc()
+ return False
+
+ finally:
+ self.teardown()
+
+ def print_summary(self):
+ """打印测试总结"""
+ print("\n" + "=" * 80)
+ print("测试总结")
+ print("=" * 80)
+ print(f"✅ 通过: {self.passed}")
+ print(f"❌ 失败: {self.failed}")
+ print(f"总计: {self.passed + self.failed}")
+
+ if self.failed == 0:
+ print("\n🎉 所有测试通过!")
+ else:
+ print(f"\n⚠️ 有 {self.failed} 个测试失败")
+
+ print("=" * 80)
+
+
+def main():
+ """主函数"""
+ tester = FunctionalityTester(test_dir="tests/test_csv_pipeline/test_output_func")
+ success = tester.run_all_tests()
+ return 0 if success else 1
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/tests/test_csv_pipeline/test_performance.py b/tests/test_csv_pipeline/test_performance.py
new file mode 100644
index 00000000..94eb64a7
--- /dev/null
+++ b/tests/test_csv_pipeline/test_performance.py
@@ -0,0 +1,537 @@
+# -*- coding: utf-8 -*-
+"""
+CSV Pipeline 性能测试
+
+测试内容:
+1. 批量写入性能
+2. 并发写入性能
+3. 内存占用情况
+4. 文件大小和数据完整性
+
+Created on 2025-10-16
+@author: 道长
+@email: ctrlf4@yeah.net
+"""
+
+import csv
+import os
+import sys
+import time
+import shutil
+import threading
+import psutil
+from pathlib import Path
+from typing import List, Dict
+
+# 添加项目路径
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+
+from feapder.pipelines.csv_pipeline import CsvPipeline
+
+
+class PerformanceTester:
+ """CSV Pipeline 性能测试器"""
+
+ def __init__(self, test_dir="test_output"):
+ """初始化测试器"""
+ self.test_dir = test_dir
+ self.pipeline = None
+ self.process = psutil.Process()
+ self.test_results = {}
+
+ def setup(self):
+ """测试前准备"""
+ # 清理历史测试目录
+ if os.path.exists(self.test_dir):
+ shutil.rmtree(self.test_dir)
+
+ # 创建测试输出目录
+ os.makedirs(self.test_dir, exist_ok=True)
+
+ # 初始化 Pipeline
+ csv_dir = os.path.join(self.test_dir, "csv")
+ self.pipeline = CsvPipeline(csv_dir=csv_dir)
+
+ print(f"✅ 测试环境准备完成,输出目录: {self.test_dir}")
+
+ def teardown(self):
+ """测试后清理"""
+ if self.pipeline:
+ self.pipeline.close()
+
+ def generate_test_data(self, count: int) -> List[Dict]:
+ """生成测试数据"""
+ data = []
+ for i in range(count):
+ data.append({
+ "id": i + 1,
+ "name": f"Product_{i + 1}",
+ "price": 99.99 + i * 0.1,
+ "category": "Electronics",
+ "url": f"https://example.com/product/{i + 1}",
+ "stock": 100 - (i % 50),
+ "rating": 4.5 + (i % 5) * 0.1,
+ "description": f"Description for product {i + 1}" * 3,
+ })
+ return data
+
+ def test_single_batch_performance(self):
+ """测试单批写入性能"""
+ print("\n" + "=" * 80)
+ print("测试 1: 单批写入性能")
+ print("=" * 80)
+
+ batch_sizes = [100, 500, 1000, 5000]
+ results = {}
+
+ for batch_size in batch_sizes:
+ data = self.generate_test_data(batch_size)
+
+ # 测试写入时间
+ start_time = time.time()
+ success = self.pipeline.save_items("product", data)
+ elapsed = time.time() - start_time
+
+ # 测试结果
+ results[batch_size] = {
+ "success": success,
+ "elapsed_time": elapsed,
+ "throughput": batch_size / elapsed if elapsed > 0 else 0,
+ }
+
+ print(f"批量大小: {batch_size:5d} | "
+ f"耗时: {elapsed:.4f}s | "
+ f"吞吐量: {results[batch_size]['throughput']:.0f} 条/秒 | "
+ f"状态: {'✅' if success else '❌'}")
+
+ self.test_results["single_batch"] = results
+ return results
+
+ def test_concurrent_write_performance(self):
+ """测试并发写入性能"""
+ print("\n" + "=" * 80)
+ print("测试 2: 并发写入性能(模拟多爬虫线程)")
+ print("=" * 80)
+
+ thread_counts = [1, 2, 4, 8]
+ results = {}
+
+ for thread_count in thread_counts:
+ # 每个线程写入的数据条数
+ items_per_thread = 100
+ total_items = thread_count * items_per_thread
+
+ def write_thread(thread_id):
+ """线程工作函数"""
+ data = self.generate_test_data(items_per_thread)
+ # 为了模拟不同表,使用不同的表名
+ table_name = f"product_thread_{thread_id}"
+ return self.pipeline.save_items(table_name, data)
+
+ # 记录初始内存
+ mem_before = self.process.memory_info().rss / 1024 / 1024
+
+ # 并发执行
+ start_time = time.time()
+ threads = []
+ for i in range(thread_count):
+ t = threading.Thread(target=write_thread, args=(i,))
+ t.start()
+ threads.append(t)
+
+ # 等待所有线程完成
+ for t in threads:
+ t.join()
+
+ elapsed = time.time() - start_time
+ mem_after = self.process.memory_info().rss / 1024 / 1024
+ mem_delta = mem_after - mem_before
+
+ results[thread_count] = {
+ "total_items": total_items,
+ "elapsed_time": elapsed,
+ "throughput": total_items / elapsed if elapsed > 0 else 0,
+ "memory_delta_mb": mem_delta,
+ }
+
+ print(f"线程数: {thread_count} | "
+ f"总数据: {total_items:5d} | "
+ f"耗时: {elapsed:.4f}s | "
+ f"吞吐量: {results[thread_count]['throughput']:.0f} 条/秒 | "
+ f"内存增长: {mem_delta:.2f}MB")
+
+ self.test_results["concurrent_write"] = results
+ return results
+
+ def test_memory_usage(self):
+ """测试内存占用"""
+ print("\n" + "=" * 80)
+ print("测试 3: 内存占用情况")
+ print("=" * 80)
+
+ # 测试不同数量的数据对内存的影响
+ test_counts = [1000, 5000, 10000, 50000]
+ results = {}
+
+ for count in test_counts:
+ data = self.generate_test_data(count)
+
+ # 记录内存
+ mem_before = self.process.memory_info().rss / 1024 / 1024
+
+ # 执行写入
+ start_time = time.time()
+ self.pipeline.save_items("product_memory", data)
+ elapsed = time.time() - start_time
+
+ mem_after = self.process.memory_info().rss / 1024 / 1024
+ mem_used = mem_after - mem_before
+ mem_per_item = mem_used / count if count > 0 else 0
+
+ results[count] = {
+ "memory_before_mb": mem_before,
+ "memory_after_mb": mem_after,
+ "memory_used_mb": mem_used,
+ "memory_per_item_kb": mem_per_item * 1024,
+ "elapsed_time": elapsed,
+ }
+
+ print(f"数据条数: {count:6d} | "
+ f"内存占用: {mem_used:6.2f}MB | "
+ f"每条数据: {mem_per_item * 1024:.2f}KB | "
+ f"耗时: {elapsed:.4f}s")
+
+ self.test_results["memory_usage"] = results
+ return results
+
+ def test_file_integrity(self):
+ """测试文件完整性"""
+ print("\n" + "=" * 80)
+ print("测试 4: 文件完整性检查")
+ print("=" * 80)
+
+ # 写入测试数据
+ test_data = self.generate_test_data(1000)
+ table_name = "product_integrity"
+
+ success = self.pipeline.save_items(table_name, test_data)
+
+ if not success:
+ print("❌ 写入失败")
+ return {"status": "failed"}
+
+ # 检查文件是否存在
+ csv_file = os.path.join(self.pipeline.csv_dir, f"{table_name}.csv")
+ if not os.path.exists(csv_file):
+ print("❌ CSV 文件不存在")
+ return {"status": "file_not_found"}
+
+ # 读取 CSV 文件并检查数据完整性
+ read_data = []
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ for row in reader:
+ read_data.append(row)
+
+ # 对比数据
+ if len(read_data) != len(test_data):
+ print(f"❌ 数据条数不符: 写入{len(test_data)}条,读取{len(read_data)}条")
+ return {
+ "status": "count_mismatch",
+ "written": len(test_data),
+ "read": len(read_data),
+ }
+
+ # 检查字段是否完整
+ expected_fields = set(test_data[0].keys())
+ actual_fields = set(read_data[0].keys())
+ if expected_fields != actual_fields:
+ print(f"❌ 字段不符\n期望: {expected_fields}\n实际: {actual_fields}")
+ return {
+ "status": "field_mismatch",
+ "expected": list(expected_fields),
+ "actual": list(actual_fields),
+ }
+
+ # 检查数据值是否正确(抽样检查)
+ sample_indices = [0, len(test_data) // 2, len(test_data) - 1]
+ for idx in sample_indices:
+ original = test_data[idx]
+ read = read_data[idx]
+
+ for key in original.keys():
+ if str(original[key]) != read.get(key, ""):
+ print(f"❌ 数据不符 (第{idx}行, 字段{key})\n"
+ f"期望: {original[key]}\n"
+ f"实际: {read.get(key)}")
+ return {"status": "data_mismatch", "index": idx, "field": key}
+
+ print(f"✅ 文件完整性检查通过")
+ print(f" 总条数: {len(read_data)}")
+ print(f" 字段数: {len(actual_fields)}")
+ print(f" 文件大小: {os.path.getsize(csv_file) / 1024:.2f}KB")
+
+ return {
+ "status": "passed",
+ "total_rows": len(read_data),
+ "total_fields": len(actual_fields),
+ "file_size_kb": os.path.getsize(csv_file) / 1024,
+ }
+
+ def test_append_mode(self):
+ """测试追加模式(断点续爬)"""
+ print("\n" + "=" * 80)
+ print("测试 5: 追加模式(断点续爬)")
+ print("=" * 80)
+
+ table_name = "product_append"
+
+ # 第一次写入
+ data1 = self.generate_test_data(100)
+ self.pipeline.save_items(table_name, data1)
+
+ csv_file = os.path.join(self.pipeline.csv_dir, f"{table_name}.csv")
+ size_after_first = os.path.getsize(csv_file) if os.path.exists(csv_file) else 0
+
+ # 第二次写入(追加)
+ data2 = self.generate_test_data(100)
+ self.pipeline.save_items(table_name, data2)
+
+ size_after_second = os.path.getsize(csv_file) if os.path.exists(csv_file) else 0
+
+ # 读取文件检查数据
+ read_data = []
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ for row in reader:
+ read_data.append(row)
+
+ # 检查是否正确追加
+ if len(read_data) == len(data1) + len(data2):
+ print(f"✅ 追加模式正常")
+ print(f" 第一次写入: {len(data1)} 条")
+ print(f" 第二次写入: {len(data2)} 条")
+ print(f" 最终总数: {len(read_data)} 条")
+ print(f" 第一次后大小: {size_after_first / 1024:.2f}KB")
+ print(f" 第二次后大小: {size_after_second / 1024:.2f}KB")
+
+ return {
+ "status": "passed",
+ "first_write": len(data1),
+ "second_write": len(data2),
+ "total": len(read_data),
+ "size_growth_kb": (size_after_second - size_after_first) / 1024,
+ }
+ else:
+ print(f"❌ 追加模式异常: 期望{len(data1) + len(data2)}条,实际{len(read_data)}条")
+ return {
+ "status": "failed",
+ "expected": len(data1) + len(data2),
+ "actual": len(read_data),
+ }
+
+ def test_concurrent_safety(self):
+ """测试并发安全性(Per-Table Lock)"""
+ print("\n" + "=" * 80)
+ print("测试 6: 并发安全性(Per-Table Lock)")
+ print("=" * 80)
+
+ table_name = "product_concurrent_safety"
+ thread_count = 4
+ items_per_thread = 250
+
+ errors = []
+ lock = threading.Lock()
+
+ def write_thread(thread_id):
+ """线程工作函数"""
+ try:
+ data = self.generate_test_data(items_per_thread)
+ success = self.pipeline.save_items(table_name, data)
+ if not success:
+ with lock:
+ errors.append(f"线程{thread_id}写入失败")
+ except Exception as e:
+ with lock:
+ errors.append(f"线程{thread_id}异常: {e}")
+
+ # 并发执行
+ threads = []
+ start_time = time.time()
+ for i in range(thread_count):
+ t = threading.Thread(target=write_thread, args=(i,))
+ t.start()
+ threads.append(t)
+
+ for t in threads:
+ t.join()
+
+ elapsed = time.time() - start_time
+
+ # 检查文件
+ csv_file = os.path.join(self.pipeline.csv_dir, f"{table_name}.csv")
+ read_data = []
+ with open(csv_file, 'r', encoding='utf-8', newline='') as f:
+ reader = csv.DictReader(f)
+ for row in reader:
+ read_data.append(row)
+
+ expected_total = thread_count * items_per_thread
+
+ if len(errors) == 0 and len(read_data) == expected_total:
+ print(f"✅ 并发安全性测试通过")
+ print(f" 线程数: {thread_count}")
+ print(f" 每线程数据: {items_per_thread}")
+ print(f" 期望总数: {expected_total}")
+ print(f" 实际总数: {len(read_data)}")
+ print(f" 耗时: {elapsed:.4f}s")
+ print(f" 吞吐量: {expected_total / elapsed:.0f} 条/秒")
+
+ return {
+ "status": "passed",
+ "thread_count": thread_count,
+ "items_per_thread": items_per_thread,
+ "expected_total": expected_total,
+ "actual_total": len(read_data),
+ "elapsed_time": elapsed,
+ "throughput": expected_total / elapsed,
+ }
+ else:
+ print(f"❌ 并发安全性测试失败")
+ if errors:
+ for error in errors:
+ print(f" {error}")
+ if len(read_data) != expected_total:
+ print(f" 数据条数不符: 期望{expected_total}条,实际{len(read_data)}条")
+
+ return {
+ "status": "failed",
+ "errors": errors,
+ "expected_total": expected_total,
+ "actual_total": len(read_data),
+ }
+
+ def test_multiple_tables(self):
+ """测试多表存储"""
+ print("\n" + "=" * 80)
+ print("测试 7: 多表存储")
+ print("=" * 80)
+
+ tables = ["product", "user", "order"]
+ rows_per_table = 500
+ results = {}
+
+ start_time = time.time()
+
+ for table in tables:
+ data = self.generate_test_data(rows_per_table)
+ success = self.pipeline.save_items(table, data)
+
+ csv_file = os.path.join(self.pipeline.csv_dir, f"{table}.csv")
+ file_size = os.path.getsize(csv_file) / 1024 if os.path.exists(csv_file) else 0
+
+ results[table] = {
+ "success": success,
+ "file_size_kb": file_size,
+ }
+
+ print(f"表: {table:10s} | 状态: {'✅' if success else '❌'} | "
+ f"文件大小: {file_size:.2f}KB")
+
+ elapsed = time.time() - start_time
+
+ # 检查所有文件
+ csv_dir = self.pipeline.csv_dir
+ files = [f for f in os.listdir(csv_dir) if f.endswith('.csv')]
+
+ print(f"\n✅ 多表存储测试完成")
+ print(f" 表数: {len(tables)}")
+ print(f" 每表行数: {rows_per_table}")
+ print(f" 生成的 CSV 文件: {len(files)}")
+ print(f" 耗时: {elapsed:.4f}s")
+
+ return {
+ "status": "passed",
+ "tables": results,
+ "file_count": len(files),
+ "elapsed_time": elapsed,
+ }
+
+ def run_all_tests(self):
+ """运行所有测试"""
+ print("\n")
+ print("╔" + "═" * 78 + "╗")
+ print("║" + " CSV Pipeline 性能和功能测试 ".center(78) + "║")
+ print("║" + " 作者: 道长 | 日期: 2025-10-16 ".center(78) + "║")
+ print("╚" + "═" * 78 + "╝")
+
+ try:
+ self.setup()
+
+ # 运行所有测试
+ self.test_single_batch_performance()
+ self.test_concurrent_write_performance()
+ self.test_memory_usage()
+ self.test_file_integrity()
+ self.test_append_mode()
+ self.test_concurrent_safety()
+ self.test_multiple_tables()
+
+ # 打印总结
+ self.print_summary()
+
+ return True
+
+ except Exception as e:
+ print(f"\n❌ 测试过程中出错: {e}")
+ import traceback
+ traceback.print_exc()
+ return False
+
+ finally:
+ self.teardown()
+
+ def print_summary(self):
+ """打印测试总结"""
+ print("\n" + "=" * 80)
+ print("测试总结")
+ print("=" * 80)
+
+ # 单批性能总结
+ if "single_batch" in self.test_results:
+ print("\n1. 单批写入性能:")
+ results = self.test_results["single_batch"]
+ for batch_size, data in results.items():
+ print(f" {batch_size:5d} 条: {data['throughput']:.0f} 条/秒, "
+ f"耗时 {data['elapsed_time']:.4f}s")
+
+ # 并发性能总结
+ if "concurrent_write" in self.test_results:
+ print("\n2. 并发写入性能:")
+ results = self.test_results["concurrent_write"]
+ for thread_count, data in results.items():
+ print(f" {thread_count} 线程: {data['throughput']:.0f} 条/秒, "
+ f"内存增长 {data['memory_delta_mb']:.2f}MB")
+
+ # 内存占用总结
+ if "memory_usage" in self.test_results:
+ print("\n3. 内存占用情况:")
+ results = self.test_results["memory_usage"]
+ for count, data in results.items():
+ print(f" {count:6d} 条: {data['memory_used_mb']:.2f}MB, "
+ f"每条 {data['memory_per_item_kb']:.2f}KB")
+
+ print("\n" + "=" * 80)
+ print("✅ 所有测试完成!")
+ print("=" * 80)
+
+
+def main():
+ """主函数"""
+ tester = PerformanceTester(test_dir="tests/test_csv_pipeline/test_output")
+ success = tester.run_all_tests()
+ return 0 if success else 1
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/tests/test_download_midware.py b/tests/test_download_midware.py
new file mode 100644
index 00000000..1accbaf7
--- /dev/null
+++ b/tests/test_download_midware.py
@@ -0,0 +1,45 @@
+# -*- coding: utf-8 -*-
+"""
+Created on 2023/9/21 13:59
+---------
+@summary:
+---------
+@author: Boris
+@email: boris_liu@foxmail.com
+"""
+
+import feapder
+
+
+def download_midware(request):
+ print("outter download_midware")
+ return request
+
+
+class TestAirSpider(feapder.AirSpider):
+ def start_requests(self):
+ yield feapder.Request(
+ "https://www.baidu.com", download_midware=download_midware
+ )
+
+ def parse(self, request, response):
+ print(request, response)
+
+
+class TestSpiderSpider(feapder.Spider):
+ def start_requests(self):
+ yield feapder.Request(
+ "https://www.baidu.com", download_midware=[download_midware, self.download_midware]
+ )
+
+ def download_midware(self, request):
+ print("class download_midware")
+ return request
+
+ def parse(self, request, response):
+ print(request, response)
+
+
+if __name__ == "__main__":
+ # TestAirSpider().start()
+ TestSpiderSpider(redis_key="test").start()
diff --git a/tests/test_log.py b/tests/test_log.py
index 3ec0ac31..c044a238 100644
--- a/tests/test_log.py
+++ b/tests/test_log.py
@@ -10,4 +10,10 @@
from feapder.utils.log import log
-log.debug(1)
\ No newline at end of file
+log.debug("debug")
+log.info("info")
+log.success("success")
+log.warning("warning")
+log.error("error")
+log.critical("critical")
+log.exception("exception")
\ No newline at end of file
diff --git a/tests/test_metrics.py b/tests/test_metrics.py
index 6b8ae8e5..308c2711 100644
--- a/tests/test_metrics.py
+++ b/tests/test_metrics.py
@@ -1,3 +1,5 @@
+import asyncio
+
from feapder.utils import metrics
# 初始化打点系统
@@ -13,9 +15,38 @@
)
-for i in range(1000):
- metrics.emit_counter("total count", count=1000, classify="test5")
- for j in range(1000):
- metrics.emit_counter("key", count=1, classify="test5")
+async def test_counter_async():
+ for i in range(100):
+ await metrics.aemit_counter("total count", count=100, classify="test5")
+ for j in range(100):
+ await metrics.aemit_counter("key", count=1, classify="test5")
+
+
+def test_counter():
+ for i in range(100):
+ metrics.emit_counter("total count", count=100, classify="test5")
+ for j in range(100):
+ metrics.emit_counter("key", count=1, classify="test5")
+
+
+def test_store():
+ metrics.emit_store("total", 100, classify="cookie_count")
+
+
+def test_time():
+ metrics.emit_timer("total", 100, classify="time")
+
+
+def test_any():
+ metrics.emit_any(
+ tags={"_key": "total", "_type": "any"}, fields={"_value": 100}, classify="time"
+ )
+
-metrics.close()
+if __name__ == "__main__":
+ asyncio.run(test_counter_async())
+ test_counter_async()
+ test_store()
+ test_time()
+ test_any()
+ metrics.close()
diff --git a/tests/test_mysqldb.py b/tests/test_mysqldb.py
index 7d59ce70..1fdd9c09 100644
--- a/tests/test_mysqldb.py
+++ b/tests/test_mysqldb.py
@@ -2,7 +2,10 @@
db = MysqlDB(
- ip="localhost", port=3306, db="feapder", user_name="feapder", user_pass="feapder123"
+ ip="localhost", port=3306, db="feapder", user_name="feapder", user_pass="feapder123", set_session=["SET time_zone='+08:00'"]
)
-MysqlDB.from_url("mysql://feapder:feapder123@localhost:3306/feapder?charset=utf8mb4")
\ No newline at end of file
+MysqlDB.from_url("mysql://feapder:feapder123@localhost:3306/feapder?charset=utf8mb4")
+
+result = db.find("SELECT @@global.time_zone, @@session.time_zone, date_format(NOW(), '%Y-%m-%d %H:%i:%s')")
+print(f"Database timezone info: {result}")
\ No newline at end of file
diff --git a/tests/test_proxies_pool.py b/tests/test_proxies_pool.py
deleted file mode 100644
index 5c63758e..00000000
--- a/tests/test_proxies_pool.py
+++ /dev/null
@@ -1,39 +0,0 @@
-# -*- coding: utf-8 -*-
-"""
-Created on 2021/4/3 4:25 下午
----------
-@summary:
----------
-@author: Boris
-@email: boris_liu@foxmail.com
-"""
-from feapder.network.proxy_pool import ProxyPool, check_proxy
-import requests
-
-url = "http://tunnel-api.apeyun.com/h?id=2020120800184471713&secret=3U1fEJPuabi3y2QJ&limit=10&format=txt&auth_mode=auto"
-
-proxy_pool = ProxyPool(size=-1, proxy_source_url=url)
-
-print(proxy_pool.get())
-#
-# headers = {
-# "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36",
-# "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
-# "Accept-Encoding": "gzip, deflate, br",
-# "Accept-Language": "zh-CN,zh;q=0.9",
-# "Connection": "keep-alive",
-# }
-#
-#
-# resp = requests.get(
-# "http://www.baidu.com",
-# headers=headers,
-# proxies={
-# "https": "https://182.106.136.67:13586",
-# "http": "http://182.106.136.67:13586",
-# },
-# )
-# print(resp.text)
-#
-# a = check_proxy("182.106.136.67", "13586", show_error_log=True, type=1)
-# print(a)
diff --git a/tests/test_rander_xhr.py b/tests/test_rander_xhr.py
index 534e5c57..15fe2da8 100644
--- a/tests/test_rander_xhr.py
+++ b/tests/test_rander_xhr.py
@@ -12,7 +12,7 @@ class TestRender(feapder.AirSpider):
user_agent=None, # 字符串 或 无参函数,返回值为user_agent
proxy=None, # xxx.xxx.xxx.xxx:xxxx 或 无参函数,返回值为代理地址
headless=False, # 是否为无头浏览器
- driver_type="CHROME", # CHROME、PHANTOMJS、FIREFOX
+ driver_type="CHROME", # CHROME、EDGE、PHANTOMJS、FIREFOX
timeout=30, # 请求超时时间
window_size=(1024, 800), # 窗口大小
executable_path=None, # 浏览器路径,默认为默认路径