更新FAQ和requirements,解决upload_file接口的两个异常 (#593)

27a9bf24 · Zhi-guo Huang · GitHub · 66c4e9de · 27a9bf24 · 27a9bf24
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -18,37 +18,39 @@ $ pip install -e .

 ---

-Q3: 使用过程中 Python 包`nltk`发生了`Resource punkt not found.`报错，该如何解决？
+Q3: 使用过程中 Python 包 `nltk`发生了 `Resource punkt not found.`报错，该如何解决？

 A3: 方法一：https://github.com/nltk/nltk_data/raw/gh-pages/packages/tokenizers/punkt.zip 中的 `packages/tokenizers` 解压，放到  `nltk_data/tokenizers` 存储路径下。

- `nltk_data` 存储路径可以通过 `nltk.data.path` 查询。
- 
- 方法二：执行python代码
-``` 
+`nltk_data` 存储路径可以通过 `nltk.data.path` 查询。
+
+方法二：执行python代码
+
+```
 import nltk
 nltk.download()
-``` 
+```

 ---

-Q4: 使用过程中 Python 包`nltk`发生了`Resource averaged_perceptron_tagger not found.`报错，该如何解决？
+Q4: 使用过程中 Python 包 `nltk`发生了 `Resource averaged_perceptron_tagger not found.`报错，该如何解决？

 A4: 方法一：将 https://github.com/nltk/nltk_data/blob/gh-pages/packages/taggers/averaged_perceptron_tagger.zip 下载，解压放到 `nltk_data/taggers` 存储路径下。

- `nltk_data` 存储路径可以通过 `nltk.data.path` 查询。  
- 
+`nltk_data` 存储路径可以通过 `nltk.data.path` 查询。
+
 方法二：执行python代码
-``` 
+
+```
 import nltk
 nltk.download()
-``` 
+```
+
 ---

 Q5: 本项目可否在 colab 中运行？

-A5: 可以尝试使用 chatglm-6b-int4 模型在 colab 中运行，需要注意的是，如需在 colab 中运行 Web UI，需将`webui.py`中`demo.queue(concurrency_count=3).launch(
-    server_name='0.0.0.0', share=False, inbrowser=False)`中参数`share`设置为`True`。
+A5: 可以尝试使用 chatglm-6b-int4 模型在 colab 中运行，需要注意的是，如需在 colab 中运行 Web UI，需将 `webui.py`中 `demo.queue(concurrency_count=3).launch( server_name='0.0.0.0', share=False, inbrowser=False)`中参数 `share`设置为 `True`。

 ---

@@ -60,7 +62,7 @@ A6: 此问题是系统环境问题，详细见  [在Anaconda中使用pip安装

 Q7: 本项目中所需模型如何下载至本地？

-A7: 本项目中使用的模型均为`huggingface.com`中可下载的开源模型，以默认选择的`chatglm-6b`和`text2vec-large-chinese`模型为例，下载模型可执行如下代码：
+A7: 本项目中使用的模型均为 `huggingface.com`中可下载的开源模型，以默认选择的 `chatglm-6b`和 `text2vec-large-chinese`模型为例，下载模型可执行如下代码：

 ```shell
 # 安装 git lfs
@@ -93,7 +95,7 @@ A8: 可使用本项目用到的模型权重文件百度网盘地址：

 Q9: 下载完模型后，如何修改代码以执行本地模型？

-A9: 模型下载完成后，请在 [configs/model_config.py](../configs/model_config.py) 文件中，对`embedding_model_dict`和`llm_model_dict`参数进行修改，如把`llm_model_dict`从
+A9: 模型下载完成后，请在 [configs/model_config.py](../configs/model_config.py) 文件中，对 `embedding_model_dict`和 `llm_model_dict`参数进行修改，如把 `llm_model_dict`从

 ```python
 embedding_model_dict = {
@@ -112,9 +114,10 @@ embedding_model_dict = {
                        "text2vec": "/Users/liuqian/Downloads/ChatGLM-6B/text2vec-large-chinese"
 }
 ```
+
 ---

-Q10: 执行`python cli_demo.py`过程中，显卡内存爆了，提示"OutOfMemoryError: CUDA out of memory"
+Q10: 执行 `python cli_demo.py`过程中，显卡内存爆了，提示"OutOfMemoryError: CUDA out of memory"

 A10: 将 `VECTOR_SEARCH_TOP_K` 和 `LLM_HISTORY_LEN` 的值调低，比如 `VECTOR_SEARCH_TOP_K = 5` 和 `LLM_HISTORY_LEN = 2`，这样由 `query` 和 `context` 拼接得到的 `prompt` 会变短，会减少内存的占用。

@@ -128,15 +131,46 @@ A11: 更换 pypi 源后重新安装，如阿里源、清华源等，网络条件
 # 使用 pypi 源
 $ pip install -r requirements.txt -i https://pypi.python.org/simple
 ```
+
 或
+
 ```shell
 # 使用阿里源
 $ pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/
 ```
+
 或
+
 ```shell
 # 使用清华源
 $ pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
 ```

---
\ No newline at end of file
+
+Q12 启动api.py时upload_file接口抛出 `partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import)`
+
+这是由于 charset_normalizer模块版本过高导致的，需要降低低charset_normalizer的版本,测试在charset_normalizer==2.1.0上可用。
+
+---
+
+Q13 启动api.py时upload_file接口，上传PDF或图片时，抛出OSError: [Errno 101] Network is unreachable
+
+某些情况下,linux系统上的ip在请求下载ch_PP-OCRv3_rec_infer.tar等文件时，可能会抛出OSError: [Errno 101] Network is unreachable，此时需要首先修改anaconda3/envs/[虚拟环境名]/lib/[python版本]/site-packages/paddleocr/ppocr/utils/network.py脚本，将57行的：
+
+```
+download_with_progressbar(url, tmp_path)
+```
+
+修改为：
+
+```
+        try:
+            download_with_progressbar(url, tmp_path)
+        except Exception as e:
+            print(f"download {url} error,please download it manually:")
+            print(e)
+```
+
+然后按照给定网址，如"https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar"手动下载文件，上传到对应的文件夹中，如“.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer/ch_PP-OCRv3_rec_infer.tar”.
+
+---
--- a/requirements.txt
+++ b/requirements.txt
@@ -32,4 +32,6 @@ starlette~=0.26.1
 numpy~=1.23.5
 tqdm~=4.65.0
 requests~=2.28.2
-tenacity~=8.2.2
\ No newline at end of file
+tenacity~=8.2.2
+# 默认下载的charset_normalizer模块版本过高会抛出，`artially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import)`
+charset_normalizer==2.1.0
\ No newline at end of file