Update README_en.md

2f6f854e · Calcitem · 3904b0f2 · 2f6f854e
--- a/README_en.md
+++ b/README_en.md
-# ChatGLM Application Based on Local Knowledge
+# ChatGLM Application with Local Knowledge Implementation

 ## Introduction

 🌍 [_中文文档_](README.md)

-🤖️ A local knowledge based LLM Application with [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) and [langchain](https://github.com/hwchase17/langchain).
+🤖️ This is a ChatGLM application based on local knowledge, implemented using [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) and [langchain](https://github.com/hwchase17/langchain).

-💡 Inspired by [document.ai](https://github.com/GanymedeNil/document.ai) by [GanymedeNil](https://github.com/GanymedeNil) and [ChatGLM-6B Pull Request](https://github.com/THUDM/ChatGLM-6B/pull/216) by [AlexZhangji](https://github.com/AlexZhangji).
+💡 Inspired by [document.ai](https://github.com/GanymedeNil/document.ai) and [Alex Zhangji](https://github.com/AlexZhangji)'s [ChatGLM-6B Pull Request](https://github.com/THUDM/ChatGLM-6B/pull/216), this project establishes a local knowledge question-answering application using open-source models.

-✅ In this project, [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese/tree/main) is used as Embedding Model，and [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) used as LLM。Based on those models，this project can be deployed **offline** with all **open source** models。
+✅ The embeddings used in this project are [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese/tree/main), and the LLM is [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B). Relying on these models, this project enables the use of **open-source** models for **offline private deployment**.

-## Webui 
-![webui](./img/ui1.png)
-Click on steps 1-3 according to the above figure to complete the model loading, file loading, and viewing of dialogue history
+⛓️ The implementation principle of this project is illustrated in the figure below. The process includes loading files -> reading text -> text segmentation -> text vectorization -> question vectorization -> matching the top k most similar text vectors to the question vector -> adding the matched text to `prompt` along with the question as context -> submitting to `LLM` to generate an answer.

-![webui](./img/ui2.png)
-Click on the Use via API at the bottom to view the API interface. Existing applications can be docked and called through post requests
+![Implementation schematic diagram](img/langchain+chatglm.png)

-### TODO
-[] Add Model Load progress bar
-[] Add output content and error prompts
-[] International language switching
-[] Reference annotation
-[] Add plugin system (can be used for basic LORA training, etc.)
+🚩 This project does not involve fine-tuning or training; however, fine-tuning or training can be employed to optimize the effectiveness of this project.

-## Update
+[TOC]

-**[2023/04/11]** 
-1. Add Webui V0.1 version and synchronize the updated content before the current day;
-2. Automatically read knowledge_ based_ Enumerate LLM and embedding models in chatglm.py, select and click 'setting' to load the model. You can switch models for testing at any time
-3. The length of the conversation history can be manually adjusted and can be adjusted according to the size of the video memory
-4. Add the upload file function, select the uploaded file from the dropdown box, click loading to load the file, and the loaded file can be changed at any time during the process
-5. Add use via API at the bottom to connect to your own system
+## Changelog

 **[2023/04/07]**
-1. Fix bug which costs twice gpu memory (Thanks to [@suc16](https://github.com/suc16) and [@myml](https://github.com/myml)).
-2. Add gpu memory clear function after each call of ChatGLM.
-3. Add `nghuyong/ernie-3.0-nano-zh` and `nghuyong/ernie-3.0-base-zh` as Embedding model alternatives，costing less gpu than `GanymedeNil/text2vec-large-chinese` (Thanks to [@lastrei](https://github.com/lastrei))
+
+   1. Resolved the issue of doubled video memory usage when loading the ChatGLM model (thanks to [@suc16](https://github.com/suc16) and [@myml](https://github.com/myml));
+   2. Added a mechanism to clear video memory;
+   3. Added `nghuyong/ernie-3.0-nano-zh` and `nghuyong/ernie-3.0-base-zh` as Embedding model options, which consume less video memory resources than `GanymedeNil/text2vec-large-chinese` (thanks to [@lastrei](https://github.com/lastrei)).

 **[2023/04/09]**
-1. Using `RetrievalQA` in `langchain` to replace the previously selected `ChatVectorDBChain`, the replacement can effectively solve the problem of program stopping after 2-3 questions due to insufficient gpu memory.
-2. Add `EMBEDDING_MODEL`, `VECTOR_SEARCH_TOP_K`, `LLM_MODEL`, `LLM_HISTORY_LEN`, `REPLY_WITH_SOURCE` parameter value settings in `knowledge_based_chatglm.py`.
-3. Add `chatglm-6b-int4`, `chatglm-6b-int4-qe` with smaller GPU memory requirements as LLM model alternatives.
-4. Correct code errors in `README.md` (Thanks to [@calcitem](https://github.com/calcitem)).

-## Usage
+   1. Replaced the previously selected `ChatVectorDBChain` with `RetrievalQA` in `langchain`, effectively reducing the issue of stopping due to insufficient video memory after asking 2-3 times;
+   2. Added `EMBEDDING_MODEL`, `VECTOR_SEARCH_TOP_K`, `LLM_MODEL`, `LLM_HISTORY_LEN`, `REPLY_WITH_SOURCE` parameter value settings in `knowledge_based_chatglm.py`;
+   3. Added `chatglm-6b-int4` and `chatglm-6b-int4-qe`, which require less GPU memory, as LLM model options;
+   4. Corrected code errors in `README.md` (thanks to [@calcitem](https://github.com/calcitem)).

-### Hardware Requirements
+**[2023/04/11]**
+
+   1. Added Web UI V0.1 version (thanks to [@liangtongt](https://github.com/liangtongt));
+   2. Added Frequently Asked Questions in `README.md` (thanks to [@calcitem](https://github.com/calcitem) and [@bolongliu](https://github.com/bolongliu));
+   3. Enhanced automatic detection for the availability of `cuda`, `mps`, and `cpu` for LLM and Embedding model running devices;
+   4. Added a check for `filepath` in `knowledge_based_chatglm.py`. In addition to supporting single file import, it now supports a single folder path as input. After input, it will traverse each file in the folder and display a command-line message indicating the success of each file load.
+
+   **[2023/04/12]**

- ChatGLM Hardware Requirements
+   1. Replaced the sample files in the Web UI to avoid issues with unreadable files due to encoding problems in Ubuntu;
+   2. Replaced the prompt template in `knowledge_based_chatglm.py` to prevent confusion in the content returned by ChatGLM, which may arise from the prompt template containing Chinese and English bilingual text.

-    | **Quantization Level** | **GPU Memory** |
-    |------------------------|----------------|
-    | FP16（no quantization）  | 13 GB          |
-    | INT8                   | 10 GB          |
-    | INT4                   | 6 GB           |
- Embedding Hardware Requirements
+## How to Use

-   The default Embedding model in this repo is [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese/tree/main), 3GB GPU Memory required when running on GPU.
+### Hardware Requirements
+
+- ChatGLM-6B Model Hardware Requirements
+  
+     | **Quantization Level** | **Minimum GPU Memory** (inference) | **Minimum GPU Memory** (efficient parameter fine-tuning) |
+     | -------------- | ------------------------- | -------- ------------------------- |
+     | FP16 (no quantization) | 13 GB | 14 GB |
+     | INT8 | 8 GB | 9 GB |
+     | INT4 | 6 GB | 7 GB |

+- Embedding Model Hardware Requirements
+
+     The default Embedding model [GanymedeNil/text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese/tree/main) in this project occupies around 3GB of video memory and can also be configured to run on a CPU.
 ### Software Requirements
-This repo has been tested in python 3.8 environment。

-### 1. install python packages
+This project has been tested in a Python 3.8 environment.
+### 1. Install Python Dependencies
+
 ```commandline
 pip install -r requirements.txt
 ```
-Attention: With langchain.document_loaders.UnstructuredFileLoader used to connect with local knowledge file, you may need some other dependencies as mentioned in  [langchain documentation](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html)
+Note: When using `langchain.document_loaders.UnstructuredFileLoader` for unstructured file loading, you may need to install other dependent packages according to the documentation. Please refer to the [langchain documentation](https://python.langchain.com/en/latest/modules /indexes/document_loaders/examples/unstructured_file.html)

-### 2. Run [knowledge_based_chatglm.py](cli_demo.py) script
+### 2. Run Scripts to Experience Web UI or Command Line Interaction
+Execute [webui.py](webui.py) script to experience **Web interaction** <img src="https://img.shields.io/badge/Version-0.1-brightgreen">
 ```commandline
-python knowledge_based_chatglm.py
+python webui.py
 ```
+The resulting interface is shown below:
+![webui](img/ui1.png)
+The API interface provided in the Web UI is shown below:
+![webui](img/ui2.png)The Web UI supports the following features:

-### Known issues
- Currently tested to support txt, docx, md format files, for more file formats please refer to [langchain documentation](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html). If the document contains special characters, the file may not be correctly loaded.
- When running this project with macOS, it may not work properly due to incompatibility with pytorch caused by macOS version 13.3 and above.
+1. Automatically reads the `LLM` and `embedding` model enumerations in `knowledge_based_chatglm.py`, allowing you to select and load the model by clicking `setting`. Models can be switched at any time for testing.
+2. The length of retained dialogue history can be manually adjusted according to the available video memory.
+3. Adds a file upload function. Select the uploaded file through the drop-down box, click `loading` to load the file, and change the loaded file at any time during the process.
+4. Adds a `use via API` option at the bottom to connect to your own system.

-### FAQ
+Alternatively, execute the [knowledge_based_chatglm.py](https://chat.openai.com/chat/cli_demo.py) script to experience **command line interaction**:

-Q: How to solve `Resource punkt not found.`?
+```commandline
+python knowledge_based_chatglm.py
+```

-A: Unzip `packages/tokenizers` in https://github.com/nltk/nltk_data/raw/gh-pages/packages/tokenizers/punkt.zip and put it in the corresponding directory of `Searched in:`.
+### FAQ
+Q1: What file formats does this project support?

-Q: How to solve `Resource averaged_perceptron_tagger not found.`?
+A1: Currently, this project has been tested with txt, docx, and md file formats. For more file formats, please refer to the [langchain documentation](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html). It is known that if the document contains special characters, there might be issues with loading the file.

-A: Download https://github.com/nltk/nltk_data/blob/gh-pages/packages/taggers/averaged_perceptron_tagger.zip, decompress it and put it in the corresponding directory of `Searched in:`.
+Q2: How can I resolve the `detectron2` dependency issue when reading specific file formats?

-## Roadmap
+A2: As the installation process for this package can be problematic and it is only required for some file formats, it is not included in `requirements.txt`. You can install it with the following command:

- [x] local knowledge based application with langchain + ChatGLM-6B
- [x] unstructured files loaded with langchain
- [ ] more different file format loaded with langchain
- [ ] implement web ui DEMO with gradio/streamlit 
- [ ] implement API with fastapi，and web ui DEMO with API
+```commandline
+pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"
+```

+Q3: How can I solve the `Resource punkt not found.` error?
+
+A3: Unzip the `packages/tokenizers` folder from https://github.com/nltk/nltk_data/raw/gh-pages/packages/tokenizers/punkt.zip, and place it in the `nltk_data/tokenizers` storage path.
+
+The `nltk_data` storage path can be found using `nltk.data.path`.
+
+Q4: How can I solve the `Resource averaged_perceptron_tagger not found.` error?
+
+A4: Download https://github.com/nltk/nltk_data/blob/gh-pages/packages/taggers/averaged_perceptron_tagger.zip, extract it, and place it in the `nltk_data/taggers` storage path.
+
+The `nltk_data` storage path can be found using `nltk.data.path`.
+
+Q5: Can this project run in Google Colab?
+
+A5: You can try running the chatglm-6b-int4 model in Google Colab. Please note that if you want to run the Web UI in Colab, you need to set the `share` parameter in `demo.queue(concurrency_count=3).launch(server_name='0.0.0.0', share=False, inbrowser=False)` to `True`.
+
+## DEMO
+
+Using the question "What are the limitations of ChatGLM-6B and how can they be improved?" as an example:
+
+Without using langchain to access local documents:
+
+> ChatGLM-6B is a chatbot based on GPT-3, and its limitations are mainly reflected in the following aspects:
+>
+>   1. Language model accuracy: Although GPT-3 is a large language model with relatively high accuracy, it still has some limitations. For instance, it may not be able to accurately understand certain dialects, accents, or complex sentence structures.
+>   2. Completeness of the knowledge graph: GPT-3 can generate some knowledge graphs, but its completeness and accuracy may not be sufficient. For example, it may struggle to accurately understand specific domain expertise and terminology.
+>   3. Coherence of dialogue generation: While GPT-3 can generate somewhat coherent dialogues, they may not always be natural. For instance, it may generate false, incoherent dialogue or misinterpret the user's intent.
+>
+>   To improve ChatGLM-6B, consider the following aspects:
+>
+>   1. Enhance the accuracy of the language model: The accuracy of the language model can be improved by increasing the training data, using better language model architectures, and optimizing the model training process.
+>   2. Strengthen the integrity of the knowledge graph: The integrity of the knowledge graph can be enhanced by increasing its dimensions, employing better knowledge graph construction methods, and optimizing the knowledge graph's matching algorithm.
+>   3. Boost the coherence of dialogue generation: The coherence of dialogue generation can be improved by augmenting the context information of the dialogue generation model, utilizing better context information extraction methods, and optimizing the dialogue generation model's algorithm.
+>   4. Incorporate more human feedback: Human feedback can help ChatGLM-6B better understand users' intentions and needs, thereby improving the quality and accuracy of dialogue generation. More human feedback can be introduced by involving more human agents and using human feedback collectors.
+
+ChatGLM's answer after using LangChain to access the README.md file of the ChatGLM-6B project:
+>ChatGLM-6B is a deep learning-based natural language processing model that excels at answering questions. However, it also has some limitations, as follows:
+>1. Limited language understanding: ChatGLM-6B has been primarily trained on Chinese natural language, and its understanding of other languages may be limited.
+>2. Insufficient knowledge base: The training dataset of ChatGLM-6B contains only a Chinese corpus, so it may not be able to answer non-Chinese questions or queries in specific domains.
+>3. Limited data volume: ChatGLM-6B's training dataset has only a few million records, which may hinder its ability to answer very specific or complex questions.
+>
+>To improve ChatGLM-6B, consider the following aspects:
+>1. Expand language knowledge: Learn natural language processing techniques in other languages to broaden the model's language understanding capabilities.
+>2. Broaden the knowledge base: Collect more Chinese corpora or use datasets in other languages to expand the model's knowledge base.
+>3. Increase data volume: Use larger datasets to train ChatGLM-6B, which can improve the model's performance.
+>4. Introduce more evaluation metrics: Incorporate additional evaluation metrics to assess the model's performance, which can help identify the shortcomings and limitations of ChatGLM-6B.
+>5. Enhance the model architecture: Improve ChatGLM-6B's model architecture to boost its performance and capabilities. For example, employ larger neural networks or refined convolutional neural network structures.
+
+## Road map
+- [x] Implement LangChain + ChatGLM-6B for local knowledge application
+- [x] Unstructured file access based on langchain
+   - [x].md
+   - [x].pdf (need to install `detectron2` as described in FAQ Q2)
+   - [x].docx
+   - [x].txt
+- [ ] Add support for more LLM models
+   - [x] THUDM/chatglm-6b
+   - [x] THUDM/chatglm-6b-int4
+   - [x] THUDM/chatglm-6b-int4-qe
+- [ ] Add Web UI DEMO
+   - [x]  Implement Web UI DEMO using Gradio
+   - [ ] Add model loading progress bar
+   - [ ] Add output and error messages
+   - [ ] Internationalization for language switching
+   - [ ] Citation callout
+- [ ] Use FastAPI to implement API deployment method and develop a Web UI DEMO for API calls
\ No newline at end of file