RAG实践
Lv1-llamaindex+Internlm2 RAG实践
1. RAG简介
RAG(Retrieval-Augmented Generation)技术是一种结合了信息检索和文本生成的技术,旨在通过检索外部知识库 来增强生成模型的能力
1.1 RAG优化方法
2. 搭建环境
2.1 相关基础依赖python虚拟环境
1 2 3 4 conda activate llamaindex conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia pip install einops==0.7.0 protobuf==5.26.1
2.2 安装 Llamaindex和相关包
1 2 conda activate llamaindex pip install llama-index==0.10.38 llama-index-llms-huggingface==0.2.0 "transformers[torch]==4.41.1" "huggingface_hub[inference]==0.23.1" huggingface_hub==0.23.1 sentence-transformers==2.7.0 sentencepiece==0.2.0
2.3 下载 Sentence Transformer 模型
源词向量模型 Sentence Transformer :(也可以选用别的开源词向量模型来进行 Embedding) 运行以下指令,新建一个python文件
1 2 3 4 5 cd ~mkdir llamaindex_demomkdir modelcd ~/llamaindex_demotouch download_hf.py
打开download_hf.py
贴入以下代码
1 2 3 4 5 6 7 import os os.environ['HF_ENDPOINT' ] = 'https://hf-mirror.com' os.system('huggingface-cli download --resume-download sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 --local-dir /root/model/sentence-transformer' )
然后,在 /root/llamaindex_demo 目录下执行该脚本即可自动开始下载:
1 2 3 cd /root/llamaindex_democonda activate llamaindex python download_hf.py
更多关于镜像使用可以移步至 HF Mirror 查看。
2.4 下载 NLTK 相关资源
我们在使用开源词向量模型构建开源词向量的时候,需要用到第三方库 nltk
的一些资源。正常情况下,其会自动从互联网上下载,但可能由于网络原因会导致下载中断,此处我们可以从国内仓库镜像地址下载相关资源,保存到服务器上。 我们用以下命令下载 nltk 资源并解压到服务器上:
1 2 3 4 5 6 7 8 cd /rootgit clone https://gitee.com/yzy0612/nltk_data.git --branch gh-pages cd nltk_datamv packages/* ./cd tokenizersunzip punkt.zip cd ../taggersunzip averaged_perceptron_tagger.zip
2.5 安装词嵌入向量依赖
1 2 conda activate llamaindex pip install llama-index-embeddings-huggingface==0.2.0 llama-index-embeddings-instructor==0.1.3
2.6 准备知识库
你所需要检索的文件
2.7 引入模型 编写相关代码
详情请参考
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 import streamlit as stfrom llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settingsfrom llama_index.embeddings.huggingface import HuggingFaceEmbeddingfrom llama_index.llms.huggingface import HuggingFaceLLMst.set_page_config(page_title="llama_index_demo" , page_icon="🦜🔗" ) st.title("llama_index_demo" ) @st.cache_resource def init_models (): embed_model = HuggingFaceEmbedding( model_name="/root/model/sentence-transformer" ) Settings.embed_model = embed_model llm = HuggingFaceLLM( model_name="/root/model/internlm2-chat-1_8b" , tokenizer_name="/root/model/internlm2-chat-1_8b" , model_kwargs={"trust_remote_code" : True }, tokenizer_kwargs={"trust_remote_code" : True } ) Settings.llm = llm documents = SimpleDirectoryReader("/root/llamaindex_demo/data" ).load_data() index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine() return query_engine if 'query_engine' not in st.session_state: st.session_state['query_engine' ] = init_models() def greet2 (question ): response = st.session_state['query_engine' ].query(question) return response if "messages" not in st.session_state.keys(): st.session_state.messages = [{"role" : "assistant" , "content" : "你好,我是你的助手,有什么我可以帮助你的吗?" }] for message in st.session_state.messages: with st.chat_message(message["role" ]): st.write(message["content" ]) def clear_chat_history (): st.session_state.messages = [{"role" : "assistant" , "content" : "你好,我是你的助手,有什么我可以帮助你的吗?" }] st.sidebar.button('Clear Chat History' , on_click=clear_chat_history) def generate_llama_index_response (prompt_input ): return greet2(prompt_input) if prompt := st.chat_input(): st.session_state.messages.append({"role" : "user" , "content" : prompt}) with st.chat_message("user" ): st.write(prompt) if st.session_state.messages[-1 ]["role" ] != "assistant" : with st.chat_message("assistant" ): with st.spinner("Thinking..." ): response = generate_llama_index_response(prompt) placeholder = st.empty() placeholder.markdown(response) message = {"role" : "assistant" , "content" : response} st.session_state.messages.append(message)