Resolved: Query on the Project - Create a Q&A Chatbot with LangChain Project
Hi Instructor,
I hope you're doing well.
In the first step, load the course transcript. I have written the code up to string_list_concat, but this variable, string_list_concat, has concatenated in a way that the MarkdownHeaderTextSplitter is unable to split the headers with one # and two ##, as the entire string is in one list. Let me share the code here. I kindly request your assistance with it. Thank you.
Please take a look at the code and help me.
Thank you,
I hope you're doing well.
In the first step, load the course transcript. I have written the code up to string_list_concat, but this variable, string_list_concat, has concatenated in a way that the MarkdownHeaderTextSplitter is unable to split the headers with one # and two ##, as the entire string is in one list. Let me share the code here. I kindly request your assistance with it. Thank you.
Please take a look at the code and help me.
Thank you,
# LangChain Q&A ChatBot Project from langchain_community.document_loaders import PyPDFLoader from langchain_text_splitters import (MarkdownHeaderTextSplitter, TokenTextSplitter) from langchain_core.output_parsers import StrOutputParser from langchain_core.messages import HumanMessage, SystemMessage from langchain_core.prompts import (PromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate) from langchain_core.runnables import (RunnablePassthrough, RunnableLambda, chain) from langchain_openai import ChatOpenAI from langchain_openai.embeddings import OpenAIEmbeddings from langchain_chroma import Chroma # Load the Tableau Course Transcript loader_pdf = PyPDFLoader('./Introduction_to_Tableau.pdf') docs_list = loader_pdf.load() docs_list[0] ' '.join(docs_list[0].page_content.split()) #concatenating string_list_concat = "".join([" ".join(i.page_content.split()) for i in docs_list]) md_splitter = MarkdownHeaderTextSplitter(headers_to_split_on = [("#", "Section Title"), ("##", "Lecture Title")]) docs_list_md_split = md_splitter.split_text(string_list_concat) docs_list_md_split # But here the variable has an empty list as I believe the concatenation was not done with proper code
5 answers ( 2 marked as helpful)
Hey,
Thank you for reaching out and for engaging with the project!
Please, note the argument of the
Let me know if you need further assistance!
Kind regards,
365 Hristina
Thank you for reaching out and for engaging with the project!
Please, note the argument of the
join()
function when defining the string_list_concat
variable:string_list_concat = "".join([i.page_content for i in docs_list])
What this code does is extracting the page content of each document in docs_list
and then joining them in a single string. Applying these changes to your code should resolve the issue.Let me know if you need further assistance!
Kind regards,
365 Hristina
Hey Hristina,
Thank you so much for answering my question. However, how do we remove the /n new lines from a single string doc?
Kindly assist me.
Thank you,
Sharieff,
Thank you so much for answering my question. However, how do we remove the /n new lines from a single string doc?
Kindly assist me.
Thank you,
Sharieff,
Hey again!
Please, refer to lecture "Indexing: Document Loading with PyPDFLoader" from the "Retrieval Augmented Generation (RAG)" section of the "Build Chat Applications with OpenAI and LangChain" course.
However, please note that removing the newline characters is not part of the task. In the third section of the project titled "Create a Chain to Correct the Course Transcript", you'll be tasked with using an LLM to structure the text appropriately.
Let me know if I can assist further!
Kind regards,
365 Hristina
Please, refer to lecture "Indexing: Document Loading with PyPDFLoader" from the "Retrieval Augmented Generation (RAG)" section of the "Build Chat Applications with OpenAI and LangChain" course.
However, please note that removing the newline characters is not part of the task. In the third section of the project titled "Create a Chain to Correct the Course Transcript", you'll be tasked with using an LLM to structure the text appropriately.
Let me know if I can assist further!
Kind regards,
365 Hristina
Go it, thank you so much. I thought removing the newline character was a part of the task and hence the problem was created.
Once again thank you so much Hristina. My query is now resolved.
Once again thank you so much Hristina. My query is now resolved.
Happy to help! Enjoy the project!
Kind regards,
365 Hristina
Kind regards,
365 Hristina