In this blog, you will learn about LLM Chains, tools or functions calling, agents and how to leverage and customise h2oGPTe to create your first agent, all while benefiting from privately hosted LLMs, and secured data that stays with you1.
LLM chaining or Large Language Model Chaining is the process of integrating one or multiple large language models - such as the ones hosted on h2oGPTe - with other applications, tools, and services. These chains or pipelines allow a language model to leverage the strengths of other tools and services, and to overcome its own limitations, in order to generate the most effective output possible.
A chain is like a pipeline that processes an input by using a specific combination of components: tools, LLMs, services, parsers. It can be thought of as a sequence of processing 'steps' that performs a certain set of operations on an input and returns the result:
For example, an LLM might be chained with an API endpoint to fetch real-time information on stock market, the news or the real time prediction of an ML model. The goal is to create a more powerful and versatile AI system (such as personal assistants) that can provide accurate and useful outputs to a various range of inputs.
At the time of writing, there are several Open Source LLM Chaining frameworks: AutoGen, LangChain, LLamaIndex, Haystack. Today, I will be using LangChain as it is quite flexible and intuitive to use.
"Functions calling" or Tools or Functions are specialized functionalities that can accomplish a specific task given a set of inputs and are interfaces that an agent, chain, or LLM can use to interact with.
Agents are systems that use LLMs as reasoning and planning engines to decide which actions and steps to take, tools to use and the inputs to pass them. After executing an action (like using a tool and collecting the result from the tool), the results can be fed back into the LLM to assess if further actions are required or if the process can be concluded. They can understand and execute complex tasks efficiently and can even collaborate with each other to achieve more sophisticated outcomes.
Examples of agents and uses cases are : role-playing conversational Agent, Customer Service Support, Healthcare Assistant, Programming and Coding assistant, Analytical Agent, Legal & Compliance Assistant etc.
Typically, whatever could be resolved by a rule-based system do not require an LLM agent. Do not replace your existing well-established and inexpensive workflow with agents if flexibility and versatility is not going to add substantial value to your existing solution.
Enterprise h2oGPTe is a customisable search assistant that helps you find answers to questions about your documents, websites, or workplace content through its user interface or its python API. It automatically contextualizes its response with your own data (whether these are text, images, or audio) using various RAG (Retrieval-Augmented Generation) approaches or answer simply a general question.
yes! Enterprise h2oGPTe will soon be releasing a version of its software integrating LLM Agents h2oGPTe-agents (find out more here). However, creating your own agent is fun way to learn how it works, when they are useful and how they may be used in your own use case or applications! In addition, it helps you to get started with h2oGPTe API using its free public version.
To combine h2oGPTe tools and functionalities and LangChain framework, we will first start by wrapping our h2oGPTe client as a LangChain Chat Model that uses chat messages as inputs and returns chat messages as outputs (i.e. are not just plain text):
class h2ogpteChatModel(BaseChatModel)
"""h2oGPTeChatModel Class based on LangChain BaseChatModel Class.
Example:
h2ogpte = h2ogpteChatModel(h2ogpte_client = h2ogpte_client, model_name = "h2oai/h2o-danube3-4b-chat")
result = model.invoke([HumanMessage(content="hello")])
result = model.batch([[HumanMessage(content="hello")], [HumanMessage(content="world")]])
"""
h2ogpte_client: H2OGPTE
model_name: Optional[str]
collection_id: Optional[str]
kwargs: Optional[dict]
def _generate(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> ChatResult:
"""Pass on Prompt input to h2oGPTe via its client, create a chat session and generate outputs given this input
Args:
messages: the prompt composed of input messages
stop: a list of strings on which the model should stop generating.
run_manager: A run manager with callbacks for the LLM.
"""
prompt = ChatPromptTemplate(messages=messages)
messages_as_string = prompt.format()
if not self.model_name:
self.model_name = self.h2ogpte_client.get_llms()[0]['display_name']
if not self.collection_id:
self.collection_id = None
chat_session_id = h2ogpte_client.create_chat_session(self.collection_id)
with h2ogpte_client.connect(chat_session_id) as session:
response = session.query(message = messages_as_string,
llm = self.model_name,
**kwargs).content
responseAIMessage = AIMessage(
content=response,
additional_kwargs={},
response_metadata={},
)
return ChatResult(generations=[ChatGeneration(message=responseAIMessage)])
@property
def _llm_type(self) -> str:
"""Get the type of language model used by this chat model.
Used to uniquely identify the type of the model. Used for logging."""
return self.model_name
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Return a dictionary of identifying parameters.
Represent model parameterization for tracing purposes.
"""
return [model for model in self.h2ogpte_client.get_llms() if model["display_name"] == self.model_name][0]
h2ogpteChatModel takes as input an instance of h2oGPTe client, optionally the llm model_name as well as the string ID of the collection of documents to use with using Retrieval-Augmented-Generation (RAG). In parallel, Enterprise h2oGPTe has a highly customizable prompting to talk to LLM, document & collection of documents, summarise and extract information and is LLM agnostic - you can choose the model you need for your use case, including your own fine tuned model!
The _generate method is used to generate a chat result from a prompt: we create a chat session with h2oGPTe, pass the user prompt (provide additional optional arguments) and we return a ChatResult of the response generated from h2oGPTe chat session as AIMessage.
I will start selecting Danube3-4B, one of H2O series of Small Language Models (SLM) fine-tuned for conversation using H2O LLM Studio.
h2ogptechat = h2ogpteChatModel(h2ogpte_client = h2ogpte_client, model_name = 'h2oai/h2o-danube3-4b-chat')
In addition to the model, we can include parameters of the RAG, and the LLM arguments such as temperature:
h2ogptechat.invoke("whatsup", llm_args = {"temperature": 1})
AIMessage(content="Greetings! I'm here to help answer your questions and provide information to the best of my abilities. How can I assist you today?", id='run-b8359e79-59b3-4680-abed-327cc5f1c272-0')
Now, we have created our first component of an LLM Chain (h2ogptechat model instance), let's create our first chain with "LangChain Expression language" or LCEL.
We will start by adding to our chat model one of the simplest component of a chain: Prompt template. It will help to format the user input and provides a consistent and standardized way to present the prompt to the LLM. Please note, h2oGPTe offers a prompt catalog, that you can augment with your own custom for your use case (find out more here).
Next, the prompt is passed to the LLM component of the chain: h2ogptechat model which will process the input prompt and generates a response accordingly:
from langchain.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template( "My name is Audrey, {input}")
my_first_chain = prompt | h2ogptechat
my_first_chain.invoke({"input":"hello, how are you?"})
AIMessage(content="Hello Audrey! I'm glad to meet you. As an AI, I don't have feelings, but I'm here to help you with any questions or tasks you have. How can I assist you today?", id='run-5d126678-ed12-4713-8283-685e6a558235-0')
Depending on the requirements and objectives of the downstream application this chain is used for, the response from h2ogptechat model - the last step on our current chain - can be displayed to the user, further processed, or fed into the next component in the chain. This is useful for standardizing our h2ogptechat model output. For example, we can add a simple string output parser to our chain:
from langchain.schema.output_parser import StrOutputParser
chain = prompt | h2ogptechat | StrOutputParser()
chain.invoke({"input":"hello, how are you?"})
"Hello Audrey! I'm glad to meet you. As an AI, I don't have feelings, but I'm here to help you with any questions or tasks you have. How can I assist you today?"
Next, we will see how our h2ogptechat model can interact with user-defined tools, APIs to fetch some data, perform some actions whenever an user or an application is requesting it.
This is possible by binding functions or tools to our h2ogptechat model to choose from, that is providing to the model a list of the functions it has available and a description of their use and how they can be called. For example, we could have multiple functions available, including get_country_information defined as follow in a JSON schema:
[ { 'name': 'get_country_information',
'description': 'use to extract one of capital, currency, population or maps of a specific country',
'parameters': {'type': 'object',
'properties': {'country': {'type': 'string',
'description': 'The country of interest, for example Italy'},
'field_to_extract': {'type': 'string',
'enum': ['capital', 'currency', 'population', 'maps']}},
'required': ['country', 'field_to_extract']}}]
h2ogptechat_bind = h2ogptechat.bind(functions=functions)
By binding the functions defined above to h2ogptechat model instance, we are letting it know what functions we have access to each time the model is invoked, what they are for and how to appropriately invoke them using this JSON schema. This way, the LLM can determine based on the user query to invoke those tools if necessary.
Now, we just need to give some instructions to our h2ogptechat model to:
This is in essence what we are trying to make it do (image source: Tool Calling LangChain):
We will provide the instruction on what to make and how to select the tools to use as a prompt which we will add as a property of our h2ogpteChatModel Class called _function_calling_prompt :
@property
def _function_calling_prompt(self) -> str:
"""Get the function calling prompt for the LLM"""
return """ // namespace functions
You serve as a wrapper for utilizing multiple tools. Each tool that can be used MUST be specified in the list of the namespace functions section above ONLY. Ensure that the parameters provided to each tool are valid according to that tool's specification. Only functions in the functions namespace are permitted,
DO NOT GUESS or MAKE UP a function if it is not available in the list of namespace functions section above.
Their description, properties are described. Given the query given below, respond with the function names to use, list of arguments and their value in a valid JSON.
For example, if the instruction are matching one or more of the description of the function above:
{'content': '', 'additional_kwargs': {'function_call': {'name': '<name of relevant listed function_name>', 'arguments': '{"<name of argument_1 of listed function>": "<argument 1 value>", "<name of argument_2 of listed function>": "<argument 2 value>"}' }}}
if none of the functions help with the query, are matching or relevant to the query/instructions below, DO NOT include the functions and simply respond instead in the below format to the best of your knowledge of the question in a valid JSON:
{'content': '<response>', 'additional_kwargs': {}}
here is the query, please respond:
"""
We augment the _generate method to use _function_calling_prompt, check if a function call is necessary and leverage h2oGPTe guided generation to ensure it returns the appropriate schema for a function call if deemed useful based on the user query:
requires_function = session.query(message = last_message.content,
system_prompt = f"""If Only the following functions are available: {str(kwargs['functions'])},
do any of the listed functions are relevant in answering entirely the following user query?
Respond ONLY with True or False. user query:""",
llm = self.model_name).content
if "true" in str.lower(requires_function):
function_calling_prompt = f"""## functions: \n namespace functions \\ \n {str(kwargs["functions"])} """ + self._function_calling_prompt
# Use h2ogpte guided generation to return a valid JSON format of how the function should be called
response = session.query(message = last_message.content,
system_prompt = function_calling_prompt,
llm = self.model_name,
llm_args=dict(
response_format='json_object',
guided_json= json_schema_function_call
)).content
You can see the full version of the modified h2ogpteChatModel Class here.
Now, let's test out the bind version and ask multiple questions, both relevant to the functions we declared (get temperature or country information) and irrelevant questions (cost of life in San Francisco):
h2ogptechat = h2ogpteChatModel(h2ogpte_client = h2ogpte_client).bind(functions=functions)
chain = prompt | h2ogptechat_bind
chain.invoke({"input": "how much money do you need to earn to live comfortably in San Francisco?"})
AIMessage(content="Hello Audrey, I'm h2oGPTe. The cost of living can vary greatly depending on individual lifestyle, but on average, to live comfortably in San Francisco, it's often suggested that you might need a salary of around $120,000 to $130,000 per year. This takes into account the high cost of housing, healthcare, transportation, and other expenses. However, please note that this is a general estimate and individual needs can vary.", id='run-e3c2692a-42f1-4f32-bb75-a50c2ee7c7f8-0')
Here the h2ogptechat model deemed based on my instructions that a function call was not necessary and used its own internal knowledge to generate a general answer to my question. Please note, depending on the type of end application, we may want to prevent the LLM from using its internal knowledge to answer a user query (using guardrails or prompt engineering for example).
Next, let's ask relevant questions related to the functions we bind to h2ogptechat:
chain.invoke({"input": "what is the number of inhabitants in Albania?"})
AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_country_information', 'arguments': '{"country": "Albania", "field_to_extract": "population"}'}}, id='run-ccf5be42-545e-4378-9f34-5fc8d281cbfb-0')
chain.invoke({"input": "what is the currency in Turkey?"})
AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_country_information', 'arguments': '{"country": "Turkey", "field_to_extract": "currency"}'}}, id='run-2c027335-711b-4254-ad21-3df3470f7ac2-0')
As you can see, the outputs are consisting of the parameter additional_kwargs which contains the relevant function name, parameter(s) and their associated value, in other words, all the information necessary to call the relevant function to get the correct answer to our question.
We have most of the building blocks to build our first agent. So let's go a step further and can execute the underlying function and return the output to the user when relevant.
First, we will be borrowing the OpenAIFunctionsAgentOutputParser, which simply check if the output from our h2ogpteChatModel contains a function_call argument or not and define whether an action such as executing a function call is required from the agent (AgentAction) or the agent job is done here (AgentFinish) .
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain_core.agents import AgentAction, AgentFinish
The agent is really the chain in charge for deciding what path to take next. For this, we can define a route function, which will check if the agent has reached a final (AgentFinish) status and return the final answer or will route the query towards the appropriate tools (result.tool) and execute it with the function call arguments (result.tool_input):
from langchain.schema.agent import AgentFinish
def route(result):
if isinstance(result, AgentFinish):
return result.return_values['output']
else:
tools = {
"get_current_temperature": get_current_temperature,
"get_country_information": get_country_information,
}
print(result.tool_input) # print tool input parameters
print(result.tool) # print tool selected
print(tools[result.tool].run(result.tool_input)) # print results
return tools[result.tool].run(result.tool_input)
We can now add the output parser and the route function to our existing chain and test it out!
routed_chain = chain | OpenAIFunctionsAgentOutputParser() | route
routed_chain.invoke({"input": "hello, what is the temperature in San Francisco?"})
{'latitude': '37.7749', 'longitude': '-122.4194'}
get_current_temperature
The current temperature is 13.3°C
routed_chain.invoke({"input": "Show me map of Spain"})
{'country': 'Spain', 'field_to_extract': 'maps'}
get_country_information
Spain's maps is {'googleMaps': 'https://goo.gl/maps/138JaXW8EZzRVitY9', 'openStreetMaps': 'https://www.openstreetmap.org/relation/1311341'}
First, let's add some messages placeholder, that is a placeholder for intermediary steps or "scratchpad" that should be a sequence of messages and steps that contains the previous agent tool invocations, observations and the corresponding tool outputs put together or message to send to h2ogptechat model as input for the next generation. This becomes useful to further enhance our agent in leveraging previous messages and actions to, including initial user query and history.
Lastly, we will be borrowing the format_to_openai_function_messages, which simply format the intermediary_steps that is agent actions and observations into AIMessage or FunctionMessage. For this we also slightly modified the h2ogpteChatModel Class to check if the initial query has been answered or not yet.
from langchain_core.prompts.chat import HumanMessagePromptTemplate
from langchain.prompts import MessagesPlaceholder
from langchain.agents import AgentExecutor
prompt = ChatPromptTemplate.from_messages([
HumanMessagePromptTemplate.from_template("My name is Audrey, {input}"),
MessagesPlaceholder(variable_name="intermediate_steps"),
])
from langchain.agents.format_scratchpad.openai_functions import format_to_openai_function_messages
from langchain.schema.runnable import RunnablePassthrough
agent_chain = RunnablePassthrough.assign(
intermediate_steps= lambda x: format_to_openai_function_messages(x["intermediate_steps"])
) | prompt | h2ogpteChat_bind | OpenAIFunctionsAgentOutputParser()
agent_executor = AgentExecutor(agent=agent_chain,
tools=[get_country_information, get_current_temperature],
verbose=True,
handle_parsing_errors=True)
agent_executor.invoke({"input": "what is the current temperature in Istres, France in degrees celcius?"})
> Entering new AgentExecutor chain...
Invoking: `get_current_temperature` with `{'latitude': 43.5536, 'longitude': 4.7836}`
The current temperature is 24.1°CThe current temperature is 24.1°C
> Finished chain.
{'input': 'what is the current temperature in Istres, France in degrees celcius?',
'output': 'The current temperature is 24.1°C'}
agent_executor.invoke({"input": "Give me a map of Hungary?"})
> Entering new AgentExecutor chain...
Invoking: `get_country_information` with `{'country': 'Hungary', 'field_to_extract': 'maps'}`
Hungary's maps is {'googleMaps': 'https://goo.gl/maps/9gfPupm5bffixiFJ6', 'openStreetMaps': 'https://www.openstreetmap.org/relation/21335'}Hungary's maps is {'googleMaps': 'https://goo.gl/maps/9gfPupm5bffixiFJ6', 'openStreetMaps': 'https://www.openstreetmap.org/relation/21335'}
> Finished chain.
{'input': 'Give me a map of Hungary?',
'output': "Hungary's maps is {'googleMaps': 'https://goo.gl/maps/9gfPupm5bffixiFJ6', 'openStreetMaps': 'https://www.openstreetmap.org/relation/21335'}"}
Yay! we created our first agent with h2oGPTe and LangChain!
Of course, this is a simple Agent and if you want to go further, here are a few things you could do:
1 as customer of enterprise h2oGPTe.