Return to page

BLOG

Agents | Building your first Agent step-by-step with h2oGPTe & LLM Chains

 headshot

By Audrey Létévé | minute read | October 16, 2024

Blog decorative banner image

In this blog, you will learn about LLM Chains, tools or functions calling, agents and how to leverage and customise h2oGPTe to create your first agent, all while benefiting from privately hosted LLMs, and secured data that stays with you1.

 

The basics: what are LLMs chains, tools and Agents?

What Are LLM Chains?

LLM chaining or Large Language Model Chaining is the process of integrating one or multiple large language models - such as the ones hosted on h2oGPTe - with other applications, tools, and services. These chains or pipelines allow a language model to leverage the strengths of other tools and services, and to overcome its own limitations, in order to generate the most effective output possible.

A chain is like a pipeline that processes an input by using a specific combination of components: tools, LLMs, services, parsers. It can be thought of as a sequence of processing 'steps' that performs a certain set of operations on an input and returns the result:

For example, an LLM might be chained with an API endpoint to fetch real-time information on stock market, the news or the real time prediction of an ML model. The goal is to create a more powerful and versatile AI system (such as personal assistants) that can provide accurate and useful outputs to a various range of inputs.

LLM Chaining frameworks

At the time of writing, there are several Open Source LLM Chaining frameworks: AutoGen, LangChain, LLamaIndex, Haystack. Today, I will be using LangChain as it is quite flexible and intuitive to use.

What are functions calling?

"Functions calling" or Tools or Functions are specialized functionalities that can accomplish a specific task given a set of inputs and are interfaces that an agent, chain, or LLM can use to interact with. 

What are LLM Agents? 

Agents are systems that use LLMs as reasoning and planning engines to decide which actions and steps to take, tools to use and the inputs to pass them. After executing an action (like using a tool and collecting the result from the tool), the results can be fed back into the LLM to assess if further actions are required or if the process can be concluded. They can understand and execute complex tasks efficiently and can even collaborate with each other to achieve more sophisticated outcomes.

When to use LLM Agents?

Examples of agents and uses cases are : role-playing conversational Agent, Customer Service Support, Healthcare Assistant, Programming and Coding assistant, Analytical Agent, Legal & Compliance Assistant etc.

When not to use LLM Agents?

Typically, whatever could be resolved by a rule-based system do not require an LLM agent. Do not replace your existing well-established and inexpensive workflow with agents if flexibility and versatility is not going to add substantial value to your existing solution.

In summary:
  • LLM Chains are sequences of processing steps for prompts. They are used within agents to define how the agent processes information.
  • Tools are specialized functionalities that are used by agents or within chains for specific tasks.
  • Agents are like characters with specific capabilities which use chains and tools to perform their functions.

what about Enterprise h2oGPTe?

Enterprise h2oGPTe is a customisable search assistant that helps you find answers to questions about your documents, websites, or workplace content through its user interface or its python API. It automatically contextualizes its response with your own data (whether these are text, images, or audio) using various RAG (Retrieval-Augmented Generation) approaches or answer simply a general question.

Does Enterprise h2oGPTe provide its own LLM Agents to use?

yes! Enterprise h2oGPTe will soon be releasing a version of its software integrating LLM Agents h2oGPTe-agents (find out more here). However, creating your own agent is fun way to learn how it works, when they are useful and how they may be used in your own use case or applications! In addition, it helps you to get started with h2oGPTe API using its free public version.

How do I create an LLM Agent? Let's get started and put the theory into practice!

 

  • First, start creating your own API key using the public version of Enterprise h2oGPTe (not for production usage)
  • You can download the complete notebook to follow here
  • (optional) h2oGPTe Python Client documentation is here

The main component of an LLM Chain: the LLM

To combine h2oGPTe tools and functionalities and LangChain framework, we will first start by wrapping our h2oGPTe client as a LangChain Chat Model that uses chat messages as inputs and returns chat messages as outputs (i.e. are not just plain text):

class h2ogpteChatModel(BaseChatModel)


"""h2oGPTeChatModel Class based on LangChain BaseChatModel Class.
Example:
    h2ogpte = h2ogpteChatModel(h2ogpte_client = h2ogpte_client, model_name = "h2oai/h2o-danube3-4b-chat")
    result = model.invoke([HumanMessage(content="hello")])
    result = model.batch([[HumanMessage(content="hello")], [HumanMessage(content="world")]])
"""


h2ogpte_client: H2OGPTE
model_name: Optional[str]
collection_id: Optional[str]
kwargs: Optional[dict]

def _generate(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> ChatResult:


"""Pass on Prompt input to h2oGPTe via its client, create a chat session and generate outputs given this input
Args:
    messages: the prompt composed of input messages
    stop: a list of strings on which the model should stop generating.
    run_manager: A run manager with callbacks for the LLM.
"""


prompt = ChatPromptTemplate(messages=messages)
messages_as_string = prompt.format()
if not self.model_name:
    self.model_name = self.h2ogpte_client.get_llms()[0]['display_name']

if not self.collection_id:
    self.collection_id = None

chat_session_id = h2ogpte_client.create_chat_session(self.collection_id)
with h2ogpte_client.connect(chat_session_id) as session:
    response = session.query(message = messages_as_string,
            llm = self.model_name,
                **kwargs).content

    responseAIMessage = AIMessage(
            content=response,
            additional_kwargs={},
            response_metadata={},
        )

return ChatResult(generations=[ChatGeneration(message=responseAIMessage)])


@property
def _llm_type(self) -> str:
    """Get the type of language model used by this chat model.
    Used to uniquely identify the type of the model. Used for logging."""

    return self.model_name

@property
def _identifying_params(self) -> Dict[str, Any]:
    """Return a dictionary of identifying parameters.
    Represent model parameterization for tracing purposes.
    """

    return [model for model in self.h2ogpte_client.get_llms() if model["display_name"] == self.model_name][0]

 

h2ogpteChatModel takes as input an instance of h2oGPTe client, optionally the llm model_name as well as the string ID of the collection of documents to use with using Retrieval-Augmented-Generation (RAG). In parallel, Enterprise h2oGPTe has a highly customizable prompting to talk to LLM, document & collection of documents, summarise and extract information and is LLM agnostic - you can choose the model you need for your use case, including your own fine tuned model!

The _generate method is used to generate a chat result from a prompt: we create a chat session with h2oGPTe, pass the user prompt (provide additional optional arguments) and we return a ChatResult of the response generated from h2oGPTe chat session as AIMessage.

I will start selecting Danube3-4B, one of H2O series of Small Language Models (SLM) fine-tuned for conversation using H2O LLM Studio.

h2ogptechat = h2ogpteChatModel(h2ogpte_client = h2ogpte_client, model_name = 'h2oai/h2o-danube3-4b-chat')

 

In addition to the model, we can include parameters of the RAG, and the LLM arguments such as temperature:

h2ogptechat.invoke("whatsup", llm_args = {"temperature": 1})

AIMessage(content="Greetings! I'm here to help answer your questions and provide information to the best of my abilities. How can I assist you today?", id='run-b8359e79-59b3-4680-abed-327cc5f1c272-0')

Now, we have created our first component of an LLM Chain (h2ogptechat model instance), let's create our first chain with "LangChain Expression language" or LCEL.

 

The simplest chain component: the Prompt Template

We will start by adding to our chat model one of the simplest component of a chain: Prompt template. It will help to format the user input and provides a consistent and standardized way to present the prompt to the LLM. Please note, h2oGPTe offers a prompt catalog, that you can augment with your own custom for your use case (find out more here).

Next, the prompt is passed to the LLM component of the chain: h2ogptechat model which will process the input prompt and generates a response accordingly:

from langchain.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template( "My name is Audrey, {input}")

my_first_chain = prompt | h2ogptechat

my_first_chain.invoke({"input":"hello, how are you?"})

AIMessage(content="Hello Audrey! I'm glad to meet you. As an AI, I don't have feelings, but I'm here to help you with any questions or tasks you have. How can I assist you today?", id='run-5d126678-ed12-4713-8283-685e6a558235-0')

 

That's it, we have created our first chain!

Depending on the requirements and objectives of the downstream application this chain is used for, the response from h2ogptechat model - the last step on our current chain - can be displayed to the user, further processed, or fed into the next component in the chain. This is useful for standardizing our h2ogptechat model output. For example, we can add a simple string output parser to our chain:

from langchain.schema.output_parser import StrOutputParser

chain = prompt | h2ogptechat | StrOutputParser()

chain.invoke({"input":"hello, how are you?"})

"Hello Audrey! I'm glad to meet you. As an AI, I don't have feelings, but I'm here to help you with any questions or tasks you have. How can I assist you today?"

 

Next, we will see how our h2ogptechat model can interact with user-defined tools, APIs to fetch some data, perform some actions whenever an user or an application is requesting it.

Adding Tools to our Chat Model

This is possible by binding functions or tools to our h2ogptechat model to choose from, that is providing to the model a list of the functions it has available and a description of their use and how they can be called. For example, we could have multiple functions available, including get_country_information defined as follow in a JSON schema:

[ { 'name': 'get_country_information',
    'description': 'use to extract one of capital, currency, population or maps of a specific country',
    'parameters': {'type': 'object',
        'properties': {'country': {'type': 'string',
                                                'description': 'The country of interest, for example Italy'},
                            'field_to_extract': {'type': 'string',
                                                            'enum': ['capital', 'currency', 'population', 'maps']}},
            'required': ['country', 'field_to_extract']}}]

 

 

h2ogptechat_bind = h2ogptechat.bind(functions=functions)

 

 

By binding the functions defined above to h2ogptechat model instance, we are letting it know what functions we have access to each time the model is invoked, what they are for and how to appropriately invoke them using this JSON schema. This way, the LLM can determine based on the user query to invoke those tools if necessary.

Now, we just need to give some instructions to our h2ogptechat model to:

  • make it aware of the functions available
  • assess when and if these functions are necessary given an input prompt
  • respond in a consistent manner that will allow downstream to call the function appropriately (in our case convert natural language into valid API calls)

This is in essence what we are trying to make it do (image source: Tool Calling LangChain):

LangChain workflow tool calling

We will provide the instruction on what to make and how to select the tools to use as a prompt which we will add as a property of our h2ogpteChatModel Class called _function_calling_prompt :

@property
def _function_calling_prompt(self) -> str:

"""Get the function calling prompt for the LLM"""
return """ // namespace functions
You serve as a wrapper for utilizing multiple tools. Each tool that can be used MUST be specified in the list of the namespace functions section above ONLY. Ensure that the parameters provided to each tool are valid according to that tool's specification. Only functions in the functions namespace are permitted,
DO NOT GUESS or MAKE UP a function if it is not available in the list of namespace functions section above.
Their description, properties are described. Given the query given below, respond with the function names to use, list of arguments and their value in a valid JSON.
For example, if the instruction are matching one or more of the description of the function above:
{'content': '', 'additional_kwargs': {'function_call': {'name': '<name of relevant listed function_name>', 'arguments': '{"<name of argument_1 of listed function>": "<argument 1 value>", "<name of argument_2 of listed function>": "<argument 2 value>"}' }}}

if none of the functions help with the query, are matching or relevant to the query/instructions below, DO NOT include the functions and simply respond instead in the below format to the best of your knowledge of the question in a valid JSON:

{'content': '<response>', 'additional_kwargs': {}}

here is the query, please respond:
"""

 

We augment the _generate method to use _function_calling_prompt, check if a function call is necessary and leverage h2oGPTe guided generation to ensure it returns the appropriate schema for a function call if deemed useful based on the user query:

requires_function = session.query(message = last_message.content,
                        system_prompt = f"""If Only the following functions are available: {str(kwargs['functions'])},
                                    do any of the listed functions are relevant in answering entirely the following user query?
                                        Respond ONLY with True or False. user query:""",

                         llm = self.model_name).content


if "true" in str.lower(requires_function):
        function_calling_prompt = f"""## functions: \n namespace functions \\ \n {str(kwargs["functions"])} """ + self._function_calling_prompt

        # Use h2ogpte guided generation to return a valid JSON format of how the function should be called
        response = session.query(message = last_message.content,
                            system_prompt = function_calling_prompt,
                            llm = self.model_name,
                            llm_args=dict(
                                    response_format='json_object',
                                guided_json= json_schema_function_call
                    )).content

 

You can see the full version of the modified h2ogpteChatModel Class here.

 

Enabling Function Calling

Now, let's test out the bind version and ask multiple questions, both relevant to the functions we declared (get temperature or country information) and irrelevant questions (cost of life in San Francisco):

h2ogptechat = h2ogpteChatModel(h2ogpte_client = h2ogpte_client).bind(functions=functions)
chain = prompt | h2ogptechat_bind

chain.invoke({"input": "how much money do you need to earn to live comfortably in San Francisco?"})

AIMessage(content="Hello Audrey, I'm h2oGPTe. The cost of living can vary greatly depending on individual lifestyle, but on average, to live comfortably in San Francisco, it's often suggested that you might need a salary of around $120,000 to $130,000 per year. This takes into account the high cost of housing, healthcare, transportation, and other expenses. However, please note that this is a general estimate and individual needs can vary.", id='run-e3c2692a-42f1-4f32-bb75-a50c2ee7c7f8-0')

Here the h2ogptechat model deemed based on my instructions that a function call was not necessary and used its own internal knowledge to generate a general answer to my question. Please note, depending on the type of end application, we may want to prevent the LLM from using its internal knowledge to answer a user query (using guardrails or prompt engineering for example).

Next, let's ask relevant questions related to the functions we bind to h2ogptechat:

chain.invoke({"input": "what is the number of inhabitants in Albania?"})

AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_country_information', 'arguments': '{"country": "Albania", "field_to_extract": "population"}'}}, id='run-ccf5be42-545e-4378-9f34-5fc8d281cbfb-0')

chain.invoke({"input": "what is the currency in Turkey?"})

AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_country_information', 'arguments': '{"country": "Turkey", "field_to_extract": "currency"}'}}, id='run-2c027335-711b-4254-ad21-3df3470f7ac2-0')

As you can see, the outputs are consisting of the parameter additional_kwargs which contains the relevant function name, parameter(s) and their associated value, in other words, all the information necessary to call the relevant function to get the correct answer to our question.

 

Building our LLM agent

AgentAction & AgentFinish

We have most of the building blocks to build our first agent. So let's go a step further and can execute the underlying function and return the output to the user when relevant.

First, we will be borrowing the OpenAIFunctionsAgentOutputParser, which simply check if the output from our h2ogpteChatModel  contains a function_call argument or not and define whether an action such as executing a function call is required from the agent (AgentAction) or the agent job is done here (AgentFinish) .

from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain_core.agents import AgentAction, AgentFinish

Routing

The agent is really the chain in charge for deciding what path to take next. For this, we can define a route function, which will check if the agent has reached a final (AgentFinish) status and return the final answer or will route the query towards the appropriate tools (result.tool) and execute it with the function call arguments (result.tool_input):

from langchain.schema.agent import AgentFinish

def route(result):
    if isinstance(result, AgentFinish):
        return result.return_values['output']
    else:
        tools = {
            "get_current_temperature": get_current_temperature,
            "get_country_information": get_country_information,
            }
        print(result.tool_input) # print tool input parameters
        print(result.tool) # print tool selected
        print(tools[result.tool].run(result.tool_input)) # print results
        return tools[result.tool].run(result.tool_input)

 

We can now add the output parser and the route function to our existing chain and test it out!

routed_chain = chain | OpenAIFunctionsAgentOutputParser() | route

routed_chain.invoke({"input": "hello, what is the temperature in San Francisco?"})

{'latitude': '37.7749', 'longitude': '-122.4194'}
get_current_temperature
The current temperature is 13.3°C

 

 

routed_chain.invoke({"input": "Show me map of Spain"})

{'country': 'Spain', 'field_to_extract': 'maps'}
get_country_information
Spain's maps is {'googleMaps': 'https://goo.gl/maps/138JaXW8EZzRVitY9', 'openStreetMaps': 'https://www.openstreetmap.org/relation/1311341'}

 

... We are almost there!

First, let's add some messages placeholder, that is a placeholder for intermediary steps or "scratchpad" that should be a sequence of messages and steps that contains the previous agent tool invocations, observations and the corresponding tool outputs put together or message to send to h2ogptechat model as input for the next generation. This becomes useful to further enhance our agent in leveraging previous messages and actions to, including initial user query and history.

Lastly, we will be borrowing the format_to_openai_function_messages, which simply format the intermediary_steps that is agent actions and observations into AIMessage or FunctionMessage. For this we also slightly modified the h2ogpteChatModel Class to check if the initial query has been answered or not yet.

LangChain workflow tool calling to user response

 

from langchain_core.prompts.chat import HumanMessagePromptTemplate
from langchain.prompts import MessagesPlaceholder
from langchain.agents import AgentExecutor

prompt = ChatPromptTemplate.from_messages([
HumanMessagePromptTemplate.from_template("My name is Audrey, {input}"),
MessagesPlaceholder(variable_name="intermediate_steps"),
])

from langchain.agents.format_scratchpad.openai_functions import format_to_openai_function_messages
from langchain.schema.runnable import RunnablePassthrough

agent_chain = RunnablePassthrough.assign(
intermediate_steps= lambda x: format_to_openai_function_messages(x["intermediate_steps"])
) | prompt | h2ogpteChat_bind | OpenAIFunctionsAgentOutputParser()

 

agent_executor = AgentExecutor(agent=agent_chain,
tools=[get_country_information, get_current_temperature],
verbose=True,
handle_parsing_errors=True)

Testing our LLM Agent

agent_executor.invoke({"input": "what is the current temperature in Istres, France in degrees celcius?"})

> Entering new AgentExecutor chain...

Invoking: `get_current_temperature` with `{'latitude': 43.5536, 'longitude': 4.7836}`


The current temperature is 24.1°CThe current temperature is 24.1°C

> Finished chain.

{'input': 'what is the current temperature in Istres, France in degrees celcius?',
'output': 'The current temperature is 24.1°C'}

 

agent_executor.invoke({"input": "Give me a map of Hungary?"})

> Entering new AgentExecutor chain...

Invoking: `get_country_information` with `{'country': 'Hungary', 'field_to_extract': 'maps'}`


Hungary's maps is {'googleMaps': 'https://goo.gl/maps/9gfPupm5bffixiFJ6', 'openStreetMaps': 'https://www.openstreetmap.org/relation/21335'}Hungary's maps is {'googleMaps': 'https://goo.gl/maps/9gfPupm5bffixiFJ6', 'openStreetMaps': 'https://www.openstreetmap.org/relation/21335'}

> Finished chain.

{'input': 'Give me a map of Hungary?',
'output': "Hungary's maps is {'googleMaps': 'https://goo.gl/maps/9gfPupm5bffixiFJ6', 'openStreetMaps': 'https://www.openstreetmap.org/relation/21335'}"}

 

Yay! we created our first agent with h2oGPTe and LangChain!

Of course, this is a simple Agent and if you want to go further, here are a few things you could do:

  • error handling: for example, if the function fails due to unavailability or invalid arguments
  • Handling simultaneous or sequential function call: Modify the h2ogpteChatModel Class to handle intermediary steps that require multiple function call or planning for more complex user query
  • Handling Memory: enabling the agent to keep a chat history for follow up questions
  • Prompt engineering: test out different Instructions or Few-shot learning examples to enforce a model behavior through all conversations and messages
  • Guardrails: enable h2oGPTe guardrails and create your own!

 

1 as customer of enterprise h2oGPTe.

 headshot

Audrey Létévé, Principal Customer Data Scientist

  • Principal Data Scientist at H2O.ai, specializing in leading complex Machine Learning projects from ideation to production, with a keen interest in Model Ops and a strong background in statistics.

  • Her expertise covers a broad range of industries such as insurance, energy, and services, enabling her to communicate effectively with both technical and non-technical stakeholders.

  • Holding a Master of Science in Mathematics and Statistics from Université Aix-Marseille II, Audrey has a proven track record of enhancing business strategies and objectives through data analysis and model development across various data science roles.