The field of Artificial Intelligence is experiencing rapid evolution. While Large Language Models (LLMs) are a significant development, current innovation focuses on evolving them from text predictors into intelligent agents capable of interacting with external systems.
1. Large Language Models
A Large Language Model (LLM) is a neural network trained on vast datasets, primarily functioning by predicting the next element in a sequence. While traditionally focused on text, these models are increasingly becoming multimodal, capable of understanding and generating not just language, but also images, audio, and video, enabling diverse applications from text generation to image Q&A.
However, inherent limitations persist: models lack real-time information, cannot perform real-world actions, and their knowledge is fixed at training time.
|
1.1. Prompts and Messages
LLMs are fundamentally prompt-based systems.
-
A Message, conversely, is a single, discrete turn within that conversation, which is an object containing a
role(who is speaking:system,user,toolorassistant) andcontent(what was said).-
A
systemmessage, typically the first in a prompt, sets the AI’s overall behavior and persona by providing high-level instructions.{ "role": "system", "content": "You are a helpful assistant that provides concise answers." } -
A
usermessage conveys the input provided by the end-user.{ "role": "user", "content": "What is the capital of France?" } -
An
assistantmessage holds the model’s responses, which can be either a final answer or a request to use a tool.{ "role": "assistant", "content": "The capital of France is Paris." } -
A
toolmessage provides the output of a function call back to the model, allowing it to process the result of an external action.{ "role": "tool", "content": "{ \"temperature\": \"22\", \"unit\": \"celsius\" }", "tool_call_id": "call_abc123" }"
-
-
A Prompt, representing the full message history processed by the model to generate a response, typically begins with an optional
systemmessage, followed by alternatingusermessages for user input andassistantmessages for model responses, withtoolmessages providing function outputs afterassistanttool calls.[ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What's the weather in Boston?" }, { "role": "assistant", "content": null, "tool_calls": [{ "function": { "name": "get_current_weather", "arguments": "{ \"location\": \"Boston, MA\" }" } }] }, { "role": "tool", "content": "{ \"temperature\": \"22\", \"unit\": \"celsius\" }", "tool_call_id": "call_abc123" } ]
1.2. RAG
Retrieval-Augmented Generation (RAG) is a powerful technique that addresses LLMs' frozen and limited knowledge by providing relevant, untrained context for each prompt.
-
A knowledge base is broken into chunks, converted into numerical
embeddingsby a specialized model, and stored in avector database. -
A user’s question is converted into an embedding to retrieve the relevant document chunks from the vector database that are then placed into the prompt with the original question to augment the generation of the model.
1.3. Tool Calling
Tool calling is a powerful mechanism that allows developers to enable an LLM to interact with external systems and overcome its inherent limitations.
While the terms are often used interchangeably, it’s helpful to think of a tool as the general capability given to the model, and a function as the specific code implementation of that tool.
|
A tool calling is a process where the model, after receiving a user’s message and a list of available tools from the application, responds with a tool_calls object that instructs the application to execute a specific function, and gives the call result back to the model to generate the final response.
-
The application sends the user’s message and a list of available tools to the model.
{ "model": "qwen3:1.7b", "messages": [ { "role": "user", "content": "What is the weather like in Boston?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } } ], "stream": false }-
The model responds with a
tool_callsobject, instead of a text answer.{ ... "message": { "role": "assistant", "content": "", "tool_calls": [ { "function": { "name": "get_current_weather", "arguments": { "location": "Boston, MA" } } } ] }, ... } -
The application executes the requested function (e.g.,
get_current_weather), obtains the result (e.g.,{"temperature": "22", "unit": "celsius"}), and sends it back to the model in atoolmessage.{ "role": "tool", "content": "{ \"temperature\": \"22\", \"unit\": \"celsius\" }", "tool_call_id": "call_abc123" } -
With the tool’s result in context, the model then generates the final, user-facing answer.
{ "role": "assistant", "content": "The current weather in Boston is 22 degrees Celsius." }
-
1.4. ReAct
ReAct (Reason and Act) is a technique that uses a system message to guide a model through an iterative Thought-Action-Observation loop to solve multi-step problems and derive a final answer.
-
Thought: The model first "thinks out loud" by generating text that outlines its reasoning, analyzes the problem, and forms a plan.
-
Action: Based on its thought process, the model outputs a structured request to use a specific tool (e.g., a
tool_callsobject). -
Observation: The application executes the requested tool, and the result of that tool (the observation) is fed back into the prompt for the next cycle.
{ "model": "phi4-mini:3.8b", "messages": [ { "role": "system", "content": "You are a helpful assistant that can use tools. To solve problems, you must reason about a plan and then take an action. When you need to use a tool, respond *only* with the following format: Thought: Your reasoning and plan for the next step. Action: A single tool call in a JSON object." }, { "role": "user", "content": "What is the weather like in Boston?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" } } } } } ], "stream": false }Thought: To provide an accurate answer to this question, I need current weather information for a specific location—in this case, Boston. Action: ```json { "type": "function", "function": { "name": "get_current_weather", ... } } ```Built-in reasoning is a native capability of fine-tuned models (like
qwen3) to decide and use tools, returning structuredtool_callswithout explicit ReAct prompt instructions.{ "model": "qwen3:1.7b", "messages": [ { "role": "user", "content": "What is the weather like in Boston?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string" } } } } } ], "stream": false }{ ... "message": { "role": "assistant", "content": "", "tool_calls": [ { "function": { "name": "get_current_weather", "arguments": { "location": "Boston, MA" } } } ] }, ... }
2. Agents
An agent combines an LLM with tools and an orchestration loop (like ReAct) to create an autonomous system to perceive environments, make decisions, plan and execute actions to achieve complex goals, and shifting from passive content generation to active problem-solving.
-
OpenCode is an open-source, provider-agnostic, terminal-first agent (TUI) that autonomously plans, implements, and refactors code across over 75 AI models.
-
Cursor is an AI-native developer platform with an
Agent Mode(GUI/CLI accessible) for planning multi-file refactors, running terminal tests, and handing off tasks to background cloud agents. -
GitHub Copilot is an ecosystem-integrated agent platform to automate the entire issue-to-PR lifecycle using a specialized Squad of Agents (Plan, Code, Repair, Review) for development tasks within GitHub, VS Code, and Visual Studio.
-
Gemini CLI is an open-source, terminal-native agent providing direct command-line access to Gemini AI for code generation, automation, grounding with Google Search, and MCP extensions.
-
Claude Desktop is an AI application featuring
Cowork, an autonomous agentic mode for administrative productivity that manages local files, organizes directories, and coordinates sub-agents to process complex document workflows. -
ChatGPT is an AI application featuring
Operator, an autonomous agentic mode that performs web-based tasks such as booking travel and completing online forms. -
Microsoft 365 Copilot is an enterprise productivity agent that orchestrates work across the Office suite, using agentic skills to manage inboxes, synthesize meeting data, and automate corporate workflows.
3. LangChain
LangChain is an open-source framework for LLM application development with a pre-built agent architecture, a standardized model interface (preventing vendor lock-in), and integrations with diverse models and tools.
-
LangGraph is a library for building durable, stateful agents by representing their workflows as a graph, which allows for more complex cycles and human-in-the-loop interactions.
-
LangSmith is a commercial platform that offers full visibility into an agent’s decision-making process through debugging, testing, evaluating, and monitoring, with free tiers for development and paid plans for enterprise use.
# pip install -qU langchain-ollama langchain
from langchain_ollama import ChatOllama
from langchain.agents import create_agent
from langchain_core.tools import tool
# 1. Define the tool using the @tool decorator for better schema parsing
@tool
def get_weather(city: str) -> str:
"""Get weather for a given city."""
return f"It's always sunny in {city}!"
# 2. Initialize the local Ollama model
# Ensure the model supporting tool calling (like llama3.2 or qwen3)
llm = ChatOllama(
model="qwen3:1.7b",
temperature=0,
)
# 3. Create the agent using the local provider
agent = create_agent(
model=llm,
tools=[get_weather],
system_prompt="You are a helpful assistant",
)
# 4. Run the agent locally
response = agent.invoke(
{"messages": [{"role": "user", "content": "what is the weather in sf"}]}
)
print(response["messages"][-1].content)
<think>
Okay, the user asked, "What is the weather in SF?" I called the get_weather function with "sf" as the city. The response was, "It's always sunny in SF!" Now I need to process this.
First, the function was called correctly. The response is a simple statement. Since the user didn't ask for a forecast or any specific details, just the current weather, and the response is straightforward, there's no need for further action. The assistant's job here is to relay the information back to the user.
The response "It's always sunny in SF!" is a bit unexpected because SF is known for rainy weather, but maybe the function is programmed to return that regardless. However, the user's query was just about the weather, so the answer is accurate based on the function's output. There's no error here, so no need to call another function. Just present the answer as is.
</think>
The weather in San Francisco is always sunny! 🌞
4. Model Context Protocol
The Model Context Protocol (MCP) is an open-source standard, like a USB-C port for devices, for connecting AI applications like Claude or ChatGPT to external systems.
MCP operates on a client-server architecture where an MCP host (e.g., an IDE, a CLI, or a desktop app) coordinates and manages one or multiple MCP clients, each maintaining a connection to an MCP server to obtain context for the host’s use.
-
MCP servers are programs that expose specific capabilities to AI applications through standardized protocol interfaces, through three building blocks tools, resources, and prompts.
An MCP Server is the program that serves context data, which can execute either locally (e.g., Claude Desktop’s filesystem server using
STDIOfor a single client) or remotely (e.g., Sentry’s server usingStreamable HTTPfor many clients).A JSON-RPC 2.0 protocol is used for the data layer to define the actual communication and its core primitives, including
tools(executable actions),resources(data sources),prompts(reusable prompt templates), andnotifications(for real-time updates).-
Toolsare functions that the model actively decides to call to perform actions like searching for flights or sending messages, giving it direct control over when and how it interacts with the outside world. -
Resourcesare passive, read-only data sources, such as files or knowledge bases, that the application makes available to provide the model with contextual understanding, like retrieving a document. -
Promptsare pre-built, reusable instruction templates defined by the application to guide the model on how to use specific tools and resources to accomplish a complex task, such as planning a vacation.
-
-
MCP clients are components instantiated by host applications to communicate with particular MCP servers, leveraging context provided by servers and offering several features back to them to build richer interactions.
-
Elicitationprovides a structured way for a server to request specific user information on demand like asking for travel preferences to finalize a booking. -
Rootssecurely communicate the client’s intended scope by defining which directories a server can access like a travel server reading a user’s calendar. -
Samplingkeeps the client in complete control of permissions by allowing a server to request an LLM completion for agentic workflows like picking the best flight from a list.
-
4.1. MCP Servers with Agents
|
When searching for MCP servers, consider the following:
|
MCP enables agents to connect with external tools and data sources through a designated JSON configuration file, which can be defined globally in the user’s home directory or locally within a project.
-
Gemini CLI configures MCP servers under
mcpServersin~/.gemini/settings.json(global) or.gemini/settings.json(per-project).{ "mcpServers": { (1) "serverName": { (2) "command": "path/to/server", (3) "args": ["--arg1", "value1"], (4) "env": { (5) "API_KEY": "$MY_API_TOKEN" }, "cwd": "./server-directory", (6) "timeout": 30000, (7) "trust": false (8) } } }1 The mcpServersobject defines the set of MCP servers to discover and connect to, keyed by server name.2 The serverNameobject provides one server’s configuration and its key is used for status display and tool-name prefixing on conflicts.3 One transport is selected by providing command(local stdio),url(SSE), orhttpUrl(streamable HTTP).4 The argsarray is used withcommandand provides argv parameters for the local stdio process.5 The envobject is used withcommandand provides environment variables for the local stdio process, supporting$VARand${VAR}expansion.6 The cwdvalue is used withcommandand sets the working directory for starting the local stdio process.7 The timeoutvalue sets the request timeout in milliseconds with a documented default of 600000ms when omitted.8 The trustflag controls confirmation behavior, wheretruebypasses tool-call confirmations andfalsepreserves them.{ "mcpServers": { "sqlite": { "command": "uvx", "args": ["mcp-server-sqlite", "--db-path", "/tmp/test.db"] }, "atlassian": { "command": "uvx", "args": ["mcp-atlassian"], "env": { "JIRA_URL": "https://your-company.atlassian.net", "JIRA_USERNAME": "your.email@company.com", "JIRA_API_TOKEN": "your_api_token", "CONFLUENCE_URL": "https://your-company.atlassian.net/wiki", "CONFLUENCE_USERNAME": "your.email@company.com", "CONFLUENCE_API_TOKEN": "your_api_token" } }, "slack": { "command": "uvx", "args": ["mcp-slack"], "env": { "SLACK_BOT_TOKEN": "<YOUR_SLACK_BOT_TOKEN>" } } } } -
Cursor Agent configures MCP servers in
~/.cursor/mcp.json(global) or.cursor/mcp.json(per-project).{ "sqlite": { "command": "uvx", "args": ["mcp-server-sqlite", "--db-path", "/tmp/test.db"] }, "atlassian": { "command": "uvx", "args": ["mcp-atlassian"], "env": { "JIRA_URL": "https://your-company.atlassian.net", "JIRA_USERNAME": "your.email@company.com", "JIRA_API_TOKEN": "your_api_token", "CONFLUENCE_URL": "https://your-company.atlassian.net/wiki", "CONFLUENCE_USERNAME": "your.email@company.com", "CONFLUENCE_API_TOKEN": "your_api_token" } }, "slack": { "command": "uvx", "args": ["mcp-slack"], "env": { "SLACK_BOT_TOKEN": "<YOUR_SLACK_BOT_TOKEN>" } } } -
OpenCode AI configures MCP servers in
~/.opencode.json(global) or.opencode.json(per-project).{ "$schema": "https://opencode.ai/config.json", "mcp": { "sqlite": { "type": "local", "command": ["uvx", "mcp-server-sqlite", "--db-path", "/tmp/test.db"], "enabled": true }, "atlassian": { "type": "local", "command": ["uvx", "mcp-atlassian"], "env": { "JIRA_URL": "https://your-company.atlassian.net", "JIRA_USERNAME": "your.email@company.com", "JIRA_API_TOKEN": "your_api_token", "CONFLUENCE_URL": "https://your-company.atlassian.net/wiki", "CONFLUENCE_USERNAME": "your.email@company.com", "CONFLUENCE_API_TOKEN": "your_api_token" }, "enabled": true }, "slack": { "type": "local", "command": ["uvx", "mcp-slack"], "env": { "SLACK_BOT_TOKEN": "<YOUR_SLACK_BOT_TOKEN>" }, "enabled": true } } }