CODE FARM
Galaxy background

"It is the mark of an educated mind to be able to entertain a thought without accepting it."

- Aristotle

Language Models and Agents

The field of Artificial Intelligence is experiencing rapid evolution. While Large Language Models (LLMs) are a significant development, current innovation focuses on evolving them from text predictors into intelligent agents capable of interacting with external systems.

1. Large Language Models

A Large Language Model (LLM) is a neural network trained on vast datasets, primarily functioning by predicting the next element in a sequence. While traditionally focused on text, these models are increasingly becoming multimodal, capable of understanding and generating not just language, but also images, audio, and video, enabling diverse applications from text generation to image Q&A.

However, inherent limitations persist: models lack real-time information, cannot perform real-world actions, and their knowledge is fixed at training time.

  • Open-weight models (like LLaMA, Mistral, or Gemma) commonly use postfixes to signal how a shared base model was fine-tuned and what it is optimized for.

    • An -instruct postfix indicates a variant optimized for single-turn, structured outputs and reliable task-following, often with enhanced helpful/harmless behavior.

    • A -chat postfix indicates a dialogue-optimized variant intended for multi-turn conversations using a specific chat template.

    • A -tools postfix indicates a variant tuned to use external tools and to output structured tool_calls, a key capability for building agentic (ReAct-like) workflows.

    • A -thinking postfix indicates a variant optimized for complex, multi-step reasoning and is particularly well-suited for Chain-of-Thought (CoT) prompting techniques.

  • In contrast, many managed API providers (like OpenAI, Anthropic, and Google) increasingly provide a single front-door model name (e.g., gpt-4o) and control its behavior through the API request’s structure and parameters.

    • Instruction-following is default with chat-structured APIs using system/user/assistant roles.

    • Tool use is enabled by declaring tool definitions in the API request for the model to choose between text responses or a tool call.

    • Enhanced reasoning is controlled by request-specific settings, product tiers, or internal routing, not by selecting a distinct -thinking model.

1.1. Prompts and Messages

LLMs are fundamentally prompt-based systems.

  • A Message, conversely, is a single, discrete turn within that conversation, which is an object containing a role (who is speaking: system, user, tool or assistant) and content (what was said).

    • A system message, typically the first in a prompt, sets the AI’s overall behavior and persona by providing high-level instructions.

      { "role": "system", "content": "You are a helpful assistant that provides concise answers." }
    • A user message conveys the input provided by the end-user.

      { "role": "user", "content": "What is the capital of France?" }
    • An assistant message holds the model’s responses, which can be either a final answer or a request to use a tool.

      { "role": "assistant", "content": "The capital of France is Paris." }
    • A tool message provides the output of a function call back to the model, allowing it to process the result of an external action.

      { "role": "tool", "content": "{ \"temperature\": \"22\", \"unit\": \"celsius\" }", "tool_call_id": "call_abc123" }"
  • A Prompt, representing the full message history processed by the model to generate a response, typically begins with an optional system message, followed by alternating user messages for user input and assistant messages for model responses, with tool messages providing function outputs after assistant tool calls.

    [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What's the weather in Boston?"
      },
      {
        "role": "assistant",
        "content": null,
        "tool_calls": [{
          "function": {
            "name": "get_current_weather",
            "arguments": "{ \"location\": \"Boston, MA\" }"
          }
        }]
      },
      {
        "role": "tool",
        "content": "{ \"temperature\": \"22\", \"unit\": \"celsius\" }",
        "tool_call_id": "call_abc123"
      }
    ]

1.2. RAG

Retrieval-Augmented Generation (RAG) is a powerful technique that addresses LLMs' frozen and limited knowledge by providing relevant, untrained context for each prompt.

  • A knowledge base is broken into chunks, converted into numerical embeddings by a specialized model, and stored in a vector database.

  • A user’s question is converted into an embedding to retrieve the relevant document chunks from the vector database that are then placed into the prompt with the original question to augment the generation of the model.

1.3. Tool Calling

Tool calling is a powerful mechanism that allows developers to enable an LLM to interact with external systems and overcome its inherent limitations.

While the terms are often used interchangeably, it’s helpful to think of a tool as the general capability given to the model, and a function as the specific code implementation of that tool.

A tool calling is a process where the model, after receiving a user’s message and a list of available tools from the application, responds with a tool_calls object that instructs the application to execute a specific function, and gives the call result back to the model to generate the final response.

  1. The application sends the user’s message and a list of available tools to the model.

    {
      "model": "qwen3:1.7b",
      "messages": [
        { "role": "user", "content": "What is the weather like in Boston?" }
      ],
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
              "type": "object",
              "properties": {
                "location": {
                  "type": "string",
                  "description": "The city and state, e.g. San Francisco, CA"
                },
                "unit": {
                  "type": "string",
                  "enum": ["celsius", "fahrenheit"]
                }
              },
              "required": ["location"]
            }
          }
        }
      ],
      "stream": false
    }
    • The model responds with a tool_calls object, instead of a text answer.

      {
        ...
        "message": {
          "role": "assistant",
          "content": "",
          "tool_calls": [
            {
              "function": {
                "name": "get_current_weather",
                "arguments": { "location": "Boston, MA" }
              }
            }
          ]
        },
        ...
      }
    • The application executes the requested function (e.g., get_current_weather), obtains the result (e.g., {"temperature": "22", "unit": "celsius"}), and sends it back to the model in a tool message.

      {
        "role": "tool",
        "content": "{ \"temperature\": \"22\", \"unit\": \"celsius\" }",
        "tool_call_id": "call_abc123"
      }
    • With the tool’s result in context, the model then generates the final, user-facing answer.

      {
        "role": "assistant",
        "content": "The current weather in Boston is 22 degrees Celsius."
      }

1.4. ReAct

ReAct (Reason and Act) is a technique that uses a system message to guide a model through an iterative Thought-Action-Observation loop to solve multi-step problems and derive a final answer.

  1. Thought: The model first "thinks out loud" by generating text that outlines its reasoning, analyzes the problem, and forms a plan.

  2. Action: Based on its thought process, the model outputs a structured request to use a specific tool (e.g., a tool_calls object).

  3. Observation: The application executes the requested tool, and the result of that tool (the observation) is fed back into the prompt for the next cycle.

    {
      "model": "phi4-mini:3.8b",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant that can use tools. To solve problems, you must reason about a plan and then take an action.
    When you need to use a tool, respond *only* with the following format:
    
    Thought:
    Your reasoning and plan for the next step.
    
    Action:
    A single tool call in a JSON object."
        },
        {
          "role": "user",
          "content": "What is the weather like in Boston?"
        }
      ],
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
              "type": "object",
              "properties": {
                "location": {
                  "type": "string",
                  "description": "The city and state, e.g. San Francisco, CA"
                }
              }
            }
          }
        }
      ],
      "stream": false
    }
    Thought:
    To provide an accurate answer to this question, I need current weather information for a specific location—in this case, Boston.
    
    Action:
    ```json
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        ...
      }
    }
    ```

    Built-in reasoning is a native capability of fine-tuned models (like qwen3) to decide and use tools, returning structured tool_calls without explicit ReAct prompt instructions.

    {
      "model": "qwen3:1.7b",
      "messages": [
        { "role": "user", "content": "What is the weather like in Boston?" }
      ],
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
              "type": "object",
              "properties": { "location": { "type": "string" } }
            }
          }
        }
      ],
      "stream": false
    }
    {
      ...
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "function": {
              "name": "get_current_weather",
              "arguments": { "location": "Boston, MA" }
            }
          }
        ]
      },
      ...
    }

2. Agents

An agent combines an LLM with tools and an orchestration loop (like ReAct) to create an autonomous system to perceive environments, make decisions, plan and execute actions to achieve complex goals, and shifting from passive content generation to active problem-solving.

  • OpenCode is an open-source, provider-agnostic, terminal-first agent (TUI) that autonomously plans, implements, and refactors code across over 75 AI models.

  • Cursor is an AI-native developer platform with an Agent Mode (GUI/CLI accessible) for planning multi-file refactors, running terminal tests, and handing off tasks to background cloud agents.

  • GitHub Copilot is an ecosystem-integrated agent platform to automate the entire issue-to-PR lifecycle using a specialized Squad of Agents (Plan, Code, Repair, Review) for development tasks within GitHub, VS Code, and Visual Studio.

  • Gemini CLI is an open-source, terminal-native agent providing direct command-line access to Gemini AI for code generation, automation, grounding with Google Search, and MCP extensions.

  • Claude Desktop is an AI application featuring Cowork, an autonomous agentic mode for administrative productivity that manages local files, organizes directories, and coordinates sub-agents to process complex document workflows.

  • ChatGPT is an AI application featuring Operator, an autonomous agentic mode that performs web-based tasks such as booking travel and completing online forms.

  • Microsoft 365 Copilot is an enterprise productivity agent that orchestrates work across the Office suite, using agentic skills to manage inboxes, synthesize meeting data, and automate corporate workflows.

3. LangChain

LangChain is an open-source framework for LLM application development with a pre-built agent architecture, a standardized model interface (preventing vendor lock-in), and integrations with diverse models and tools.

  • LangGraph is a library for building durable, stateful agents by representing their workflows as a graph, which allows for more complex cycles and human-in-the-loop interactions.

  • LangSmith is a commercial platform that offers full visibility into an agent’s decision-making process through debugging, testing, evaluating, and monitoring, with free tiers for development and paid plans for enterprise use.

# pip install -qU langchain-ollama langchain

from langchain_ollama import ChatOllama
from langchain.agents import create_agent
from langchain_core.tools import tool

# 1. Define the tool using the @tool decorator for better schema parsing
@tool
def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"

# 2. Initialize the local Ollama model
# Ensure the model supporting tool calling (like llama3.2 or qwen3)
llm = ChatOllama(
    model="qwen3:1.7b",
    temperature=0,
)

# 3. Create the agent using the local provider
agent = create_agent(
    model=llm,
    tools=[get_weather],
    system_prompt="You are a helpful assistant",
)

# 4. Run the agent locally
response = agent.invoke(
    {"messages": [{"role": "user", "content": "what is the weather in sf"}]}
)

print(response["messages"][-1].content)
<think>
Okay, the user asked, "What is the weather in SF?" I called the get_weather function with "sf" as the city. The response was, "It's always sunny in SF!" Now I need to process this.

First, the function was called correctly. The response is a simple statement. Since the user didn't ask for a forecast or any specific details, just the current weather, and the response is straightforward, there's no need for further action. The assistant's job here is to relay the information back to the user.

The response "It's always sunny in SF!" is a bit unexpected because SF is known for rainy weather, but maybe the function is programmed to return that regardless. However, the user's query was just about the weather, so the answer is accurate based on the function's output. There's no error here, so no need to call another function. Just present the answer as is.
</think>

The weather in San Francisco is always sunny! 🌞

4. Model Context Protocol

The Model Context Protocol (MCP) is an open-source standard, like a USB-C port for devices, for connecting AI applications like Claude or ChatGPT to external systems.

MCP operates on a client-server architecture where an MCP host (e.g., an IDE, a CLI, or a desktop app) coordinates and manages one or multiple MCP clients, each maintaining a connection to an MCP server to obtain context for the host’s use.

  • MCP servers are programs that expose specific capabilities to AI applications through standardized protocol interfaces, through three building blocks tools, resources, and prompts.

    An MCP Server is the program that serves context data, which can execute either locally (e.g., Claude Desktop’s filesystem server using STDIO for a single client) or remotely (e.g., Sentry’s server using Streamable HTTP for many clients).

    A JSON-RPC 2.0 protocol is used for the data layer to define the actual communication and its core primitives, including tools (executable actions), resources (data sources), prompts (reusable prompt templates), and notifications (for real-time updates).

    • Tools are functions that the model actively decides to call to perform actions like searching for flights or sending messages, giving it direct control over when and how it interacts with the outside world.

    • Resources are passive, read-only data sources, such as files or knowledge bases, that the application makes available to provide the model with contextual understanding, like retrieving a document.

    • Prompts are pre-built, reusable instruction templates defined by the application to guide the model on how to use specific tools and resources to accomplish a complex task, such as planning a vacation.

  • MCP clients are components instantiated by host applications to communicate with particular MCP servers, leveraging context provided by servers and offering several features back to them to build richer interactions.

    • Elicitation provides a structured way for a server to request specific user information on demand like asking for travel preferences to finalize a booking.

    • Roots securely communicate the client’s intended scope by defining which directories a server can access like a travel server reading a user’s calendar.

    • Sampling keeps the client in complete control of permissions by allowing a server to request an LLM completion for agentic workflows like picking the best flight from a list.

4.1. MCP Servers with Agents

When searching for MCP servers, consider the following:

MCP enables agents to connect with external tools and data sources through a designated JSON configuration file, which can be defined globally in the user’s home directory or locally within a project.

  • Gemini CLI configures MCP servers under mcpServers in ~/.gemini/settings.json (global) or .gemini/settings.json (per-project).

    {
      "mcpServers": { (1)
        "serverName": { (2)
          "command": "path/to/server", (3)
          "args": ["--arg1", "value1"], (4)
          "env": { (5)
            "API_KEY": "$MY_API_TOKEN"
          },
          "cwd": "./server-directory", (6)
          "timeout": 30000, (7)
          "trust": false (8)
        }
      }
    }
    1 The mcpServers object defines the set of MCP servers to discover and connect to, keyed by server name.
    2 The serverName object provides one server’s configuration and its key is used for status display and tool-name prefixing on conflicts.
    3 One transport is selected by providing command (local stdio), url (SSE), or httpUrl (streamable HTTP).
    4 The args array is used with command and provides argv parameters for the local stdio process.
    5 The env object is used with command and provides environment variables for the local stdio process, supporting $VAR and ${VAR} expansion.
    6 The cwd value is used with command and sets the working directory for starting the local stdio process.
    7 The timeout value sets the request timeout in milliseconds with a documented default of 600000ms when omitted.
    8 The trust flag controls confirmation behavior, where true bypasses tool-call confirmations and false preserves them.
    {
      "mcpServers": {
        "sqlite": {
            "command": "uvx",
            "args": ["mcp-server-sqlite", "--db-path", "/tmp/test.db"]
        },
        "atlassian": {
          "command": "uvx",
          "args": ["mcp-atlassian"],
          "env": {
            "JIRA_URL": "https://your-company.atlassian.net",
            "JIRA_USERNAME": "your.email@company.com",
            "JIRA_API_TOKEN": "your_api_token",
            "CONFLUENCE_URL": "https://your-company.atlassian.net/wiki",
            "CONFLUENCE_USERNAME": "your.email@company.com",
            "CONFLUENCE_API_TOKEN": "your_api_token"
          }
        },
        "slack": {
          "command": "uvx",
          "args": ["mcp-slack"],
          "env": {
            "SLACK_BOT_TOKEN": "<YOUR_SLACK_BOT_TOKEN>"
          }
        }
      }
    }
  • Cursor Agent configures MCP servers in ~/.cursor/mcp.json (global) or .cursor/mcp.json (per-project).

    {
      "sqlite": {
        "command": "uvx",
        "args": ["mcp-server-sqlite", "--db-path", "/tmp/test.db"]
      },
      "atlassian": {
        "command": "uvx",
        "args": ["mcp-atlassian"],
        "env": {
          "JIRA_URL": "https://your-company.atlassian.net",
          "JIRA_USERNAME": "your.email@company.com",
          "JIRA_API_TOKEN": "your_api_token",
          "CONFLUENCE_URL": "https://your-company.atlassian.net/wiki",
          "CONFLUENCE_USERNAME": "your.email@company.com",
          "CONFLUENCE_API_TOKEN": "your_api_token"
        }
      },
      "slack": {
        "command": "uvx",
        "args": ["mcp-slack"],
        "env": {
          "SLACK_BOT_TOKEN": "<YOUR_SLACK_BOT_TOKEN>"
        }
      }
    }
  • OpenCode AI configures MCP servers in ~/.opencode.json (global) or .opencode.json (per-project).

    {
      "$schema": "https://opencode.ai/config.json",
      "mcp": {
        "sqlite": {
          "type": "local",
          "command": ["uvx", "mcp-server-sqlite", "--db-path", "/tmp/test.db"],
          "enabled": true
        },
        "atlassian": {
          "type": "local",
          "command": ["uvx", "mcp-atlassian"],
          "env": {
            "JIRA_URL": "https://your-company.atlassian.net",
            "JIRA_USERNAME": "your.email@company.com",
            "JIRA_API_TOKEN": "your_api_token",
            "CONFLUENCE_URL": "https://your-company.atlassian.net/wiki",
            "CONFLUENCE_USERNAME": "your.email@company.com",
            "CONFLUENCE_API_TOKEN": "your_api_token"
          },
          "enabled": true
        },
        "slack": {
          "type": "local",
          "command": ["uvx", "mcp-slack"],
          "env": {
            "SLACK_BOT_TOKEN": "<YOUR_SLACK_BOT_TOKEN>"
          },
          "enabled": true
        }
      }
    }