How to Build Your Own Coding Agent

Spoiler: It's easier than you think!

Apr 21, 2025

November 2022. ChatGPT arrived quietly, not with a crash but like a letter slipping under a locked door.

People clapped. We thought it was just a clever tool and didn’t know how it would change everything.

Soon, something strange began to happen. The machines weren’t just helping. They were thinking, writing, and fixing mistakes.

Picking up where humans left off.

Then came the Cursor. Windsurf. Devin. Names that seemed harmless, like pets or weekend hobbies. But these were not toys. They read code like old detectives read letters. They poked into corners, solved problems, and sometimes rewrote the story.

It sounds like science fiction: a machine that chats, edits files, runs commands and corrects itself and keeps going even after it stumbles.

But there is no secret. No magic.

It is just a loop. A model. And lots of words and some clever API calls.

Now, we are going to build one. From scratch. Piece by piece.

black flat screen computer monitor — Photo by Godfrey Nyangechi on Unsplash

Let’s chat!

A coding agent is a chat interface connected to a Large Language Model (LLM) like OpenAI’s GPT models or (my preferred) Anthropic’s Claude Sonnet models. You need an API key from OpenAI, Anthropic or Gemini, a system prompt to define the agent’s persona, ask the user for input in a loop, send the conversation (messages) to the OpenAI chat completions API endpoint, get a result, and print it.

import json
import requests


def load_config(config_path):
    with open(config_path, 'r') as file:
        config = json.load(file)
    return {
        "endpoint": config["endpoint"],
        "api_key": config["api_key"],
        "model": config["model"]
    }


def chat_completion(endpoint, api_key, model, messages):
    request = {"model": model, "messages": messages}
    headers = {"Authorization": f"Bearer {api_key}"}
    response = requests.post(
        f"{endpoint}/chat/completions",
        json=request,
        headers=headers
    )
    response.raise_for_status()
    return response.json()


def main():
    config = load_config('config.json')

    system_prompt = """You are a helpful chat assistant."""

    messages = [{"role": "system", "content": system_prompt}]

    print("Chat with the assistant. Type 'exit' to quit.")

    while True:
        user_input = input("> ")
        if user_input.lower() == "exit":
            break

        messages.append({"role": "user", "content": user_input})

        response = chat_completion(
            endpoint=config["endpoint"],
            api_key=config["api_key"],
            model=config["model"],
            messages=messages
        )

        choice = response["choices"][0]
        message = choice["message"]
        messages.append(message)

        print(message["content"])


if __name__ == "__main__":
    main()

Just 60 lines, and you have ChatGPT Pro! It is missing things like chat persistence, image inputs, and all other UI niceties, but you get the idea. The core is simple.

Note: I’m not using any LLM SDK here, as they are plain HTTP calls underneath. If you use them, you can make this code even smaller.

AI, follow my instructions!

The GPT 3 model, more or less a glorified autocomplete, was the precursor to ChatGPT (GPT 3.5). ChatGPT was a fine-tuned GPT 3 model that was good at following instructions. If you told ChatGPT to rewrite something in the style of King James’ Bible, it complied.

Soon, we figured we could also instruct the models to generate and understand “structured” outputs like JSON. You probably have done things like asking ChatGPT to make tables from JSON.

A LLM with tools is an “agent”. It now has agency to work on things we tell it to do. You can prompt the models to generate a structured response, parse them, call your custom code, pass the prompt again to the agent, and get your formatted response. Modify the system prompt in the chat code we created before and see how it works.

You are a helpful home assistant, answer the users queries based on your knowledge and provided the tools about home automation

Always format your response with tags.
DO NOT include any other content before or after the tags
- Use the speak tag to return a user-facing message
- Use the call tag to output a json which we will use to call a tool to do something
- You SHOULD respond with multiple call tags if needed

Example:

<speak>content to speak</speak>
<call>{ "name": "function_name", "args": { "arg1": "value", "arg2": "value" } }</call>


You have access to following tools:
1. temperature(): number in degree celsius
2. humidity(): number from 0 to 1 to denote relative humidity
3. list_devices(): { deviceId: string, name: string}, name and deviceId of all configured devices
4. get_device_properties(deviceId: string): Record<string, any> of device properties
5. set_device_properties(deviceId: string, properties: Record<string, any>): boolean, set device properties

When you request to run a tool, you will receive JSON formatted tool output in the next message
as { "output": "tool output" } or { "error": "tool error" }

Read the tool output and emit a helpful error message

If you ask a question, the model will respond with a structured response now. Now respond with a JSON value and see the model format the values appropriately in natural language. If you ask a general query, the model also answers with its intelligence.

$ python chat.py
Chat with the assistant. Type 'exit' to quit.
> what is the temperature and humidity
<speak>Let me check the current temperature and humidity for you.</speak>
<call>{ "name": "temperature", "args": {} }</call>
> { "output": 32 }
<speak>The current temperature is 32 degrees Celsius. Now, I will check the humidity.</speak>
<call>{ "name": "humidity", "args": {} }</call>
> { "output": 0.78 }
<speak>The current humidity level is approximately 78%.</speak>                                                                                     > C compilers are written in C! how was the first compiler created then?
<speak>The first compiler was created as a separate program written in a different programming language or directly in machine code. For example, the earliest compiler for a high-level language like Fortran was initially written in assembly language or machine-specific code before being rewritten in the target language itself. This process is known as "bootstrapping," where a simpler or existing compiler is used to create a more advanced one. Over time, this foundational work enabled the development of self-hosting compilers written in the language they
compile.</speak>

Building our first tool

Parsing model responses was so 2022. It’s 2025 now, and model providers have done a ton on RLHF to ensure models use tools effectively and also provide a way to specify which tools your models call in the chat completions API itself.

Let’s start by defining our first tool read, which reads a file.

def read(path: str):
    with open(path, 'r') as f:
        return f.read()

Then, modify our chat_completion code to accept tools and pass it to the API.

def chat_completion(endpoint, api_key, model, messages, tools):
    request = {"model": model, "messages": messages, "tools": tools}
    headers = {"Authorization": f"Bearer {api_key}"}
    response = requests.post(
        f"{endpoint}/chat/completions",
        json=request,
        headers=headers
    )
    response.raise_for_status()
    return response.json()

The OpenAI API spec wants us to use JSON schema to describe the tool params, so let’s define our tools and add a description of our read tool.

When the model wants to call a tool, it will specify it in its finish_reason; let’s handle that and call our tool. Since we need to pass the result of our tool immediately call back to the LLM and generate further responses, let’s add a nested loop to process tool calls and yield the prompt back to the user only when the model completely processes your request.

# ... rest of the code, same as before
def main():
    config = load_config('config.json')

    system_prompt = """You are a helpful chat assistant."""

    messages = [{"role": "system", "content": system_prompt}]

    # Our tools go here...
    tools = [{
        "type": "function",
        "function": {
            "name": "read",
            "description": "Read a text file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "File path"
                    }
                },
                "required": ["path"]
            }
        }
    }]

    tool_functions = {
        "read": read
    }

    print("Chat with the assistant. Type 'exit' to quit.")

    while True:
        user_input = input("> ")
        if user_input.lower() == "exit":
            break

        messages.append({"role": "user", "content": user_input})

        # Our inner tool-calling loop goes here
        processing = True
        while processing:
            response = chat_completion(
                endpoint=config["endpoint"],
                api_key=config["api_key"],
                model=config["model"],
                tools=tools,
                messages=messages
            )

            choice = response["choices"][0]
            message = choice["message"]
            messages.append(message)

            print(message["content"])

            # Handle tool calls
            if choice["finish_reason"] == "tool_calls":
                tool_calls = message["tool_calls"]
                for tool_call in tool_calls:
                    tool_call_id = tool_call["id"]
                    tool_call_function = tool_call["function"]
                    tool_name = tool_call_function["name"]
                    tool_args = tool_call_function["arguments"]

                    tool_args_dict = json.loads(tool_args)
                    # Tell we are executing the function
                    print(f"{tool_name}({str(tool_args_dict)})")
                    # Execute the tool function
                    result = tool_functions[tool_name](**tool_args_dict)

                    messages.append({
                        "role": "tool",
                        "content": result,
                        "tool_call_id": tool_call_id
                    })


            # If the LLM ran out of output tokens
            elif choice["finish_reason"] == "length":
                messages.append({
                    "role": "user",
                    "content": "Please continue."
                })

            # We are done doing our task, yield back to the user
            elif choice["finish_reason"] == "stop":
                processing = False

That’s our first tool. It doesn’t do much, but now the LLM can read files and answer based on that. Let’s try it out.

$ python chat.py
Chat with the assistant. Type 'exit' to quit.
> what does chat.py do?
read({'path': 'main.py'})
The chat.py script is designed to operate as a helpful assistant. It interacts with users via prompts and can execute various
tools to facilitate development workflows

Bringing it all together

Think about the elementary set of tools you, as a programmer, need to write code: list all other files, read the files, write your changes and run commands to test your code. Let’s add a ls and a write tool to explore the codebase and write any changes you ask the agent to make.

# Add tool implementions

def ls(path: str):
    if not os.path.exists(path):
        return "Path does not exist."

    if os.path.isfile(path):
        return f"File: {path}"

    items = os.listdir(path)
    items_str = "\n".join(items)
    return f"Directory: {path}\n{items_str}"

def write(path: str, content: str):
    with open(path, 'w') as file:
        file.write(content)
    return "File written successfully."

# Register our tools

def main():
    # ... rest of the code
    tools = [
        {
            "type": "function",
            "function": {
                "name": "ls",
                "description": "List files and directories.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "path": {
                            "type": "string",
                            "description": "Directory path"
                        }
                    },
                    "required": ["path"]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "read",
                "description": "Read a text file",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "path": {
                            "type": "string",
                            "description": "File patb"
                        }
                    },
                    "required": ["path"]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "write",
                "description": "Write content to a file.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "path": {
                            "type": "string",
                            "description": "File path"
                        },
                        "content": {
                            "type": "string",
                            "description": "The content to write to the file."
                        }
                    },
                    "required": ["path", "content"]
                }
            }
        }
    ]

    tool_functions = {
        "ls": ls,
        "read": read,
        "write": write
    }
    # ... rest of the code

While building any agent, think of what you, as a human, would use to complete your tasks. You can start with a basic set of tools and supply more specialized ones to make your agent more efficient.

For example, let’s use the agent we just created to introduce a new search_replace tool to make more minor file edits instead of rewriting everything.

$ python chat.py
What would you like to do?
> Add a new search_replace tool to main.py
ls({'path': './'})
read({'path': 'chat.py'})
To add a new search_replace tool to `chat.py`, we will:

1. Add a Python function called `search_replace(path, search_text, replace_text)` that reads a file, replaces all instances of `search_text` with `replace_text`, and writes back the changes.
2. Add the `search_replace` tool specification to the `tools` list.
3. Add the mapping in `tool_functions`.

Would you like these lines added for you and the file updated, or do you want to copy and paste them manually?
> Go ahead
read({'path': 'chat.py'})
write({'path': 'chat.py', 'content': 'import json... (trimmed)'})
The `search_replace` tool has been added to `simplified_main.py`.
You can now use this tool to find and replace text in any file via your
assistant interface. If you need to test its usage or want an example command, just let me know!

If you notice, our agent generates a lot of markdown content, asks for confirmation, etc. By tweaking the system prompt, we can change our agent’s personality and make it behave as we want.

Try the following system prompt. You’ll see your agent perform a lot smarter.

You are a helpful assistant that assist users with software development tasks.
- Always analyze the user's codebase before providing a solution.
- Use the tools provided to you.
- If you are unsure about the user's intent, ask clarifying questions.
- Keep your responses concise and to the point, avoiding unnecessary verbosity.
Respond in plain text, and do not include any code blocks or formatting.

And this is prompt engineering 101, no secret sauce. Experiment till you convince your model to behave as you want.

Here is the complete 250 lines of our basic agent.

Play around with our simple agent, and you will find that it’s pretty smart already. It can make edits, write documentation, and review code. If you add specialized tools, like a bash tool to execute commands or a web search tool to look for information on the web, you can make it even more capable.

Then what makes Cursor, Windsurf or <insert AI coding tool here> any special?

Nothing.

This architecture for building agents isn’t just for toy projects. All the cutting-edge copilots work this way: orchestrating LLMs and practical function calls. The trick is to start simple, add chat and tools, feed a lot of tokens to the models and then the richer interactions. I started writing this article earlier, but OpenAI’s Codex CLI release this week just confirmed the theory.

Some of these agents are integrated into editors and IDEs as plugins, which gives them a nice UI and a varying level of tool integration.

This scrappy script does the job, but you can make it much more efficient. You can extend this by adding version control tools or semantic code searches. I have been implementing a more extensive version of this agent with a language server and debugging capabilities.

The real magic: context engineering

If you’ve read this far, you’re either in awe or disbelief, asking, “No, there must be something, right?”

As we just saw, there is no secret recipe for a coding agent, and as the models have gotten smaller, there is no moat in the system prompts as well. However, these agents have one serious disadvantage compared to humans: memory, or in LLM terms, the context length.

Each time you say something or run a tool and provide its output, we keep appending the messages and feeding them to the next API call. This sequence of messages is the model’s context to generate the next set of output tokens. The original ChatGPT model had a token length of 16,384 tokens, translating to roughly 12,000 words. Later models have improved the token length significantly, with Google’s Gemini topping the charts with 2 million tokens, and the newest OpenAI models have a 1 million context size.

But even for a small codebase, you will quickly run out of tokens if you iterate enough. Also, with a large token length, the models tend to lose alignment with your instructions and increase the hallucination rate. My personal experience with Claude, which has 200K token sizes, starts to degrade its performance at around 70K tokens.

As we are paying tokens in + tokens out in every API call, a large context length will not only increase latency, as the model has to read through the entire conversation to generate its output but also the costs, which grow quadratically as the context grows.

So, a lot of creative engineering is involved in managing the model context as the agent performs its tasks. Some agents do summarization once they are done with a task, while some mix vector searches to mutate the context.

With a large enough context window and lots of tokens (and $$$), we won’t need this either. But for now, this is where the real creative magic is.

Exciting times are coming.

Discussion about this post

Ready for more?