Rendered at 13:29:25 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
aitchnyu 18 hours ago [-]
I keep wondering how people accept a nights worth of agent activity.
I feel 30 minutes of planning and 30 minutes of implementation in my solo side project's repo is too big to review. At minute 5, I may ask the AI to redo stuff even as its spitting out code.
d4rkp4ttern 17 hours ago [-]
Most of the narrative is about how AI is writing all/most code, but I’d wager that the fraction of human reviewed code is approaching zero far faster than anyone is realizing or willing to admit.
londons_explore 17 hours ago [-]
Very true. Last year I at least glanced at every line of AI generated code. Now if some AI makes a 10k line program for some one-off tasks, I run the program, glance only over the output, and move on.
noduerme 10 hours ago [-]
Especially if you're having an LLM write non-interactive scripts to calculate complex things from large datasets, glancing at the output is not enough to know if the output is remotely accurate (unless the output is so trivial you could literally do it in your head).
Case in point: I recently asked an LLM to write a pile of code to compile historical baseball stats to test betting success against the results of my hand-written code that evolves genetic algorithms. I marveled for a little while at the unbelievable improvement in EV/ROI that this script was showing could have been achieved from certain small tweaks. I only noticed after pushing a total bet that the push registered on the output as a win - and only because I was carefully staying on top of it. A single stupid recursively operating >= instead of > had caused completely nonsensical results that looked plausible.
Imagine, like, trusting a 10k loc script to give you data for something you were going to build in the physical world, and hoping an LLM hadn't made a mistake like that.
bobjordan 4 hours ago [-]
Code needs tested. I'm glad that the bar of entry has been lowered but now we just have a huge amount of people that haven't yet learned anything about how to test and verify that the code meets the expected requirements.
A huge factor I don’t see mentioned often enough, is the rapid increase of AI-coding in a language unknown to the dev.
MaKey 17 hours ago [-]
Which one-off tasks need 10k lines of code?
embedding-shape 16 hours ago [-]
Would depend on what AI and prompt you use ultimately. Ask it to add tests (functional, E2E and unit, maybe invent a new type too), packaging, modular code and/or whatever, and you get to 10K relatively quickly with some of the more verbose LLMs out there.
Personally it's probably the biggest struggle, trying to rein in the "spray and pray" approach LLMs typically like to take, and reducing the "patch on top of patch" syndrome too.
londons_explore 16 hours ago [-]
Calculate the engine power of a 2015 VW polo when travelling 70 mph on a flat road behind a box truck. Draw a chart of drag Vs follow distance. How significant is humidity on the result?
Grosvenor 15 hours ago [-]
European or African Polo?
noduerme 10 hours ago [-]
You're not supposed to post that you just like a comment, but this was best comment on HN in ages.
One off web app for scrubbing through some data, that, once done, will never be run again?
bossyTeacher 16 hours ago [-]
Java programs
rq1 14 hours ago [-]
Enterprise programs*
2 hours ago [-]
11 hours ago [-]
jatora 11 hours ago [-]
i admit. agentic coders do not look at the code except by accident. not much point unless you're working on enterprise applications
vasco 15 hours ago [-]
People already barely reviewed code, most of it was imported libraries.
seanw444 15 hours ago [-]
The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer(s). But now even that's unreliable because libraries are being slopified at an unreviewable pace too.
embedding-shape 32 minutes ago [-]
> The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer
I don't know many serious software engineers who'd take that approach, the convention was always to actually open up the code, evaluate the quality, see if they seem to know what they're doing, then chose the libraries you know works and could be adjusted to fit whatever you wanted it. At least for professional development inside companies, not a single library would be included unless you at least reviewed that the top-level dependency you pull in actually had code worth pulling in in the first place.
And this approach just as well today as it used to, you literally have to spend like 3-5 minutes browsing the code, evaluate the abstractions they've built and then say "Yes, looks good enough to try to use" or "Clearly these people just hacked this together as fast as they could".
jatora 10 hours ago [-]
It's weird that you think humans weren't slopifying code until LLM's came along. At least now they are implementing tests and CI and far more documentation, updating API versions, etc. OOMs above the amount they did before.
I'd also wager that far more % of code gets more coverage of review, via prompting AI to do it, than it did before.
Most PR's pass as long as they A. pass checks, B. dont introduce regressions, C. fix a bug or implement a feature. People talk about this era of humans reviewing code with nostalgia... but that never existed at scale.
bossyTeacher 2 hours ago [-]
> The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer(s).
Let us be honest, for your average dev, the assumption was that the number of github stars, npm/nuget downloads was a god proxy for quality.
giancarlostoro 14 hours ago [-]
People seem to have rosy glasses about how great and vetted code was before AI coding took off the way it has, it was not great.
king_geedorah 13 hours ago [-]
I’d say the increased scrutiny has merely exposed the difference in care between the different groups in the industry. Seems to explain pretty well why both sides are equally confounded by the other’s expectations.
almostdeadguy 13 hours ago [-]
People keep saying this like it’s some meaningful point, but the reality is many people in different projects have a shared need for that code to work correctly, and there is a social proof involved in used open source libraries. That is why people look at downloads and dependent projects as heuristics of stability and correctness. That is not the case with (and cannot be obtained with) code authored by generative AI.
vasco 8 hours ago [-]
Yes it can, the code will be ran and you will have the proof that it ran well. Or it won't run well and you'll re-do it. Same as with some imported library.
lanyard-textile 18 hours ago [-]
A lot of that agent activity is combing over what was previously made, forcing constraints upon it so you have a reasonable expectation of what ends up on your desk for review.
For me, strong file structure helps as well. Reviewing a 3,000 line file it just created is abysmal. I wouldn't accept that from human nor machine :) Multiple files in the right places helps reduce cognitive load.
Sometimes I'll also review with the agent interactively. What is the most important file to review first, etc?
I like to stage changes into a "LGTM" pile. Then if I want changes, I'll have the agent "review unstaged changes - I want something different done here."
dakiol 16 hours ago [-]
No one is reviewing the code. Managers don't want us to review code either. It's a bottleneck. If something goes wrong (bugs) they are fixed as they come. It's a very sad era of software engineering. If there ever was some engineering in our trade, now it's mostly gone. We are guessing around, writing "skills" files with "please, do not introduce bugs" or "you are an owner, not a renter" or similar stuff. It's just very low effort, very undeterministic. Big apps out there are going down constantly because of AI slop (e.g., Github), and we are seeing it more often as well in non-so popular systems (e.g., in my company and other saas that we use).
Product managers never cared about the code. Engineering managers don't care about code as much as they did when they were engineers. Directors couldn't care less about code. CTOs don't know what code looks like anymore. We are at the end of the chain, and somehow we always took pride of well written and maintainble code because we knew deep inside that good systems are built based on good code. But now we are jeopardizing ourselves, it's us the engineers who don't care anymore about code and with AI that problems is amplified.
digitaltrees 12 hours ago [-]
You care about code quality. Many don’t. I had someone tell me this week that a 6000 line class was ok because it was easier for the model to understand and that’s more important than human comprehension. And I get his point but that seems like a big risk to take.
mpalczewski 12 hours ago [-]
and it's wrong. a 6000 line class is not easier for a model to understand. the same things that help humans also help agents. I find myself adding linters that must pass and the agent muss fix that limit file size, function length, function complexity, how many files in a directory. a little more work for the agent, but the codebase is healthier and the agents write fewer bugs.
lukevdp 12 hours ago [-]
I don't think the same things that help humans help agents. Simplicity helps humans, for agents parsing complexity is a breeze.
Not saying code quality isn't important - it is. But I think what is described as quality code will change.
cjbgkagh 11 hours ago [-]
Agents still pay a penalty for complexity even if it is a smaller one.
digitaltrees 9 hours ago [-]
Parsing single file is easier than navigating a file system for an LLM. Until the models have context windows large enough to hold the entire codebase in one shot, single files will beat multiple files every time.
bossyTeacher 2 hours ago [-]
This. I suspect the codebases in the future will be made of a small number of gigantic source files. These will be able to be transpiled into a more human friendly that produces multiple smaller files per big file in human-debug mode.
SatvikBeri 15 hours ago [-]
I usually aim to have Claude end up with about 500 lines of code after a night of work. Most of what it's doing is experimenting with many different approaches, summarizing them, and then giving me a relatively small diff to review and modify.
a1o 14 hours ago [-]
This is the way to go. I usually play with relatively stable software where the improvements are either performance or very small niche features that are built on top of already existing ones. Big changes are undesirable by both the others working on it and its users.
fphilipe 18 hours ago [-]
I wonder the same. The answer I usually get from people who do manage is that they don't look at the code – or at least not in detail.
Personally, I always end up tweaking something the agent produced. I wonder if I should let go of that control...
InsideOutSanta 17 hours ago [-]
Even the newest models, like GPT 5.5, only deliver what I want nine out of ten times. If I didn't catch the remaining 10% of misguided garbage by manually reviewing every change, it would add up really quickly.
debabrata_saha 17 hours ago [-]
yeah
stavros 17 hours ago [-]
I never look at code. It used to be that it quickly became unmaintainable spaghetti where the agent struggled to make any change at all, but in the past year (and with a three step plan/develop/review workflow), the quality is so good that I basically just don't look at the code any more.
It definitely has fewer bugs than a senior developer, but it really hinges on getting the plan right. 20 minutes of planning and 20 of implementation sounds about right for my workflow as well, just make sure you have GPT as a reviewer. It's very nitpicky and finds lots of bugs.
materielle 17 hours ago [-]
This brings to mind two thoughts:
First, that this is challenging to scale across large orgs. Even if your plans produce high quality code, that isn’t true for everyone. I’m definitely struggling with slop code being collectively mailed to me for review my our 1,000 engineers that were told to use their AI subscription all at once.
I feel like we should be taking “prompt engineering” more seriously. And when people mail me code to review, it should also include the agentic workflow and plan. So that when code isn’t up to quality, and can have a discussion about the prompts used to generate it.
My second thought is related to your senior engineer comment. This isn’t surprising, because in most engineering orgs, seniority is completely unrelated to code quality. In fact, many orgs incentive the opposite: “senior” devs that push out buggy code quickly and push accountability downhill to the junior devs.
stavros 16 hours ago [-]
Eh, everything is challenging to scale across large orgs. Even before LLMs, the code was a huge ball of spaghetti that barely held together. Now we just get there faster.
About senior engineers, I guess that depends on the org you have experience with. My experience doesn't match yours.
BosunoB 14 hours ago [-]
Yeah the multi-agent workflow just hasn't been satisfying to me. The more chats I try to run at once, the more I got lost and overwhelmed. I trust Claude to implement a plan correctly after I've reviewed it, but if I don't review all of the plans, I will miss some small detail that it misunderstood and it'll be a pain to fix later.
I'm like a 1-2 chats at a time kind of guy. I just don't see how I could keep my exact vision for the project otherwise.
hgoel 14 hours ago [-]
Same, on top of that multi-agent workflows just cost too much to make stopping and correcting them to feel worthwhile, compared to one or two manually managed chats
mlyle 17 hours ago [-]
So I've been in a hobby project for a few weeks -- transforming an old software modem binary to c code.
I gave it the existing modem, and had it build rigging to build test vectors. I had it specify the work in the modem. And to confirm that legacy<>legacy produced the same streams as the new code. I've also recorded test vectors vs. other modems.
I've since launched it on targeted refactoring and code reduction projects.
I am mostly not looking at the code. There's a 100KSLOC lump of code that is much cleaner than a decompilation but a fair bit dirtier than what I would write myself. It is not factored terribly. I have some hope of getting it to trim this down to 70KSLOC that then I can accept in small blocks.
It outperforms the original softmodem, hitting higher RX rates for the same line quality and using less CPU. It also has additional functionality.
So, you know, I would never have written something this large for a hobby myself. And it's cost me $200 and 20-30 minutes per day for a few weeks to get a huge functional surface that I do believe I will be able to trust at the end of the process.
Brian_K_White 16 hours ago [-]
I don't like that there are any good sounding stories, but this sounds pretty good.
JodieBenitez 6 hours ago [-]
I never review anything writtend by codex in my pet projects. It works or it doesn't and then I prompt again. I can see how it's easy to multiply agents in this case.
Now when using it for my job... that's a totally different story: I review all the changes, so a single chat session with an agent can lead to a whole day of review. And it's great, sometimes the agent uses patterns and functions I don't know, so I learn a lot.
cold_harbor 2 hours ago [-]
the bottleneck moves from generation to review. agents parallelize, humans review sequentially — 8 parallel cards means 8x the diffs to read, none of the timelines overlap
bluGill 17 hours ago [-]
That depends. When I'm working on a 1 in a million race condition in some multi-threaded code, the agent needs hours to figure out what is going on. (I would probably need weeks - I don't know as I've given up on some of these before I could point an agent at it)
jgalt212 51 minutes ago [-]
Without agentic coding the number go up narrative dies. There's your answer.
suralind 14 hours ago [-]
I agree, but for small tasks - <20 lines that I can understand in a minute or two - perfect. Thinking about it - I have hundreds, if not thousands of tasks that I would like to do, improving pipelines, migrating from one tool to another, but never have time. The only question is - if I don't have time to do it, do I have time to prompt it?
zx8080 11 hours ago [-]
Cost of generation is low, why review? Regenerate if not working. Rinse and repeat.
Maximize providers profits. What can go wrong.
risyachka 3 hours ago [-]
No one reads code that results from this. Those who say otherwise either lie or are very bad developers which is essentially the same as not reading that code.
I understand and agree with the feeling but then I also feel AI is too slow and too expensive.
My most successful autonomous runs have been expanding scrapers across a number of similar but different portals. I had examples and targets and it just kept searching for new ones and adding them.
But even doing basic ML auto research k have found it to be surprisingly poor except at trivial but useful augmentation of models. Yes it can implement things but somehow I am required a lot even though I set up a lot of framework around it.
My mental model is that it's very good at complex deterministic work like reading bad API docs and getting some connectors to work.
But perhaps I care less about being stuck in a local optimum there.
throw03172019 17 hours ago [-]
They most likely don’t review it ;)
toobulkeh 9 hours ago [-]
Testing
faangguyindia 12 hours ago [-]
whenever i found a guy who uses parallel overnight agents, i asked them how many users they have. Crickets.
They do not have any users. Meanwhile, i've to do code reviews and all otherwise my 12,000+ users will be pissed off if anything in their workflow breaks.
This means i really cannot release more than 1 tiny feature a day. And using parallel agents, well that's good for testing but i don't think i need to add that many features to add anything.
anhphong 13 hours ago [-]
[flagged]
szundi 17 hours ago [-]
Lots of people are working on repetitive simple projects like the Nth website whatever or things like that, boring stuff. This LLM era is already a very big deal for these people.
Personally somehow I am working on stuff that has like 25% not trivial stuff and that is enough to have the same experience as you have.
But also lots of people just don't care about quality and they might be right with their customers/audience. In these cases when someone catches one, an agent is going to iterate on it and make it (seemingly) go away, bandage applied, who cares again. This has a market, I am sure. Lots of programmer folks are just as bad.
14 hours ago [-]
siva7 17 hours ago [-]
Yes it is too big to review for you - the human - so you simply don't review code anymore. Isn't that difficult to comprehend, is it?
kesor 7 hours ago [-]
Originally the word "Kanban" was used by Toyota to describe their card system that helped them achieve several important things, one of them was to NOT work on too many things at once. The other one was to visualize the work. And in general Kanban was used to manage the flow of work so that defects don't get through.
This tool on the other hand is all about "jam as much work as you can come up with into being created in parallel". Obviously there is no managing of any flow of quality outputs, and no limiting of any work because you just shove everything into the agent and burn tokens like crazy.
Calling this a "Kanban" really irks me ... its like blasphemy or something.
KerrickStaley 17 hours ago [-]
This reminds me of Vibe Kanban (https://vibekanban.com/) which I use to manage coding agents on most of my projects.
The Vibe Kanban developers unfortunately decided that they didn't see a path to profitability and have stopped investing in the project. It's open source and so you can run it locally / fork it, but it has stopped improving and there are still annoying bugs that need to be fixed (and I don't have time to maintain it personally). This makes me sad because I would be willing to pay for Vibe Kanban, but I didn't need the features their paid plan offered (in retrospect maybe I should have paid anyway).
I'll give Kanbots a go :) I'd recommend liberally copying features from Vibe Kanban. In particular the remote support and "Open in VS Code" button (which in my case opens a local VSCode client pointing to a remote VSCode server) are critical for me.
2001zhaozhao 16 hours ago [-]
Vibe Kanban is indeed a treasure trove in terms of useful features.
I've been working for the last week or two on getting my new tool up to parity with VK with additional improvements. I've been posting some screenshots into the Vibe Kanban discord as well. Hopefully it'll be a great fit for your use case when I finally am ready to launch it.
(My tool aims for better features than VK in both the Kanban board and agent workspaces, while adding extra systems like desktop windowing, plugins, in-browser VSCode integration, and htmx-like server-rendered UI. The remote access also works differently - you host the whole thing like OpenClaw and access the remote desktop UI from the browser, rather than run a webserver on your laptop to access remote coding agents.)
chrisweekly 19 hours ago [-]
> "Local-first, zero servers. Everything lives in .kanbots/ next to your repo: SQLite database, configs, worktrees. No cloud account, no telemetry, no HTTP server. This is the open-source desktop edition."
This is table-stakes for me to consider adoption of a tool like this.
fmbb 18 hours ago [-]
What is ”a tool like this”?
If AI is agentic I would expect it takes an hour of chatting for any PM to integrate some agent Ralph loop with Jira. Jira or Trello or Linear or Basecamp all have APIs and I guess CLIs any agent can use to talk to them. No developer or SaaS should be needed to make them understand tasks are checked out when you start work and contain instructions and when you are done you move the ticket to DONE.
Brainspackle 18 hours ago [-]
what is a table-stake?
idle_zealot 18 hours ago [-]
The minimum required payment to play a gambling game, where the money up for grabs is called "stake". See also "raising the stakes". In context it means the minimum feature set to be considered for adoption.
davedigerati 14 hours ago [-]
very nice response idle_zealot, it's this level of human decency in comments that keeps me coming back to HN
iyn 2 hours ago [-]
same — that was an unexpected little moment on HN. Honestly appreciated the answer as it caused me to stop and examine the phrase I've been using/hearing without never really thinking about it.
kesor 7 hours ago [-]
one of those bizbuzzwords that americans use to make themselves look smart while they are the only gambling degenerate in the room who is trying to bullshit you into submission.
desireco42 12 hours ago [-]
From their page, they say they require cloud account login for this to work, even locally which is why I decided not to try it out. Looks cool tbh. But I have quite a few tools that look cool.
rirze 11 hours ago [-]
I'm confused, I downloaded their repo and ran it and I had the option to continue locally without signing up/in.
speedgoose 18 hours ago [-]
I have got more frustrations than successes when I tried to run agents without supervising them. I believe the technology will get there eventually, but right now I need one IDE per agent and its cumbersome to merge the work.
nullbio 9 hours ago [-]
One thing I don't understand with all of these, is how they're handling different worktree infrastructure spin-up?
For example, if I have a webapp, I want each of the worktrees to spin up its own infrastructure, and be accessible on its own unique local url, so that I can see the changes locally for each worktree, or I can have agents automate visual checks using something like agent-browser.
Currently I use docker for my infrastructure, each service running in its own container. I have a script that has a ./app worktree create worktreename. That creates a worktree as "worktreename" and spins up all of my docker infrastructure with prefixes for things like "WORKTREENAME", and I can access all my urls at worktreename.myapp.test (or just myapp.test for the main worktree).
This is working fine for now, but it'd be cool if one of these apps was compatible with this concept so I could move over to that.
laurels-marts 1 hours ago [-]
i have shell scripts that create/tear down worktrees. the shell script finds unique unused ports across all existing worktrees and assigns in local .env upon worktree creation. when the worktree gets merged and is torn down the ports get released. secrets that are not worktree-specific i don't keep in local .env i inject via shell.
honestly creating these local scripts for automating the dev work was trivial and they combine well with all other cli tooling. thats why i havent tried any of the GUI apps yet. im not sure they're able to compete with my custom local setup that works exactly the way i want.
chrisvenum 8 hours ago [-]
I have this issue at work, I asked CC to create a very simple bun CLI tool that can be hard to create, destroy and list worktrees.
The CLI seeds the .env file with a url, db for that worktree and I use Vercels open source package portless to spin up a dev server with unique ports so I get a url per worktree
bhu8 8 hours ago [-]
Just use direnv? You’ll probably need to adjust the port you are hosting the local page on, but that’s just N=mod(hash based on the worktree name) and then port=default_port+N.
Tell your claude to set this up. Should do it in a single prompt
There's a few apps out there that facilitate handing off to agents from kanban boards. I needed something more 'human in the loop', handing off to an agent without good visibility of the change set and opportunity to steer doesn't work for me. https://www.agentkanban.io links a taskboard with github copilot chat in vs code via our extension so we have the benefit of task management and context capture from the chat to the tasks. This gives us all the features of a top harness (vs code) and the task / project management features at the same time.
rigonkulous 18 hours ago [-]
>There's a few apps out there that facilitate handing off to agents from kanban boards.
jira-cli and hermes, for example.
in fact, wiring hermes up to an existing Jira(/other_PM_system) is, well .. fruitful.
aavci 19 hours ago [-]
I don’t see a problem here. Do you?
Also, Linear themselves are also working on this.
satvikpendem 10 hours ago [-]
I never said there was a problem.
vitriapp 19 hours ago [-]
Is windsurf open source?
vitriapp 19 hours ago [-]
Hello folks, sharing my latest open source project, a kanban board with parallel agents. Trying to improve this with more features, I would love your contributions on this repo, with either code contributions or ideas
rigonkulous 18 hours ago [-]
Nice work .. I have had my own agents running kanban on existing Jira projects, categorized by workflow, and it is a pleasure to see your project on HN today. I will for sure enjoy catching up with your work, thanks for sharing it.
malfist 19 hours ago [-]
You should check out Stripe's dev blog about minions. Seems directionally similar.
aqme28 3 hours ago [-]
Ha, I built the exact same thing, except using ClickUp for all the task management.
Having agents on kanban is really a level-up in terms of what you can do and how you can organize.
I thought it was the same product. Have any of you used either of them?
KronisLV 18 hours ago [-]
Gave it a brief shot, felt a bit early on, went back to Claude. I feel like the Kanban board that would do it best would just allow easily bringing up Claude Code sessions with all user input etc.
This is dope! We basically built something very similar internally for our team and it's been a very natural and intuitive way to manage agents (as opposed to having a bunch of terminals to track). Not every task/conversation can be done in the background, so it's been helpful for us internally to be able to seamlessly transition between "interactive conversation" and "background job done by agents" even within a single card.
marat_yv 4 hours ago [-]
UI looks like not a huge amount of taste and effort. Maybe smth nice looking could distinguish it from the competitors. Cool thing though
jaequery 15 hours ago [-]
i was also building something similar. https://github.com/jaequery/planbooq. i think building your own kanban is the new age vim/emacs, so you can streamline your workflows the way you want them to be.
okandship 6 hours ago [-]
i like the board-as-orchestrator idea, but the hard part seems less "run more agents" and more making review/load visible enough that wip stays bounded
rirze 11 hours ago [-]
I see a bunch of these orchestrators, but is anything out there jj (jujitsu) native? I would really like to avoid hacking onto an existing product.
waterproof 8 hours ago [-]
If Jujutsu had become popular a couple of years earlier, it might have had a chance to catch on. I worry that it missed the most critical training window for AI and we may be locked in on Git forever.
jorl17 17 hours ago [-]
Personally, this is somewhat close to what I want.
I want to have a fullblown cursor instance/window for each task I have, and a central Hub that manages spawning those instances, setting up the worktrees, etc.
Cursor seems to pretty much have all the available tools there already (it can already spawn agents to their own worktrees with proper setup scripts, for example). I don't get why they don't do it and instead insist on a buggy and confusing agents experience.
Unfortunately, most attempts at this seem to assume I want a model where "1 task = 1 agent = 1 chat", whereas what I really want is "1 task = 1 worktree = 1 full IDE around it".
With the full IDE I can have multiple agents/conversations, review code thoroughly and also chip in once in a while. I can have multiple models (that I pick) in multiple chats, iterate forwards, backwards, you name it.
I really don't understand why there seems to be this idea that "parallel agents" should live in their own little restricted flow that's limited to a tiny chat interface. I want the full flow for every agent!
I was hoping cursor would do this, but they really seem to be going the direction of turning their absolutely terrible web agents UI (where you can't even CHANGE THE MODEL!!!!) into a desktop thing. Sad, as I've been an Ultra paying customer and might have to leave soon with the direction they're heading.
2001zhaozhao 17 hours ago [-]
> I want to have a fullblown cursor instance/window for each task I have, and a central Hub that manages spawning those instances, setting up the worktrees, etc.
I am working on exactly this interface for my new tool called Kotkit. You start with kanban board management of workspaces. Each workspace (worktree on one/multiple repos) is a feature-rich IDE interface in a remote-capable in-browser desktop. You can spawn multiple agents with a good UI wrapper and full auditable logs, solve worktree rebase/merge with 1-click AI features, and there is also an embedded VSCode to solve edge cases. It also supports very deep plugin integration like IntelliJ.
Currently dogfooding it on my own projects and will be released sometime soon.
gavmor 8 hours ago [-]
That's sorta how Agent of Empires and also Zed seem to operate—worktrees as a first class aspect of the workflow.
The parallel agents concept is interesting. How does it handle state sync between agents when they're modifying the same board? Or is there built-in conflict resolution?
kraftman 16 hours ago [-]
arent they sticking to their own cards?
tsvillain 6 hours ago [-]
[flagged]
2001zhaozhao 17 hours ago [-]
Looks very nice!
Just a heads up, the website is extremely choppy on WebKit (Orion Browser) for me when scrolling
edit: not working with Claude Code on Amazon Bedrock, it needs a claude scription
How does it handle dependent cards ? And the spare the they should be blah blah
wyre 19 hours ago [-]
Just post the GitHub page if it’s open-source. It’s great you have a domain name, but if your website is going to look the same as every other SaaS product designed by Claude it’s really hard to look past that and look at the novelty or benefits of the product.
pbjerkeseth 17 hours ago [-]
I've built/am building something similar, but I spent the first half of my tech career as a UI/UX designer before becoming a software engineer and I'd _like_ to think it shows, but there is something about designing-in-code with agents that leads to homogenous outputs if you don't spend equal time on visual design as on the technical parts.
Looks great. I can tell you put a lot of time and energy into making it look good.
I think a lot of the problems with the homogenous outputs of front-end design wouldn't be such a problem if the models naturally make their designs so much simpler, but they are LLM's so they are always going to be overly verbose.
I was curious so I had asked my agent to redesign and recreate your front page for comparison and it gave me this: https://ouijit-redesign.vercel.app
fphilipe 17 hours ago [-]
These pages do look good. But they all just look the same. And I'm getting bored of them.
I open such a page and I immediately know it was Claude that produced it (probably end-to-end). Not that there's anything wrong with that, but it lacks soul… and that makes me kind of sad.
ptengelmann 17 hours ago [-]
one of your pages return 404 /comparison... but cool project! I guess we're just still not there to let agents run without supervision. At least for me.
sinandrei 18 hours ago [-]
Could this be something running of our a Github kanban board?
4 hours ago [-]
npodbielski 6 hours ago [-]
> play nice with tools you already use: codex, Claude, cusrsor, github, sqlite, electron
I do not use any of those.
Also why all of those vibe coded websites are so slow on mobile.
Also I do not understand why Software Developer people are work so hard to make themselves obsolete. Why? You guys do not enjoy eating and having place to sleep?
alex_x 2 hours ago [-]
seeing this slop landings is so tiring.
It makes me not caring about potentially good product in no time
encoderer 15 hours ago [-]
Is the website done with Claude design? It gave me a very similar theme recently.
meagher 18 hours ago [-]
big opportunity to buy kan.bot
Guillaume86 18 hours ago [-]
Tangential question for Claude Code subscribers, mid June `claude -p` will move to api pricing (with some "SDK credits" before it kicks in), so headless usage will become 20-30 times more expensive, and all these high level orchestrator tools/workflows depend on it. What the next move for you? How does the OpenAI subscriptions compare? Similar limitations?
Landing page looks cool and I would love to try it out, but to make a cloud account just to work locally is not OK for me.
atoav 14 hours ago [-]
The problem with this is that agents need to have good taste (code, design and UX) hammered into them with a crowbar and eben then writing manual CSS is something I find myself doing
teaearlgraycold 16 hours ago [-]
The kanban board on the landing page looks terrible on mobile (iPhone 15 Pro).
cyclopeanutopia 19 hours ago [-]
I don't understand this.
genxy 19 hours ago [-]
It looks like a kanban interface to agent orchestration.
17 hours ago [-]
manu017 2 hours ago [-]
[flagged]
zane_shu 4 hours ago [-]
[flagged]
yurukusa 14 hours ago [-]
[flagged]
Ozzie-D 4 hours ago [-]
[flagged]
KaiShips 18 hours ago [-]
[flagged]
othmarodev 17 hours ago [-]
[flagged]
offercc 9 hours ago [-]
[flagged]
wotsdat 13 hours ago [-]
[dead]
18 hours ago [-]
rigonkulous 18 hours ago [-]
Yes, this is like, the best thing ever .. I've generally been doing this, albeit with command-line Jira and a "my workflow is my prompt" philosophy, resulting in a fleet of little kanbans .. and my agents are really, really doing well. They never sleep, eat, etc.
But .. you know something cute? AI makes using Jira fun, again.
I feel 30 minutes of planning and 30 minutes of implementation in my solo side project's repo is too big to review. At minute 5, I may ask the AI to redo stuff even as its spitting out code.
Case in point: I recently asked an LLM to write a pile of code to compile historical baseball stats to test betting success against the results of my hand-written code that evolves genetic algorithms. I marveled for a little while at the unbelievable improvement in EV/ROI that this script was showing could have been achieved from certain small tweaks. I only noticed after pushing a total bet that the push registered on the output as a win - and only because I was carefully staying on top of it. A single stupid recursively operating >= instead of > had caused completely nonsensical results that looked plausible.
Imagine, like, trusting a 10k loc script to give you data for something you were going to build in the physical world, and hoping an LLM hadn't made a mistake like that.
Personally it's probably the biggest struggle, trying to rein in the "spray and pray" approach LLMs typically like to take, and reducing the "patch on top of patch" syndrome too.
https://news.ycombinator.com/item?id=35015#35079
I don't know many serious software engineers who'd take that approach, the convention was always to actually open up the code, evaluate the quality, see if they seem to know what they're doing, then chose the libraries you know works and could be adjusted to fit whatever you wanted it. At least for professional development inside companies, not a single library would be included unless you at least reviewed that the top-level dependency you pull in actually had code worth pulling in in the first place.
And this approach just as well today as it used to, you literally have to spend like 3-5 minutes browsing the code, evaluate the abstractions they've built and then say "Yes, looks good enough to try to use" or "Clearly these people just hacked this together as fast as they could".
I'd also wager that far more % of code gets more coverage of review, via prompting AI to do it, than it did before.
Most PR's pass as long as they A. pass checks, B. dont introduce regressions, C. fix a bug or implement a feature. People talk about this era of humans reviewing code with nostalgia... but that never existed at scale.
Let us be honest, for your average dev, the assumption was that the number of github stars, npm/nuget downloads was a god proxy for quality.
For me, strong file structure helps as well. Reviewing a 3,000 line file it just created is abysmal. I wouldn't accept that from human nor machine :) Multiple files in the right places helps reduce cognitive load.
Sometimes I'll also review with the agent interactively. What is the most important file to review first, etc?
I like to stage changes into a "LGTM" pile. Then if I want changes, I'll have the agent "review unstaged changes - I want something different done here."
Product managers never cared about the code. Engineering managers don't care about code as much as they did when they were engineers. Directors couldn't care less about code. CTOs don't know what code looks like anymore. We are at the end of the chain, and somehow we always took pride of well written and maintainble code because we knew deep inside that good systems are built based on good code. But now we are jeopardizing ourselves, it's us the engineers who don't care anymore about code and with AI that problems is amplified.
Not saying code quality isn't important - it is. But I think what is described as quality code will change.
Personally, I always end up tweaking something the agent produced. I wonder if I should let go of that control...
It definitely has fewer bugs than a senior developer, but it really hinges on getting the plan right. 20 minutes of planning and 20 of implementation sounds about right for my workflow as well, just make sure you have GPT as a reviewer. It's very nitpicky and finds lots of bugs.
First, that this is challenging to scale across large orgs. Even if your plans produce high quality code, that isn’t true for everyone. I’m definitely struggling with slop code being collectively mailed to me for review my our 1,000 engineers that were told to use their AI subscription all at once.
I feel like we should be taking “prompt engineering” more seriously. And when people mail me code to review, it should also include the agentic workflow and plan. So that when code isn’t up to quality, and can have a discussion about the prompts used to generate it.
My second thought is related to your senior engineer comment. This isn’t surprising, because in most engineering orgs, seniority is completely unrelated to code quality. In fact, many orgs incentive the opposite: “senior” devs that push out buggy code quickly and push accountability downhill to the junior devs.
About senior engineers, I guess that depends on the org you have experience with. My experience doesn't match yours.
I'm like a 1-2 chats at a time kind of guy. I just don't see how I could keep my exact vision for the project otherwise.
I gave it the existing modem, and had it build rigging to build test vectors. I had it specify the work in the modem. And to confirm that legacy<>legacy produced the same streams as the new code. I've also recorded test vectors vs. other modems.
I've since launched it on targeted refactoring and code reduction projects.
I am mostly not looking at the code. There's a 100KSLOC lump of code that is much cleaner than a decompilation but a fair bit dirtier than what I would write myself. It is not factored terribly. I have some hope of getting it to trim this down to 70KSLOC that then I can accept in small blocks.
It outperforms the original softmodem, hitting higher RX rates for the same line quality and using less CPU. It also has additional functionality.
So, you know, I would never have written something this large for a hobby myself. And it's cost me $200 and 20-30 minutes per day for a few weeks to get a huge functional surface that I do believe I will be able to trust at the end of the process.
Now when using it for my job... that's a totally different story: I review all the changes, so a single chat session with an agent can lead to a whole day of review. And it's great, sometimes the agent uses patterns and functions I don't know, so I learn a lot.
Maximize providers profits. What can go wrong.
My most successful autonomous runs have been expanding scrapers across a number of similar but different portals. I had examples and targets and it just kept searching for new ones and adding them.
But even doing basic ML auto research k have found it to be surprisingly poor except at trivial but useful augmentation of models. Yes it can implement things but somehow I am required a lot even though I set up a lot of framework around it.
My mental model is that it's very good at complex deterministic work like reading bad API docs and getting some connectors to work.
But perhaps I care less about being stuck in a local optimum there.
They do not have any users. Meanwhile, i've to do code reviews and all otherwise my 12,000+ users will be pissed off if anything in their workflow breaks.
This means i really cannot release more than 1 tiny feature a day. And using parallel agents, well that's good for testing but i don't think i need to add that many features to add anything.
Personally somehow I am working on stuff that has like 25% not trivial stuff and that is enough to have the same experience as you have.
But also lots of people just don't care about quality and they might be right with their customers/audience. In these cases when someone catches one, an agent is going to iterate on it and make it (seemingly) go away, bandage applied, who cares again. This has a market, I am sure. Lots of programmer folks are just as bad.
This tool on the other hand is all about "jam as much work as you can come up with into being created in parallel". Obviously there is no managing of any flow of quality outputs, and no limiting of any work because you just shove everything into the agent and burn tokens like crazy.
Calling this a "Kanban" really irks me ... its like blasphemy or something.
The Vibe Kanban developers unfortunately decided that they didn't see a path to profitability and have stopped investing in the project. It's open source and so you can run it locally / fork it, but it has stopped improving and there are still annoying bugs that need to be fixed (and I don't have time to maintain it personally). This makes me sad because I would be willing to pay for Vibe Kanban, but I didn't need the features their paid plan offered (in retrospect maybe I should have paid anyway).
I'll give Kanbots a go :) I'd recommend liberally copying features from Vibe Kanban. In particular the remote support and "Open in VS Code" button (which in my case opens a local VSCode client pointing to a remote VSCode server) are critical for me.
I've been working for the last week or two on getting my new tool up to parity with VK with additional improvements. I've been posting some screenshots into the Vibe Kanban discord as well. Hopefully it'll be a great fit for your use case when I finally am ready to launch it.
(My tool aims for better features than VK in both the Kanban board and agent workspaces, while adding extra systems like desktop windowing, plugins, in-browser VSCode integration, and htmx-like server-rendered UI. The remote access also works differently - you host the whole thing like OpenClaw and access the remote desktop UI from the browser, rather than run a webserver on your laptop to access remote coding agents.)
This is table-stakes for me to consider adoption of a tool like this.
If AI is agentic I would expect it takes an hour of chatting for any PM to integrate some agent Ralph loop with Jira. Jira or Trello or Linear or Basecamp all have APIs and I guess CLIs any agent can use to talk to them. No developer or SaaS should be needed to make them understand tasks are checked out when you start work and contain instructions and when you are done you move the ticket to DONE.
For example, if I have a webapp, I want each of the worktrees to spin up its own infrastructure, and be accessible on its own unique local url, so that I can see the changes locally for each worktree, or I can have agents automate visual checks using something like agent-browser.
Currently I use docker for my infrastructure, each service running in its own container. I have a script that has a ./app worktree create worktreename. That creates a worktree as "worktreename" and spins up all of my docker infrastructure with prefixes for things like "WORKTREENAME", and I can access all my urls at worktreename.myapp.test (or just myapp.test for the main worktree).
This is working fine for now, but it'd be cool if one of these apps was compatible with this concept so I could move over to that.
honestly creating these local scripts for automating the dev work was trivial and they combine well with all other cli tooling. thats why i havent tried any of the GUI apps yet. im not sure they're able to compete with my custom local setup that works exactly the way i want.
Tell your claude to set this up. Should do it in a single prompt
[0] https://windsurf.com/blog/windsurf-2-0
jira-cli and hermes, for example.
in fact, wiring hermes up to an existing Jira(/other_PM_system) is, well .. fruitful.
Also, Linear themselves are also working on this.
Having agents on kanban is really a level-up in terms of what you can do and how you can organize.
What a fitting first error to run into for vibe coded software.
I want to have a fullblown cursor instance/window for each task I have, and a central Hub that manages spawning those instances, setting up the worktrees, etc.
Cursor seems to pretty much have all the available tools there already (it can already spawn agents to their own worktrees with proper setup scripts, for example). I don't get why they don't do it and instead insist on a buggy and confusing agents experience.
Unfortunately, most attempts at this seem to assume I want a model where "1 task = 1 agent = 1 chat", whereas what I really want is "1 task = 1 worktree = 1 full IDE around it".
With the full IDE I can have multiple agents/conversations, review code thoroughly and also chip in once in a while. I can have multiple models (that I pick) in multiple chats, iterate forwards, backwards, you name it.
I really don't understand why there seems to be this idea that "parallel agents" should live in their own little restricted flow that's limited to a tiny chat interface. I want the full flow for every agent!
I was hoping cursor would do this, but they really seem to be going the direction of turning their absolutely terrible web agents UI (where you can't even CHANGE THE MODEL!!!!) into a desktop thing. Sad, as I've been an Ultra paying customer and might have to leave soon with the direction they're heading.
I am working on exactly this interface for my new tool called Kotkit. You start with kanban board management of workspaces. Each workspace (worktree on one/multiple repos) is a feature-rich IDE interface in a remote-capable in-browser desktop. You can spawn multiple agents with a good UI wrapper and full auditable logs, solve worktree rebase/merge with 1-click AI features, and there is also an embedded VSCode to solve edge cases. It also supports very deep plugin integration like IntelliJ.
Currently dogfooding it on my own projects and will be released sometime soon.
https://www.stridelikeaboss.com/
Just a heads up, the website is extremely choppy on WebKit (Orion Browser) for me when scrolling
edit: not working with Claude Code on Amazon Bedrock, it needs a claude scription
I'm a bit anxious about putting myself out there, but I'd be curious if my efforts cross that bar for you or not? https://ouijit.com/ (and the repo is at https://github.com/ouijit/ouijit)
I think a lot of the problems with the homogenous outputs of front-end design wouldn't be such a problem if the models naturally make their designs so much simpler, but they are LLM's so they are always going to be overly verbose.
I was curious so I had asked my agent to redesign and recreate your front page for comparison and it gave me this: https://ouijit-redesign.vercel.app
I open such a page and I immediately know it was Claude that produced it (probably end-to-end). Not that there's anything wrong with that, but it lacks soul… and that makes me kind of sad.
I do not use any of those.
Also why all of those vibe coded websites are so slow on mobile.
Also I do not understand why Software Developer people are work so hard to make themselves obsolete. Why? You guys do not enjoy eating and having place to sleep?
It makes me not caring about potentially good product in no time
But .. you know something cute? AI makes using Jira fun, again.