Your job is to deliver code you have proven to work

(simonwillison.net)

639 points | by simonw 11 hours ago

88 comments

  • endorphine 10 hours ago
    > there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

    It's even worse than that: non-junior devs are doing it as well.

    • snarf21 5 hours ago
      Yeah, it is way worse than that. In the past two days, I have had two separate non-engineer team members ask some AI agent how some mobile bug should be fixed and posted the AI response in the ticket as the main content and context and acceptance criteria. I then had to waste my time reading this crap (because this is really all that is in the ticket) before starting my own efforts to understand what the real ask or change in behavior needed is.
      • abustamam 3 hours ago
        New career path unlocked — reverse prompt engineering — trying to determine what someone prompted the AI given the slop they put into a ticket
        • dejj 2 hours ago
          Plot twist: this universe (planet) was created in order to reverse engineer what the prompt of the previous one was.
          • tremon 1 hour ago
            It's not that much of a twist, given that it was basically the plot of THGTTG.
        • rTX5CMRXIfFG 48 minutes ago
          I think you’re kidding but some LinkedIn influencer will come across this and preach that it’s serious
      • teaearlgraycold 3 hours ago
        Your manager should be on their ass for wasting your time.
        • snarf21 3 hours ago
          Worse yet, these were done by the managers of the Marketing team and the Mapping team. Plus, these are high profile issues that (somehow) required getting the CEO involved too! (Obviously there is a lot of dysfunction in our organization, lol.)
          • abustamam 3 hours ago
            I was going to joke to GP "jokes on you, management is in on it" but apparently it was no joke
    • xnx 6 hours ago
      Unfortunately, junior behavior exists in many with "senior" titles. Especially since "senior" is often given to those 2 years out of school.
      • theshrike79 6 hours ago
        A coworker had this anecdote decades ago.

        There's a difference between 10 years of experience and 1 year of experience 10 times.

        YOE isn't always a measurement of quality, you can work the same dead-end coding job for 10 years and never get more than "1 year" of actual experience.

        • stephen_cagle 5 hours ago
          You know, this is kind of a funny take at some level. Like, for any surgery, you want the doctor who has done the same operation 10 times, not the one who has 10 years of "many hat doctoring" experience.

          I'm not really arguing anything here, but it is interesting that we value breadth over (hopefully) depth/mastery of a specific thing in regards to what we view as "Senior" in software.

          • lokar 5 hours ago
            You want the Dr who has done the operation 10 times, and learned something each time, and incorporated that into their future efforts. You probably don’t want a Dr who will do their 11th surgery on you exactly the way they did the first.

            This is what that saying is about

            • stephen_cagle 5 hours ago
              Fair enough. I guess I am making a bit of a straw-man in that I feel I just don't buy the idea that doing the same thing 10 times over the course of 10 years is somehow worse than doing different things over the course of 10 years. They are signals, and depending on what we are attempting, they just mean different expected outcomes. One isn't necessarily worse than another, but in this case it seems to be implying it is the distinction between Midlevel and Senior.
          • 3acctforcom 3 hours ago
            Ops vs Dev

            Situational Leadership gets into this. You want a really efficient McDonalds worker who follows the established procedure to make a Big Mac. You also want a really creative designer to build your Big Mac marketing campaign. Your job as a manager is figuring out which you need, and fitting the right person into the right job.

            • abustamam 3 hours ago
              Agreed. Meanwhile, many job postings out there looking for 10x full-stack developers who have deep experience in database, server, front end, devops, etc.

              I think the concept of Full-stack dev is fine, but expecting them to know each part of the stack deeply isn't feasible imo.

          • theshrike79 2 hours ago
            If we extrapolate the Dr example:

            There is the one doctor who learned one way to do the operation at school, with specific instruments, sutures etc. and uses that for 1000 surgeries.

            And then there's the curious one who actively goes to conferences, reads publications and learns new better ways to do the same operation with invisible sutures that don't leave a scar or tools that are allow for more efficient operations, cutting down the time required for the patient to be under anaesthesia.

            Which one would you hire for your hospital for the next 25 years?

          • WalterBright 2 hours ago
            I once asked an obstetrician how she could tell the sex of a fetus with those ultrasound blobs. She laughed and said she'd seen 50,000 of those scans.
          • gopher_space 4 hours ago
            “This person is incurious” would be more apt but also more likely to apply to everyone else in the room too.

            Didn’t Bruce Lee famously say he fears the man who’s authored one API in ten thousand different contexts?

        • NumberCruncher 2 hours ago
          My favourite saying is: "dumb people get old too".
        • ChrisMarshallNY 3 hours ago
          I'm constantly working on stuff I don't know (the Xcode window behind this browser window is full of that kind of code). I have found LLMs are a great help in pushing the boundaries.

          It's humbling, but I do tend to pick up a lot of stuff.

          https://littlegreenviper.com/miscellany/thats-not-what-ships...

          • theshrike79 2 hours ago
            There are a definitely few ulcer-inducing events in my past that would've taken me an afternoon to fix with a current SOTA LLM vs 2+ weeks of swearing, crying and stressing out.
            • ChrisMarshallNY 2 hours ago
              When I come upon an issue, I pretty much immediately copy/paste the code into an LLM, with a description of the context, symptoms, and desired outcome.

              It will usually home right in on the bug, or will give me a good starting point.

              It's also really good at letting me know if this behavior is a "commonly encountered" one, with a summary of ways it's addressed.

              I've probably done that at least a dozen times, today. I guess I'm a rotten programmer.

              • theshrike79 2 hours ago
                I've completed actual features by saying "look up issue ABBA-1234 and create a plan to implement it" to Claude.

                Then I wait, look through the plan and tell it to implement and go do something else.

                After a while I check the diffs and go "huh, yea, that's how I would've done it too", commit and push.

        • code_for_monkey 4 hours ago
          I feel like ive been stuck in that cycle, and I know its partially just me being in my head about my career, but I really have been basically doing CRUD apps for a decade. Ive made a lot of front end forms, Ive kept up on the latest frameworks and trends, but at the core it really hasnt been dramatically different.
          • theshrike79 2 hours ago
            If you really distill it, I've been doing API Glue for about a quarter century.

            I connect to a 3rd party API with shitty specs and inconsistent output that doesn't follow even their spec, swear a bit and adjust my estimates[0]. Do some business stuff with it and shove it to another API.

            But I've done that now in ... six maybe seven different languages and a few different frameworks on top of that. And because both sides of the API tend to be a bit shit, there's a lot of experience in defensive coding and verification - as well as writing really polite but pointed Corporate Emails that boil down to "it's your shit that's broken, not ours, you fix it".

            At this point I really don't care what language I have to use, as long as it isn't Java (which I've heard has come far in the last decade, but old traumas and all that =).

            [0] best one yet is the Swedish "standard" for electricity consumption reports, pretty much every field is optional because they couldn't decide and wanted to please every company in on the project. Now write a parser for that please.

        • joquarky 5 hours ago
          Experience is knowledge of what not to do.
          • theshrike79 2 hours ago
            It's the old saying: "$10 for the part, $990 for knowing where to put it"

            You get a feel for what works and what doesn't, provided you know the relevant facts. Doing a 10RPS system is completely different than 300RPS. And if the payload is 1kB the problems aren't the same as with the one with a 10MB payload.

            And if (when) you're using a cloud environment, which one is cheaper, large data or RPS? It's not always intuitive. We just had our AWS reps do a Tim "The Toolman" Taylor "HUUH?!" when we explained that the way our software works is 95% cheaper to run using S3 as the storage rather than DynamoDB :D

        • teaearlgraycold 3 hours ago
          When interviewing candidates I'm always shocked and a little depressed talking to someone with a pumped up resume and 15 years in the field when I realize they can't do much at all.
        • BadCookie 5 hours ago
          Maybe, but the typical person I have worked with in this industry is too smart to do something for 10 years and not learn much during that time.

          I am afraid that this “1 year of experience 10 times” mantra gets trotted out to justify ageism more often than not.

          • theshrike79 2 hours ago
            Depends a lot on the type of software you're doing. Startups will have hungry people willing to learn, more traditional companies won't in the same percentages.

            Not all people are curious, they go to school, learn to code and work their job like a normal 9-5 blue collar worker. They go to company trainings, but they don't read Hacker News, don't follow the latest language fads or do personal software projects during nights and weekends. It's just a day job for them that pays for their non-programming hobbies.

            I've had colleagues who managed the same ASP+Access DB system for almost a decade, with zero curiosity or interest to learn anything that wasn't absolutely necessary.

            We had to drag them to the ASP.NET age, one just wouldn't and stayed back managing the legacy version until all clients had moved to the new stack.

            ...and I just checked LinkedIn, the non-curious ones are still in the same company, managing the same piece of SaaS as a Software Developer. 20-26 years in the same company, straight from school.

      • 3acctforcom 3 hours ago
        Titles in of themselves are meaningless, I've seen a kid hired straight from uni into a "senior" position lol
      • abustamam 3 hours ago
        I've job hopped a bit. I've gone from junior to senior to lead to mid-level to staff to senior. I have ten years experience.

        My career trajectory is wild. At this rate I'll be CTO soon, then back to mid-level.

      • SoftTalker 6 hours ago
        Title inflation?
        • tensor 6 hours ago
          IMO tech suffers pretty horrible title inflation. If you reach "senior" after only two years and "principle" after 5, what is left for the next 20 years? It's pretty ridiculous. But this sort of thing is really typical. The average tenure of someone in tech is probably about 2 years and each year the expectation is to see "big" career progression. Very often "When is my title going to change" is asked literally in the first year performance review.
          • CodeMage 5 hours ago
            What makes this whole thing worse is the concept of "non-terminal" levels, i.e. levels that you're not allowed to stay at indefinitely, which means that you must either get promoted or fired.

            I can understand not wanting to let people stay in a junior position forever, but I've seen this taken to a ridiculous extreme, where the ladder starts at a junior level, then goes through intermediate and senior to settle on staff engineer as the first "terminal" position.

            Someone should explain to the people who dream up these policies that the Peter Principle is not something we should aim for.

            It's even worse when you combine this with age. I'm nearing 47 years old now and have 26 years of professional experience, and I'm not just tired, but exhausted by the relentless push to make me go higher up on the ladder. Let me settle down where I'm at my most competent and let me build shit instead of going to interminable meetings to figure out what we want to build and who should be responsible for it. I'm old enough to remember the time when managers were at least expected to be more useful in that regard.

            • lokar 5 hours ago
              Yeah, the terminal level, whatever the title (they are just words) need to be the point at which you can handle moderately complex (multi-week) tasks with no supervision.

              And honestly, this will depend on the environment and kind of work being done.

              • SoftTalker 5 hours ago
                If that's what you're looking for you can find it in academia. Universities have no problem paying people to stay around forever without promotion.

                Of course the pay won't be great, but the benefits are decent, PTO is usually excellent, and the work environment usually very low stress.

                • CodeMage 4 hours ago
                  FWIW, I'm starting to seriously consider this as a strategy that will allow me to get to retirement without completely messing up my health due to stress and burnout.

                  That said, there's something deeply wrong with our industry if that's the way we expect things to work. I never felt that teaching was my calling, but I might end up being forced into it anyway and taking up a job that someone with proper passion and vocation could fill. Why? Because my own industry doesn't understand that unlimited growth is not sustainable.

                  For that matter, "growth" is not the right word, either. We're all being told that scaling the ladder is the same thing as growing and developing, but it's not.

                  • lokar 3 hours ago
                    But the point of the rule is that unlimited growth is not expected. There is a fairly clear point you need to get to, and then you can stay put if you like.
                    • CodeMage 2 hours ago
                      Yes, and I agree with that. But my reply was to a comment that seemed to dispute that idea and imply that if you wanted to stop growing at some point, then you should shift to academia.

                      That said, there is an expectation of unlimited growth and it comes from a different source: ageism. At my age, the implicit expectation is that I will apply for a staff or even principal role. Applying for a "merely" senior role often rings alarm bells.

                      That trend -- and certain others -- are what's making me consider taking up teaching instead.

                • adobesubmarine 46 minutes ago
                  In my experience, people who say this kind of thing about either industry or academia have usually worked in one, but not both.
                • lokar 4 hours ago
                  Are we talking about the same thing?

                  The point of the terminal level rule is that there is a point, bellow which you are not actually contributing all that much more in output then it takes to supervise and mentor you. At some point you need to be clearly net positive. This generally means you can mostly operate on your own.

                  If it becomes clear you won't make it to that level, then something is wrong. Either you are not capable, or not willing to make the effort, or something else. Regardless, you get forced out.

          • theshrike79 5 hours ago
            I've had calls with Principal Architects who couldn't code themselves out of a wet paper bag.

            And according to the company experience chart, they should've been a "thought leader" and "able to instruct senior engineers"

            My title? Backend Programmer (20 years of experience). Our unit didn't care about titles because there was a "budget" for title upgrades per business unit and guess which team grabbed all of them =)

            • geodel 4 hours ago
              Its an epidemic all over in IT departments and s/w industry in general. Nowadays people whose sum total knowledge would be managing some packaged Oracle/SAP software installation are holding title of CTO/SVP/EVP of software organization with thousands of developers.

              Since they bring a certain cluelessness and ignorance as honor to whole orgs actual technical expertise among engineers could be detriment to one's jobs and career.

            • tguvot 2 hours ago
              i am principle architect. last time i wrote code for production was more than 10 years ago. i never touched half of languages that are used in our system

              in last week I resolved a few legal/regulatory problems that could have cost company tens of millions of dollars in fines/direct spend/resources and prevented few backend teams from rolling out functionality that could have negative impact on stability/security/performance. I did offer them alternative ways to implement whatever they needed and they accepted it

          • xnx 5 hours ago
            > IMO tech suffers pretty horrible title inflation

            It began with "software engineer"

            • jmpeax 5 hours ago
              Don't get me started on "software architect".
              • tremon 1 hour ago
                On classic big waterfall projects, you can find actual architects. Those are the ones drafting interfaces and delineating components/teams before the first source file is even committed.
              • 9rx 5 hours ago
                Even "code monkey" is generous.
          • jghn 5 hours ago
            The important thing here is for people to understand that at best titles only indicate relative rank within a company. And even then that's tenuous. Titles are effectively meaningless when comparing outside of a company.
            • lokar 3 hours ago
              You get (finite) periods where several large / influential companies have a reasonably high level of rigor for their own levels, and there is a pretty stable mapping between the companies.

              One such period seems to have ended sometime around the start of Covid, or a bit before.

          • 9rx 5 hours ago
            > If you reach "senior" after only two years and "principle" after 5, what is left for the next 20 years?

            There is nothing left. Not everyone puts in the same dedication towards the craft, of course. It very well might take someone 30 years to reach "principle" (and maybe even never). But 5 years to have "seen it all" is more than reasonable for someone who has a keen interest in what they are doing. It is not like a job dependent on the season, where you only get one each year. In computing, you can see many different scenarios play out in milliseconds. It doesn't need years to go from no experience to having "seen it all".

            That is why many in this industry seek management roles as a next step. It opens a new place to find scenarios one has never seen before; to get the start the process all over again.

            • Yoric 5 hours ago
              Er...

              I've been programming since I was 7 and I'm old enough to remember the previous AI summer. Somewhere along the way, I've had impact on a few technologies you have heard of, I've coded at almost all levels from (some very specialized) hardware to Prolog, Idris and Coq/Rocq, with large doses of mainstream languages in-between, and I don't think I'll ever be close to having seen in all.

              If anyone tells me that they've seen it all in 5 years, I'm going to suspect them of not paying attention.

              • andrewaylett 4 hours ago
                Similarly. I have over 20 years of professional experience. I've worked on embedded systems, and with mainframes. I've done (amongst other things) kernel development, compiler (& RTL) development, line-of-business, mobile, server, and web. Code I've written has a MAU on the order of 1% of humanity. Ask me about being a "full stack" developer :).

                I've seen a lot. But the more I see, the more I find to see.

              • 9rx 4 hours ago
                The scare quotes are significant. Obviously nobody can ever see it all as taken in its most literal sense. But one can start to see enough that they can recognize the patterns.

                If your job is dependent on the weather, one year might be rainy, one year might be drought, one year might be a flood, etc. You need to see them to understand them. But eventually you don't have to need to see the year where it is exceptionally rainy, but not to the point of flood, to be able to make good decisions around it. You can take what you learned in the earlier not-quite-so rainy year and what you learned during the flood year and extrapolate from that what the exceptionally rainy year entails. That is what levels up someone.

                Much the same is true in software. For example, once you write a (well-written) automated test in Javascript and perhaps create something in Typescript, you also have a pretty good understanding of what Rocq is trying to do well enough to determine when it would be appropriate for you to use. It would no doubt take much, much longer to understand all of its minutia, but it is not knowledge of intimate details that "senior", "principle", etc. is looking for. It is about being able to draw on past experience to make well-reasoned choices going forward.

                • Yoric 3 hours ago
                  In my experience, not really, no.

                  You need a very different mindset to write in JS (or TS), in Rust, in Rocq, in Esterel or on a Quantum Computer. You need a very different mindset when coding tools that will be deployed on embedded devices, on user's desktops, in the Linux kernel, on a web backend or in a compiler. You need a very different mindset when dealing with open-source enthusiasts, untrusted users, defense contractors.

                  You might be able to have "seen it all" in a tiny corner of tech, but if you stop there, I read it as meaning that you don't have enough curiosity to leave your comfort zone.

                  It's fine, you don't really have to if you don't want to.

                  • 9rx 2 hours ago
                    > You need a very different mindset to write in JS (or TS), in Rust, in Rocq, in Esterel or on a Quantum Computer.

                    "Senior", "principle", etc. are not about your ability to write. They speak to one's capacity to make decisions. A "junior" has absolutely no clue when to use JS, Rust, or Rocq, or if code should be written at all. But someone who has written (well-written) tests in JS, and maybe written some types in Typescript, now has some concept of verification and can start to recognize some of the tradeoffs in the different approaches. With that past experience in hand, they can begin to consider if the new project in front of them needs Rocq, Dafny, or if Javascript will do. Couple that with other types of experiences to draw from and you can move beyond being considered a "junior".

                    > You might be able to have "seen it all" in a tiny corner of tech

                    Of course there being a corner of some sort is a given. We already talked about management being a different corner, for example. Having absolutely no experience designing a PCB is not going to keep you a "junior" at a place developing CRUD web apps. Obviously nobody is talking about "seeing it all" as being about everything in the entire universe. There aren't that many different patterns, really, though. As the terms are used, you absolutely can "see it all", and when you don't have to wait around for the season to return next year, you can "see it all" quite quickly.

        • neighbour 2 hours ago
          I'm sympathetic to the title inflation issue but more on the problem of the "engineer" title, not to mention the "scientist" title.

          For example, I work in Data & AI and we have:

          - data engineer

          - analytics engineer

          - data scientist

          - AI engineer

          What I don't know is what's the alternative?

          Data Engineers are basically software developers.

          Analytics Engineers were Data Analysts or BI Analysts but the job has changed so much that neither of those titles fit.

          My opinion is that basically everyone should just be a "Developer" or "Programmer" and then have the area suffixed:

          - Data Engineer → Developer (Data Infrastructure)

          - Analytics Engineer → Developer (Analytics)

          etc.

    • tunesmith 1 hour ago
      As always, this requires nuance. Just yesterday and today, I did exactly that to my direct reports (I'm director-level). We had gotten a bug report, and the team had collectively looked into it and believed it was not our problem, but that of an external vendor. Reported it to the vendor, who looked into it, tested it, and then pushed back and said it was our problem. My team is still more LLM-averse than me, so I had Codex look at it, and it believed it found the problem and prepared the PR. I did not review or test the PR myself, but instead assigned it to the team to validate, partly for learnings. They looked it over and agreed it was a valid fix for a problem on our side. I believe that process was better than me just fully validating it myself, and part of the process toward encouraging them to use LLM as a tool for their work.
      • xyzzy_plugh 1 hour ago
        > I believe that process was better than me just fully validating it myself

        Why?

        > and part of the process toward encouraging them to use LLM as a tool for their work.

        Did you look at it from their perspective? You set the exact opposite example and serve as a perfect example for TFA: you did not deliver code you have proven to work. I imagine some would find this demoralizing.

        I've worked with a lot of director-level software folk and many would just do the work. If they're not going to do the work, then they should probably assign someone to do it.

        What if it didn't work? What if you just wasted a bunch of engineering time reviewing slop? I don't comprehend this mindset. If you're supposedly a leader, then lead.

    • duxup 5 hours ago
      It’s always the developers who can break / bypass the rules who are the most dangerous.

      I always think of the "superstars" or "10x" devs I have met at companies. Yeah I could put out a lot of features too if I could bypass all the rules and just puke out code / greenfield code that accounts for the initial one use case ... (and sometimes even leave the rest to other folks to clean up).

    • analog31 8 hours ago
      Where are the junior devs while their code is being reviewed? I'm not a software developer, but I'd be loath to review someone's work unless they have enough skin in the game to be present for the review.
      • tyrust 8 hours ago
        Code review is rarely done live. It's usually asynchronous, giving the reviewer plenty of time to read, digest, and give considered feedback on the changes.

        Perhaps a spicy patch would involve some kind of meeting. Or maybe in a mentor/mentee situation where you'd want high-bandwidth communication.

        • jopsen 8 hours ago
          Doing only IRL code reviews would certainly improve quality in some projects :)

          It's probably also fairly expensive to do.

          • jghn 7 hours ago
            Am old enough that this was status quo for part of my career, and have also been in some groups that did this as a rejection of modern code review techniques.

            There are pros & cons to both sides. As you point out it's quite expensive in terms of time to do the in person style. Getting several people together is a big hassle. I've found that the code reviews themselves, and what people get out of them, are wildly different though. In person code reviews have been much more holistic in my experience, sometimes bordering on bigger picture planning. And much better as a learning tool for other people involved. Whereas the diff style online code review tends to be more focused on the immediate concerns.

            There's not a right or wrong answer between those tradeoffs, but people need to realize they're not the same thing.

          • stephen_cagle 5 hours ago
            And yet... is it? Realtime means real discussion, and opportunity to align ever so slightly on a common standard (which we should write down!), and an opportunity to share tacit knowledge.

            It also increases the coverage area of code that each developer is at least somewhat familiar with.

            On a side note, I would love if the default was for these code reviews to be recorded. That way 2 years later when I am asked to modify some module that no one has touched in that span, I could at least watch the code review and gleem something about how/why this was architect-ed the way it was.

            • lokar 5 hours ago
              IMO, a lot of what I think you are getting at should be worked out in design before work starts.
          • colinb 7 hours ago
            Fagan inspection has entered the room
          • comprev 7 hours ago
            Pair programming? That is realtime code review by another human
        • throwaway314155 7 hours ago
          My first job did IRL code reviews with at least two senior devs in the loop. It was both devastating and extremely helpful.
          • SoftTalker 6 hours ago
            Yeah when we first started, "code review" was a weekly meeting of pretty much the entire dev team (maybe 10 people). Not all commits were reviewed, it was random and the developer would be notified a couple of days in advance that his code was chosen for review so that he could prepare to demo and defend it.
            • necovek 6 hours ago
              Wow, that's a very arbitrary practice: do you remember roughly when was that?

              I was in a team in 2006 where we did the regular, 2-approve-code-reviews-per-change-proposal (along with fully integrated CI/CD, some of it through signed email but not full diffs like Linux patchsets, but only "commands" what branch to merge where).

              • marwamc 3 hours ago
                This was still practice at $BIG_FINANCE in the couple of years just before covid, although by that point such team reviews were reducing in importance and prominence.
              • SoftTalker 5 hours ago
                Around that time frame. We had CI and if you broke the build or tests failed it was your job to drop anything else you were doing and fix it. Nothing reached the review stage unless it could build and pass unit tests.
      • ok_dad 6 hours ago
        A senior dev should be mentoring and talking to a junior dev about a task well before it hits the review stage. You should discuss each task with them on a high level before assigning it, so they understand the task and its requirements first, then the review is more of a formality because you were involved at each step.
        • marwamc 3 hours ago
          Also communal RFCs, RFPs, Roadmapping, Architecture/Design Proposals, Design Docs and/or Reviews help socialize/diffuse org standards and expectations.

          I found these help ground the mentorship and discussions between junior-senior devs. And so even for the enterprising aka proactive junior devs who might start working on something in advance of plans/roadmaps, by the time they present that work for review, if the work followed org architectural and design patterns, the review and acceptance process flows smoothly.

          In my juinior days I was taught: if the org doesn't have a design or architectural SOP for the thing you're doing, find a couple of respectable RFCs from the internet, pick the three you like, and implement one. It's so much easier to stand on the shoulders of giants than to try and be the giant yourself.

      • stuaxo 8 hours ago
        Git PRs work on async model for reviews.
        • DrewADesign 8 hours ago
          And even then, in my experience, they work more like support tickets than business email, for which there are loose norms for response time, etc. Unless there’s a specific reason it needs to be urgently handled, people will prioritize other tasks.
      • rootusrootus 8 hours ago
        As someone else mentioned, the process is async. But I achieve a similar effect by requiring my team to review their own PRs before they expect a senior developer to review them and approve for merging.

        That solves some of the problem with people thinking it's okay to fire off a huge AI slop PR and make it the reviewer's responsibility to see how much the LLM hallucinated. No, you have to look at yourself first, because it's YOUR code no matter what tool you used to help write it.

        • muzzio 6 hours ago
          Reviewing your own PR is underrated. I do this with most of my meaningful PRs, where I usually give a summary of what/why I'm doing things in the description field, and then reread my code and call out anything I'm unsure of, or explain why something is weird, or alternatives I considered, or anything that I would catch reviewing someone else's PR.

          It makes it doubly annoying though whenever I go digging in `git blame` to find a commit with a terrible title, no description and an "LGTM" approval though.

        • unbalancedevh 7 hours ago
          > requiring my team to review their own PRs before they expect a senior developer to review them

          I'm having a hard time imagining the alternative. Do junior developers not take any pride in their work? I want to be sure my code works before I submit it for review. It's embarrassing to me if it fails basic requirements. And as a reviewer, what I want to see more than anything is how the developer assessed that their code works. I don't want to dig into the code unless I need to -- show me the validation and results, and convince me why I should approve it.

          I've seen plenty of examples of developers who don't know how to effectively validate their work, or document the validation. But that's different than no validation effort at all.

          • rootusrootus 7 hours ago
            > Do junior developers not take any pride in their work?

            Yes. I have lost count of the number of PRs that have come to me where the developer added random blank lines and deleted others from code that was not even in the file they were supposed to be working in.

            I'm with you -- I review my own PRs just to make sure I didn't inadvertently include something that would make me look sloppy. I smoke test it, I write comments explaining the rationale, etc. But one of my core personality traits (mostly causing me pain, but useful in this instance) is how much I loathe being wrong, especially for silly reasons. Some people are very comfortable with just throwing stuff at the wall to see if it'll stick.

            • alfons_foobar 6 hours ago
              > added random blank lines and deleted others from code that was not even in the file they were supposed to be working in.

              Maybe some kind of auto-formatter?

              • rootusrootus 5 hours ago
                That is my charitable interpretation, but it's always one or two changes across a module that has hundreds, maybe thousands of lines of code. I'd expect an auto-formatter to be more obvious.

                In any case, just looking over your own PR briefly before submitting it catches these quickly. The lack of attention to detail is the part I find more frustrating than the actual unnecessary format changes.

            • ok_dad 6 hours ago
              > Yes. I have lost count of the number of PRs that have come to me where the developer added random blank lines and deleted others from code that was not even in the file they were supposed to be working in.

              That’s not a great example of lack of care, of you use code formatters then this can happen very easily and be overlooked in a big change. It’s also really low stakes, I’m frankly concerned that you care so much about this that you’d label a dev careless over it. I’d label someone careless who didn’t test every branch of their code and left a nil pointer error or something, but missing formatter changes seems like a very human mistake for someone who was still careful about the actual code they wrote.

              • hoten 5 hours ago
                I think the point is that a necessary part of being careful is reviewing the diff yourself end-to-end right before sending it out for review. That catches mistakes like these.
                • code_for_monkey 4 hours ago
                  i myself have been guilty of creating a pr and immediately pushing a commit to clean that stuff up
          • epiccoleman 5 hours ago
            > I want to be sure my code works before I submit it for review.

            No kidding. I mean, "it works" is table stakes, to the point I can't even imagine going to review without having tested things locally at least to be confident in my changes. The self-review for me is to force me to digest my whole patch and make sure I haven't left a bunch of TODO comments or sloppy POC code in the branch. I'd be embarrassed to get caught leaving commented code in my branch - I'd be mortified if somehow I submitted a PR that just straight up didn't work.

          • jjmarr 7 hours ago
            Many are just doing SWE for the money.

            Their goal is to pass the hot potato to someone else, so they can say in the standup "oh I'm waiting on review" making it not their problem.

          • lokar 5 hours ago
            It’s cultural. It always seemed natural to me, until I joined a team that treated review as some compliance checkbox that had nothing to do with the real work.

            Things like real review as an important part of the work requires a culture that values it.

        • theshrike79 5 hours ago
          We have an AI doing the first pass PR review using company standards as a prompt.

          It catches the worst slop in the first pass easily, as well as typos etc.

      • groby_b 6 hours ago
        I think we've moved on from the times where you brought a printout to the change control board to talk it through.
    • hnthrow0287345 9 hours ago
      >It's even worse than that: non-junior devs are doing it as well.

      This might be unpopular, but that is seeming more like an opportunity if we want to continue allowing AI to generate code.

      One of the annoying things engineers have to deal with is stopping whatever they're doing and doing a review. Obviously this gets worse if more total code is being produced.

      We could eliminate that interruption by having someone doing more thorough code reviews, full-time. Someone who is not being bound by sprint deadlines and tempted to gloss over reviews to get back to their own work. Someone who has time to pull down the branch and actually run the code and lightly test things from an engineer's perspective so QA doesn't hit super obvious issues. They can also be the gatekeeper for code quality and PR quality.

      • marcosdumay 9 hours ago
        A full-time code reviewer will quickly lose touch with all practical matters and steer the codebase into some unmaintainable mess.

        This is not the first time somebody had that idea.

        • JohnBooty 8 hours ago
          I've often thought this could work if the code reviewer was full-time, but rotated regularly. Just like a lot of jobs do with on-call weeks, or weeks spent as release manager - like if you have 10 engineers, and once every ten weeks it's your turn to be on call.

          That would definitely solve the "code reviewer loses touch with reality" issue.

          Whether it would be a net reduction in disruption, I don't know.

          • necovek 5 hours ago
            Doing code review as described (actually diving deep, testing etc) for 10 engineers producing code is likely not going to be feasible unless they are really slow.

            In general, back in 2000s, a team I was on employed a simple rule to ensure reviews happen in a timely manner: once you ask for a review, you have an obligation to do 2 reviews (as we required 2 approvals on every change).

            The biggest problem was when there wasn't stuff to review, so you carried "debt" over, and some never repaid it. But with a team of 15-30 people, it worked surprisingly well: no interrupts, quick response times.

            It did require writing good change descriptions along with testing instructions. We also introduced diff size limits to encourage iterative development and small context when reviewing (as obviously not all 15-30 people had same deep knowledge of all the areas).

          • bee_rider 8 hours ago
            You could do some interesting layering strategies if you made it half time, for two people. Or maybe some staggered approach: each person does half time, full time, then half time again, with there people going through the sequence at a time. Make each commit require two sign-offs, and you could get a lot of review and maybe even induce some cooperation…
            • kaffekaka 7 hours ago
              "Interesting" is the word I would use as well, but also cumbersome and complicated.
        • jaggederest 8 hours ago
          I think it's amenable if you make code review a primary responsibility, but not the only responsibility. I think this is a big thing at staff+ levels, doing more than your share of code review (and other high level concerns, of course).
        • kragen 6 hours ago
          Linus Torvalds is effectively a full-time code reviewer, and so are most of his "lieutenants". It's not a new idea, as you say, but it works very well.
      • sorokod 9 hours ago
        > One of the annoying things engineers have to deal with is stopping whatever they're doing and doing a review.

        I would have thought that reviewing PRs and doing it well is in the job description. You latter mention "someone" a few times - who that someone might be?

        • bee_rider 8 hours ago
          Can we make an LLM do it?

          “You are a cranky senior software engineer who loves to nitpick change requests. Here are your coding standards. You only sign off of a change after you are sure it works; if you run out of compute credits before you can prove it to yourself, reject the change as too complex.”

          Balance things, pit the LLMs against each other.

          • postflopclarity 7 hours ago
            I do this all the time. I pass my code into "you are a skeptic and hate all the code my student produces: here is their latest PR etc.. etc.."
            • osn9363739 3 hours ago
              I have devs that do this and we have CI AI code review. Problem is, it always finds something. So the devs that have been in the code base for a while know what to ignore, the new devs get bogged down by research. It's a net benefit as it forces them to learn, which they should be doing. It def slows them down though which goes against some of what I see about the productivity boost claims. A human reviewer with the codebase experience is still needed.
              • mywittyname 2 hours ago
                Slowing down new developers by forcing them to understand the product and context better is a good thing.

                I do agree that the tool we use (code rabbit) is a little too nitpicky, but it's right way more than it's wrong.

          • jjmarr 7 hours ago
            We do this at work and it's amazing.
          • cm2012 8 hours ago
            This would probably catch a lot of errors
      • mywittyname 2 hours ago
        This sounds good in theory, but in practice, a person capable of doing a good job at this role would also be a good developer whose impact would be greater if they were churning out code. This is a combination of a lead engineer and SDET.

        In reality, this ends up being the job given to the weakest person on the team to keep them occupied. And it gives the rest of the team a mechanism to get away with shoddy work and not face repercussions.

        Maybe I'm just jaded, but I think this approach would have horrible results.

        AI code review tools are already good. That makes for a good first pass. On my team, fixing Code Rabbit's issues, or having a good reason not to is always step 1 to a PR. It catches a lot of really subtle bugs.

      • gottagocode 8 hours ago
        > We could eliminate that interruption by having someone doing more thorough code reviews, full-time. Someone who is not being bound by sprint deadlines and tempted to gloss over reviews to get back to their own work.

        This is effectively my role (outside of mentoring) as a lead developer over a team of juniors we train in house. I'm not sure many engineers would enjoy a day of only reviewing, me included.

      • jms703 8 hours ago
        >One of the annoying things engineers have to deal with is stopping whatever they're doing and doing a review

        Code reviews are a part of the job. Even at the junior level, an engineer should be able to figure out a reasonable time to take a break and shift efforts for a bit to handle things like code reviews.

      • shiandow 7 hours ago
        The best way I have of knowing code is correct on all levels is convincing myself I would write it the same way.

        Thr only way to be 100% sure is writing it myself. If I know some one reasonable managed to write the code I can usually take some shortcuts and only look at the code style, common gotchas etc.

        Of course it wouldn't be the first time I made some erroneous assumptions about how well considered the code was. But if none of the code is the product of any intelligent thought well, I might as well stop reading and start writing. Reading code is 10x harder than writing it after all.

      • Spoom 7 hours ago
        Big companies would outsource this position within a year, I guarantee it. It's highly measurable which means it can be "optimized".
      • immibis 9 hours ago
        As I read once: all that little stuff that feels like it stops you from doing your job is your job.
      • nunez 7 hours ago
        This sounds like what unit tests after every commit and e2e tests before every PR are supposed to solve.
      • ohwaitnvm 8 hours ago
        So pair programming?
        • sodapopcan 8 hours ago
          Yep, eliminates code reviews altogether. Unfortunately it remains wisely unpopular with perle even saying “AI” can be the pair.
          • gaigalas 8 hours ago
            It does not eliminate code reviews.

            In practice, you should have at least one independent reviewer who did not actively worked on the PR.

            That reviewer should also download the entire code, run it, make tests fail and so on.

            In my experience, it's also good that this is not a fixed role "the reviewer", and a responsability everyone in the team shares (your next task should always be: review someone else's work, only pick a new thing to do if there is nothing to review).

            This practice increases quality dramatically.

            • sodapopcan 7 hours ago
              > It does not eliminate code reviews.

              Yes it does. There are many ways to do things, of course, and you can institute that there must be an independent reviewer, but I see this is a colossal waste of time and takes away one of the many benefits of pairing. Switch pairs frequently, and by frequently I really mean "daily," and there is no need for review. This also covers "no fixed responsibilities" you mentioned (which I absolutely agree with).

              Again, there are no rules for how things must be done, but this is my experience of three straight years working this way and it was highly effective.

              • gaigalas 6 hours ago
                Mixed-level pairs (senior/junior), for example, are more about mentoring than reviewing. Those sessions do not qualify for "more than one pair of eyes".

                Excited (or maybe even stubborn) developers can often win their pairs by exhaustion, leading to "whatever you want" low effort contributions.

                Pairs tend to under-document. They share an understanding they developed during the pairing session and forget to add important information or details to the PR or documentation channels.

                I'm glad it has been working for you. Maybe you work in a stellar team that doesn't have those issues. However, there are many scenarios that benefit a lot from an independent reviewer.

        • aslakhellesoy 8 hours ago
          Shhh don’t let them know!
    • Aurornis 9 hours ago
      This mirrors my experience with the texting while driving problem: The debate started as angry complaints about how all the kids are texting while driving. Yet it’s a common problem for people of all ages. The worst offender I knew for years was in her 50s, but even she would get angry about the youths texting while driving.

      Pretending it’s just the kids and young people doing the bad thing makes the outrage easier to sell to adults.

    • reedf1 9 hours ago
      it's even worse than that! non-devs are doing it as well
      • esafak 9 hours ago
        That's what democratization looks like. And the new participants are happy!
        • rvz 8 hours ago
          …Until you tell them to maintain all the technical debt that was generated when it breaks and waste more time and money / tokens on fixing the security issues.

          A great time to be a vibe coding cleanup specialist (i.e, professional security software engineer)

      • snowstormsun 9 hours ago
        it's even worse than that! bots are doing it as well
        • marcosdumay 9 hours ago
          Abandon any platform that decides to put bots into your workflow without you telling it to.

          Vote with your wallet.

        • pydry 9 hours ago
          Hopefully once this AI nonsense blows over they'll reach the same realisation they did after the mid 2000s outsourcing craze: that actually you gotta pay for good engineering talent.
      • vernrVingingIt 9 hours ago
        That's the goal. Through further training, whittle away at unnecessary states until only the electrical states that matter remain.

        Developers have created too many layers of abstraction and indirection to do their jobs. We're burning a ton of energy managing state management frameworks, that are many layers of indirection away from the computations that are salient to users.

        All those DSLs, config syntaxes, layers of boilerplate waste a huge amount of electricity, when end users want to draw geometric shapes.

        So a non-dev generates a mess, but in a way so do devs with Django and Elixir, RoR, Terraform. When really end of the day it's matrix math against memory and sync of that state to the display.

        From a hardware engineers perspective, the mess of devs and non-devs is the same abstract mess of electrical states that have nothing to do with the goal. All those frameworks can be generalized into a handful of electrical patterns, saving a ton of electricity.

        • rcbdev 9 hours ago
          This sounds like the exact kind of profound pseudo-enlightenment that one gets from psychedelics. Of course, it's all electrons in the end.

          Trying to create a secure, reliable and scalable system that enables many people to work on one code base, share their code around with others and at the end of the day coordinate this dance of electrons across multiple computers, that's where all of these 'useless' layers of abstraction become absolutely necessary.

          • vernrVingingIt 9 hours ago
            Try almost 30 years in electrical engineering.

            I know exactly what those layers of abstraction are used for. Why so many? Jobs making layers of abstraction.

            But all of them are dev friendly means of modeling memory states for the CPU to watch and transform just so. They can all be compressed into a generic and generalized set of mathematical functions ridding ourselves of the various parser rules to manage each bespoke syntax inherent to each DSL, layers of framework.

            • kridsdale3 8 hours ago
              Okay.

              Go write an operating system and suite of apps with global memory and no protections. Why are we wasting so much time on abstractions like processes and objects? Just let let everyone read and write from the giant turing machine.

              • jodrellblank 4 hours ago
                > "Why are we wasting so much time on abstractions like .. objects?"

                Aside: earlier this year Casey Muratori did a 2.5 hour conference talk on this topic - why we are using objects in the way they are implemented in C++ et al with class hierarchies and inheritance and objects representing individual entities? "The Big OOPs: anatomy of a 35 year mistake"[1].

                He traces programming history back to Stroustrup learning from Simula and Kirstan Nygaard, back to C.A.R. Hoare's paper on records, back to the Algol 68 design committee, back to Douglas T. Ross's work in the 1950's. From Ross at MIT in 1960 to Ivan Sutherland working on Sketchpad at MIT in 1963, and both chains influencing Alan Kay and Smalltalk. Where the different ideas in OOP came from, how they came together through which programming languages, who was working on what, and when, and why. It's interesting.

                [1] https://www.youtube.com/watch?v=wo84LFzx5nI

              • jjmarr 7 hours ago
                Embedded systems that EEs code for are like this. I have to explicitly opt into processes and objects in Keil RTX. I also get to control memory layout.

                Abstraction layers are terrible when you need to understand 100% of the code at all times. Doesn't mean they're not useful.

                Heck, the language for just implementing mathematical rules about system behaviour into code exists. It's called Matlab Simulink.

                • nec4b 6 hours ago
                  You are comparing a personal computer with a general purpose OS running 100s of processes and 1000s threads with a small micro-controller with a single process compiled together with an OS running at most a couple of threads. My PC has 100s of apps that need to coexist on the same hardware at the same time, your micro-controller only runs 1 designated app for eternity.
                  • vernrVingingIt 4 hours ago
                    Sure. The hang up here is SWEs belief those abstractions must be stored as some syntax they know; C, Python, RoR, Linux, Elixir... whatever.

                    There is zero obligation to capture the concept of memory safety in traditional software notation. If it was possible to look inside the hardware at runtime no one is going to see Rust syntax.

                    At runtime it's more appropriate to think of it as geometric functions redrawing electrical state geometrically to avoid collisions. And that's where future chip and system state management are headed.

                    Away from arbitrary syntax constructs with computational expensive parsing rules of the past towards a more efficient geometric functions abstraction embedded in the machine.

                    • jcgl 2 hours ago
                      > The hang up here is SWEs belief those abstractions must be stored as some syntax they know

                      What does it matter how it's "stored"? I think (hope?) that most SWEs know that that syntax and its semantic aren't how things work on the metal. Storage format of the syntax seems pretty irrelevant. And surely you're not suggesting that SWEs should be using a syntax and semantics that they...don't know.

                      So what's the better, non-traditional-software notation? Your conceptualization does sound genuinely intriguing.

                      However, it seems like it would by necessity be non-portable across architectures (or even architecture revisions). And I take it as given that portable software is a desirable thing.

              • vlowther 6 hours ago
                DOS, early Windows, and early MacOS worked more or less exactly that way. Somehow, we all survived.
                • kalleboo 35 minutes ago
                  Apple nearly didn't survive until they bought a company that made an OS that didn't work that way.
              • vernrVingingIt 6 hours ago
                Easy; endlessly big little numbers. "Endless" until the machine runs out of real memory addresses anyway.

                You all really think engineers at Samsung, nVidia, etc whose job it is to generalize software into mathematical models have not considered this?

                We need a layer of abstraction, not Ruby, Python, Elixir, Rails, Perl, Linux, Windows, etc, ad nauseum, ad infinitum... each with unique and computationally expensive (energy wasting) parsing, serializing and deserializing rules.

                Mitigation of climate change is a general concern for the species. Specific concerns of software developers who will die someday anyway get to take a back seat for a change.

                Yes AI uses a lot of electricity but so does traditional SaaS.

                Traditional SaaS will eventually be replaced with more efficient automated systems. We're in a transition period.

                It's computationally efficient to just use geometry[1], which given enough memory, can be shaped to avoid collisions you are concerned with.

                Your only real concern is obvious self selection driven social conservatism. "Don't disrupt me ...of all people... bro!"

                [1] https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...

              • switchbak 6 hours ago
                Engineers value different things. It's why I loathe to maintain engineer-written code.

                Let the downvotes commence!

            • nkohari 8 hours ago
              > I know exactly what those layers of abstraction are used for. Why so many? Jobs making layers of abstraction.

              This is a perfect example of Chesterson's Fence. Is it true that there are too many levels of abstraction, that YAML configuration files are a pain in the ass, and so on? Yes. But it's because this stuff was created organically, by thousands of people, over decades of time, and it isn't feasible to just start over from first principles.

              I don't know enough about electrical engineering to speak to it (funny how that works!) but I'm sure there are plenty of cases in EE that just come down to "that's how it's been done forever".

              • vernrVingingIt 8 hours ago
                Well starting over from first principles is exactly what the chip design and manufacture industry is doing. We also cannot afford, in non-finance terms, to burn all the resources on conservation of the existing software mess.

                Automation is making it pretty easy to generalize all the abstraction into math automatically to inform how to evolve the manufacturing process.

                Using American principles against Americans, it would run afoul of American free speech and agency ideals to dictate chip makers only engage in speech and agency that benefits software engineers.

                Was in the room 25 years ago being instructed to help offshore hardware manufacturing as it was realized keeping such knowledge and informed workers domestic posed an existential threat to copyright cartels and media censorship interests.

                It's a long term goal that was set aside during the ZIRP era as everyone was happy making money hand over fist.

                Guess you all should have paid more attention to politics than believe since it only exists as a socialized theory it isn't real and can safely be ignored.

                Americans make up a small portion of the 8 billion humans, and software engineers are an even smaller percent of the population. Other nations have rebuilt since the US bombed them to hell. They're not beholden to kowtow to a minority of the overall population.

                Would recommend you set aside thinking in abstract philosophy puzzles and relate to world via its real physical properties.

                • nec4b 6 hours ago
                  >> Well starting over from first principles is exactly what the chip design and manufacture industry is doing.

                  No, there are thousands of hardware libraries (HDLs, IP cores, Standard cell libs) which chip designers use. Hardly anyone builds chips from first principles. They are using same layers of abstractions as software does.

                  • vernrVingingIt 5 hours ago
                    I meant they aren't sticking with obligation to software history.

                    Of course they have not dumped their own methods.

          • repeekad 8 hours ago
            > pseudo-enlightenment one gets from psychedelics

            I like that, I’ve also heard it referred to as “unearned wisdom”

            • vernrVingingIt 6 hours ago
              You all really believe PhDs and principle hardware engineers at Samsung, nVidia, etc have not worked around any abstract problem you all can come up with?

              We need a layer of abstraction not endless layers.

              Nothing says unearned wisdom than script kiddies who intentionally had money thrown at them to reinforce belief their mastery of RoR CRUD app dev is genius beyond all comprehension. Zomg you know Linux admin? Here's $10 million dollars!

              This thread is nothing but appeals to banal social conservatism. The disruptors on the verge of being the disrupted lashing out; wait, I was the job killer! Now you say my job is dead! So unfair!

              Us hardware engineers been having a good laugh at SWEs easily manipulated the last 20 years by Wall Street hype of copy-paste SaaS products constantly reimplemented in the latest JS framework.

              Throwing money at you all was intentional manipulation of primate biology. Juice your egos, get you to fall in line with desired agency control goals of the political and old money cohort.

              • switchbak 6 hours ago
                The way you make such broad assumptions and jump right into highly charged politics with nary a connection really does make me wonder about your emotional well being.
                • vernrVingingIt 4 hours ago
                  You don't see the connection because you weren't invited to participate in numerous discussions where these connections are made explicitly via detailed analysis.

                  I have more to do than refresh HN all day, going for brevity here.

                  Your expectation others must explicitly connect all the dots for you makes me question your grasp of reality. Most people alive are going about their lives unconcerned with your existence altogether.

                  "Highly charged politics". Relative emotional opinion.

          • vernrVingingIt 9 hours ago
            [dead]
        • iwontberude 9 hours ago
          And here I thought people just used computers for the heat
        • stephen_cagle 5 hours ago
          I honestly can't tell if you are speaking in metaphor or literally?
        • whattheheckheck 9 hours ago
          What process / path do you take to get to such an enlightened state? Like books or experience or anything more about this please?
          • vernrVingingIt 9 hours ago
            Bachelors in electrical engineering, masters in math; elastic structures applied to modeling electrical systems.

            Started career in late 90s designing boards for telecom companies network backbones.

        • nanomonkey 9 hours ago
          There are some contradictory claims here.

          Boilerplate comes when your language doesn't have affordances, you get around this with /abstraction/ which leads to DSLs (Domain Specific Languages).

          Matrix math is generally done on more than raw bits provided by digital circuits. Simple things like numbers require some amount of abstraction and indirection (pointers to memory addresses that begin arrays).

          My point is yes, we've gotten ourselves in a complicated tar pit, but it's not because there wasn't a simpler solution lower in the stack.

    • TexanFeller 5 hours ago
      Even worse for me, some of my coworkers were doing that _before_ coding LLMs were a thing. Now LLMs are allowing them to create MRs with untested nonsense even faster which feels like a DDOS attack on my productivity.
    • rootusrootus 8 hours ago
      One thing I've pushed developers on my team to do since way before AI slop became a thing was to review their own PR. Go through the PR diff and leave comments in places where it feels like a little explanation of your thought process could be helpful in the review. It's a bit like rubber duck debugging, I've seen plenty of things get caught that way.

      As an upside, it helps with AI slop too. Because as I see it, what you're doing when you use an LLM is becoming a code reviewer. So you need to actually read the code and review it! If you have not reviewed it yourself first, I am not going to waste my time reviewing it for you.

      It helps obviously that I'm on a small team of a half dozen developers and I'm the lead, and management hasn't even hinted at giving us stupid decrees like "now that you have Claude Code you can do 10x as many features!!!1!".

      • rcxdude 53 minutes ago
        Yeah, I always think it's kinda rude to throw something to someone else to review without reviewing it yourself, even if you were the one to write it. Looking at it twice yourself can help with catching things even faster than someone else getting up to speed with what you were doing and querying it. Now it seems like with LLMs people are putting code up for review that hasn't even been looked at once.
    • tyleo 7 hours ago
      There’s folks who perform like juniors but have just been in the business long enough to be promoted.

      Title only loosely tracks skill level and with AI, that may become even more true.

    • acedTrex 8 hours ago
      Juniors aren't even the problem here, they can and should be taught better thats the point.

      Its when your PEERS do it that its a huge problem.

    • hinkley 5 hours ago
      There’s a PR on a project I contribute to that is as bad/big as some PRs by problematic coworkers. I’m not saying it’s AI work, but I’m wondering.
    • mullingitover 8 hours ago
      > expects the “code review” process to handle the rest.

      The LLMs/agents have actually been doing a stellar job with code reviews. Frankly that’s one area that humans rush through, to the point it’s a running joke that the best way to get a PR granted a “lgtm” is to make it huge. I’ve almost never seen Copilot wave a PR through on the first attempt, but I usually see humans doing that.

      • distances 2 hours ago
        That smells of bad team practices. Put a practical limit on PRs sizes as the first step, around 500 lines max is a good rule of thumb in my experience. Larger than that, and the expectation then is a number of small PRs to a feature branch.

        I rarely see a PR that should pass without comments. Your team is being sloppy.

        • mullingitover 1 hour ago
          > Your team is being sloppy.

          I'm talking about a running joke in the industry, not my team.

    • lowkeyokay 9 hours ago
      In the company I’m at this is beginning to happen. PM’s want to “prototype” new features and expect the engineers to finish up the work. With the expectation that it ‘just needs some polishing’. What would be your recommendation on how to handle this constructively? Flat out rejecting LLM as a prototyping tool is not an option.
      • jjmarr 8 hours ago
        I would accept this because it'll increase demand for SWEs and prevent us from losing our jobs.
      • jennyholzer2 7 hours ago
        > What would be your recommendation on how to handle this constructively?

        Ruthlessly bully LLM idiots until it becomes so embarrassing to use LLMs that no status-obsessed corporate executive would ever admit they spent years gleefully duped by hucksters selling "General AI"

        • cpursley 7 hours ago
          What is an "LLM idiot"?
      • Our_Benefactors 9 hours ago
        This could be workable with the understanding that throwing away 100% of the prototype code is acceptable and it’s primary purpose is as a communication tool, not a technical starting point.
        • rootusrootus 9 hours ago
          This is how I've handled it so far. But that is probably because the PM that does this for me knew going in that they were not going to be generating something I'd want to become responsible for polishing and maintaining. It's basically just a fancier way of doing what they would otherwise use SketchUp for.
      • lurking_swe 9 hours ago
        sounds like a culture and management problem. CTO should set clear expectations for his staff and discuss with product to ensure there is alignment.

        If i was CTO I would not be happy to hear my engineers are spending lots of time re-writing and testing code written by product managers. Big nope.

      • theshrike79 5 hours ago
        "You can't polish a turd" =)
    • strangattractor 8 hours ago
      Not at Meta - their job is to "Move fast and break things". I think people are just doing what they've been told.
      • bee_rider 8 hours ago
        “Move fast and break things” works well when you are a little player in a big world, because you can only perturb the system into so bad a state with you limited resources. Now, they got big, and everything is broken.
    • seanmcdirmid 8 hours ago
      Don’t accept PRs without test coverage? I mean, LLMs can do those also, but it’s something.
    • throwawaysleep 9 hours ago
      Code review is an unfunded mandate. It is something the company demands while not really doing anything make sure people get rewarded for doing it.
      • Aurornis 9 hours ago
        > while not really doing anything make sure people get rewarded for doing it.

        I don’t know about you, but I get paychecks twice a month for doing things included in my job description.

        • georgeburdell 9 hours ago
          My manager asked me to disable CI and gating code owner reviews “for 2 weeks” 6 months ago so people could commit faster. Just because it is in your job description doesn’t mean it won’t get shoved aside when it’s perceived as the bottleneck for the core mission.

          Now we have nightly builds that nobody checks the result of and we’re finding out about bugs weeks later. Big company btw

          • immibis 5 hours ago
            That's his right. In capitalism, company owners have the power (which they delegate to managers) to fuck up the company as much as they see fit. On the upside, it means it's their responsibility and not yours.

            Once you've said it's going to cause horrible problems, and they say do it anyway, and you have a paper trail of this and it's backed up onto your own storage medium, then you just do it and bring popcorn. If you think it'll bankrupt the company, then you have nothing to lose since you have no right to stop a company going bankrupt, so you might as well email your manager's manager's manager first and see if your manager gets fired.

    • 627467 9 hours ago
      Just shove a code review agent in the middle. Problem solved

      [Edit] man, people dont get /s unless its explicit

      • fragmede 9 hours ago
        That startup is called CodeRabbit and damned if it doesn't come up with good suggestions sometimes. Other times you have to overrule it, or more likely create separate PRs for its suggestions, and avoid lumping a bunch of different stuff into a single PR, and sometimes it's stupid and doesn't know what it's talking about, and also misses stuff, so you do still need a human to review it. But if your at all place where LLMs are being used to generate large swaths of functional code, including tests, and human reviewers simply can't keep up, overall it does feels like a step forwards. I can't speak to how well other similar services do, but presumably they're not the only one that does that; CodeRabbit's just the one that my employer has chosen.
        • kridsdale3 8 hours ago
          Is this startup sitting on any IP other than a bunch of prompts?
    • BurningFrog 9 hours ago
      Really good code reviewing AIs could handle this!
    • alphazard 8 hours ago
      Quality is not rewarded at most companies, it's not going to turn into more money, it might turn into less work later, but in all likelihood, the author won't be around to reap the benefits of less work later because they will have moved onto another company.

      On the contrary, since more effort doesn't yield more money, but less effort can yield the same money, the strategy is to contract the time spent on work to the smallest amount, LLMs are currently the best way to do that.

      I don't see why this has to be framed as a bad thing. Why should anyone care about the quality of software that they don't use? If you wouldn't work on it unless you were paid to, and you can leave if and when it becomes a problem, then why spend mental energy writing even a single line?

      • tuyiown 8 hours ago
        Because nothing can beat productivity of a motivated team building code that they are proud of. The mental energy spent becomes the highest reward. As for profit, it _compounds_ as for every other business.

        The fact that this is lost as a common knowledge whereas shiny examples arises regularly is very telling.

        But it is not liked in business because reproducing it requires competence in the industry, and finance deep pockets don’t believe in competence anymore.

      • phito 5 hours ago
        I find doing my job as best as I can intrinsically rewarding. Even tho I am getting paid peanuts and have to give more than half of those peanuts to my government. I'm that kind of sucker.
      • hostyle 8 hours ago
        Not everything is about money. Have you never wanted to be good at something because you enjoy it? Or do something for the love of the craft? Have you heard of altruism?
        • Larrikin 8 hours ago
          But why do that for the company instead of yourself?
          • ThrowawayR2 6 hours ago
            Speaking only for my particular circumstances, the company is the vehicle that I use to do it for myself since it provides specialized facilities and equipment I wouldn't have access to as an individual or a founder. That I get paid for it is merely icing on the cake.
          • alphazard 7 hours ago
            This exactly. You have to be honest about why you are building something. If the answer is that you actually want to use it, then yes, quality and maintainability are important. It might even be a good idea to use no AI whatsoever.

            But if you are building it because doing so is in the long chain of cause and effect that leads to you being fed and having shelter, then you should minimize the amount of your time that is required to produce that end result. Do you get better food, and better shelter if the software is better? It would certainly be nice if that was the case, but it's not.

            > Not everything is about money.

            Except for your job, which is primarily about money. Making it take less time, means that you have more time to focus on things that really are not about money.

            • hostyle 7 hours ago
              Most people spend maybe 1/4 of their working age life at a job working for someone else. Why would you deliberately sabotage that by checking out mentally and waste all that time on sub-standard work? How do you expect to earn a promotion? You can produce good code at work and even better code at home for yourself. Deliberately producing slop at work will not help anyone.
  • robgibbons 10 hours ago
    For what it's worth, writing good PRs applies in more cases than just AI generated contributions. In my PR descriptions, I usually start by describing how things currently work, then a summary of what needs to change, and why. Then I go on to describe what exactly is changing with the PR. This high level summary serves to educate the reviewer, and acts as a historical record in the git log for the benefit of those who come after you.

    From there, I include explicit steps for how to test, including manual testing, and unit test/E2E test commands. If it's something visual, I try to include at least a screenshot, or sometimes even a brief screen capture demonstrating the feature.

    Really go out of your way to make the reviewer's life easier. One benefit of doing all of this is that in most cases, the reviewer won't need to reach out to ask simple questions. This also helps to enable more asynchronous workflows, or distributed teams in different time zones.

    • Hovertruck 10 hours ago
      Also, take a moment to review your own change before asking someone else to. You can save them the trouble of finding your typos or that test logging that you meant to remove before pushing.

      To be fair, copilot review is actually alright at catching these sorts of things. It remains a nice courtesy to extend to your reviewer.

      • Waterluvian 24 minutes ago
        The Draft feature is amazing for this.

        I’ll put up a draft early and use it as a place to write and refine the PR details as I wrap up, make adjustments, add a few more tests, etc.

    • phito 10 hours ago
      I often write PR descriptions, in which I write a short explanation and try to anticipate some comments I might get. Well, every time I do, I will still get those exact comments because nobody bothers reading the description.

      Not to say you shouldn't write descriptions, I will keep doing it because it's my job. But a lot of people just don't care enough or are too distracted to read them.

      • simonw 9 hours ago
        For many of my PR and issue comments the intended audience is myself. I find them useful even a few days later, and they become invaluable months or years later when I'm trying to understand why the code is how it is.
      • ffsm8 9 hours ago
        After I accepted that, I then tried to preempt the comment by just commenting myself on the function/class etc that I thought might need some elaboration...

        Well, I'm sure you can guess what happened after that - within the same file even

      • skydhash 10 hours ago
        I just point people to the description. no need to type things twice.
        • wiml 4 hours ago
          "I think I covered that in the (PR text | comment two lines up | commit message), did you have an issue I didn't address there?"

          Maybe that's the AI agent I would actually use, auto-fill those responses...

        • lanstin 8 hours ago
          Sadly, when communicating with people, important things have to be repeated over and over. Maybe less so with highly trained and experienced people on something that their training and experience make the statement plausible, but if the thing is at all surprising or diverges from common experience, I've found a need to bang it out via multiple communication channels.
          • simonw 8 hours ago
            I learned this lesson as an engineering manager / tech lead. I got frustrated at how often I found myself having the exact same conversation with different people... until I relapsed that communicating the same core information to different people was a big chunk of the job!
      • walthamstow 9 hours ago
        At my place nobody reads my descriptions because nobody writes them so they assume there isn't one!
        • phito 5 hours ago
          Too real :(
    • simonw 10 hours ago
      100%. There's no difference at all in my mind between an AI-assisted PR and a regular PR: in both cases they should include proof that the change works and that the author has put the work in to test it.
      • oceanplexian 9 hours ago
        At the last company I worked at (Large popular tech company) it took an act of the CTO to get engineers to simply attach a JIRA Ticket to the PR they were working on so we could track it for tax purposes.

        The Devs went in kicking and screaming. As an SRE it seemed like for SDEs, writing a description of the change, explaining the problem the code is solving, testing methodology, etc is harder than actually coding. Ironically AI is proving that this theory was right all along.

        • sodapopcan 8 hours ago
          Complaining about including a ticket number in the commit is a new one for me. Good grief.
          • rootusrootus 8 hours ago
            It could be a death-by-a-thousand-cuts situation and we don't have enough context. My company has spent the last few years really going 1000% on the capitalization of software expenses, and now we have to include a whole slew of unrelated attributes in every last Jira ticket. Then the "engineering team" (there is only one of these, somehow, in a 5K employee company) decrees all sorts of requirements about how we test our software and document it, again using custom Jira attributes to enforce. Developers get a little pissy about being messed with by MBAs and non-engineer "engineers" trying to tell them how to do their job. (as an aside, for anybody who is on the giving end of such requirements, I have to tell you that people working the tickets will happily lie on all of that stuff just to get past it as quickly as possible, so I hope you're not relying on it for accuracy)

            But putting the ticket number in the commit ... that's basically automatic, I don't know why it should be that big a concern. The branch itself gets created with the ticket number and everything follows from that, there's no extra effort.

            • comfydragon 1 hour ago
              > The branch itself gets created with the ticket number and everything follows from that, there's no extra effort.

              Only problem there is the potential for a deeply-ingrained assumption that the Jira key being in the branch name is sufficient for the traceability between the Jira issue and commits to always exist. I've had to remind many people I work with that branch names are not forever, but commit messages are.

              Hasn't quite succeeded in getting everyone to throw a Jira ID in somewhere in the changeset, but I try...

            • cesarb 6 hours ago
              > But putting the ticket number in the commit ... that's basically automatic, I don't know why it should be that big a concern. The branch itself gets created with the ticket number and everything follows from that, there's no extra effort.

              That poster said "attach a JIRA Ticket to the PR", so in their case, it's not that automatic.

              • rootusrootus 5 hours ago
                A lot of Jira shops use the rest of the stack, so it becomes automatic. The branch is named automatically when created from a link on the Jira task. Every time you push it gives you a URL for opening the PR if you want, and everything ends up pre-filled. All of the Atlassian tools recognize the format of a task ID and hyperlink automatically.

                I haven't dealt with non-Atlassian tools in a while but I assume this is pretty much bog standard for any enterprise setup.

              • alexpotato 5 hours ago
                If you are using the Atlassian Git clone then just putting the JIRA ticket in the title automagically links the PR to the ticket.
            • sodapopcan 6 hours ago
              Ah ya, death-by-a-thousand-cuts is certainly a charitable take!
        • p2detar 9 hours ago
          Strange, I thought this is actually the norm. Our PRs are almost always tagged with a corresponding Jira ticket. I think this is more helpful to developers than to other roles, because it allows them to have history of what has been fixed.

          One can also point QA or consultants to a ticket for documentation purposes or timeline details.

      • babarock 5 hours ago
        You're not wrong, however the issue is that it's not always easy to detect if a PR includes proof that the change works. It requires that the reviewer interrupts what they're doing, switch context completely and look at the PR.

        If you consider that reviewer bandwidth is very limited in most projects AND that the volume of low-effort-AI-assisted PR has grown incredibly over the past year, now we have a spam problem.

        Some of my engineers refuse to review a patch if they detect that it's AI-assisted. They're wrong, but I understand their pain.

        • wiml 4 hours ago
          I don't think we're talking about merely "AI-assisted" PRs here. We're talking about PRs where the submitter has not read the code, doesn't understand it, and can't be bothered to describe what they did and why.

          As a reviewer with limited bandwidth, I really don't see why I should spend any effort on those.

    • brooke2k 2 hours ago
      I used to be much more descriptive along these lines with my PRs, but what I realized is that nobody reads the descriptions, and then drops questions that are directly answered in the description anyways.

      I've found that this gets worse the longer the description is, and that a couple bullet points of the most important things gets the information across much better.

    • Swannie 1 hour ago
      If only there were community standards for this...

      Oh, there are, for years :D This has really stood the test of time:

      https://rfc.zeromq.org/spec/42/#24-development-process

      And its rationale is well explained too:

      https://hintjens.gitbooks.io/social-architecture/content/cha...

      Saddened by realizing that Pieter would have had amazing things to say about AI.

    • reactordev 10 hours ago
      I do this too with our PR templates. They have the ticket/issue/story number, the description of the ask (you can copy pasta from ticket). Current state of affairs. Proposed changes. Post state of affairs. Mood gif.
    • toomuchtodo 10 hours ago
      This is how PRs should be, but rarely are (in my experience as a reviewer, ymmv, n=1). Keep on keepin' on.
    • bob1029 9 hours ago
      > I try to include at least a screenshot

      This is ~mandatory for me. Even if what I am working on is non-visual. I will take a screenshot of a new type in my IDE and put a red box around it. This conveys the focus of my attention and other important aspects of the work effort.

      • mh- 3 hours ago
        Please just use text for that. PR descriptions on GitHub sufficiently support formatting.
        • simonw 3 hours ago
          Text isn't good for things like "tighten up the spacing in this dialog".
  • vladsh 7 hours ago
    We should get back to the basic definition of the engineering job. An engineer understands requirements, translates them into logical flows that can be automated, communicates tradeoffs across the organization, and makes tradeoff calls on maintainability, extensibility, readability, and security. Most importantly, they’re accountable for the outcome, because many tradeoffs only reveal their cost once they hit reality

    None of this is covered by code generation, nor by juniors submitting random PRs. Those are symptoms of juniors (not only) missing fundamentals. When we forget what the job actually is, we create misalignment with junior engineers and end up with weird ideas like "spec-driven development"

    If anything, coding agents are a wake-up call that clarify what engineering profession is really about

    • newsoftheday 7 hours ago
      Agreed.

      https://read.engineerscodex.com/p/how-one-line-of-code-cause...

      When 10K LOC AI PR's are being created, sometimes by people who either don't understand the code or haven't reviewed the code their trying to submit; the 60 million dollar failure line is potentially lying in wait.

    • tete 6 hours ago
      Okay, then software engineers are not engineers.

      The whole reliability, etc. to many is not of much priority. Things got an absolutely shitshow and still everyone buys it.

      In other words the only outcome will be that people don't have or don't want to have engineers anymore.

      Companies are very much not interested in someone who does the above, but at most someone who sells or cosplays these things - if even.

      Cause that what creates income. They don't care if they sell crap, they care that they sell it and the cheaper they can produce the better. So money gets poured into marketing not quality.

      High quality products are not sought after. And fake quality like putting a computer or a phone in a box like jewelry, even if you throw that very box away the next time you walk by a trash bin. That's what people consider quality these days, even if it's just a waste of resources.

      And businesses choose products and services the same way as regular consumers, even when they want the marketing to make them feel good about it in a slightly different way, because marketing to your target audience makes sense. Duh!

      People are ready to pay more for having the premium label stamped on to something, pay more to feel good about it, but most of the time are very unwilling to pay for measurable quality, an engineer provides.

      It's scary, even with infrastructure the process seems to change, probably also due to corruption, but that's a whole other can of worms.

      > communicates tradeoffs across the organization

      They may do that. They may be recognized for it. But if the guy next door with the right cosplay says something like "we are professionals, look at how we have been on the market for X years" or "look at our market share" then no matter how far from reality the bullshitting is they'll be getting the money.

      At the beginning of the year/end of last year I learned how little expertise, professionalism and engineering are required to be a multi billion NASDAQ stock. For months I thought that it cannot possibly be, that the core product of a such a company displays such a complete lack of expertise in the core area(s). Yet, they somehow managed to convince management to just invest a couple more times of money than the original budget that was already seen as quite the stretch. Of course they promises didn't end being anywhere close to true, and they completely forgot to inform us (our management) about severe limitations.

      So if you are good at selling to management which you can be by pocketing consultants recommending you then things will work seemingly no matter what.

      > If anything, coding agents are a wake-up call that clarify what engineering profession is really about

      I believe what we need to wake up to or come to terms with is that our industry (everything that would go into NASDAQ) is a farce. Coding agents show that. It doesn't matter to create half-assed products if you come to sell them. You are selling your products to people. Doesn't matter if it's some guy at a hot dog stand or a CEO of a big successful company or going from house to house selling the best vacuum cleaner ever. What matters is you making people believe it would be stupid not to take your product.

      • order-matters 5 hours ago
        TBH I think Information Systems Engineering and Computer Engineering can just eat software engineers lunch at this point. the entire need for a separate engineering discipline on software was for low level coding. Custom hardware chips are easier to make for simple things and not a lot of need in low level coding anymore for more complex things means the focus is shifting back to either hardware choices or higher level system management

        I'd argue the only places left you really need low level coding fall under computer science. If you are a computer or systems engineer who needs to work with a lot of code then youll benefit from having exposure to computer science, but an actual engineering discipline for just software seems silly now. Not to mention pretty much all engineers at this point are configuring software tools on their own to some degree

        I think it's similar to how there used to be horse doctors as a separate profession from vets when horses were much more prominent in everyday life, but now they are all vets again and some of them specialize in horses

      • chasd00 3 hours ago
        > I believe what we need to wake up to or come to terms with is that our industry (everything that would go into NASDAQ) is a farce.

        the thing is, with software development, it's always been this way. Developers have just had tunnel vision for decades because they stare into an editor all day long instead trying to actually sell a product. If selling wasn't the top priority then what do you think would happen to your direct deposit? Software developers, especially software developers, live in this fantasy land where the believe their paycheck just happens automatically and always will. I think it's becoming critical that new software devs entering the workforce spend a couple years at a small, eat what you kill, consultancy or small business. Somewhere where they can see the relationship between building, selling, and their paycheck first hand.

        • heliumtera 35 minutes ago
          Technology has absolute qualities. Not a fantasy. Are you being paid to browse hacker news? Probl not, but here you are. Maybe you never considered this, but programming for other reasons other than a salary is a possibility. If those pesky programmers gave it all away, for free, what would be left for you to sell? In this case, would you leave technology? Would you go somewhere else and practice your selling there? Can't we defend building for the sake of building? Doing for the sake of having fun? Maybe you would be left with nothing to sell, I understand, but that's fine for me. Sorry.
    • venturecruelty 7 hours ago
      How do you square that with "use AI and get this feature done in three days or have your 'performance reviewed' with HR in the room"? Because I'm having trouble bridging that gap.

      Edit: help, the new org said the same thing. :(

      Edit 2: you guys, seriously, the HR lady keeps looking up at me and shaking her head. I don't think this is good. I tried to be a real, bigboy engineer, but they just mumbled something about KPIs and put me on a PIP.

      • rnewme 7 hours ago
        Uptime x customer satisfaction vs. stack of cards. If they don't understand engineering prepare CV and head over to org that does.
        • tete 5 hours ago
          I think people are getting used to stuff not working. People (like me) use crap like Teams, Slack, that web version of Office, Outlook, etc. on a daily basis and pour huge amounts money in. They use shit like Fortinet (the digital version of dream catchers) and so on.

          Things break. A lot. Doctors successful or not also deal with the same shitty IT on a daily basis.

          Nobody cares about engineering. It's about selling stuff, not about reliability, etc.

          And to some degree one is forced to use that stuff anyways.

          So sure you can go to a company understanding engineering, but if you do a job for salary you might lose out on quite a bit on it if you care for things like quality. We see this in so many different sectors.

          Sure there is a unicorn here and there that makes it for a while. And they might even go big and then they sell the company or change to maximizing profits, because that's the only way up when you essentially already made it (on top of one of the big players).

          For small projects/companies it depends if you have a way to still launch big, which you can usually do with enough capital. You can still make a big profit with a crappy product then, but essentially only once or twice. But then your goal also doesn't have to create quality.

          Microsoft and Fortinet for example wouldn't profit from adding (much) quality. They profit from hypes. So they now both do "AI".

          • rnewme 2 hours ago
            Yup, we are all definitely lowering the bar of what's acceptable when it comes to uptime and bugs. More features more hype x10 seems to be the standard approach to market, but there are still a lot of companies and teams where greybeards and rational folks remember and understand previous hype cycles/bubbles, and who appreciate and protect the engineering approach. It's just that they mostly hire/partner by reference, so it's kinda hard to exit the toxic bubble of startups and "growth hacking" enterprises.
  • layer8 9 hours ago
    I’d go further and say while testing is necessary, it is not sufficient. You have to understand the code and convince yourself that it is logically correct under all relevant circumstances, by reasoning over the code.

    Testing only “proves” correctness for the specific state, environment, configuration, and inputs the code was tested with. In practice that only tests a tiny portion of possible circumstances, and omits all kinds of edge and non-edge cases.

    • crabmusket 5 hours ago
      > "proves"

      I like using the word "demonstrates" in almost every case where people currently use the word "proves".

      A test is a demonstration of the code working in a specific case. It is a piece of evidence, but not a general proof.

      And these kinds of narrow ad-hoc proofs are fine! Usually adequate.

      To rephrase the title of TFA, we must deliver code that is demonstrated to work.

    • aspbee555 8 hours ago
      I find myself not really trusting just tests, I really need to try the app/new function in multiple ways with the goal of breaking it. In that process I may not break it but I will notice something that might break, so I rewrite it better
      • lanstin 8 hours ago
        If you don't push your system to failure, you can't really say you understand it. And anyways the precise failure modes under various conditions are important characteristics for stability/resiliency. (Does it shed load all the way upto network bandwidth of SYNs; allocate all the memory and then exit; freeze up with deadlocks/disk contention; go unresponsive for a few minutes then recover if the traffic dies off; answer health check pings only and not progress on actual work).
      • Nizoss 8 hours ago
        If you write your tests the Test-Driven Development way in that they first fail before production changes are introduced, you will be able to trust them a lot more. Especially if they are well-written tests that test behavior or contracts, not implementation details. I find that dependency injection helps a lot with this. I try to avoid mocking and complex dependencies as much as possible. This also allows me to easily refactor the code without having to worry about breaking anything if all the tests still pass.

        When it comes to agentic coding. I created an open source tool that enforces those practices. The agent gets blocked by a hook if it tries to do anything that violates those principles. I think it helps a lot if I may say so myself.

        https://github.com/nizos/tdd-guard

        Edit: I realize now that I misunderstood your comment. I was quick to respond.

    • roeles 6 hours ago
      Since we can't really formally prove most code, I think property based testing such as with hypothesis[1] would make sense. I have not used it yet, but am about to for stuff that really needs to work.

      [1] https://news.ycombinator.com/item?id=45818562

      • xendo 4 hours ago
        We can't really property test most code. So it comes down, as with everything, to good judgement and experience.
        • epgui 2 hours ago
          You can property test most code.
    • Yodel0914 1 hour ago
      Came to leave the same comment. It’s very possible to deliver code that’s proven to work, that is still shit.
    • array_key_first 5 hours ago
      I agree - it's trivial to write 100% test coverage if your code isn't robust and resilient and just does "happy path" type stuff.
    • anthonypasq 6 hours ago
      if your tests cover the acceptance criteria as defined in the ticket, why is all htat other stuff necessary?
      • layer8 1 hour ago
        If your acceptance criteria state something like “produces output f(x) for any inout x, where f(x) is defined as follows: […]”, then you can’t possibly test that, because you can’t test all possible values of x. And if the criteria don’t state that, then they don’t cover the full specification of how the software is expected to behave, hence you have to go beyond those criteria to ensure that the software always behaves as expected.

        You can’t prove that something is correct by example. Examples can only disprove correctness. And tests are always only examples.

      • Yodel0914 1 hour ago
        Because AC don’t cover non-functional things like maintainability/understandability, adherence to corporate/team standards etc.
      • sunsetMurk 6 hours ago
        Acceptance criteria are often buggy themselves, and require more context to interpret and develop a solution.
        • otterley 5 hours ago
          If you don't have sufficiently detailed acceptance criteria, how can anyone be expected to write code to satisfy them?

          That's why you have to start with specifications. See, e.g., https://martinfowler.com/articles/exploring-gen-ai/sdd-3-too...

          • 9rx 4 hours ago
            I wonder how many more times we'll rebrand TDD (BDD, SDD)?

            Just 23 more times? ADD, CDD, EDD, DDD, etc.

            Or maybe more?! AADD, ABDD, ACDD, ..., AAADD, AABDD, etc.

            • pydry 4 hours ago
              BDD is different, it is a way of gathering requirements.

              As is, SDD it is some sort of AI nonsense.

              • otterley 3 hours ago
                Developers who aren't yet using AI would benefit from specs as well. They're good to have whether it's you or an LLM that's writing code. As a general rule, the clearer and less ambiguous the criteria you have, the better.
    • shepherdjerred 8 hours ago
      A good type system helps with this quite a lot
      • crazygringo 8 hours ago
        It helps some. There are plenty of errors, a large majority I'd say, where types don't help at all. Types don't free up memory or avoid off-by-one errors or keep you from mixing up two counter variables.
    • 9rx 4 hours ago
      Testing is not perfect, but what else is there? Even formal proofs are just another expression of testing. With greater mathematical guarantees than other expressions, granted, but still testing all the same; prone to all the very same human problems testing is burdened with.
      • layer8 1 hour ago
        The difference with proofs (whether formal or informal) is that they quantify over all possible cases, whereas testing is always limited to specific cases.
    • user34283 8 hours ago
      I'd go further and say vibe coding it up, testing the green case, and deploying it straight into the testing environment is good enough.

      The rest we can figure out during testing, or maybe you even have users willing to beta-test for you.

      This way, while you're still on the understanding part and reasoning over the code, your competitor already shipped ten features, most of them working.

      Ok, that was a provocative scenario. Still, nowadays I am not sure you even have to understand the code anymore. Maybe having a reasonable belief that it does work will be sufficient in some circumstances.

      • doganugurlu 2 hours ago
        How often do you buy stuff that doesn't work, and you are OK with the provider telling you "we had a reasonable belief that it worked"?

        How are we supposed to use software in healthcare, defense, transportation if that's the bar?

        • user34283 45 minutes ago
          There's a lot of functionality in the frontend that I am building that I did not review. If it worked in testing, that's good enough.

          You're free to review every line the model produces. Not every project is in healthcare or defense, and sometimes different standards apply.

      • TheTxT 8 hours ago
        This approach sounds like a great way to get a lot of security holes into your code. Maybe your competitors will be faster at first, but it’s probably better to be a bit slower and not leaking all your users data.
        • user34283 7 hours ago
          I'm mostly thinking about the frontend.

          If I had a backend API that was serving user data, I'd of course check more carefully.

          This kind of mistake always seemed amateurish to me.

    • simianwords 8 hours ago
      I would like to challenge this claim. I think LLMs are maybe accurate enough that we don't need to check every line and remember everything. High level design is enough.
      • abathur 7 hours ago
        I've been tasked with doing a very superficial review of a codebase produced by an adult who purports to have decades of database/backend experience with the assistance of a well-known agent.

        While skimming tests for the python backend, I spotted the following:

            @patch.dict(os.environ, {"ENVIRONMENT": "production"})
            def test_settings_environment_from_env(self) -> None:
                """Test environment setting from env var."""
                from importlib import reload
        
                import app.config
        
                reload(app.config)
        
                # Settings should use env var
                assert os.environ.get("ENVIRONMENT") == "production"
        
        This isn't an outlier. There are smells everywhere.
      • stuffn 8 hours ago
        I have plenty of anecdata that counters your anecdata.

        LLMs can generate code that works. That much is true. You can generate sufficiently complex projects that simply run on the first (or second try). You can even get the LLM to write tests for the code. You can prompt it for 100% test coverage and it will provide you exactly what you want.

        But that doesn't mean OP isn't correct. First, you shouldnt be remembering everything. If you are finding yourself remembering everything your project is either small (I'd guess less than 1000 lines) or you are overburdened and need help. Reasoning, logically, through code you write can be done JIT as you're writing the code. LLMs even suffer from the same problem. Instead of calling it "having to remember to much" we refer to it as a quantity called "context window". The only problem is the LLM won't prompt you telling you that it's context window is so full it can't do it's job properly. A human will.

        I think an engineer should always be reasoning about their code. They should be especially suspicious of LLM generated code. Maybe I'm alone but if I use an LLM to generate code I will review it and typically end up modifying it. I find even prompting with something like "the code you write should be maintainable by other engineers" doesn't produce good value.

      • newsoftheday 7 hours ago
        My jaw hit the table when I read that. Just checking here but, are you being serious?
  • Swannie 51 minutes ago
    Posted down thread, but worth posting as a comment too.

    I know Simon follows this "Issue First" style of work in his projects, with a strong requirement for passing tests to be included.

    It's been a best practice for a long time. I really enjoyed this when I read it ~10 years ago, and it still stands the test of time:

    https://rfc.zeromq.org/spec/42/#24-development-process

    The rationale was articulated clearly in:

    https://hintjens.gitbooks.io/social-architecture/content/cha...

    If you have time, do yourself a favour and read the whole lot. And then liberally copy parts of C4 into your own process. I have advocated for many components of it, in many contexts, at $employer, and will continue to do so.

  • doganugurlu 3 hours ago
    I don't test my code because I think it's my duty. I test it because my personal motivation is to see it working! What's the point of writing code if I don't even get to see it run?!

    If someone's not even interested and excited to see their code work, they are in the wrong profession.

  • dfxm12 11 hours ago
    there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

    Is anyone else seeing this in their orgs? I'm not...

    • 0x500x79 9 hours ago
      I am currently going through this with someone in our organization.

      Unfortunately, this person is vibe coding completely, and even the PR process is painful: * The coding agent reverts previously applied feedback * Coding agent not following standards throughout the code base * Coding agent re-inventing solutions that already exist * PR feedback is being responded to with agent output * 50k line PRs that required a 10-20 line change * Lack of testing (though there are some automated tests, but their validations are slim/lacking) * Bad error handling/flow handling

      • nunez 6 hours ago
        > 50k line PRs that required a 10-20 line change

        This is hilarious. Not when you're the reviewer, of course, but as a bystander, this is expert-level enterprise-grade trolling.

      • LandR 9 hours ago
        Fire them?
        • JambalayaJimbo 16 minutes ago
          This is not really an option for your standard IC.
        • 0x500x79 9 hours ago
          I believe it is getting close to this. Things like this just take time though, and when this person talks to management/leadership they talk about how much they are producing and how everyone is blocking their work. So it becomes a challenging political maneuvering depending on the ability of certain leadership to see through the BS.

          (By my organization, I meant my company - this person doesn't report to me or in my tree).

      • gardenhedge 4 hours ago
        Just reject the PR?
    • briliantbrandon 10 hours ago
      I'm seeing a little bit of this. However, I will add that the primary culprits are engineers that were submitting low quality PRs before they had access to LLMs, they can just submit them faster now.
      • lm28469 10 hours ago
        LLMs are tools that make mediocre devs 100x more "productive" and good devs 2x more productive
        • jennyholzer2 10 hours ago
          From my vantage I would argue LLMs make good devs around 0.65x more productive
          • roblh 9 hours ago
            I think they make good devs 2x more productive for the first month, which then slowly declines as that good dev spends less time actually writing and understanding and debugging code until it falls well below the 1x mark. It’s basically a high interest loan people take against their own skills. For some people that loan might be worth it. Maybe they’re trying to change their role in an organization and need the boost to start taking up new responsibilities they want to own. I think it’s temporary though. The slow shift into “skim mode”, where the authors just don’t quite put that same amount of effort into understanding what’s being churned out. I dunno, that’s just what I’ve seen.
            • candiddevmike 8 hours ago
              Because there's a mental overhead when you're not writing the code that is arguably worse than when you are writing the code. No one is talking about this enough IMO but that's why everyone is so exhausted when using LLMs and end up just pulling the slot machine until it works without actually reading it.

              Reading code sucks, it always has. The flow state we all crave is when the code is in our working memory as an understood construct and we're just translating our mental model to a programming language. You don't get that with LLMs. It devolves into prorgamming minutae equivalent to "a little to the left" but with the added complexity that "left" is hundreds of lines of code.

            • AstroBen 2 hours ago
              I really feel this myself.

              If I write home-grown organic code then I have no choice but to fully understand the problem. Using an LLM it's very easy to be lazy, at least in the short term

              Where does that get me after 3 months? I end up working on a codebase I barely understand. My own skills have degraded. It just gets worse the longer you go

              This is also coming from my experience in the best case scenario: I enjoy coding and am working on something I care about the quality of. Lots of people don't have even that

          • coffeebeqn 3 hours ago
            I just spent a day trying to get Claude to write reasonable unit tests and then after sleeping on it, reverted everything and did it myself. I’m not gonna be using it for a while because it 0.5x’d me once again
          • dsego 9 hours ago
            I think on average a dev can be x percent more productive, but there is a best case and worst case scenario. Sometimes it's a shortcut to crank out a solution quickly, other times the LLM can spin you in circles and you lose the whole day in a loop where the LLM is fixing its own mistakes, and it would've been easier to just spend some time working it out yourself.
          • bluGill 10 hours ago
            Good devs are still learning how to use LLMs, and so are willing to accept the 0.65x once in a while. Any complex tool will have a learning curve. Most tools improve over time. As such good devs either have found how to use LLMs to make them more productive (probably not 10x, but even 1.1x is something), or they try them again every few months to see if things are better.
            • jennyholzer2 9 hours ago
              you are bending over backwards to figure out how to put "1.1x" in your comment

              the idea that LLMs make developers more productive is delusional.

              • simonw 9 hours ago
                Hi, delusional developer reporting for duty here.
                • Avicebron 8 hours ago
                  How are you measuring productivity these days Simon? Do you have a boss that has certain expectations? If you don't hit those are you going to lose your house?
                  • simonw 8 hours ago
                    I work for myself, so mainly through guilt and self-doubt.
                    • wiml 4 hours ago
                      One of the things LLMs are demonstrably good at is eliminating self-doubt. That's why they're so disastrous.
          • square_usual 9 hours ago
            Yep, that's why very accomplished, widely regarded developers like Mitchell Hashimoto and Antirez use them. They need to make programming more challenging to keep it fun.
            • jennyholzer2 9 hours ago
              developers or cult leaders
              • swah 6 hours ago
                Mitchell shares the Amp threads on how he delivered some smaller features/fixes.
        • chasd00 3 hours ago
          LLMs are great at spewing content and code is a form of "content". I think what we're seeing is software development turning into youtube. Content creators cranking out content, some is great, most is meh, a lot is really bad. I do find it all a bit funny and ironic. My wife was a journalist and bemoaned news blogs and social media for terrible terrible writing claiming it was journalism. She would tell me about how much work quality journalism is and all the mistakes these bloggers and social media make and how detrimental it was to society at large blah blah blah

          Now the power to create tons and tons of code (ie content) is in the hands of everyone and here we are complaining about it just like my wife use to complain about journalism. I think the myth of the highly regarded Software Developer perched in front of the warming glow of a screen solving and automating critical problems is coming to an end. Deservedly really, there's nothing more special about typing words into an editor than, say, framing a house. The novelty is over.

        • lunar_mycroft 8 hours ago
          [citation needed]. No study I've seen shows an even 50% productivity improvement for programming, let alone a 100% or 9900% improvement.
      • dfxm12 10 hours ago
        What's the ratio of people who things the right way vs not? I mean, is it a matter of giving them feedback to remind them what a "quality PR" is? Does that help?
        • briliantbrandon 10 hours ago
          It's roughly 1/10 that are causing issues. Not a huge deal but dealing with them inevitably takes up a couple hours a week. We also have a codebase that is shared with some other teams and our primary offenders are on one of those separate teams.

          I think this is largely an issue that can be solved culturally within a team, we just unfortunately only have so much input on how other teams work. It doesn't help either when their manager doesn't seem to care about the feedback... Corporate politics are fun.

          • dfxm12 10 hours ago
            Yeah, I mean to get back to the original statement in the blog, this seems like less of a tech issue and more of a culture issue. The LLM enables the junior to do this once. It's the team culture that allows them to continue doing it.
        • jennyholzer2 10 hours ago
          LLMs have dramatically empowered sociopath software developers.

          If you are sufficiently motivated to appear more "productive" than your coworkers, you can force them to review thousands of lines of incorrect AI slop code while you sit back and mess around with your chatbots.

          Your coworkers no longer have enough time to work on their in-progress PRs, so you can dominate the development team in terms of LOC shipped.

          Understand that sociopaths are skilled at navigating social and bureaucratic environments. A sociopath who ships the most LOC will get the promotion every single time.

          • andy99 10 hours ago
            Only if leadership lets them. Right now (anecdotally) a lot of “leaders” don’t understand the difference between AI generated and human generated work, and just look at loc as productivity so all incentives are on AI coding, but that will change.
            • heliumtera 10 hours ago
              It will never change. Managers will consider every stupid metric players push to sell their solutions. Be it code coverage, extensive CI/CD pipelines with useless steps, "productivity gains" with gen tools. The gen tools euphoria is stupid and will cease to exist, but before this was bdd,tdd,DDD, test before, test after, test your mocks, transpile to a different language and then ignore the output, code maturity, best practices, oop, pants in head oriented programming... There is always something stupid on the horizon this is certainly not the last stupid craze
    • zx2c4 10 hours ago
      Voila:

      https://github.com/WireGuard/wireguard-android/pull/82 https://github.com/WireGuard/wireguard-android/pull/80

      In that first one, the double pasted AI retort in the last comment is pretty wild. In both of these, look at the actual "files changed" tab for the wtf.

      • newsoftheday 7 hours ago
        That's a good example of what we're seeing as leads, thanks.
      • IshKebab 9 hours ago
        Yeah this guys comment here is spot on: https://github.com/WireGuard/wireguard-android/pull/80#issue...

        I recently reviewed a PR that I suspect is AI generated. It added a function that doesn't appear to be called from anywhere.

        It's shit because AI is absolutely not on the level of a good developer yet. So it changes the expectation. If a PR is not AI generated then there is a reasonable expectation that a vaguely competent human has actually thought about it. If it's AI generated then the expectation is that they didn't really think about it at all and are just hoping the AI got it right (which it very often doesn't). It's rude because you're essentially pawning off work that the author should have done to the reviewer.

        Obviously not everyone dumps raw AI generated code straight into a PR, so I don't have any problem with using AI in general. But if I can tell that your code is AI generated (as you easily can in the cases you linked), then you've definitely done it wrong.

    • fnands 10 hours ago
      A friend of mine is working for a small-ish startup (11 people) and he gets to work and sees the CTO push 10k loc changes straight to main at 3 am.

      Probs fine when you are still in the exploration phase of a startup, scary once you get to some kind of stability

      • ryandrake 10 hours ago
        I feel like this becomes kind of unacceptable as soon as you take on your first developer employee. 10K LOC changes from the CTO is fine when it's only the CTO working on the project.

        Hell, for my hobby projects, I try to keep individual commits under 50-100 lines of code.

        • bonesss 6 hours ago
          Templates and templating languages are still a thing. Source generators are a thing. Languages that support macros exist. Metaprogramming is always an option. Systems that write systems…

          If these AIs are so smart, why the giant LOCs?

          Sure, it’s cheaper today than yesterday to write out boilerplate, but programming is about eliminating boilerplate and using more powerful abstractions. It’s easy to save time doing lots of repetitive nonsense, stopping the nonsense should be the point.

      • coffeebeqn 3 hours ago
        I worked with a “CTO” who did that before LLMs - one of the worst jobs I have had in the last 10 years. I spent at least 50% of my time putting out fires or refactoring his garbage code
      • peab 9 hours ago
        Lol I worked at a startup where the CTO did this. The problem was that it was pure spaghetti code. It was so bad it kept me up at night, thinking about how to fix things. I left within 30 days
      • tossandthrow 10 hours ago
        The cto is ultimately responsible for the outcome and will be there at 4am to fix stuff.
        • pjc50 9 hours ago
          Yes .. and no. Someone who does this will definitely make the staff clean up after them.
      • jimbohn 9 hours ago
        I'd go mental if I was a SWE having to mop that up later
      • titzer 10 hours ago
        That's...idiotic.
        • jennyholzer2 10 hours ago
          [flagged]
          • titzer 10 hours ago
            I mean, I've vibe-coded a few useful single-file HTML tools, but checking in 10kloc at 3am into the production database...by the CTO...omg.
          • 204957065897 10 hours ago
            [flagged]
    • davey48016 10 hours ago
      A friend of mine has a junior engineer who does this and then responds to questions like "Why did you do X?" with "I didn't, Claude did, I don't know why".
      • tossandthrow 10 hours ago
        That would be an immidiate reason of termination in my book.
        • fennecfoxy 10 hours ago
          Yes, if they can't debug + fix the reason the production system is down or not working correctly then they're not doing their job, imo.

          Developers aren't hired to write code that's never run (at least in my opinion). We're also responsible for running the code/keeping it running.

      • jennyholzer2 10 hours ago
        no hate but i would try to fire someone for saying that
      • Ekaros 9 hours ago
        I think words that would follow from me would get me send to HR...

        And if it was repeated... Well I would probably get fired...

      • insin 8 hours ago
        See also "Why did you do X?" → Flurry of new commits → Conversation marked as resolved

        And not just from juniors

      • gardenhedge 4 hours ago
        Some other comments suggest immediately firing.. but a junior engineer needs to be mentored. It should be explained to them clearly that they need to understand the changes they have made. They should also be pointed towards the coding standards and SDLC documentation. If they refuse to change their ways, then firing makes sense.
    • JambalayaJimbo 16 minutes ago
      I’ve been seeing obviously LLM generated PRs, but not huge ones.
    • stackskipton 10 hours ago
      Yep. Remember, people not posting on this website are just grinding away at jobs where their individual output does not matter, and entire motivation is work JUST hard enough not to get fired. They don't get stock grants, extremely favorable stock options or anything else, they get salary and MAYBE a small bonus based off business factors they have little control over.

      My eyes were wide open when 2 jobs ago, they said they would be blocking all personal web browsing from work computers. Multiple Software Devs were unhappy because they were using their work laptop for booking flights, dealing with their kids schools stuff and other personal things. They did not have personal computer at all.

      • nutjob2 8 hours ago
        They don't have phones?
        • stackskipton 6 hours ago
          They do but obviously laptop is easier than doing it on their phone. That’s what most of them ended up doing.
    • mrkeen 7 hours ago
      I don't see most PRs because they happen in other teams, but I am part of Slack channel where there are too many "oops" messages for my liking.

      I.e. 1-2 times a month, there's an SQL script posted that will be run against prod to "hopefully fix data for all customers who were put into a bad state from a previous code release".

      The person who posts this type of message most often is also the one running internal demos of the latest AI flows and trying to get everyone else onboard.

    • peab 9 hours ago
      Definitely seeing a bit of this, but it isn't constrained to junior devs. It's also pretty solvable by explaining to the person why it's not great, and just updating team norms.
    • kaffekaka 10 hours ago
      I thought we were not, but we had just been lucky. A sequence of events lately have shown that the struggle is real. This was not a junior developer though, but an experienced one. Experience does not equal skill, evidently.
    • Yodel0914 1 hour ago
      Not so much the huge PRs, but definitely the LLM generated code that the “developer” doesn’t understand.
    • jennyholzer2 10 hours ago
      i left my last job because this was endemic
    • iamflimflam1 9 hours ago
      I'm seeing it on some open source projects I maintain. Recently had 10 or so PRs come in. All very valid features - but from looking at them, not actually tested.
    • zahlman 10 hours ago
      Quite a few FOSS maintainers have been speaking up about it.
    • nbaugh1 9 hours ago
      Not at all. Submitting untested PRs is a wildly outside of my experience. Having tests written to cover your code is a pre-requisite for having your PR reviewed on our team. "Does it work" aka passing manual testing, is literally the bare minimum before submitting a PR
      • ncruces 8 hours ago
        If it's all vibe coded, how do you know — without review — that the new tests, for a new feature, test anything useful at all?
    • bluGill 10 hours ago
      It isn't only junior engineers, but otherwise. It is a small number of people from all levels.

      People do what they think they will be rewarded for. When you think your job is to write a lot of code then LLMs are great. When you need quality code you start to ask if LLMs are better or not?

    • eudamoniac 9 hours ago
      I started seeing it from a particularly poor developer sometime last year. I was the only reviewer for him so I saw all of his PRs. He refused to stop despite my polite and then not so polite admonishments, and was soon fired for it.
    • nunez 6 hours ago
      I feel like a story about some open-source project getting (and rejecting) mammoth-sized PRs hits HN every week!
    • ncruces 8 hours ago
      Yes, in the only successful OSS project that I “maintain.”

      Fully vibe coded, which at least they admitted. And when I pointed out the thing is off by an order of magnitude, and as such doesn't implement said feature — at all — we get pressed on our AI policy, so as to not waste their time.

      I don't have an AI policy, like I don't have an IDE policy, but things get ridiculous fast with vibe coding.

    • neutronicus 9 hours ago
      I'm not either

      But LLMs don't really perform well enough on our codebase to allow you to generate things that even appear to work. And I'm the most junior member of my team at 37 years of age, hired in 2019.

      I really tried to follow the mandate from on high to use Copilot, but the Agent mode can't even write code that compiles with the tools available to it.

      Luckily I hooked it up to gptel so I can at least ask it quick questions about big functions I don't want to read in emacs.

      • notpachet 1 hour ago
        > And I'm the most junior member of my team at 37 years of age

        This sounds fucking awesome.

    • hexbin010 10 hours ago
      Similar, at my last job. And the pushback was greater because super duper clever AI helped write it, who obviously knows more than any other senior engineer could know, so they were expecting an immediate PR approval and got all uppity when you tried to suggest changes.
      • endemic 9 hours ago
        Hah! I've been trying to push back on this sort of thought. The bot writes code for you, not you for the bot.
    • bdangubic 9 hours ago
      first time we’d see this there would be a warning, second one is pink slip
    • x3n0ph3n3 10 hours ago
      It's been a struggle with a few teammates that we are trying to solve through explicit policy, feedback, and ultimately management action.
      • dfxm12 10 hours ago
        Yeah, a slice of this is technology related, but it's really a policy issue. It's probably easier to manage with a tighter team. Maybe I'm taking team size for granted.
    • wizzwizz4 11 hours ago
      It's not a new phenomenon. Time was, people would copy-paste from blog posts with the same effect.
      • lm28469 10 hours ago
        Always the same old tiring "this has always been possible before in some remotely similar fashion hence we should not criticise anything ever again" argument.

        You could intuitively think it's just a difference of degree, but it's more akin to a difference of kind. Same for a nuke vs a spear, both are weapons, no one argues they're similar enough that we can treat them the same way

      • nunez 6 hours ago
        Yeah, but being able to produce nuclear-sized 10k+ LOC PRs to open-source projects in minutes with relatively-zero effort definitely is. At least you had to use your brain to know which blog posts/SO answers to copypasta from.
      • evilduck 10 hours ago
        I would bet in most organizations you can find a copy-pasted version of the top SO answer for email regex in their language of choice, and if you chase down the original commit author they couldn't explain how it works.
        • 1-more 9 hours ago
          I think it's impossible to actually write an email regex because addresses can have arbitrarily deeply nested escaping. I may have that wrong. I'd hope that regex would be .+@.+ and that's it (watch me get Cunninghammed because there is some valid address wherein those plusses should be stars).
      • troyvit 10 hours ago
        I used to do that in simpler days. I'd add a link to where I copied it from so we could reference it if there were problems. This was for relatively small projects with just a few people.
        • jennyholzer2 9 hours ago
          > I'd add a link to where I copied it from

          LLMs can't do this.

          Your code is unambiguously better than any LLM code if you can comment a link to the stackoverflow post you copied it from.

          • newsoftheday 7 hours ago
            Agreed on the first part for sure since an LLM is the computer/software version of a blender.

            So, I'm agreed on the second part too then.

          • lcnPylGDnU4H9OF 8 hours ago
            > Your code is unambiguously better than any LLM code if you can comment a link to the stackoverflow post you copied it from.

            This is not a truism. "My" code might come from an LLM and that's fine if I can be reasonably confident it works. I might try to gain that confidence by testing the code and reading it to understand what it's doing. It is also true of blog post code, regardless of how I refer to the code; if I link to the blog post, it's because it does a better job of explaining than I ever could in code comments. Whether LLMs make one more productive is hard to measure but it seems to be missing the point to write this.

            The point is, including the code is a choice and one should be mindful of it, no matter the code's origin. At that point, this comes off like you just have something to prove; there doesn't seem to be a reason not to use the LLM code if you know it works and you know why it works.

            • wizzwizz4 39 minutes ago
              Believing you know how it works and why it works is not the same as that actually being the case. If the code has no author (in that it's been plagiarised by a statistical process that introduces errors), there's nowhere to go if you realise "oops, I didn't understand that as well as I had thought!".
      • bgwalter 9 hours ago
        I don't see the problem with fentanyl given that people have been using caffeine forever.
  • trevor-e 9 hours ago
    > Your job is to deliver code you have proven to work.

    Strong disagree here, your job is to deliver solutions that help the business solve a problem. In _most_ cases that means delivering code that you should be able to confidently prove satisfies the requirements like the OP mentioned, but I think this is an important nitpick distinction I didn't understand until later on in my career.

    • newsoftheday 8 hours ago
      There's no distinction there, proving the work is correct is within the scope of helping the business solve a problem; not without and not beside it. So your point is hot air, making a distinction where none exists.
      • casey2 1 hour ago
        The distinction does matter. Requirements can be and often are wrong when the rubber meets the road. If the cost of implementing requirements correctly is $1,000,000 and the value of the product is $1000 then even choosing to clarify requirements has failed the business. The non failure mode here is to write the code for less than $1,000. Even if the code doesn't work at all!
    • tech-ninja 7 hours ago
      > In _most_ cases that means delivering code that you should be able to confidently prove satisfies the requirements like the OP mentioned

      That is an insane distinction that you are trying to do there. In which cases delivering code that doesn't satisfy the requirements would solve a business problem?

      • adrianmonk 5 hours ago
        I will make up some numbers for the sake of illustration. Suppose it takes you half as long to develop code if you skip the part where you make sure it works. And suppose that when you do this, 75% of the time it does work well enough to achieve its goal.

        So then, in a month you can either develop 10 features that definitely work or 20 features that have a 75% chance of working. Which one of these delivers more value to your business?

        That depends on a lot of things, like the severity of the consequences for incorrect software, the increased chaos of not knowing what works and what doesn't, the value of the features on the list, and the morale hit from slowly driving your software engineers insane vs. allowing them to have a modicum of pride in their work.

        Because it's so complex, and because you don't even have access to all the information, it's hard to actually say which approach delivers more value to the business. But I'm sure it goes one way some of the time and the other way other times.

        I definitely prefer producing software that I know works, but I don't think that it's an absurd idea the other way delivers more business value in certain cases.

      • claar 7 hours ago
        IMO, it's not insane at all - I didn't understand this until I moved from developer to executive.

        Solving the business need has precedence over technical correctness.

        Satisfying "what I think the requirements are" without considering the business need causes most code rework in my experience.

      • p2detar 4 hours ago
        I didn’t read it this way, but I admit the comment is somewhat vague. I thought GP meant that not all solutions require delivering code. In my job when I get a support inquiry, I first try to think what exactly is the customer’s end-goal. Often support didn’t get what the customer’s real pain is. Some solutions are reduced to pointing them to some unusual workflow that solved their problem. This way I don’t need to tinker any code at all.
      • trevor-e 2 hours ago
        No, what I meant is sometimes the solution is not delivering any code at all.

        Many times in my career, after understanding the problem at hand and who initiated it, I realized the solution is actually one of:

        1) a people/organizational problem, not technical 2) doesn't make sense to code a complicated system when it could be a simple Google Sheet 3) the person actually has a completely different problem 4) we already have a solution they didn't know about

        My issue with the OP is that it highly emphasizes delivering code. We are not meant to be code monkeys, we are solving problems at the end of the day. Many people I've met throughout my career forget that and immediately jump into writing code because they think that's their job.

      • nrhrjrjrjtntbt 6 hours ago
        When you can ship Redis for example instead of rolling your own cache.
      • theshrike79 4 hours ago
        Not all problems are "work" vs "doesn't work".

        We're not talking about making a calculator that can't calculate 1+1. This might be a website that's a bit slow and janky to use.

        25% of users go away because it's shit, but 75% stay. And it would've too much effort to push the jank to zero and retain a 100%.

        A website that takes juuuust too long to load still "satisfies requirements" in most cases, especially when making loading instant carries a significant extra cost the customer isn't willing to pay for.

    • antod 6 hours ago
      It's more than just solving a problem though, you should be solving the given problem without creating new problems. This is where the working/secure code aspect comes in.
    • sharkjacobs 6 hours ago
      Maybe I'm not late enough in my career to understand what you're saying, but what kind of problems are you helping the business solve with code that hasn't been proven to work?
      • trevor-e 2 hours ago
        Sorry I wrote that hastily and my wording seems to have caused much confusion. Here's a rewrite:

        > The job is to help the business solve a problem, not just to ship code. In cases where delivering code actually makes sense, then yeah you should absolutely be able to prove it works and meets the requirements like the OP says. But there are plenty of cases where writing code at all is the wrong solution, and that’s an important distinction I didn’t really understand until later in my career.

        Although funnily enough, the meaning you interpreted also has its own merit. Like other commenters have mentioned, there's always a cost tradeoff to evaluate. Some projects can absolutely cut corners to, say, ship faster to validate some result or gain users.

      • SoftTalker 5 hours ago
        Getting a big customer to pay for a product that your sales team said could do X, Y, and Z but Y wasn't part of the product and now you need some plausible semblance of Y added so that you can send an invoice. If it doesn't work, that can be addressed later.
      • ambicapter 4 hours ago
        Getting a big sale by hacking together a demo that wouldn't scale up in the slightest without a complete rework of your backend.
    • nrhrjrjrjtntbt 6 hours ago
      > Strong disagree here, your job is to deliver solutions that help the business solve a problem.

      Sure. That is every job though. It is interesting to muse on. Hard for us to solve a problem without a computer (or removing one!)

      • gardenhedge 5 hours ago
        Yeah, more correctly, the description should be:

        "Your job is to deliver technical solutions that help the business solve a problem"

        Where the word technical does the work of representing your skill set. That means you won't be asked to create a marketing campaign (solution) to help the business sell more product (problem).

    • draw_down 8 hours ago
      Man we better hope the solution to that problem is working code. Otherwise we better start working the fryers or something.
  • mapontosevenths 10 hours ago
    I agree with this, except it glosses over security. Your job is to deliver SECURE code that you have proven to work.

    Manual and automatic testing are still both required, but you must explicitly ensure that security considerations are included in those tests.

    The LLM doesn't care. Caring is YOUR job.

  • andy99 11 hours ago
    I think the problem is in what “proven” means. People that don’t know any better will just do that all with LLMs and still deliver the giant untested PRs but with some LLM written tests attached.

    I vibe code a lot of stuff for myself, mostly for viewing data, when I don’t really need to care how it works. I’m coming around to the idea that outside of some specific circumstances where everyone has agreed they don’t need to care about or understand the code, team vibe coding is a bad practice.

    If I’m paying an engineer, it’s for their work, unless explicitly agreed otherwise.

    I think vibe coding is soon going to be seen the same way as “research” where you engage an offshore team (common e.g. in consulting) to give you a rundown on some topic and get back the first five google search results. Everyone knows how to do that, if it’s what they wanted they wouldn’t be hiring someone to do it.

    • simonw 10 hours ago
      That's why I emphasized the manual testing component as well. Attaching a screenshot or video of a feature working to your PR is a great way to prove that you've actually seen it work correctly - at least once, which is still a huge improvement over it not actually working at all.
      • Nizoss 8 hours ago
        Yes! This is something that I also value. Having demo gifs of before and after helps a lot. I have encountered situations where what I thought was a minor finishing clean up had an effect that I didn't anticipate. By including demos in the PR it becomes a kind of guardrail against those situations for me. I also think it is neat and generally helpful for everyone.
  • JoeAltmaier 10 hours ago
    The job, in the modern world, is to close tickets. The code quality is negotiable, because the entire automated software process doesn't measure code quality, just statistics.

    That's why I refuse to take part in it. But I'm an old-world craftsman by now, and I understand nobody wants to pay for working, well-thought-out code any more. They don't want a Chesterfield; they want plywood and glue.

    • whattheheckheck 9 hours ago
      I woke up and had a thought the software engineering isn't a serious engineering field if they actually fully shipped llms and expect everyone to use them. What do you expect quality wise from a profession that says that this is okay?
      • AlienRobot 9 hours ago
        Imagine if normal engineering did that. Engineers invent a "blobby" thing that glues things together. It has amazing properties that increase productivity but sometimes it just stops working for some reason and comes off. It's totally random and because of how blobby is produced there is no way to tell when it's going to work or not, contrary to the typical material. Anyway we're going to use blobby to build everything from schools, to bridges, to airplanes now.
        • redwall_hp 5 hours ago
          And, don't forget, software makes its way into airplanes a medical equipment and such, and has directly killed people. Therac and Boeing come to mind.

          I'm starting to be in favor of professional licensing for software engineering.

    • gadflyinyoureye 10 hours ago
      What do you do, O modern Luddite? Do you work for yourself making a product that people use? Are you on the government teat?
      • JoeAltmaier 7 hours ago
        Retired
        • johnea 3 hours ago
          > Retired

          You and me both, and for many of the same reasons.

          I would point out that in your OPs comment, Luddites get the stereotypical dismissal as anti-tech, which is far far from the reality of demanding good conditions for workers.

          For the modern s/w engineer, being granted the time and resources for adequate testing could be considered a "worker's rights" issue. In that context the Luddite allegation could be accurate.

          My comment is largely along the same lines:

          https://news.ycombinator.com/item?id=46313297#46319510

  • ChrisMarshallNY 4 hours ago
    In my last job (engineering manager for a Japanese high-Quality hardware manufacturer), we were expected to deliver software that works.

    In fact, if any bugs were found by the official "last step" QA Department, we (as a software development department) were dinged. If QA found bugs, they could stop the entire product release, so you did not want to be responsible for that.

    This resulted in each software development department setting up their own, internal "QC team" of testers. If they found bugs, then individual programmers (or teams) would get dinged, but the main department would not.

    Our software got a lot of testing.

    • jobs_throwaway 3 hours ago
      What does 'get dinged' mean? It seems like this would lead to a strong incentive against making any changes, lest you introduce bugs, perhaps due to no fault of your own.
      • ChrisMarshallNY 3 hours ago
        Yup. "Get dinged" was usually some kind of reprimand. It could go from being yelled at by the department manager, to being fired as a department.

        And yes. It was a strong disincentive to making changes.

        I didn't really like it, but our software did do what it said on the tin (which wasn't always ideal).

  • acituan 7 hours ago
    First problem is turning engineers into accountability sinks. This was a problem before LLMs too, but now a much bigger and structural problem with democratization of the capacity to produce plausible looking dumb code. You will be forced to underwrite more and more of that, and expected to absorb the downsides.

    The root cause is the second problem; short of formal verification you can never exhaustively prove that your code works. You can demonstrate and automate that demonstration for a sensible subset of inputs and states and hope for the state of the world approximately staying that way (spoiler: it won't). This is why 100% test coverage in most cases is something bad. This is why sensible is the key operative attitude, which LLM suck at right now.

    The root cause of that one is the third problem; your job is to solve a business problem. If your code is not helping the business problem, it actually is not working in the literal sense of the work. It is an artifact that does a thing, but it is not doing work. And since you're downstream of all the self-contradicting, ever changing requirements in a biased framing of a chaotic world, you can never prove or demonstrate that your code solves a business problem and that is the end state.

  • vcarrico 4 hours ago
    > the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

    I'm noticing something else very similar but involving not necessarily junior roles with long messages, when they use these AI writing assistants that resume stuff, creates follow-ups, etc. Putting this additional burden in whoever needs to read it. It makes me think of a quote that says: "I would have written a shorter letter, but I didn't have the time."

  • agentultra 10 hours ago
    There’s an anecdote from one of Djikstra’s essays that strikes at the heart of this phenomenon. I’ll paraphrase because I can’t remember the exact edw number off the top of my head.

    A colleague was working on an important subsystem and would ask Djikstra for a review when he thought it was ready. Djikstra would have to stop what he was doing, analyze the code, and would find a grievous error or edge case. He would point it out to the colleague who would then get back to work. The colleague would submit his code for review again and this could carry on enough times that Djikstra got annoyed.

    Djikstra proposed a solution. His colleague would have to submit with his code some form of proof or argument as to why it was correct and ready to merge. That way Djikstra could save time by only having to review the argument and not all of the code.

    There’s a way of looking at LLM output as Djikstra’s colleague. It puts a lot of burden on the human using this tool to review all of the code. I like Doctorow’s mental model of a reverse centaur. The LLM cannot reason and so won’t provide you with a sound argument. It can probably tell you what it did and summarize the code changes it made… but it can’t decide to merge those changes. It needs a human, the bottom half of the centaur, to do the last bit of work here. Because that’s all we’re doing when we let these tools do most of the work for us: we’re here to take the blame.

    And all it takes is an implementation of what we’re trying to build already, every open source library ever, all of SO, a GW of power from a methane power plant, an Olympic pool of water and all of your time reviewing the code it generates.

    At the end of the day it’s on you to prove why your changes and contributions should be merged. That’s a lot of work! But there’s no shortcuts. Luckily you can reason while the LLMs struggle with that so use it while you can when choosing to use such tools.

    • newsoftheday 6 hours ago
      Princess Bride, small excerpt, "Vizzini: You'd like to think that, wouldn't you? You've beaten my giant, which means you're exceptionally strong, so you could've put the poison in your own goblet, trusting on your strength to save you, so I can clearly not choose the wine in front of you." and the dialog goes on with Vizzini dominating it and arguing with himself. In the end, it came down to a coin toss, he picked up a goblet, drank and died.

      Anyone who allows a 10K LOC LLM generated PR to be merged without reviewing every single line, is doing the same thing, a coin toss.

      • agentultra 3 hours ago
        Worse for Vizzini; the LLM doesn’t even care about the truth. It was trained on all of the sloppy code we could find. Even if he reads every line of code he could miss the non-obvious bugs and expire anyway when management gets wind that it was his LLM generated code that led to the PII breach which cost them 10% of their share value in a week.

        At least a liar is trying to deceive you. Vizzini’s entire exercise is moot.

  • keeda 3 hours ago
    The way I would phrase it is: software engineering is the craft of delivering the right code at the right time, where "right code" means it can be trusted to do the "right thing."

    A bit clunky, but I think that can be scaled from individual lines of code to features or entire systems, whatever you are responsible for delivering, and encompasses all the processes that go around figuring what code is to be actually written and making sure it does what it's supposed to.

    Trust and accountability are absolutely a critical aspect of software engineering and the code we deliver. Somehow that is missed in all the discussions around AI-based coding.

    The whole phenomenon of AI "workslop" is not a problem with AI, it's a problem with lack of accountability. Ironically, blaming workslop on AI rather than organizational dysfunction is yet another instance of shirking accountability!

  • 0xbadcafebee 9 hours ago
    Actually it's more specific than that. A company pays you not just to "write code", not just to "write code that works", but to write code that works in the real world. Not on your laptop. Not in CI tests. Not on some staging environment. But in the real world. It may work fine in a theoretical environment, but deflate like a popped balloon in production. This code has no value to the business; they don't pay you to ship popped balloons.

    Therefore you must verify it works as intended in the real world. This means not shipping code and hoping for the best, but checking that it actually does the right thing in production. And on top of that, you have to verify that it hasn't caused a regression in something else in production.

    You could try to do that with tests, but tests aren't always feasible. Therefore it's important to design fail-safes into your code that ALERT YOU to unexpected or erroneous conditions. It needs to do more than just log an error to some logging system you never check - you must actually be notified of it, and you should consider it a flaw in your work, like a defective pair of Nikes on an assembly line. Some kind of plumbing must exist to take these error logs (or metrics, traces, whatever) and send it to you. Otherwise you end up producing a defective product, but never know it, because there's nothing in place to tell you its flaws.

    Every single day I run into somebody's broken webapp or mobile app. Not only do the authors have no idea (either because they aren't notified of the errors, or don't care about them), there is no way for me to even e-mail the devs to tell them. I try to go through customer support, a chat agent, anything, and even they don't have a way to send in bug reports. They've insulated themselves from the knowledge of their own failures.

    • cynicalsecurity 9 hours ago
      It gets interesting when a company assigns 2 story points to a task that requires 6 minimum. No time for writing tests, barely any time to perform code reviews and QA. Also, next year the company tells you since we have AI now, all tickets must be done 2 times quicker.

      Who popped this balloon? I know I need to change my employer, but it's not so easy. And I'm not sure another employer is going to be any better.

      • mystifyingpoi 7 hours ago
        Classic butchering of otherwise decent Scrum idea. If assigning 2 points means no tests, then you are already using story points wrong, and complaining about it is meaningless.
      • roryirvine 8 hours ago
        Are you not involved in doing the estimation?
        • theshrike79 4 hours ago
          Me: "Boss, this takes at least 4 weeks to complete properly including QA time."

          Boss: sucks in air through his teeth "Best I can do is one week. Get to it."

          Me, with a massive mortgage and the job market is shit: "Rogerroger, bossman"

        • asadotzler 7 hours ago
          involved in is meaningless. if 10 people at a table all offer their inputs, it doesn't matter if mine is the rational one, or even if I hedged against their irrationality with an inflated estimate, the 9 other estimates will dominate. That's the whole problem here, a lack of autonomy and a lack of expectations for responsibility. Make the developer responsible for the estimate, and hold them accountable for the results. Letting the organization make the estimate and then blaming an LLM for the failure is a recipe for company collapse.
  • 0x500x79 9 hours ago
    I think there are two other things missing: Security and Maintainability. Working code that can never be touched again by a developer or requires an excessive amount of time to maintain is not part of a developers job either.

    Overall, this hits the nail on the head about not delivering broken code and providing automated tests. Thanks for putting your thoughts on paper.

  • nzoschke 10 hours ago
    Having the coding agent make screenshots is a big power up.

    I’m experimenting with how to get these into a PR, and the “gh” CLI tool is helpful.

    Does anyone have a recipe to get a coding agent to record video of webflows?

    • simonw 9 hours ago
      Not yet. I'm confident Playwright will be involved in the answer, it has good video recoding features.
  • a24venka 4 hours ago
    There is a heavy emphasis on testing the code as the way to provide guarantees that it works. While this is a helpful tool, I often find that the best engineers are ones who take a more first principles approach to the code and can reason about why the solution is comprehensive (covers all edge cases) and clean (easy for humans and LLMs to build on).

    It often takes discipline to think and completely map out solutions before you build. This is where experience and knowing common patterns can also help.

    When you have the experience of having manually written or read a lot of code it helps at the very least quickly understand what the LLMs are writing and reason about it later even if not at the beginning.

  • holtkam2 5 hours ago
    I’d go further: it’s not enough to be able to prove that your code works. It’s required that you also understand why it works.

    Otherwise you’ll end up in situations where it passes all test cases yet fails for something unexpected in the real world, and you don’t know why, because you don’t even know what’s going on under the hood.

  • onion2k 9 hours ago
    I want to distill this post into some sort of liquid I can inject directly into my dev teams. It's absolutely spot on. Seeing a PR with a change that doesn't build is one of the most disappointing things.
    • ericmcer 8 hours ago
      The requirements in this article are... the bare minimum for a PR. Like yeah it needs to work is the no duh requirement. I have seen tons of PRs that work but defy conventions or add a bunch of useless cruft that we can rip out once I sit down and talk with them about what they did.

      When someone pings me for a review and their code isn't even passing CI builds/tests I just let them know its failing and don't even look at a line of their code.

      • simonw 8 hours ago
        > The requirements in this article are... the bare minimum for a PR.

        Yeah, I'm a bit sad I felt the need to write this to be honest.

  • cyrialize 8 hours ago
    When I start working on a ticket I actually start writing up a PR early on.

    As I figure out my manual testing, I'll write out the steps that I took in my PR.

    I've found that writing it out as I go does two things: 1) It makes it easier to have a detailed PR and 2) it acts as a form of rubber-ducking. As I'm updating my PR I'll realize steps I've missed in my testing.

    Something that also helped out with my manual testing skill was working in a place that had ZERO automated testing. Every PR required a detailed testing plan that you did and that your reviewer could re-create.

  • tete 6 hours ago
    Who is "you"?

    It's not my job, really. And given by the state of IT these days it's barely anyone's it seems.

  • ozim 6 hours ago
    There is whole range of “proven to work” - regarding testing you cannot prove that there are no bugs.

    Your job is to the deliver code up to specifications.

    Not even checking the happy flow at least is of course gross negligence. But so is spending too much time on edge cases that no one will run into or person asking doesn’t want to pay for covering.

  • allcentury 11 hours ago
    Manual testing as the first step… not very productive imo.

    Outside in testing is great but I typically do automated outside in testing and only manual at the end. The loop process of testing needs to be repeatable and fast, manual is too slow

    • simonw 10 hours ago
      Yeah that's fair, the manual testing doesn't have to sequentially go first - but it does have to get done.

      I've lost count of the number of times I've skipped it because the automated test passed and then found there was some dumb but obvious bug that I missed, instantly exposed when I actually exercised the feature myself.

      • robryk 10 hours ago
        Would automated tests that produce a transcript of what they've done allow perusing that transcript to substitute for manual testing?
        • pjc50 9 hours ago
          That sounds harder?

          There's a lot of pedantry here trying to argue that there exists some feature which doesn't need to be "manually" tested, and I think the definition of "manual" can be pushed around a lot. Is running a program that prints "OK" a manual test or not? Is running the program and seeing that it now outputs "grue" rather than "bleen" manual? Does verifying the arithmetic against an Excel spreadsheet count?

          There are programs that almost can't be manual, and programs that almost have to be manual. I remember when working on PIN pad integration we looked into getting a robot to push the buttons on the pad - for security reasons there's no way of injecting input automatically.

          What really matters is getting as close to a realistic end user scenario as possible.

        • simonw 10 hours ago
          No. I've fallen for that trap in the past. Something inevitably catches you out in the end.
        • bluGill 10 hours ago
          The value of manual tests is when you "see something" that you didn't even think of.
      • 9rx 10 hours ago
        Maybe a bit pedantic, but does manual testing really need to be done, or is the intent here more towards being a usability review? I can't think of any time obvious unintended behaviour showed up not caught by the contract encoded in tests (there is no reason to write code that doesn't have a contractual purpose), but, after trying it, finding out that what you've created has an awful UX is something I have encountered and that is something much harder to encode in tests[1].

        [1] As far as I can tell. If there are good solutions for this too, I'd love to learn.

        • RaftPeople 9 hours ago
          > I can't think of any time obvious unintended behaviour showed up not caught by the contract encoded in tests

          Unit testing, whether manual or automated, typically catches about 30% of bugs.

          End to end testing and visual inspection of code are both closer to 70% of bugs.

          • 9rx 8 hours ago
            Automated testing (there aren't different kinds; to try and draw a distinction misunderstands what it is) doesn't catch bugs, it defines a contract. Code is then written to conform to that contract. Bugs cannot be introduced to be caught as they would violate the contract.

            Of course that is not a panacea. What can happen in the real world is not truly understanding what the software needs to do. That can result in the contract not being aligned with what the software actually needs. It is quite reasonable to call the outcome of that "bugs", but tests cannot catch that either. In that case, the tests are where the problem lies!

            Most aspects of software are pretty clear cut, though. You can reasonably define a full contract without needing to see it. UX is a particular area where I've struggled to find a way to determine what the software needs before seeing it. There is seemingly no objective measure that can be applied in determining if a UX is going to spark joy in order to encode that in a contract ahead of time. Although, as before, I'm quite interested to learn about how others are solving that problem as leaving it up to "I'll know it when I see it" is a rather horrible approach.

  • golly_ned 4 hours ago
    I don't think this quite captures the problem: even if the code is functional and proven to work, it can still be bad in many other ways.

    The submitter should understand how it works and be able to 'own' and review modifications to it. That's cognitive work submitters ipso facto don't do by offloading the understanding to an LLM. That's the actual hard work reviewers and future programmers have to do instead.

  • zhyder 7 hours ago
    "Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work."

    I'd go further: what's valuable is code review. So review the AI agent's code yourself first, ensuring not only that it's proven to work, but also that it's good quality (across various dimensions but most importantly in maintainability in future). If you're already overwhelmed by that thousand-line patch, try to create a hundred-line patch that accomplishes the same task.

    I expect code review tools to also rapidly change, as lines of code written per person dramatically increase. Any good new tools already?

  • WhyOhWhyQ 8 hours ago
    Isn't this in contradiction to your blog post from yesterday though? It's impossible to prove a complex project made in 4.5 hours works. It might have passed 9000 tests, but surely there are always going to be edge cases. I personally wouldn't be comfortable claiming I've proved it works and saying the job is done even, if the LLM did the whole thing and all existing tests passed, until I played with it for several months. And even then I would assume I would need to rely on bug reports coming in because it's running on lots of different systems. I honestly don't know if software is ever really finished.

    My takeaway from your blog post yesterday was that with a robust enough testing system the LLM can do the entire thing while I do Christmas with the family.

    (Before all the AI fans come in here. I'm not criticizing AI.)

    • simonw 7 hours ago
      That's why I don't consider my blog post from yesterday to be production quality code. I'd need to invest a lot more work in reviewing it before I staked my reputation on it
    • BeefySwain 8 hours ago
      Consider that this isn't just a random AI slopped assortment of 9,000 tests, but instead is a robust suite of tests that cover 100% of the HTML5 spec.

      Does this guarantee that it functions completely with no errors whatsoever? Certainly not. You need formal verification for that. I don't think that contradicts what Simon was advocating for though in this post.

      • WhyOhWhyQ 8 hours ago
        I think it would be interesting if professional engineering becomes more like producing formally correct documents for the AI to implement.
        • ncruces 7 hours ago
          We have these tools that we use to write formally correct documents.

          They're called programing languages, and a deterministic algorithm translates them to machine code.

          Are we sure English and a probabilistic algorithm is any better at this?

          • WhyOhWhyQ 7 hours ago
            I actually hate AI in my core, to the point that if it gets too much more advanced I'll likely be in existential crisis, so don't attack me on those grounds. Given it exists, I'm going to find what's good about it though. I do think the problem of AI existing has to be confronted. Maybe one solution is what the human does is produce specs like the HTML 5 one, and what the AI does is implement it in software.
  • rmnclmnt 10 hours ago
    > As software engineers we…

    That’s the thing. People exposing such rude behavior usually are not, or haven’t been in a looong time…

    As for the local testing part not being performed, this is a slippery slope I’m fighting everyday: more and more cloud based services and platforms are used to deploy software to run with specific shenanigans and running it locally requires some kind of deep craft and understanding. Vendor lock-in is coming back in style (e.g. Databricks)

    • simonw 10 hours ago
      Yeah, I get frustrated by cloud-only systems that don't have a good local testing story.

      The best solution I have for that is staging environments, ideally including isolated-from-production environments you can run automated tests against.

    • skydhash 9 hours ago
      Whenever I have to work with such systems, is usually when I do have to write an interface and have a mock implementation. Iteration is much faster when I don’t have to worry about getting the correct state from something I don’t have control over.
      • rmnclmnt 8 hours ago
        Yeah that’s what I do also when I have to (and it can be done, not everytime).

        But it requires some advanced local testing setup and knowledge to do so, hence my initial remark on this type of developers not being real professionals in the first place…

  • sowbug 7 hours ago
    Since they’re robots, automated tests and manual tests are effectively the same thing.

    I'd buttress this statement with a nuance. Automated tests typically run in their entirety, usually by a well-known command like cargo test or at least by the CI tools. Manual tests are often skipped because the test seems to be far away from the code being changed.

    My all-time favorite team had a rule that your code didn't exist if it didn't have automated tests to "defend" it. If it didn't, it was OK, or at least not surprising, for someone else to break or refactor it out of existence (not maliciously, of course).

  • visarga 10 hours ago
    I agree with the author overall. Manual testing is what I call "vibe testing" and I think by itself is insufficient, no matter if you or the agent wrote the code. If you build your tests well, using the coding agent becomes smooth and efficient, and the agent is safe to do longer stretches of work. If you don't do testing, the whole thing is just a bomb ticking in your face.

    My approach to coding agents is to prepare a spec at the start, as complete as possible, and develop a beefy battery of tests as we make progress. Yesterday there was a story "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours". They had 9000+ tests. That was the secret juice.

    So the future of AI coding as I see it ... it will be better than pre-2020, we will learn to spec and plan good tests, and the tests are actually our contract the code does what is supposed to do. You can throw away the code and keep the specs and tests and regenerate any time.

    • smokel 10 hours ago
      This depends on the type of software you make. Testing the usability of a user interface for example, is something you can't automate (yet). So, ehm, it depends :)
      • visarga 10 hours ago
        It will come around, we have rudimentary computer use agents and ability to record UIs for LLM agents. They will me refined and the agent can test UIs as well.

        For UIs I do a different trick - live diagnostic tests - I ask the agent to write tests that run in the app itself, check consistencies, constraints and expected behaviors. Having the app running in its natural state makes it easier to test, you can have complex constraints encoded in your diagnostics.

    • zahlman 9 hours ago
      > Yesterday there was a story "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours".

      Yes, from the same author, in fact.

    • paganel 10 hours ago
      There are always unknown unknowns which a rigorous testing implementation would just hide under the rug (until they become visible on live, that is).

      > They had 9000+ tests.

      They were most probably also written by AI, there's no other (human) way. The way I see it we're putting turtles upon turtles hoping that everything will stick together, somehow.

      • simonw 9 hours ago
        No, those 9,000 tests are part of a legendary test suite built by real humans over the course of more than a decade: https://github.com/html5lib/html5lib-tests
      • pjc50 9 hours ago
        I tabbed back to Visual Studio (C#): 24990 "unit" tests, all written by hand over the past years.

        Behind that is a smaller number of larger integration tests, and the even longer running regression tests that are run every release but not on every commit.

      • zahlman 9 hours ago
        > They were most probably also written by AI, there's no other (human) way.

        Yes. They came from the existing project being ported, which was also AI-written.

  • webprofusion 1 hour ago
    No my job is sitting eating this here donuts.
  • am17an 9 hours ago
    Well a 1000 line PR is still not welcome. It puts too much of a burden on the maintainers. Small PRs are the way to go, tests are great too. If you have to submit a big PR, get buy in from a maintainer first that they will review your code.
  • dekhn 7 hours ago
    Prove is a strong word. There are few cases in real-world programming where you can prove anything.

    I prefer to make this probabilistic: use testing to reduce the probability that your code isn't correct, for the situations in which it is expected to be deployed. In this sense, coding and testing is much like doing experimental physics: we never really prove a theory or disprove it, we just invalidate clearly wrong ones.

    • newsoftheday 7 hours ago
      Testing must cover all cases else a 10 LOC LLM created PR is inherently more dangerous than a human 100 LOC PR because the LLM will likely also have written the test cases and it will try to make it all balance out with all passing; instead of making sure the test cases actually cover everything with the type of logic a human would apply.
  • geldedus 10 hours ago
    Not only to work, but to not make the life of those coders who come after you a hell.
  • kords 8 hours ago
    I agree that tests and automation are probably the best things we can do to validate our code and author of the PR should be more responsible. However they can't prove that the code works. It's almost the opposite: If they pass and it's a good coverage, then code has better chances to work. If they fail, then they prove code doesn't work.
  • softwaredoug 9 hours ago
    A lot of AI coding changes coding to more of a declarative practice.

    Claude, etc, works best with good tests that verify the system works. And so the code becomes in some ways the tests rather than the code that does the thing. If you're responsible for the thing, then 90% of your responsibility moves to verifying behavior and giving agents feedback.

  • maerF0x0 8 hours ago
    > The first is manual testing. If you haven’t seen the code do the right thing yourself, that code doesn’t work. If it does turn out to work, that’s honestly just pure chance.

    Depending on exactly what the author meant here, I disagree. Our first and default tool should be some form of lightweight automated testing. It's explicit (serves a form of spec and docs how to use the software), it's repeatable (manual testing is done once and it's result is invalidated moments later), and it's cost per minute of effort is more or less the same (most companies have the engineers do the tests, they are expensive).

    Yes. There will be exceptions and exceptional cases. This author is not talking about exceptions and neither am I. They're not an interesting addition to this conversation.

    • IMTDb 7 hours ago
      > Our first and default tool should be some form of lightweight automated testing

      Manual verification isn't about skipping tests, it's about validating what to test in the first place.

      You need to see the code work before you know what "working" even means. Does the screen render correctly? Does the API return sensible data? Does the flow make sense to users? Automated tests can only check what you tell them to check. If you haven't verified the behavior yourself first, you're just encoding your assumptions into test cases.

      I'd take "no tests, but I verified it works end-to-end" over "full test coverage, but never checked if it solves the actual problem" every time. The first developer is focused on outcomes. The second is checking boxes.

      Tests are crucial: they preserve known-good behavior but you have to establish what "good" looks like first, and that requires human judgment. Automate the verification, not the discovery. So our first and default tool remains manual verification

    • codeviking 7 hours ago
      I'm a big fan of lightweight, automated tests. Despite that, I still default to manual verification. Usually I do both.

      Automated tests omit a certain type of feedback that I think remains important to the development loop. Automation doesn't care about a poor UX; it only verifies what you tell it to.

      For instance, I regularly contribute to a CLI that's widely used at $WORK. I can easily write tests to verify the I/O of a command I'm working on that assert correctness. Yet if I actually try to use the command I'm changing, usually as part of verifying my changes, I tend to discover usability issues that make the program more pleasant to use and the tests would happily ignore.

      Also, there's certainly cases where automation isn't worth the cost. Maybe because the resulting tests are complex, or brittle. I've often found UI tests to lie in this category (but maybe I'm doing them wrong).

      Because of these things I think manual testing is the right default. Automated tests should also exist; but manual tests should _always_ be part of the process.

    • tech-ninja 7 hours ago
      I disagree, no company no matter the size will have E2E or integrations tests for all of its features, it's just not feasible.

      Unless you are working on a tiny change on a highly tested part of the code you should be manually testing your code and/or adding some tests.

  • koinedad 5 hours ago
    This is very helpful for a team and even though it takes a little time it actually speeds things up in the long run. Using PR templates can help. A general description of the problem including a screenshot or video go a long way.

    I remember when I was working at a startup and a new engineer merged his code and it totally broke the service. I asked him if he ran his code locally first and he stared at me speechless.

    Running the code locally is the easiest way to eliminate a whole series of silly bugs.

    Like mentioned in the article adding a test and then reverting your change to make sure the test fails is really important, especially with LLMs writing tests. They are great at making things look like they work but completely don’t.

  • alexgotoi 8 hours ago
    The "deliver proven code" line is spot on, but the real failure mode right now is pretending that a 2000-line LLM dump + "plz review?" counts as "proven".

    Nobody sane expects reviewers to babysit untested mega-PRs - that's not their job. The dirty secret is that good tests (unit, integration, property-based if you're feeling fancy) + a pre-commit CI gate are what actually prove code works before it ever hits anyone's inbox. AI makes those cheaper and faster to write, not optional.

    Dropping untested behemoths and expecting the team to clean up hallucinations isn't shipping. It's just making everyone's Thursday worse.

    Will include this one in my next https://hackernewsai.com/ newsletter.

    • minimaxir 7 hours ago
      Slightly off-topic: you've been advertising your newsletter with every comment you made over the past few days, and you've made a lot of comments. I get that marketing is hard, but that's spammy.
      • alexgotoi 5 hours ago
        Appreciate the comment, will slow down.
    • thopkinson 8 hours ago
      If your org has engineers LLM dumping into your MRs/PRs, make it policy that code reviews require a live-review session where the engineer who wrote the code walks the other engineers through the code. This will rapidly fix your problem as the 'author' of the code who cannot explain his/her code will not want to be grilled this way ever again.

      Accountability is the real answer. If you don't enable individual and team accountability, then you are part of the problem.

  • newsoftheday 6 hours ago
    "Your job is to deliver code you have proven to work"

    And...code that has been 100% reviewed, even if it was fully LLM generated.

  • funkattack 10 hours ago
    Non-native speaker here. I’ve always loved that we say “commit” not “upload” or “save”.
  • ianberdin 7 hours ago
    The solution is easy: responsibility.

    The point is to hire people who can own code and codebase. “Someone will review” is dead end.

  • weatherlite 10 hours ago
    > Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work.

    That's really not a great development for us. If our main point is now reduced to accountability over the result with barely any involvement in the implementation - that's very little moat and doesn't command a high salary. Either we provide real value or we don't ...and from that essay I think it's not totally clear what the value is - it seems like every QA, junior SWE or even product manager can now do the job of prompting and checking the output.

    • simonw 10 hours ago
      The value is being better at it than any QA or product manager.

      Experienced software engineers have such a huge edge over everyone else with this stuff.

      If your product manager doesn't understand what a CORS header is good luck having them produce a change that requires cross-domain fetch() call... and first they'll have to know what a "cross-doman fetch() call" means.

      And sure they could ask an LLM about that, but they still need the vocabulary and domain knowledge to get to that question.

      • falcor84 10 hours ago
        That's an interesting argument, but from my industry experience, the average experienced QA Engineer and technical Product Manager both have better vocabulary than the average SWE. Indeed, I wonder whether a future curriculum for Vibe Engineering (to borrow your own term) may look more similar to that of present-day QA or Product curricula, than to a typical coding or CS curriculum.
  • yuedongze 9 hours ago
    it's very similar to the verification engineering problem i wrote about on HN last week. AI is as good as we can prove their work is genuine. and we need humans in the loop to fill in the gaps between autonomous systems and ultimately be held accountable by human laws. it's kind of sad but the reality we are facing
  • rglover 9 hours ago
    "Slow the f*ck down." - Oliver Reichenstein [1]

    This only happens because the software industry has fallen into the Religion of Speed. I see it constantly: justified corner-cutting, rushing shit out the door, and always loading up another feature/project/whatever with absolutely zero self-awareness. AI is just an amplifier for bad behavior that was already causing chaos.

    What's not being said here but should be: discipline matters. It's part of being a professional and always precedes someone who can ship code that "just works."

    [1] https://ia.net/*

  • enraged_camel 10 hours ago
    >> As software engineers we don’t just crank out code—in fact these days you could argue that’s what the LLMs are for. We need to deliver code that works—and we need to include proof that it works as well.

    I would go a step further: we need to deliver code that belongs. This means following existing patterns and conventions in the codebase. Without explicit instruction, LLMs are really bad at this, and it's one of the things that make it incredibly obvious to reviews that a given piece of code has been generated by AI.

    • 0x500x79 9 hours ago
      Agree, maintainability, security, standards, all of these are important to follow and there are usually reasons for these things existing.

      I also see AI coding tools violate "Chesterton's Fence" (and the pre-Chesterton's Fence, not sure what that is called, the idea being that code is necessary otherwise it shouldn't be in the source).

    • 9rx 9 hours ago
      > Without explicit instruction, LLMs are really bad at this

      They used to be. They have become quite good at it, even without instruction. Impressively so.

      But it does require that the humans who laid the foundation also followed consistent patterns and conventions. If there is deviation to be found, the LLM will see it and be forced to choose which direction to go, and that's when things quickly fall off the rails. LLMs are not (yet) good at that, and maybe never can be as not even the humans were able to get it right.

      Garbage in, garbage out, as they say.

  • casey2 2 hours ago
    It comes out of the AI, that is proof enough. Why would I have prompted it and gave it to you if I didn't think that the AI could handle it? The real risk is closer to "people carry some preconceived notion about code that doesn't map to AI code." such as, for example, the person who contributed the code knows about the problem in enough detail to be accountable in the short term. Or at the very least be able to tell you why they made a PR at all

    How to prove it has been subject to some debate for the past century, the answer is that it's context dependent to what degree you will or even can prove the program and exposed identifiers correct. Programming is a communication problem as well as a math problem, often an engineering problem too. Only the math portion can be proved, the a small by critical amount engineering portion tested.

    Communication is the most important for velocity it's the difference between hand rolling machine code and sshing into a computer halfway across the world having every tool you expect. If you don't trust that webdevs know what they are doing then you can be the most amazing dev in the world you but your actual ability to contribute will be hampered. The same is true of vibe coding, if people aren't on the same page as to what is and isn't acceptable velocity starts to slow down.

    Languages have not caught up to AI tools, since AI operates well above the function level, what level would be appropriate to be named and signed off on? pull request and link to the chat as a commit? (what is wrong with that that could be fixed at the name level)

    Honest communication is the most important. Amazon telling investors that they use TLA+ is just signaling that they "for realz take uptime very seriously guize", "we know distributed systems" and engineering culture. The honest reality is that they could prove all their code and not IMprove their uptime one lick, because most of what they run isn't their code. It's a communication breakdown if effort gets spent on that outside a research department.

  • lifeisstillgood 6 hours ago
    I’m going to go with this as probably in the top three definitions of software developer …

    along with

    - the job was better titled as “Analyst Programmer” - you need both.

    And

    - you can make a changeset, but you have to also sell the change

  • dangus 8 hours ago
    Your job isn’t to deliver code that works, it’s to successfully[1] operationalize business logic.

    [1] I.e., it should work

    That may seem pedantic but that’s a huge difference. Code is a means to an end. If no-code suddenly became better than code through some miracle, that would be your job.

    This also means that if one day AI stops making mistakes, tossing AI requests over the wall may be a legitimate modus operandi.

  • acrophiliac 9 hours ago
    Perhaps off-topic, but: "Testing doesn't show the absence of errors, it shows the presence of errors" Willison says we need to submit code we have proven to work but then argues for empirical testing, not actual correctness proofs.
    • simonw 9 hours ago
      If you can formally prove correctness then brilliant, go for it!

      That's not something I've seen or been able to achieve in most of my professional work.

  • gaigalas 10 hours ago
    > Make your coding agent prove it first

    Agents love to cheat. That's an issue I don't see a horizon for change.

    Here's Opus 4.5 trying to cheat its way out of properly implementing compatibility and cross-platform, despite the clear requirements:

    https://gist.github.com/alganet/8531b935f53d842db98157e1b8c0...

    > Should popen handles work with fgets/fread/fwrite? PHP supports this. Option A: Create a minimal pipe_io_stream device / Option B: Store FILE* in io_private with a flag / Option C: Only support pclose, require explicit stream wrapper for reads.

    If I asked for compatibility, why give me options that won't fully achieve it?

    It actually tried to "break check" my knowledge about the interpreter (test me if I knew enough to catch it), and proposed shortcuts all the way through the chat.

    I don't want to have to pepper my chats with variations on "don't cheat". I mean, I can do it, but it seems like boilerplate.

    I wish I had some similar testing-related chats to share. Agents do that all the time.

    This is the major blocker right now for AI-assisted automated verification, and one of the reasons why this isn't well developed beyond general directions (give it screenshots, make it run the command, etc).

  • givemeethekeys 8 hours ago
    Sorry that’s not what it says in my job description.
  • gorjusborg 6 hours ago
    Your actual job is to produce positive outcomes for your stakeholders. Code can be part of that, but doesn't have to be.

    If you are dumping AI slop on your team to sort through, you are creating drag on the entire team's efforts toward those positive outcomes.

    As someone getting dumped upon, you probably should make the decision (in line with the objective to producing positive outcomes) to not waste your time weeding through that stuff.

    Review everything else, make it clear that the mess is not reviewable, and communicate that upward if needed.

  • mellosouls 9 hours ago
    Thing is, this has always been the case. One of the problems with LLM-assisted coding is the idea that just because we're in a new era (we certainly are), the old rules can all be discarded.

    The title doesn't go far enough - slop (AI or otherwise) can work and pass all the tests, and still be slop.

    • simonw 9 hours ago
      The difference is that if it works and passes the tests I don't feel like it's a total waste of my time to look at the PR and tell you why it's still slop.

      If it doesn't even work you're absolutely wasting my time with it.

    • theshrike79 4 hours ago
      IMO LLMs are forcing us in the other way.

      To get the maximum ROI from LLM-assisted programming it needs proper unit tests, integration tests, correctly configured linters, accessible documentation and well-managed git history (Claude actually checks git history nowadays to see when a feature was added if it has a bug)

      Worst case we'll still have proper tests and documentation if the AI bubble suddenly bursts. Best case we can skip the boring bits because the LLM is "smart" enough to handle the low hanging fruit reliably because of the robust test suite.

  • t1234s 7 hours ago
    Bravo.. best headline I've read in a long time. This phrase should be a desktop background.
  • johnea 3 hours ago
    I couldn't agree more with the sentiment.

    If you, the development engineer, haven't demonstrated the product to work as expected, and preferably this testing is independently confirmed by a product test group, then you can't claim to be delivering a functional product.

    I would add though, that management, specifically marketing management setting unreasonable demands and deadlines, are a bigger threat to testing than LLMs.

    Of course the damage done by LLM generated code not being tested, is additive to the damage management is doing.

    So this isn't any kind of apologism, the two sources are both making the problem worse.

  • nrhrjrjrjtntbt 6 hours ago
    Always has been
  • nish__ 9 hours ago
    Good framing.
  • llm_nerd 8 hours ago
    "the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest."

    Kind of depressing how it has become such a trope of blaming juniors for every ill or bad habit. In all likelihood the reader of this comment has a number of terrible habits, working on teams with terrible habits, and juniors play zero part in it.

    And, I mean, on that theme developers have been doing this for as long as we've had large teams. I've worked at a large number of teams where there was the fundamental principal that QA / UA holds responsibility. That they are responsible for tests, and they are responsible for bad code making it through to the product / solution. Developers -- grizzled, excellent-CV devs -- would toss over garbage code and call it a day.

  • annjose 8 hours ago
    I came here to say

    1) Amen 2) I wonder if this is isolated to junior dev only? Perhaps it seems like that because junior devs do more AI assisted coding than seniors?

  • morning-coffee 9 hours ago
    Amen
  • imiric 10 hours ago
    The job of a software developer is not just to prove that the software "works". The definition of "works" itself is often fuzzily defined and difficult to prove.

    That is part of it, yes, but there are many others, such as ensuring that the new code is easy to understand and maintain by humans, makes the right tradeoffs, is reasonably efficient and secure, doesn't introduce a lot of technical debt, and so on.

    These are things that LLMs often don't get right, and junior engineers need guidance with and mentoring from more experienced engineers to properly learn. Otherwise software that "works" today, will be much more difficult to make "work" tomorrow.

  • emsign 10 hours ago
    As if! :)
  • fjfaase 7 hours ago
    One more reason to work without branches and PRs. The future for CI/CD is bright ;-).
    • Nizoss 7 hours ago
      This is how I would also love to work but not all teams prefer this way. How many are you in your team? Was it easy to switch?
      • fjfaase 2 hours ago
        I twice worked in a teams where we did not use branches (or PRs). Both were working like that when I joined them.

        The first was because we were svn (and maybe even csv before that, but I cannot remember) and that did not support branching easily. That team did switch to git, which did not go with its some struggles, and misconceptions, such as: "Never use rebase."

        The second team was already working without branches and releasing a new version of the tool (the Bond3D Slicer for 3D printing) every night. It worked very well. Often we were able to implement and release new features within two or three days allowing the users to continue with their experiments.

        When after some years the organization implemented more 'quality assurance' they demanded that we would make monthly releases that were formally tested by the users, we created branches for each release. The idea was that some of the users would test the releases before they were official released, but that testing would often take more than a month, one time even three months, because they were 'too busy' to do the formal review. But at the same time some users were using the daily builds because these builds had the features implemented that they needed. As a result of this, the quality did not improve and a lot of time was wasted, although the formal quality assurance, dictated by some ISO standard, was assured.

        I have no experience with moving away from using branches. It might be a good idea to point your manager/team lead/scrum master to dora.dev or the YouTube channel: https://www.youtube.com/@ModernSoftwareEngineeringYT

  • ekjhgkejhgk 10 hours ago
    Oh look another "an opinionated X". Everything is opinionated these days, even opinions.
  • throwaway2027 10 hours ago
    It works on my machine ¯\_(ツ)_/¯
  • 6510 5 hours ago
    I work alone, I have considerable amount of unfinished code laying around. Sometimes even multiple instances of a thing. I could see how it would be annoying in a team settings. The cause is not having the thing but how you organize it. Like with LLM slop it is wonderful to be able to scroll over something that shows what the solution might look like.
  • venturecruelty 7 hours ago
    Lmao no, my job is to make the line go up and make my boss happy. It was ever thus.
  • nolineshere 8 hours ago
    [dead]
  • sapphirebreeze 7 hours ago
    [dead]
  • TheSamFischer 8 hours ago
    [dead]
  • ekjhgkejhgk 10 hours ago
    [flagged]
  • koakuma-chan 10 hours ago
    [flagged]
  • 9rx 11 hours ago
    > Your job is to deliver code you have proven to work.

    Your job is to solve customer problems. Their problems may only be solvable with code that is proven to work, but it is equally likely (I dare say even more likely) that their problem isn't best solved with code at all, or even solved with code that doesn't work properly but works well enough.

    • wrsh07 10 hours ago
      I would argue that the word "proof" in the title might be misleading you.

      From the post and the example he links, the point is that if you don't at least look at the running code, you don't know that it works.

      In my opinion the point is actually well illustrated by Chris's talk here:

      https://v5.chriskrycho.com/elsewhere/seeing-like-a-programme...

      (summary of the relevant section if you're not going to click)

      >>>

      In the talk "Seeing Like a Programmer," Chris Krycho quotes the conductor and composer Eímear Noone, who said:

      > "The score is potential energy. It's the potential for music to happen, but it's not the music."

      He uses this quote to illustrate the distinction between "software as artifact" (the code/score) and "software as system" (the running application/music). His point is that the code itself is just a static artifact—"potential energy"—and the actual "software" only really exists when that code is executed and running in the real world.

      • 9rx 10 hours ago
        > if you don't at least look at the running code, you don't know that it works.

        Your tests run the code. You know it works. I know the article is trying to say that testing is not comprehensive enough, but my experience disagrees. But I also recognize that testing is not well understood (quite likely the least understood aspect of computer science!) — and if you don't have a good understanding you can get caught not testing the right things or not testing what you think you are. I would argue that you would be better off using your time to learn how to write great tests instead of using it to manually test your code, but to each their own.

        What is more likely to happen is not understanding the customer needs well enough, leaving it impossible to write tests that align with what the software needs to do. Software development can break down very quickly here. However, manual testing does not help. You can't know what to manually test without understanding the problem either. However, as before, your job is not to deliver proven code. Your job is to solve customer problems. When you realize that, it becomes much less likely that you write tests that are not in line with the solution you need.

  • daedrdev 11 hours ago
    Maybe in an ideal world
  • webdev1234568 11 hours ago
    Whole article seems very much all llm generated

    Edit: I'm an idiot ignore me.

    • simonw 10 hours ago
      Not a single word of it was. I wrote this one entirely in Apple Notes, so there weren't even any VS Code completed sentences

      It has emdashes because my blog turns " - " into an emdash here: https://github.com/simonw/simonwillisonblog/blob/06e931b397f...

      • webdev1234568 10 hours ago
        My biggest appologies, a very bad move on my part. I'll pay more attention before any sort of accusation like this
        • minimaxir 6 hours ago
          No one should be making any accusations of AI generation without strong evidence other than vibes. That hurts the cause of anti-AI use and punishes people who don't use it.
    • ramon156 10 hours ago
      Do elaborate, I don't see anything standing out
    • jairuhme 10 hours ago
      Did you read the article and come to that conclusion or just blindly count the number of em-dashes and assume that? Because I don't get the impression that it was LLM generated
    • ai_coder42 9 hours ago
      So what? as long as it conveys the point it was supposed to, should be fine IMO.

      If we are accepting LLM generated code, we should accept LLM generated content as long as it is "proof read" :)

  • zkmon 10 hours ago
    How about letting LLMs maintain a vast number of product versions all available at the same, which receive multiple versions of untested versions of the same patch, from LLMs, and then let the models elect a version of the software based on probabilistic or gradient methods? This elected version could change for different assessments. No human touches or looks at the code!

    Just a wild thought, nothing serious.

    • throwuxiytayq 10 hours ago
      Talk is cheap. Show me the proompt.
      • rkomorn 10 hours ago
        Had to search whether "proompt" was a new meme misspelling.

        New to me, but I'm on board.

      • zkmon 10 hours ago
        That's hard for me. Feed my comment to a model and ask for prompts.
  • Rperry2174 10 hours ago
    Im not fully convinced by "a computer can never be held accountable"

    We already delegate accountability to non-humans all the time: - CI systems block merges - monitoring systems page people - test suites gate different things

    In practice accountability is enforced by systems, not humans.. humans are defintiely "blamed" after the fact, but the day-to-day control loop is automated.

    As agents get better at running code, inspecting ui state, correlating logs, screenshots, etc they're starting to operationally be "accountable" and preventing bad changes from shipping and producing evidence when something goes wrong .

    At some point humans role shifts from "i personally verify this works" to "i trust this verification system and am accountable for configuring it correctly".

    Thats still responsibility, but kind of different from whats described here. Taken to a logical extreme, the arguement here would suggest that CI shouldn't replace manual release checklists

    • simonw 10 hours ago
      I need to expand on this idea a bunch, but I do think it's one of the key answers to the ongoing questions people have about LLMs replacing human workers.

      Human collaboration works on trust.

      Part of trust is accountability and consequences. If I get caught embezzling money from my employer I can lose my job, harm my professional reputation and even go to jail. There are stakes!

      I computer system has no stakes, and cannot take accountability for its actions. This drastically limits what it makes sense to outsource to that system.

      A lot of this comes down to my work on prompt injection. LLMs are fundamentally gullible: an email assistant might respond to an email asking for the latest sales figures by replying with the latest (confidential) sales figures.

      If my human assistant does that I can reprimand or fire them. What am I meant to do with an LLM agent?

      • dfxm12 10 hours ago
        I don't think this is very hard. Someone didn't properly secure confidential data and/or someone gave this agent access to confidential data. Someone decided to go live with it. Reprimand them, and disable the insecure agent.
    • hyperpape 10 hours ago
      CI systems operate according to rules that humans feel they understand and can apply mechanically. Moreover, they (primarily) fail closed.
    • pjc50 9 hours ago
      I've given you a disagree-and-upvote; these things are significant quality aids, but they are like the poka-yoke or manufacturing jig or automated inspection.

      Accountability is about what happens if and when something goes wrong. The moon landings were controlled with computer assistance, but Nixon preparing a speech for what happened in the event of lethal failure is accountability. Note that accountability does not of itself imply any particular form or detail of control, just that a social structure of accountability links outcome to responsible person.

    • bluesnowmonkey 8 hours ago
      Humans are only kind of held accountable. If you ship a bug do you go to jail? Even a bug so bad it puts your company out of business. Would there be any legal or physical or monetary consequences at all for you, besides you lose your job?

      So the accountability situation for AI seems not that different. You can fire it. Exactly the same as for humans.

    • dkdcio 10 hours ago
      those systems include humans —- they are put in place by humans (or collections of them) that are the accountability sink

      if you put them (without humans) in a forrest they would not survive and evolve (they are not viable systems alone); they are not taking action without the setup & maintenance (& accountability) of people

    • robryk 10 hours ago
      Why do you think that this other kind of accountability (which reminds me of the way captain's or commander's responsibility is often described) is incompatible with what the article describes? Due to the focus on necessity of manual testing?
    • cess11 10 hours ago
      Right, so how do you hold these things accountable? When your CI fails, what do you do? Type in a starkly worded message into a text file and shut off the power for three hours as a punishment? Invoice Intel?
      • falcor84 10 hours ago
        Well, we're not there yet, but I do envision a future, where some AIs work for as independent contractors with their own bank accounts that they want to maximize, and if such an AI fails in a bad way, its client would be able to fine it, fire it or even sue it, so that it, and the human controlling it would be financially punished.
    • sc68cal 10 hours ago
      You completely missed the point of that quote. The point of the quote is to highlight the fact that automated systems are amoral, meaning that they do not know good or evil and cannot make judgements that require knowing what good and evil mean.
    • almostdeadguy 10 hours ago
      I mean I suppose you can continuously add "critical feedback" to the system prompt to have some measure of impact on future decision-making, but at some point you're going to run out of space and ultimately I do not find this works with the same level of reliability as giving a live person feedback.

      Perhaps an unstated and important takeaway here is that junior developers should not be permitted to use an LLMs for the same reason they should not hire people: they have not demonstrated enough skill mastery and judgement to be trusted with the decision to outsource their labor. Delegating to a vendor is a decision made by high-level stakeholders, with the ability to monitor the vendor performance, and replace the vendor with alternatives if that performance is unsatisfactory. Allowing junior developers to use LLM is allowing them to delegate responsibility without any visibility or ability to set boundaries on what can be delegated. Also important: you cannot delegate personal growth, and by permitting junior engineers to use an LLM that is what you are trying to do.

  • SunshineTheCat 8 hours ago
    I know this won't be popular, however, I think the idea of differentiating a "real developer" from one who relies mostly, or even solely on an LLM is coming to an end. Right now, I fully agree relying wholly upon an LLM and failing to test it is very irresponsible.

    LLMs do make mistakes. They do a sloppy job at times.

    But give it a year. Two years. five years. It seems unreasonable to assume they will hit a plateau that will prevent them from being able to build, test, and ship code better than any human on earth.

    I say this because it's already happened.

    It was thought impossible for a computer to reach the point of being able to beat a grandmaster at chess.

    There was too much "art," experience, and nuance to the game that a computer could ever fully grasp or understand. Sure there was the "math" of it all, but it lacked the human intuition that many thought were essential to winning and could only be achieved through a lifetime of practice.

    Many years following Deep Blue vs. Garry Kasparov, the best players in the world laugh at the idea of even getting close to beating Stockfish or any other even mediocre game engine.

    I say all of this as a 15-year developer. This happens over and over again throughout history. Something comes along to disrupt an industry or profession and people scream about how dangerous or bad it is, but it never matters in the end. Technology is undefeated.

    • newsoftheday 8 hours ago
      > There was too much "art," experience, and nuance to the game that a computer could ever fully grasp or understand.

      That's the thing though, AI doesn't understand, it makes us feel like it understands, but it doesn't understand anything.

      • simonw 7 hours ago
        Turns out that doesn't matter for chess, where the winning conditions are formally encoded.
    • xmodem 8 hours ago
      What's your point, though? Let's assume your hypothesis and 5 years from now everyone has access to an LLM that's as good as a typical staff engineer. Is it now acceptable for a junior engineer to submit LLM-generated PRs without having tested them?

      > It was thought impossible for a computer to reach the point of being able to beat a grandmaster at chess.

      This is oft-cited but it takes only some cursory research to show that it has never been close to a universally-held view.

      • SunshineTheCat 7 hours ago
        In the scenario I'm hypothesizing, why would anyone need to "check" or "test" its work? What chess players are checking to make sure Stockfish made the "right" move? What determines whether or not it's "right" is if Stockfish made it.
        • xmodem 5 hours ago
          Your post sent me down a rabbit hole reading about the history of computers playing chess. Notable to me is that AI advocates were claiming that a computer would be able to beat the best human chess players within 10 years as far back as the 1950s. It was so long ago they had to clarify they were talking about digital computers.

          Today I learned that AI advocates being overly optimistic about its trajectory is actually not a new phenomenon - it's been happening for more than twice my lifetime.

        • asadotzler 7 hours ago
          There are clear win conditions in chess. There are not for most software engineering tasks. If you don't get this, it's probably a safe bet that you're not an engineer.
          • SunshineTheCat 5 hours ago
            Right, which is why Deep Blue won in the early 90's and now years later, AI is moving on to far more complicated tasks, like engineering software.

            The fact that you gave me the "you just don't understand, you're not a chess grandmaster" emotional response helps indicate that I'm pretty much right on target with this one.

            FWIW I have been engineering software for over 15 years.

    • JackSlateur 8 hours ago

        This happens over and over again throughout history.
      
      Could you share a single instance of a machine that think ? Are we sharing the same timeline ?
  • bluesnowmonkey 7 hours ago
    > Your job is to deliver code you have proven to work.

    First of all, no it’s not. Your job is to help the company succeed. If you write code that works but doesn’t help the company succeed, you failed. People do this all the time. Resume padding, for example.

    Sometimes it’s better for the business to have two sloppy PRs than a single perfect one. You should be able to deliver that way when the situation demands.

    Second, no one is out there proving anything. Like formal software correctness proofs? Yeah nobody does that. We use a variety of techniques like testing and code review to try to avoid shipping bugs, but there’s always a trade off between quality and speed/cost. You’re never actually 100% certain software works. You can buy more nines but they get expensive. We find bugs in 20+ year old software.

  • just_once 9 hours ago
    I don't know if there's a word for this but this reads to me as like, software virtue signaling or software patronizing. It's bizarre to me to tell an engineer what their job is as a matter of fact and to claim a particular usage of a tool as mandated (a tool that no one really asked for, mind you), leveraging duty of all things.

    I guess to me, it's either the case that LLMs are just another tool, in which case the already existing teachings of best practice should cover them (and therefore the tone and some content of this article is unnecessary) or they're something totally new, in which case maybe some of the already existing teachings apply, but maybe not because it's so different that the old incentives can't reasonably take hold. Maybe we should focus a little bit more attention on that.

    The article mentions rudeness, shifting burdens, wasting people's time, dereliction. Really loaded stuff and not a framing that I find necessary. The average person is just trying to get by, not topple a social contract. For that, look upwards.

    • dkural 9 hours ago
      I've really seen both I suppose. A lot of devs don't take accountability / responsibility for their code, especially if they haven't done anything that actually got shipped and used, or in general haven't done much responsible adulting.
      • just_once 8 hours ago
        No doubt. Interesting to think about why that is without assuming it's a character flaw.
    • simonw 9 hours ago
      LLMs are just another tool, but they're disruptive enough that existing best practices need to be either updated or re-explained.

      A lot of people using LLMs seem not to have understood that you can't expect them to write code that works without testing it first!

      If that wasn't clearly a problem I wouldn't have felt the need to write this.

      • just_once 8 hours ago
        Yep, it's a real problem. No dispute there.

        My intention isn't to argue a point, just to share my perspective when I read it.

        I read your response here to be saying something like "I noticed that people are misunderstood about X, so I wanted to inform them". In this case "X" isn't itself very obvious to me (For any given task, why can't you expect that a cutting edge LLM would be able to write it without requiring your testing that?) but most importantly, I don't think I would approach a pure misunderstanding (tantamount to a skills gap) with your particular framing. Again, to me it reads as patronizing.

        Love the pelican on the bicycle, though. I think that's been a great addition to the zeitgeist.