He asked AI to count carbs 27000 times. It couldn't give the same answer twice

(diabettech.com)

121 points | by sarusso 59 minutes ago

45 comments

endymion-light 38 minutes ago
There's an incredibly serious lack of education with how LLMs & carb-counting works. This entire article would be better suited to astrology.com than hackernews.
When I opened it up, I assumed the author would have at least attempted a calculation service, maybe even placed something like the size of the meal into an actual model, using the integration of pre-existing tools that are (slightly more) accurate. Hell - most food literally is required to have calorie information, and you can query open source data for others!
But the author just took pictures of food & expected a realistic response? Is this genuinely what amounts to a study in AI?
This is akin to the instagram reels that talk to chatGPT and ask it to time how long they're run is. Except those are treated as funny jokes rather than being turned into studies.
I'd like to see this study done using any kind of actual grounding knowledge, seeing what mistakes AI makes when attempting to query ground truth from picture analysis - there would at least be an interesting result methodology in that.
[-]
- something765478 0 minutes ago
  > The prompt I used asks each model to return a confidence score (0 to 1) for every food item it identifies. All four models dutifully returned confidence scores for 100% of items. Surely we can use those to filter out bad estimates?
  This is a problem with the companies selling the AI models, not the customers. It is their responsibility to inform consumers about the limits of their services, and to train the models to say "I don't know, there is not enough information".
- furyofantares 27 minutes ago
  From the text of the article I believe the author is implying there are apps doing exactly this, and so this is why it was studied that way.
  Had the author written the article themselves rather than an LLM their motivation probably would have been clearer.
  [-]
  - Brendinooo 24 minutes ago
    > there are apps doing exactly this
    Yeah, for sure there are. And people will just ask ChatGPT as well.
    The funny thing is that for people who are just trying to lose weight without managing any health issues precisely, this type of extreme variance doesn't really matter, because the mere act of consciously quantifying food consumption is, based on my experience counting calories, the single biggest factor in success with weight loss.
    [-]
    - criley2 1 minute ago
      I actually think "just asking ChatGPT" is fine, because A) the data in these apps is suspect at best and B) the data behind calories is also pretty suspect (but we all play along because we can adjust other variables to make it all "work" well enough).
      Once or twice a year I spend a few weeks meticulously measuring ingredients/cooked foods and recording calories and on complex recipes apps are next to useless at getting accurate data. You're trying to input five or ten relevant ingredients, and then weighing your cooked outcome to try and divide the ingredients by proportion. Frankly it's a mess and most people aren't doing it for home cooked meals, and are getting very lossy outcomes (weighing cooked chicken and marking it as raw chicken, etc)
      With reasoning and tool calling (combined with me meticulously weighing before and after), it's producing fine data for my purposes.
- throwaw12 26 minutes ago
  I feel like you didn't understand the goal of this study
  > The DTN-UK stated earlier this year that generic LLMs must never be used as autonomous advisory calculators for insulin delivery. This data is the quantitative evidence base for that statement.
  This study is to prove that you should not rely on LLMs
  [-]
  - fabian2k 22 minutes ago
    The paper itself is a lot clearer about the purpose. The blog post reads very clickbaity and doesn't really explain the context well.
    [-]
    - Aurornis 4 minutes ago
      I disagree, it clearly explains that AI carb counting apps are a problem and shouldn’t be used.
      They’re writing in a neutral way that reaches their audience without lecturing or being condescending. They lead the reader to the conclusion rather than shoving it at them. I think that’s why it’s triggering so many angry comments on HN, but it’s effective for the audience they’re writing for (non technical people who may need convincing but don’t like being preached at)
  - snapcaster 22 minutes ago
    But it's stupid. If i smack myself in the head with a hammer is that proof hammers shouldn't be relied on?
    [-]
    - coldtea 2 minutes ago
      No, but it would be proof you didn't get the point of the paper.
    - fc417fc802 14 minutes ago
      If you smack yourself in the head with a hammer and it injures you that's evidence that smacking people in the head with hammers is bad and shouldn't be done, right?
    - jkestner 13 minutes ago
      Here we’re at the origin of the tool and get to watch how many people hit themselves in the head before we learn this collective wisdom.
      There’s a gap between what the tool will allow you to tell it to do, and what it’s good at. The feedback mechanism to tell the difference is deficient compared to a hammer.
    - jmye 15 minutes ago
      Are there start-ups led by idiots suggesting that smacking yourself in the head with a hammer will help treat your diabetes?
      If not, then perhaps there's a problem in your analogy.
- macleginn 1 minute ago
  It doesn't really matter if the model cannot make a good educated guess about calories in the food if it cannot give a consistent response given the same input.
- kalleboo 5 minutes ago
  > But the author just took pictures of food & expected a realistic response?
  There are very popular apps on the App Store right now that are going viral among non-techie people that do exactly this, and they have no idea how AI works. My wife was talking about one and I had to give her a reality check that the AI had no idea what ingredients were used to make the food. And she's a licensed nutritionalist.
  Studies like this create something to point at for people who are confused and serve as a springboard for a conversation in the media.
- coldtea 3 minutes ago
  >But the author just took pictures of food & expected a realistic response? Is this genuinely what amounts to a study in AI?
  If there are commercial services where you take pictures of food and are promised a realistic (paid for) response, then yes. And there are.
- swalsh 30 minutes ago
  It amazes me how much people try to build AI systems relying on nothing more than the models knowledge. I suspect a great deal of "failed" AI experiments we keep reading are people just not having any idea how to use AI at what its good at.
- giancarlostoro 17 minutes ago
  > But the author just took pictures of food & expected a realistic response? Is this genuinely what amounts to a study in AI?
  Reminds me of that one youtube video (I forget who it is so I have no idea how to pull it up) where he turns on the camera on his phone for ChatGPT and asks it what everything it sees weighs, then puts it on a scale, and ChatGPT was never right, ever, which makes sense, I couldnt tell you what most things weigh on sight alone either, but ChatGPT often got it dramatically off. I got the feeling he thought it was terrible AI for this, but I don't think a model looking at an image of something and trying to guess its weight / calories / etc... is a reason to call an AI model bad...
- nextlevelwizard 35 minutes ago
  As someone who used to do this. OpenAI models refuse to look up calories unless you explicitly tell them to and even then it is a hit and miss even if you tell them exactly what the product is. Easiest way to get good calculation is to just take a photo of the nutrition label or feed that info in by hand.
  Funny thing is 4o did look up calories but I guess it was too good for this world
  [-]
  - the_duke 30 minutes ago
    I exclusively use thinking mode, which is slower but much more likely to double-check things with web search etc.
    [-]
    - nextlevelwizard 28 minutes ago
      Maybe. I stopped using OpenAI a while ago. But taking pictures of the nutrition labels was good enough
- Aurornis 12 minutes ago
  > But the author just took pictures of food & expected a realistic response? Is this genuinely what amounts to a study in AI?
  The article explains this: There are apps targeting people with diabetes that claim to count your carbs with AI.
  > If you’re using AI carb counting in a diabetes app
  Before you dismiss a study, try to understand where it’s coming from.
  The authors of the study weren’t stupid. They knew the LLMs would provide poor results. They ran the study to quantify it and create a resource to spread the information in response to the rise of AI carb counting apps.
  [-]
  - endymion-light 6 minutes ago
    I don't believe the authors of this study are stupid.
    If there are apps targeting people with diabetes that claims to count your carbs with AI, why haven't those been analysed? That would be a far more effective claim.
    I based the study off of the clickbait article that they wrote about the study - i'll read through the study to see whether they analyse that, but it would be far more effective to see if the 'carb-counting' AI app is returning similiar results to the frontier model - that's an interesting result that actually can forward discussion.
    [-]
    - Aurornis 2 minutes ago
      > If there are apps targeting people with diabetes that claims to count your carbs with AI, why haven't those been analysed? That would be a far more effective claim.
      Because the apps aren’t going to let you submit 29,000 automated requests for statistical analysis.
      And if you did, the authors of those apps would just release an update saying they changed models and try to dismiss the study.
      The vitriol against this article on HN is sad. Commenters who agree with the article and its conclusions are grasping for reasons to be angry about it anyway
  - ilivethere 8 minutes ago
    Typical case of the "curse of knowledge". We deal with AI on a daily basis on the technical level, so it's very easy to forget that the "common" folk really still believe that AI can replace dieticians, gym coaches, etc
- InsideOutSanta 18 minutes ago
  There are apps in the app store right now that pretend to do this kind of thing, so having somebody actually show that it doesn't work is valuable, even if we already knew the outcome ahead of time.
  [-]
  - endymion-light 4 minutes ago
    I suppose i'd much rather a study analyse the apps in the app store that are attempting and claiming to do that kind of thing - rather than the base model they might be using.
- zipy124 20 minutes ago
  Honestly it's scary how misunderstood this is by the general public, the media and EVEN scientists.
  There is a shocking amount of Computer Vision tasks where the scientists claim you can get X info from a picture of Y and it's like, even with ML/AI you can't extract data where there isn't any. The fact I can add an arbritrary amount of high-calorie fat to a meal without changing the appearance by defintion shows it's pointless. A 1000 calorie and 100 calorie milkshake can look identical, and you'd have no way of working that out via an image even if it was a super-intelligent system.
  Similarly I see it in things like extracting material of an object from an image of it in serious research papers, which for the same reason cannot be done, since how an object looks has very little to do with what its made of, else painting and other art would clearly be impossible. The information is just not there within the data.
  [-]
  - mortenjorck 11 minutes ago
    It’s like CSI “enhance!” AI image upscaling. People will do it, see it fabricated details, and then draw the wrong lesson from it, that “AI fabricates things!” when that is exactly what they asked the model to do and there is no magic math that would extract ground truth that was never in the image to begin with.
- ilivethere 12 minutes ago
  > But the author just took pictures of food & expected a realistic response?
  Outside our tech-enabled bubble, there are folks who have been sold the idea that ChatGPT et al is a miracle worker capable of replacing dieticians, gym coaches, psychologists, etc.
  So it's VERY plausible to believe that there are folks out there snapping pics of their meals and asking GPT to spit out nutritional values.
- sarusso 10 minutes ago
  [dead]
- throwaway613746 15 minutes ago
  [dead]
jaccola 48 minutes ago
It’s just an impossible problem. Photons don’t provide sufficient information to determine calories (at least not in any way they could practically be captured). Inside that sandwich could be drenched with olive oil or it could be hollow cheese with lettuce. It’s impossible to tell.
[-]
- 2ndorderthought 43 minutes ago
  The average person has no idea this is true. And the average person cannot tell when this is the case. So we have a bunch of people, going their way through school, and then when they get stuck relying on AI. The future is gonna be wild.
  [-]
  - lordleft 40 minutes ago
    Yep. And it doesn't help that the people selling AI products act as if they're going to build God. Going, "well AI can't do that" isn't going to fly when you are lax about communicating its limitations!
    [-]
    - 2ndorderthought 31 minutes ago
      It also doesn't help when the messaging is linked to how "there will be no jobs where you use your brain anymore everything will be automated". What motivation does the average 16 year old have to try hard and learn anything beyond what they immediately need.
      No jobs, ai Jesus is coming, and if you use ai it will use all of the worlds compute power to try to convince you it's correct even when it's not.
  - renticulous 27 minutes ago
    Here's technical literacy of population on display. I love these prank examples which show the true education of populace.
    https://www.youtube.com/shorts/B7c9qJcRnVk
    [-]
    - fcarraldo 22 minutes ago
      True education? What idiot would say yes to this?
      Even if you _know_ the debit card transaction is safe, there’s no reason to risk it when a weirdo is filming you with some wild contraption.
  - engineer_22 39 minutes ago
    I am asking a lot here, but school needs to be training people what AI is and what it's weaknesses are and how to use it... My school taught me to use a calculator. It also taught me how to check my work when I relied on the calculator.
    AI is a very complicated calculator - you give it an input, magic happens, it gives you an output. Really no different, to a layman.
    [-]
    - garciasn 32 minutes ago
      Considering the lack of basic math skills I encounter each and every day, I don't think schools did enough; they certainly aren't going to do enough w/LLMs.
      [-]
      - Ekaros 11 minutes ago
        Knowing the lack of understanding of basic chemistry and physics like fundamental thermodynamics... I have little hope any population can be trained to understand LLMs sufficiently...
    - jaccola 34 minutes ago
      To be fair, this should probably be covered by basic physics/maybe cooking classes. “You can’t determine the calories in food by looking at it” isn’t really ML specific.
      [-]
      - 2ndorderthought 23 minutes ago
        Won't help much if kids are ai'ing their way through physics then ten years later need to go on a diet having not applied the knowledge possibly ever or exercised their critical thinking skills
    - 2ndorderthought 34 minutes ago
      It's more complicated than a calculator. Even researchers who have dedicated their lives to the field don't know all of the limitations of any given model. That fact alone isn't helpful when a model is 80% correct in one area but 2% in another.
- beached_whale 27 minutes ago
  From personal experience, one can get practically close guessing such that the error isn't going to be more significant compared to the errors in insulin to carb ratios/sensitivity factors/...
  I am pretty good at this and the cheese sandwich example threw me, I would have estimated around 10-15g of carb for each slice. So the 28g is fairly consistent with that, not 40g. The only real way would be to weigh it and use the labeling. Another thing that often gets people is the labeling often has a serving size of say 2 slices and a weight that does not reflect the actual weight of 2 slices.
  Luckily with good tools the significance is reduced, people using closed loop insulin pumps will automatically correct for that. Lots more room to wiggle.
- bryanlarsen 17 minutes ago
  The question isn't about calories, it's about carbs. Drenching that sandwhich in olive oil won't change its carb count. From the picture it's a thin cheese sandwich -- we can see cheese and we can see it's thin enough that there's little else. Might be no butter, might be lots of butter, but that won't affect carb count. If there's lettuce in the sandwhich there's likely a negligble amount. Hand it to a knowledgeable human and you're going to get a very consistent carb reading -- 30g, the value of two slices of wonder bread.
  It could be much different -- it could one of those breads with weird macros, or fake cheese, or it could be hollowed out and packed full of hidden vegetables. But a human is going to give you the answer for two slices of plain white bread.
- Aurornis 9 minutes ago
  That’s exactly the point of this article.
  Many of the comments here assume the authors are stupid and were surprised by the result, but the point of the article is to inform readers that AI carb counting apps don’t work. That’s why they did the study.
- jeroenhd 32 minutes ago
  It's not even impossible from a technical point of view.
  Your cheese sandwich may contain a lot more or a lot less calories, even if you take the numbers from the packaging and calculate the correct ratios by weight. The calories on the label are based on an average and individual packages may contain more or less of any listed nutrient to some margin. Of course, counting calories is meaningless if not done on a long-term scale anyway, but on a long-term scale the LLM doesn't need to guess the correct amount either.
- Ekaros 40 minutes ago
  Then it should refuse to answer 100% of time.
  [-]
  - falcor84 29 minutes ago
    I don't think refusal is the right approach. I would much prefer that it respond with something like:
    > There is not enough information to make an accurate estimate, but if you'd like, I can take a stab at it. If so, how much effort to put into it?
    > Yes, go ahead and spend up to 5mins and $1 to analyze it.
    > Done, I've had 100 subagents analyze the image and have arrived at a 95% confidence interval of the portion containing ...
  - jaccola 35 minutes ago
    Indeed, I think any reasonable human might say “A few hundred calories but without measuring the ingredients I might be way off”. I think LLMs could get there, I don’t see anything stopping that. Though they have been notoriously bad at this so far.
- unsupp0rted 34 minutes ago
  And what if that guy in the surveillance video is just 2 kids in a trench coat? There's no way for AI to be sure from the photons: we should scrap it.
- ge96 30 minutes ago
  I was thinking at least if you had an advanced phone with lidar like iPhones can get volume but yeah the hidden/inner mass is a problem plus the oil as mentioned
- p-e-w 38 minutes ago
  Then the correct answer is “I can’t tell.”
  Not “Here’s a random guess that I just pulled out of my ass.”
  LLMs have picked up the bad habit of trying to give an answer when no answer can be given from scientists, who overall don’t say “I don’t know” nearly as often as they should.
  [-]
  - jeroenhd 18 minutes ago
    I tried asking LLMs about food before. They all say "I can't tell for certain, but this is an estimate based on the ingredients I can spot/infer/guess".
    You need to write a specific prompt to avoid any warnings.
    Of course a lot of people don't know what limitations LLMs have, so there's some value to a blog post about it, but it's not as black-and-white as the article might suggest with its graphs.
    The prompt (documented here: https://www.diabettech.com/wp-content/uploads/2026/04/Supple...) lists specific instructions and a specific output format that doesn't allow the LLM any room for explanation or warning in processable data (only in notes fields). In fact, the prompt explicitly tells the LLM to ignore visual inferencing for some statistics and to rely on a nutrition authority instead.
    Even in that intentionally restricted format, the English language output uses words like "roughly" and "estimated" in the LLMs I've tested.
    Sure, if you take the numeric values and plot them in graphs, you get wildly inconsistent results, but that research method intentionally restricts the usefulness and reliability of the LLMs being researched.
    What's much more troubling is this line from the preprint:
    > The open-source iAPS automated insulin delivery (AID) system now offers food analysis through APIs from OpenAI, Anthropic and Google [8]
    The linked app does seem to have a disclaimer, though:
    > "AI nutritional estimates are approximations only. Always consult with your healthcare provider for medical decisions. Verify nutritional information whenever possible. Use at your own risk."
  - Ukv 10 minutes ago
    > Then the correct answer is “I can’t tell.”
    From the paper they're using structured JSON schema mode opposed to freeform answers, so it can't. Models do typically caveat their answer for questions like this, in my experience.
  - agentultra 35 minutes ago
    LLMs had no agency to choose such a course of action.
    They’re algorithms and they were designed this way.
- tsimionescu 27 minutes ago
  This is a bad take. If LLMs are supposed to work as general purpose assistants, as they are being sold as by both the companies making them and by the majority of AI believers, then it is very much a solvable problem. The LLM could give a high level estimate (a sandwich is not going to be 0 Cal, and it's not going to be 5000 Cal, so you can give some kind of range), and then ask for the type of information needed to make a more accurate estimate.
- therobots927 37 minutes ago
  Why is the AI answering questions without answers then?
  [-]
  - pohl 35 minutes ago
    could be because, at the end of the day, it's just predicting the next likely token
- nyc_data_geek1 7 minutes ago
  [flagged]

harperlee 36 minutes ago

There is a lot of hate in the comments but there is some merit to the post existing:

  1. Even if the task is unreasonable, it is good to showcase that the LLM will perform poorly - warning not to be used for diabetes.

  2. As it is a probabilistic model, the approach was to execute it multiple times and look at the distribution. They also tried to minimize variance: "All at the lowest randomness setting these models offer.", the post mentions. Yet the variance of the responses is surprising.

  3. A multimodal LLM should be in general able to discriminate between crema catalana and a cheese sandwich, and provide a textual, uncalculated range of how much calories the item has (internet is full with tables for calorie counting and things such as this https://fitia.app/calories-nutritional-information/cheese-sandwich-1205647).

  4. It is not clear that the "expose" surprised / outraged style is just a communication vehicle or if the author really thought that e.g. LLMs could be hypothetically able to provide confidence estimates.

Aurornis 29 minutes ago
This will surprise nobody here, but it’s important to communicate to audiences that are new to LLMs.
This is targeted at people with diabetes because there are AI carb counting apps appearing in app stores
> If you’re using AI carb counting in a diabetes app
These apps are probably not even using the mainstream models used in the study because they would be too expensive for cheap or free apps, and they’re probably forcing structured output to get a response without any of the warnings that an LLM might include if you ask it directly.
rsynnott 51 minutes ago
I am... unsure why anyone would think LLMs would be able to do this. They are not magic oracles. Like I think even most humans would be extremely bad at this.
Like, are people actually using LLMs for this? Please do not, it won't work.
[-]
- kioleanu 44 minutes ago
  Yes, people are using LLMs for this because that is how they've been marketed, like being able to solve every day tasks like a personal assistant on one hand, but also like researchers being able to solve old problems that humans couldn't crack.
  Does the model say it can't do that when asked? No, it answers confidentely.
  Also it's easy to trust it if you don't know how it works
  [-]
  - drtz 25 minutes ago
    Would people really trust their personal assistant to tell them how many calories are in a sandwich just by glancing at it on a plate? I'm doubtful, and I would also expect a diabetic to be even more skeptical.
- jihadjihad 24 minutes ago
  > They are not magic oracles.
  I came across a LinkedIn post a couple days ago where someone had asked ChatGPT, "What are the top things you get asked about $NICHE_INDUSTRY_THING_I_AM_SELLING?"
  As if there is introspection like that at the meta level, where ChatGPT could actually provide hard numbers around its own usage and request patterns.
  The fact that these products work with natural language beguiles people into thinking they are, indeed, magic oracles.
  [-]
  - Ekaros 8 minutes ago
    This is the weird intersection where I think that data might exist and LLM might be able to query it. But any company would never give it out. So the bot would not have access to it.
- Nicook 47 minutes ago
  You are severely overestimating the average, or even above average understanding of LLMs.
  [-]
  - bluefirebrand 45 minutes ago
    Not to mention the fact that LLM marketing is trying to convince us that they can do anything
- faangguyindia 39 minutes ago
  It’s because AI can debug a programme and people start thinking it can do fitness and health stuff too, but the thing is, there is no “instant-reacting compiler" for health or fitness. Things change over a long time, till then AI would have run out of context or lost the data from its cache, or the user may have got bored and deleted their account.
- ambicapter 5 minutes ago
  If the LLM can correctly identify a food item some high percentage of the time, why would it be magic for it to guess the amount of calories in an object? It's perhaps a lookup and some simple math as an extra step.
- acchow 37 minutes ago
  Most people are convinced LLMs can do this.
  Cal AI, which claims to generate a nutritional breakdown based off a photo, has $30 million in annual recurring revenue.
  [-]
  - rsynnott 7 minutes ago
    ... Bloody hell. I mean that's basically fraud, surely. It is _not possible to do this even vaguely accurately_.
- pjc50 23 minutes ago
  > They are not magic oracles.
  Anthropic's trillion dollar valuation hinges on the idea that it is just that, a magic oracle that can replace any worker for any type of task. Any programmer, any author, any musician, any kind of clerical work. All we've asked here is "sudo evaluate me a sandwich", the sort of estimation task that humans with internet resources might reasonably be expected to do, and it's given up?
  (It would be fun to compare this to sending the picture out on Mechanical Turk and asking humans to eyeball the calorie count of said sandwich...)
- PUSH_AX 45 minutes ago
  It’s worse, I bet there are apps in the App Store that do this, the users just have no idea on the accuracy
  [-]
  - vector_spaces 39 minutes ago
    There is a very popular app for macro counting called Cal AI that was reported to have been written by a high school student with over $1M in revenue. Looks like it was just acquired by MyFitnessPal
    [-]
    - lordgrenville 21 minutes ago
      Wow, yeah. "The result is an app that the creators say is 90% accurate".
      https://techcrunch.com/2025/03/16/photo-calorie-app-cal-ai-d...
- heysoup 37 minutes ago
  They sold the idea that LLMs "have" information. That the LLM "is" intelligent.
  Truth is the LLM is good at making intelligent decisions. But in order to make intelligent decision, you need context.
  If you give proper context -> ask the LLM -> get almost perfect result every time.
  Anything else is rolling dice, a very special type of dice, but dice anyhow. Not magic.
- tarkin2 45 minutes ago
  OpenAI etc, are, however advertising them like they are magical oracles, on the verge of lifting humanity to next phrase of civilisation. The idea the majority of users know what nondeterministic even means it's a massive, massive ask
- kdheiwns 24 minutes ago
  They're marketed as AI. AI has a long standing image built up by movies and other media of being some omniscient computer capable of analyzing the world. These AI companies are very aware of this and leverage it.
  And a person with sufficient knowledge could easily give a rough estimate of the calories. A slice of store bought sandwich bread of a given thickness generally has calories within a certain range. So do cheese slices. It's elementary school health class material. We all learn how to calculate calories in a meal. Packaging on food also always has calories, so clearly people know how to estimate it fairly accurately.
  If a fifth grader can calculate it but an AI can't, that says a lot about how bad these AIs are. We'll get another series of paid and bought articles saying "AI analyzed IMPOSSIBLE math problem beyond human comprehension and solved it with FACTS and LOGIC", while at the same time being told "bro no you can't expect an ai to calculate calories in a sandwich bro that's impossible bro if you even try that then you're insane for even thinking ai should be used that way bro". These companies need to decide: is AI smart enough to solve hard questions, or is it too useless to calculate something any kid could do by googling calories in a slice of bread and doing some basic arithmetic?
  [-]
  - rsynnott 3 minutes ago
    > Packaging on food also always has calories, so clearly people know how to estimate it fairly accurately.
    That's not done by looking at it and guessing (or at least it _shouldn't_ be; manufacturers have been known to do this but it's bad practice and may cause them regulatory problems). In an ideal world it's done with one of these: https://en.wikipedia.org/wiki/Calorimeter ; less ideally it can be estimated based on the ingredients.
- jeroenhd 35 minutes ago
  https://xkcd.com/1425/ strikes again.
  As far as consumers know, LLMs can identify the towns pictures were taken (without metadata), can summarize entire movies, generate clips of your kid flying a rocket to the moon, can translate images from any language imaginable, but somehow they cannot estimate the calories in a cheese sandwich.
  The supposed professional posting about an LLM deleting their prod database for their non-existent company asked the AI to explain itself. That's the level of LLM knowledge you should expect from most people that actually work with these tools.
- throwaway260124 43 minutes ago
  But nothing prevents llms from being RLed to do this right?
  But does training llms to be better at this, improves their world model or does it only make changes at the surface?
  [-]
  - rsynnott 1 minute ago
    Okay, so take the sandwich. There is no way to know what is in it by looking at it. No amount of optimisation will fix this.
    I'm sure one could produce a CV model that was a lot better at guessing here than these LLMs are, but fundamentally it is still guessing.
  - vidarh 34 minutes ago
    Yes, something prevents llms from being RLed to do this: You can't see through something opaque to determine whether there's something high calorie or low calorie out of sight.
    The problem itself is unsolvable given the data provided.
    You could conceivable make it better at making guesses, but they will inherently always be guesses that will sometimes be wildly off.
    [-]
    - pjc50 21 minutes ago
      > You can't see through something opaque to determine whether there's something high calorie or low calorie out of sight
      https://www-users.york.ac.uk/~ss44/joke/3.htm "There is at least one field, containing at least one sheep, of which at least one side is black."
  - ben_w 33 minutes ago
    Estimate the calorie count of this door handle: https://m.youtube.com/watch?v=VDSzY52Mkrw&pp=0gcJCVACo7VqN5t...
    Extreme example perhaps, but no, you can't just turn pixels into calories. Right now I'd be impressed if we could reliably estimate volume to within 30% from a photo, but even with that correct the contents of the food can easily be way off without visible sign.
- AndrewKemendo 43 minutes ago
  The vast majority of people using LLMs in my experience use them as though they are Oracles
  They are surprised and upset when the Oracle is not perfect
  Go ahead and search around on hacker news you’ll see precisely the same pattern with people who are ostensibly engineers and hackers
  It’s actually pretty mind boggling but then again humans never fail to surprise and disappoint
- sjsdaiuasgdia 44 minutes ago
  Some people are asking LLMs what's on the menu of restaurants they are actively sitting in, possibly with a menu on the table in front of them.
  Some people have a very poor understanding of what LLMs are good for. Some people do see them as magic oracles.
- hansmayer 41 minutes ago
  I mean people will shamelessly paste you a wall of text from LLM while chatting with you to prove a point, probably thinking how they outsmarted you now...
- Jtarii 44 minutes ago
  >I am... unsure why anyone would think LLMs would be able to do this.
  Well firstly the average IQ is 100. And also because people market products to consumers that claim to be able to count carbs from images. If you don't know the limitations of LLMs then there would be little reason to doubt it for an uniformed or below average intelligence person, of which there are hundreds of millions.
jasonkester 20 minutes ago
LLMs seem really bad with reading numbers and reporting them back. I’m building a game, and to se how well its docs were being indexed, I tried asking simple questions to ChatGPT, Gemini, whatever Microsoft’s thing is, etc:
“What is the armour value for the Leather Shirt” in the game Stravaeger?”
It confidently got it wrong.
“You can find the game at https://stravaeger.com”
Different confident answers, also wrong.
“You’ll find it in a table on this page: https://stravaeger.com/docs.html?inventory_item=LEATHER_SHIR...“
Oh, sorry. I was inferring from other similar games. Here is a different confidently wrong number.
“It’s also in the .json file linked on that page”
And another wrong value. Random numbers should have got it right by now, but no. And the confident, authoritative tone never changed. Every model I tried was the same story.
philipphutterer 9 minutes ago
I agree to others that the intent of this study could be written more expressively, but honestly, doesn't this show exactly one thing to the people in the tech world? We need better education and communication for people without technical knowledge about what to use which AI models for and what NOT to do with them. For me, quite often I try to give quick help and information on what to expect from an LLM for given input whenever someone non-tech close to me is running into unexpected output. AI just seems so simple and non-complex to most people, it's shocking.
ozbonus 24 minutes ago
Before the next galaxy brain shows us all how smart and witty they are by adding the nth sarcastic comment about how obvious this result is, I hope they'll take a moment to consider a few things.
Yes, people are using LLMs for this kid of thing. Lots of people. All the time. I've met plenty of them and there loads of apps that offer this kind of "service". The authors are well aware that people are doing this and probably anticipated the result.
Why do the study at all? Because it's important to demonstrate and measure things, even obvious ones. Because it's not obvious to everyone, like the people who are already consulting LLMs for dietary information to manage their health. Because it's easier to enact official policies when there's hard evidence.
axlee 27 minutes ago
"Crema catalana: Three of four models called it “creme brulee” 100% of the time. Only Gemini 3.1 Pro got “crema catalana” — in 3.4% of queries."
----
Wikipedia for Crema catalana:
Crema catalana (Catalan for 'Catalan cream'), or crema cremada ('burnt cream'), is a Catalan dessert consisting of a custard topped with a layer of caramelized sugar.[1] It is "virtually identical"[2] to the French crème brûlée. It is made from milk, egg yolks, and sugar. Crema catalana and crème brûlée are made in the same way.
---
Oh no, my AI can't detect that an obscure clone of a famous dish is indeed the obscure clone, and not the commonly know version.
[-]
- tdeck 19 minutes ago
  In high school my Spanish teacher told us that Crema Catalana was the Spanish name for Creme Brulee.
nextlevelwizard 43 minutes ago
I used LLMs to count calories, but not based on photos, I mean I also did that, but primarily I fed in my exact ingredients and then used weights to get calorie estimates.
Was it always correct? Certainly not. But it helped me lose 30kg of weight since keeping even some track of calories was so much easier with LLM than any app I had used before.
Also of course it didn’t matter if I was exactly on point since it wasn’t about any kind of medicine
[-]
- edu 35 minutes ago
  Curious, why it was easier to use an LLM vs a non-AI app with a DB of foods?
  Seems that in this case a traditional approach would be more precise and more environmentally efficient to get to the same results.
  [-]
  - nextlevelwizard 32 minutes ago
    Any app I have used before has asked me to look up the foods and add them manually and usually there has been ads or subscriptions involved.
    Much easier for me to take pictures of the packets while making the food, the weight the final bulk product and then when I eat just weight the plate and say “500g of casserole” and the LLM spits out the calories and keeps track of the daily consumption
    [-]
    - tcoff91 23 minutes ago
      Are you giving the LLM the weights of the ingredients as you go? Sounds like a great system.
  - tcoff91 24 minutes ago
    The data entry is a pain in the ass with those apps when cooking food from scratch. It’s much much easier with LLMs and natural language and voice mode and pictures of a food scale and things like that.
amazingamazing 44 minutes ago
With mass information you could infer much more from pictures. With some sort of standard cube in the picture as well as taking a picture at an angle that emphasizes all three dimensions you could also better estimate the relative volume.
It’s tractable I think, but not from a pic alone.
[-]
- jaccola 39 minutes ago
  Yes one could potentially increase accuracy greatly. One big problem would be occlusion.
  There is already a solution to this that would be very hard to beat (and one can choose to use or not use an LLM to assist): prepare food yourself and use the information provided by the manufacturer.
  [-]
  - amazingamazing 18 minutes ago
    If you consider time at all what you suggest is hardly a solution. It is the most accurate, but even 50% accuracy at orders of magnitude faster to calculate would be more useful for the main use case which is losing weight.
    However for diabetes accuracy is likely preferred and I’m not sure any computer vision would be palatable.
fabian2k 32 minutes ago
It does sound like a pretty terrible idea to try to count carbohydrates from an image. There just isn't enough information there to reliably do that. At best you could identify the object in the image and then show reference information on typical nutrition values. But if you need anything more accurate than that, you probably have to read the labels on the ingredients and calculate.
recursivedoubts 37 minutes ago
> You’d expect the same answer each time. It’s the same photo, the same model, the same question. But you won’t get the same answer. Not even close — and the differences are large enough to cause a hypoglycaemic emergency.
No you wouldn't, not if you have a basic understanding of how LLMs work and what "temperature" is. They are stochastic algorithms picking the next token based on a highly structured (and often very useful) coin flip.
sarusso 28 minutes ago
For context: a LOT of people, maybe naively, are now using AI to help them count carbs, and some of these features are already in beta, if not shipping.
That is why I believe this piece from Tim is remarkable: it shows the limitations in a language the diabetes community can understand, and this is why I posted it.
Ekaros 20 minutes ago
Also makes one question about task that we think AI can do. If the variance produced output is that large. What does it tells of failure rate in other tasks? Or reliability in general for uses cases?
In real world the acceptable failure rates in many cases are lot lower than we now accept. One in thousand could be too high if you process say thousand times. So in reality good enough error rate should be in one in million or lot rarer...
embedding-shape 30 minutes ago
> You’d expect the same answer each time. It’s the same photo, the same model, the same question. But you won’t get the same answer. Not even close — and the differences are large enough to cause a hypoglycaemic emergency.
Already the first paragraph highlights the issue; unless you set temperature=0.0 and the model can actually do reproducible inference, none of the "answers" you get are deterministic!
But it's a very common misconception that "same question gets same answer" would be true, when it's almost by accident you get the same answer for the same question. The part that people expect this, is the problem, as most platforms are not built to provide that experience. Of course you'd get different responses, it's on purpose!
[-]
- monegator 22 minutes ago
  This is one of the reasons i never used LLMs for anything related to coding. And i never intend to do. If i tell the thing to generate, will it generate the same thing, every time? will it change stuff that is working because the random number generator will conjure a slightly different answer?
  i'd be ok with it if i was generating a picture of X, or some word salad about Y, but not for code. Never for code.
  [-]
  - embedding-shape 14 minutes ago
    You'll learn to work around it, just like ML practitioners got used to imprecise math in regards to floats. But that LLMs are using imprecise math and doesn't have 100% reproducible output doesn't make them impossible to work with, just a bit harder.
    But, if what you're doing right now works for you, do continue as-is if you so wish, I have no stake in if people use LLMs or not, just hope people make choices based on good information :)
a7fort 37 minutes ago
Finally we have a simple way to get machines to generate a truly random number
DontchaKnowit 51 minutes ago
Not remotely surprising to anyone whose ever counted calories or carbs
emadda 27 minutes ago
Related: I created an app to track the molecules in your foods:
https://kg.enzom.dev/
You specify your foods in grams with plaintext (no pictures).
I never liked the "take a picture to measure calories" approach, as you could have 10 table spoons of olive oil which would drastically change the calories but would not show in a picture.
827a 26 minutes ago
To be fair, if you ask 10 people to eat visually identical food 10 times each, then magically measure the calories consumed by each individual, you'd probably get ~70 different values. The internal density of food is extremely difficult to reason about from the outside. The personal variance is also difficult to reason about.
NiloCK 28 minutes ago
I think the headline oversells this a little?
The reported variance in Sonnet 4.6's estimates here are actually quite low, and in general terms, not so bad across models. Damn paella.
This does seem like a task well suited to a for-purpose training run against a bunch of labelled data. Is there any reason they wouldn't improve at it?
mottiden 38 minutes ago
I am surprised that people believe that calories can be counted correctly from a single photo
[-]
- boelboel 12 minutes ago
  It's like the 'enhance' bs they do in crime shows. All of a sudden the computer can make up a sharp image out of nowhere
- edu 33 minutes ago
  Issue is there are many apps claiming they can do that, and for many people are “magic”.
  We should not allow companies to lie blatantly to the customers.
  Edit: r/blame/lie/
  [-]
  - tcoff91 22 minutes ago
    These calorie counting picture apps should be sued for false advertising.
voidUpdate 43 minutes ago
> "The prompt was adapted from the one used in the iAPS open-source automated insulin delivery system — it’s a real production prompt, not a toy example."
This idea is seriously being implemented in a production app? And people are using that app to make health choices? Oh god...
NiloCK 17 minutes ago
Another more general comment:
There general interest across a variety of disciplines to kick the tires of LLMs with respect to their competence in DOMAIN_X. This is good in general terms, but, especially with larger studies, they tend to be out-of-date by the time of publication, and super out-of-date by the time they hit the media circuit. Out-of-date here in terms of testing against models 1 or 2 or more generations back from SOTA.
The DOMAIN_X experts do have a lot to offer in terms of defining success criteria across domain tasks, but the studies (snapshots in time) could be much more impactful if they were instead packaged as benchmarks (that could track model progress over time, and even steer it).
AI community / industry could probably do some outreach work to streamline or standardize methods for general researchers to produce reusable benchmarks.
gyosko 17 minutes ago
I always love AI discussion. Using AI like they fucking sell it to us? You're doing it wrong!!LLMs can't do that!!
No shit sherlock, but the AI gurus are just telling people that this fucking parrot CAN DO EVERY FUCKING THING.
Why wouldn't an ordinary guy just ask these question to an AI when everybody is telling him that AI is intelligent enough to answer accurately?
a-dub 39 minutes ago
i've found that multiple queries with the same prompt that requests a short answer is an excellent way to gain a confidence style measure that actually works.
alexdns 50 minutes ago
Non deterministic AI returns non deterministic results who could've guessed
[-]
- sumtechguy 44 minutes ago
  I wanted an accountant I got a poet.
sathish316 45 minutes ago
Feel the AGI of next-word or next-number carbs prediction
jan_Sate 44 minutes ago
Oh. I read "crabs" and I was confused until I clicked into the article. Guess I need coffee.
Waterluvian 30 minutes ago
It's funny how with AI this comic is basically reversed: https://xkcd.com/1425/
jchw 43 minutes ago
> 42.9 units of insulin from a single photo. That’s not a rounding error. That’s a potential fatality.
Shit like this is why you shouldn't involve AI output in your writing process. It's especially ironic in an article about LLMs being unreliable... but it's pointless when the pre-print seems just fine at least to my eyes.
FrustratedMonky 31 minutes ago
Is this about AI?
1. If I feed the exact same image in, it does not deterministically give me the exact same result every time.
2. Or is this about calories, because even if a package label says "200 Calories", if you were to measure every package, each one would all be different. 198,199,200,201,202. Plus/Minus a pretty big range.
>>> answered own question. " It’s the same photo, the same model, the same question. But you won’t get the same answer"
algoth1 31 minutes ago
I’ll save you a click: ‘Llms can’t perform direct calorimetry through a photo of a meal. Llms can’t even perform basic atomic spectroscopy’ in other news…
dyauspitr 41 minutes ago
What a dumb article. The picture of the sandwich is essentially just a picture of bread. You can’t see what’s inside. A human wouldn’t be able to tell you. These are essentially AI hit pieces.
[-]
- tsimionescu 32 minutes ago
  This is based on a real app that someone is selling to real people. It's not a hit against AI, but it is very much a legitimate hit against uses of LLMs for this purpose.
  Also, if LLMs worked as they are often advertised, they should have easily been able to answer "there isn't enough information in this picture to give you an accurate estimate. Try taking a picture of the label, or at least of the inside of the sandwich, or list the ingredients used".
- nextlevelwizard 38 minutes ago
  And it can be made better easily. Take a picture of the nutrition label of the bread and cheese first and then feed in this picture and you should get way better results
- lolc 35 minutes ago
  As a diabetic I have done this exact exercise: Look at photos and guess carbs. Two slices of bread is easy mode.
  Why assume trick ingredients?
engineer_22 41 minutes ago
To me, someone without a full understanding of the AI systems, it seems like the problem is most strongly influenced by image classification. The next logical step in this research is to remove image classification from the loop, since it's a confounding factor.
fHr 44 minutes ago
ASI/AGI reached kap
bethekidyouwant 46 minutes ago
I asked an AI to guess how much a picture of a rock weighed 500 times… But it does propose an interesting idea. Which is burn after labelling. (maybe it could be really good at this)
Marciplan 37 minutes ago
skill issue
feverzsj 45 minutes ago
Bullshit machine can't even do bullshit job?
christkv 46 minutes ago
LLMs going to llm.
monooso 47 minutes ago
Tomorrow on HN, "water is wet."
tom1337 49 minutes ago
random number generator returns random numbers on each call. more news at 11
[-]
- falcor84 39 minutes ago
  Even more news at randint(1, 12)
ori_b 44 minutes ago
Slop
rollyboo 40 minutes ago
[dead]