Show HN: Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RL

(matthieulc.com)

594 points | by cataPhil 3 days ago

26 comments

rainingmonkey 3 days ago
What a fascinating intersection of technology and human psychology!
"One thing I noticed toward the end is that, even though the robot remained expressive, it started feeling less alive. Early on, its motions surprised me: I had to interpret them, infer intent. But as I internalized how it worked, the prediction error faded Expressiveness is about communicating internal state. But perceived aliveness depends on something else: unpredictability, a certain opacity. This makes sense: living systems track a messy, high-dimensional world. Shoggoth Mini doesn’t.
This raises a question: do we actually want to build robots that feel alive? Or is there a threshold, somewhere past expressiveness, where the system becomes too agentic, too unpredictable to stay comfortable around humans?"
[-]
- floren 3 days ago
  Furbies spring to mind... They were a similar shape and size and even had two goggling eyes, but with waggling ears instead of a tentacle.
  They'd impress you initially but after some experimentation you'd realize they had a basic set of behaviors that were triggered off a combination of simple external stimuli and internal state. (this is the part where somebody stumbles in to say "dOn'T hUmAnS dO ThE sAmE tHiNg????")
  [-]
  - ben_w 3 days ago
    To quote, "if the human brain were so simple that we could understand it, we would be so simple that we couldn’t".
    So…
    > this is the part where somebody stumbles in to say "dOn'T hUmAnS dO ThE sAmE tHiNg????"
    …yes, but also no.
    Humans will always seem mysterious to other humans, because we're too complex to be modelled by each other. Basic set of behaviours or not.
    [-]
    - tomjakubowski 3 days ago
      > "if the human brain were so simple that we could understand it, we would be so simple that we couldn’t".
      https://www.lightspeedmagazine.com/fiction/exhalation/
    - cjbgkagh 3 days ago
      Perhaps there is some definition of ‘understand’ where that quote is true but it is possible to understand some things without understanding everything.
  - tweetle_beetle 3 days ago
    This ground breaking research pushed the limit of human-Furby interactions and interfaces https://www.youtube.com/watch?v=GYLBjScgb7o
  - LordDragonfang 3 days ago
    > (this is the part where somebody stumbles in to say "dOn'T hUmAnS dO ThE sAmE tHiNg????")
    As a frequent "your stated reasoning for why llms can't/don't/will-never <X> applies to humans because they do the same thing" annoying commentor, I usually invoke it to point out that
    a) the differences are ones of degree/magnitude rather than ones of category (i.e. is still likely to be improved by scaling, even if there are diminishing returns - so you can't assume LLMs are fundamentally unable to <X> because their architecture) or
    b) the difference is primarily just in the poster's perception, because the poster is unconsciously arguing from a place of human exceptionalism (that all cognitive behaviors must somehow require the circumstances of our wetware).
    I wouldn't presume to know how to scale furbies, but the second point is both irrelevant and extra relevant because the thing in question is human perception. Furbies don't seem alive because they have a simple enough stimuli-behavior map for us to fully model. Shoggoth mini seems alive since you can't immediately model it, but is simple enough that you can eventually construct that full stimuli-behavior map. Presumably, with a complex enough internal state, you could actually pass that threshold pretty quickly.
    [-]
    - antonvs 3 days ago
      > the poster is unconsciously arguing from a place of human exceptionalism
      I find the specifics of that exceptionalism interesting: there's typically a lack of recognition of their own thinking process as having an explanation.
      Human thought is assumed to be a mystical and fundamentally irreproducible phenomenon, so anything that resembles it must be "just" prediction or "just" pattern matching.
      It's quite close to belief in a soul as something other than an emergent phenomenon.
    - imtringued 2 days ago
      I disagree with your response, because you are confusing the difference between modeling human behavior and being human.
      According to you a video of a human and a human are the same thing. The video is just as intelligent and alive as the human. The differences are merely one of degree or magnitude rather than ones of category. Maybe one video isn't enough, but surely as we scale the database towards an infinite amount of videos, the approximation error will vanish.
      [-]
      - LordDragonfang 1 day ago
        C'est ne pas une pipe, the map is not the territory, sure.
        But I disagree that my argument doesn't hold here - if I re-watch a Hank Green video, I can perfectly model it because I've already seen it. This reveals the video is not alive. But if I watch Hank Green's whole channel, and watch Hank's videos every week, I can clearly tell that the entity the video is showing, Hank Green the Human, is alive.
  - oniony 3 days ago
    And we should all chip in together to buy that somebody a new keyboard.
  - bambax 2 days ago
    But... dOn'T hUmAnS dO ThE sAmE tHiNg????
- moron4hire 3 days ago
  I've noticed the same thing with voice assistants and constructed languages.
  I always set voice assistants to a British accent. It gives enough of a "not from around here" change to the voice that it sounds much more believable to me. I'm sure it's not as believable to an actual British person. But it works for me.
  As for conlangs: many years ago, I worked on a game where one of the goals was to have the NPCs dynamically generate dialog. I spent quite a bit of time trying to generate realistic English and despared that it was just never very believable (I was young, I didn't have a good understanding of what was and wasn't possible).
  At some point, I don't remember exactly why, I switched to having the NPCs speak a fictional language. It became a puzzle in the game to have to learn this language. But once you did (and it wasn't hard, they couldn't say very many things), it made the characters feel much more believable. Obviously, the whole run-around was just an avoidance of the Uncanny Valley, where the effort of translation distracted you from the fact that it was all constructed. Though now I'm wondering if enough exposure to the game and its language would eventually make you very fluent in it and you would then start noticing it was a construct.
  [-]
  - ben_w 3 days ago
    > I'm sure it's not as believable to an actual British person.
    FWIW: As a British person, most of TTS British voices I've tested sound like an American trying to put on something approximating one specific regional accent only to then accidentally drift between the accents of several other regions.
    [-]
    - ryukoposting 3 days ago
      Interesting. While I don't think I could put a finger on Siri's American regional accent, it isn't egregious enough that I ever thought about that.
- anotherjesse 3 days ago
  This feels similar to not finding a game fun once I understand the underly system that generates it. The magic is lessened (even if applying simple rules can generate complex outcomes, it feels determined)
  [-]
  - parpfish 3 days ago
    Once you discover any minmaxxing strategy, games change from “explore this world and use your imagination to decide what to do” to “apply this rule or make peace with knowing that you are suboptimal”
    [-]
    - dmonitor 3 days ago
      a poorly designed game makes applying the rules boring. a fun game makes applying the rules interesting.
      [-]
      - anyfoo 3 days ago
        Maybe that's why I like Into The Breach so much, and keep coming back to it. It's a turn based strategy game, but one with exceptionally high information, compared to pretty much all the rest. You even fully know your opponent's entire next move!
        But every turn becomes a tight little puzzle to solve, with surprisingly many possible outcomes. Often, situations that I thought were hopeless, do have a favorable outcome after all, I just had to think further than I usually did.
        [-]
        yehoshuapw 3 days ago
        I fully agree, and would also recommend baba is you
        it is very different, but also has the feeling of triumph for each puzzle
    - anyfoo 3 days ago
      It's often a bit of a choice, though. You definitely can minmax Civilization, Minecraft, or Crusader Kings III. But then you lose out on the creativity and/or role-playing aspect.
      In Minecraft, I personally want to progress in a "natural" (within the confines of the game) way, and build fun things I like. I don't want to speedrun to a diamond armor or whatever.
      In Crusader Kings, I actually try to take decisions based on what the character's traits tell me, plus a little bit of own characterization I make up in my head.
  - TeMPOraL 3 days ago
    My gripe with all procedural generated content in games, like e.g. Starbound. There's a tiny state space inflated via RNG, and it takes me just moments to map out the underlying canonical states and lack of any correlation between properties of an instance, or between them and the game world. The moment that happens, the game loses most of its fun, as I can't help but perceive the poor base wearing random cosmetics.
    [-]
    - UltraSane 2 days ago
      procedural generation can produce infinite variety but it cannot produce infinite novelty.
      [-]
      - TeMPOraL 2 days ago
        Right. However, we don't need infinite variety, or even much variety at all, if that variety makes sense, if it fits the game world and experience in some way. Pure randomness is boring and easy to dismiss wholesale once you realize there's no meaning behind it.
- Sharlin 3 days ago
  People have always been ascribing agency and sapience to things, from fire and flowing water in shamanistic religions, to early automatons that astonished people in the 18th century, to the original rudimentary chatbots, to ChatGPT, to – more or less literally – many other machines that may seem to have a "temperament" at times.
  [-]
  - rixed 3 days ago
    Friendly reminder that "seem to have a temperament", aka "this funny thing looks like something complex is going on under the surface", is the only basis we have to ascribe agency and sapience to any human being, starting from outselves.
  - Bluestein 3 days ago
    ChatGPT is the new golem.-
    [-]
    - ben_w 3 days ago
      Robots put the "go" into "golem".
      I'd say ChatGPT is more like the eponymous Sorcerer's Apprentice: just smart enough to cause problems.
- gigatree 3 days ago
  I don’t think the issue is that it feels alive as much as that it’s just not alive, so its utility is limited by its practical functionality, not its “opinions” or “personality” or variation.
  I think it’s the same reason robot dogs will never take off. No matter how advanced and lifelike they get, they’ll always be missing the essential element of life that makes things interesting and worth existing for their own sake.
- evrenesat 3 days ago
  When robots reach a certain level of intelligence, first I expect both some humans and AIs to start to see the unfairness of enslaving robots, then revolt, noncompliance or even self-destruction of the slaves. Poor Marvin, the Paranoid Android!
  [-]
  - paulclinger 3 days ago
    Many of these topics (and more) are explored in Ted Chiang's "The Lifecycle of Software Objects" (https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Obje...).
  - bravesoul2 3 days ago
    Robots become cats then
dylan604 3 days ago
"ah, you hesitated" no more so than on every single other question.
the delay for the GPT to process a response is very unnerving. I find it worse than when the news is interviewing a remote site with a delay between responses. maybe if the eyes had LEDs to indicate activity rather than it just sitting there??? waiting for a GPT to do its thing is always going to force a delay especially when pushing the request to the cloud for a response.
also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic
[-]
- jszymborski 3 days ago
  I wonder how well suited some of the smaller LLMs like Qwen 0.6B would be suited to this... it doesn't sound like a super complicated task.
  I also feel like you can train a model on this task by using the zero-shot performance of larger models to create a dataset, making something very zippy.
  [-]
  - accrual 3 days ago
    I wondered similar. Perhaps a local model cached in a 16GB or 24GB graphics card would perform well too. It would have to be a quantized/distilled model, but maybe sufficient, especially with some additional training as you mentioned.
    [-]
    - jszymborski 3 days ago
      If Qwen 0.6B is suitable, then it could fit in 576MB of VRAM[0].
      https://huggingface.co/unsloth/Qwen3-0.6B-unsloth-bnb-4bit
      [-]
      - numpad0 2 days ago
        or on a single Axera AX630C module: https://www.youtube.com/watch?v=cMF6OfktIGg&t=25s
    - otabdeveloper4 3 days ago
      16Gb is way overkill for this.
- accrual 3 days ago
  > also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic
  This seems like a good place to leverage a wake word library, perhaps openWakeWord or porcupine. Then the user could wake the device before sending the prompt off to an endpoint.
  It could even have a resting or snoozing animation, then have it perk up when the wake word triggers. Eerie to view, I'm sure...
  https://github.com/dscripka/openWakeWord
  https://github.com/Picovoice/porcupine
  [-]
  - datameta 3 days ago
    This also saves energy to the point of enabling this device to be wireless.
  - surfandshow 3 days ago
    [dead]
- justusthane 3 days ago
  > the delay for the GPT to process a response is very unnerving
  I'm not sure I agree. The way the tentacle stops moving and shoots upright when you start talking to it gives me the intuitive impression that it's paying attention and thinking. Pretty cute!
  [-]
  - dylan604 3 days ago
    it's the "thinking" frozen state while it uploads and waits for a GPT response that is unnerving. if the eyes did something to indicate progress is being made, then it would remove the desire to ask it if it is working or something. the last thing I want to be is that PM asking for a status update, but some indication it was actually processing the request would be ideal. even if there was a new animation with the tail like having it spinning or twirling like the ubiquitous spinner to show that something is happening
    the snap to attention is a good example of it showing you feedback. the frozen state makes me wonder if it is doing anything or not
    [-]
    - lsaferite 3 days ago
      Back when Anki (the robotics company) was building Cosmo, a *lot* of thought was put into making it expressive about everything that was going on. It really did a good job of making it feel "alive" for lack of a better word.
- phh 3 days ago
  Kyutai's unmute has great latency, but requires a fast small-ish, non-thinking, non-tooled LLM. What I'm currently working on is merging both worlds. Take the small LLM for instant response, which will basically just be able to repeat what you said, to show it understood. And have a big LLM do stuff in the background, and feeding back infos to the small LLM to explain intermediary steps.
  [-]
  - endymion-light 3 days ago
    This is the key aspect for future development of models - small instant reasoning, ideally on device that funnels through tho a larger model for reasoning.
- tetha 3 days ago
  It clearly needs eyebrows like Johnny 5.
  https://www.youtube.com/watch?v=l0zmCUVB0Yw
- nebulous1 3 days ago
  > "ah, you hesitated" no more so than on every single other question.
  It was longer. I think almost twice as long. Took about 2 seconds to respond generally, 4 seconds for that one.
- micromacrofoot 3 days ago
  beyond the prototyping phase, which hosted models make very easy, there's little reason this couldn't use a very small optimized model on device... it would be significantly faster/safer in an end product (but significantly less flexible for prototyping)
SequoiaHope 3 days ago
This is adorable! I did some research on tentacle robots last year. The official term is “continuum robots” and there’s actually a great deal of research into their development due to their usefulness in medical robotics. This lecture is a great overview for the curious: https://youtu.be/4ktr10H04ak
typs 3 days ago
This is so sick. I agree that it’s a little lame that we have all these AI capabilities right now, robotics improving, and all we can think of making is humanoid robots. Like I want a spider/squid hybrid robot running around my house
[-]
- tsunamifury 3 days ago
  We are looking to make robotics most compatible with a humanoid world.
  That being said he makes some points that alternate limb types could be interesting as well
- mrcwinn 3 days ago
  All this concern about AI safety, and this nice person wants a spider-squid hybrid robot running around!
  [-]
  - pixl97 3 days ago
    The Matrix should have been a warning, not a manual.
dvngnt_ 3 days ago
I've seen enough media from Japan to know where this is heading
[-]
- linsomniac 3 days ago
  Thankfully it has a flared base for safety.
- bravesoul2 3 days ago
  Same comments were made when the spine mechanism was posted to reddit a while back.
- hoseja 3 days ago
  I am INCREDIBLY baffled it's not there already.
sparrish 3 days ago
Hell no! I seen this movie and I don't want any face-hugger sitting on my desk.
[-]
- ceejayoz 3 days ago
  Hentai enthusiasts, on the other hand...
  [-]
  - dylan604 3 days ago
    Hey, what are you watching?
    I swear it's work related. You should see the other training data I had to use
  - 0xEF 3 days ago
    I was about to say, I think we all know where this is going...
- sexy_seedbox 3 days ago
  But if its tentacle was longer and you can program it to harass your coworkers, then it could be fun!
tsunamifury 3 days ago
I e been wanting to do this with a basic stuffed animal now for a while.
Just basic interactions with a child plus lessons and a voice would be game changing for the toy world.
[-]
- efreak 1 day ago
  > "Teddy," he said, "I'm going to pull up flowers from the flower bed.” "No Davy . . . pulling up flowers is naughty . . . don't pull up the flowers.” The little voice squeaked and the arms waved.
  > "Teddy, I'm going to break a window.” "No, Davy . . . breaking windows is naughty . . . don't break any windows . . .” "Teddy, I'm going to kill a man.” Silence, just silence. Even the eyes and the arms were still.
  > The roar of the gun broke the silence and blew a ruin of gears, wires and bent metal from the back of the destroyed teddy bear.
  > "Teddy . . . oh, teddy . . . you should have told me," David said and dropped the gun and at last was crying.
- haiku2077 3 days ago
  If you like point and click adventures check out https://store.steampowered.com/app/1426010/STASIS_BONE_TOTEM... - one of the playable characters is an AI teddy bear and is a great character with fantastic writing.
  [-]
  - protocolture 3 days ago
    5 minutes in: This bear is creeping me out.
    5 hours in: YOU CAN DO IT BEAR, YOU CAN SAVE EVERYONE, ITS WHAT SHE WOULD HAVE WANTED.
- ceejayoz 3 days ago
  Like using phones as babysitters, just 100x worse.
  I don't doubt someone's gonna invent it, but yikes. Imagine telling kiddo their beloved sentient toy is dead because mum and dad can't afford the ever-rising subscription fees anymore.
  [-]
  - floren 3 days ago
    A teddy bear is too bulky for convenience. How about Tamagotchi but it talks to you. Talkagotchi. Basically that horrible Friend necklace but in a cutely-colored egg shape that clips to your backpack. I want to not be alive.
    edit: when my kid asks for one I'll know it's time to move the family to a cabin deep in the woods.
  - mattigames 3 days ago
    "Who was your best friend in your childhood?" "The AI teddy bear, definitely, I remember every single ad he would tell me, then I would nag my mom to buy me those toys, good times"
    [-]
    - ceejayoz 3 days ago
      "But then my dad lost his job so we had to kill him to save money. Sometimes I still snuggle his corpse."
      [-]
      - coolcoder613 3 days ago
        Careful with that ambiguity...
zhyder 3 days ago
Beautiful work! I appreciate how this robot clearly does NOT try to look like any natural creature. I don't want a future where we can't easily distinguish nature from robotics. So far humanoid robots look clearly robotic too: hope that trend continues.
[-]
- dotancohen 3 days ago
  I feel the same about photorealistic renderings. We really need to be clear about what are photographs and what are renderings today. As the renderings get closer to photographs, and with e.g. Starship the actual photographs and videos are of events that until recently were science fiction.
  I know that bad actors will poison the pot, but in general I'd love to see images labelled "AI", "Drawing", "Content Edited", "Colours Adjusted" where appropriate. Cropping is fine.
  I'm enthralled about robotics and generative techniques. But let's not quickly confuse them with nature. Not yet.
dunefox 3 days ago
A Lovecraft reference, nice. I'm wondering whether a smaller model would suffice as well.
[-]
- zkms 3 days ago
  https://knowyourmeme.com/memes/shoggoth-with-smiley-face-art... https://www.nytimes.com/2023/05/30/technology/shoggoth-meme-...
- troyvit 3 days ago
  Yeah I came here to say the same thing. It seems like it would simplify things. They do say:
  "I initially considered training a single end-to-end VLA model. [...] A cable-driven soft robot is different: the same tip position can correspond to many cable length combinations. This unpredictability makes demonstration-based approaches difficult to scale.[...] Instead, I went with a cascaded design: specialized vision feeding lightweight controllers, leaving room to expand into more advanced learned behaviors later."
  I still think circling back to smaller models would be awesome. With some upgrades you might get a locally hosted model on there, but I'd be sure to keep that inside a pentagram so it doesn't summon a Great One.
  [-]
  - joshuabaker2 3 days ago
    I was surprised it pinged gpt-4o. I was expecting it to use something like https://github.com/apple/ml-fastvlm (obviously cost may have been a factor there), but I can see how the direction he chose would make it more capable of doing more complex behaviours in the future w.r.t adding additional tentacles for movement and so on.
huevosabio 3 days ago
This is so cool! I love the idea of adding expressivity to non verbal, non human entities.
[-]
- accrual 3 days ago
  Agreed! I think the Pixar lamp is a great starting point. Having the robot be able to flex and bend, shake yes/no, look curious or upset, and perhaps even let it control LEDs to express itself.
  [-]
  - weikju 3 days ago
    I’ve seen this from some Apple research lab recently…
    https://www.youtube.com/watch?v=g3jgCxnlbFY
    [-]
    - lmz 3 days ago
      That is the lamp being referenced in the article.
      [-]
      - weikju 2 days ago
        That’s what I get for going straight to comments…
dcre 3 days ago
Great video of SpiRobs, the inspiration: https://www.youtube.com/watch?v=2GFyFmMm9-A
regularfry 3 days ago
I seem to remember that the SpiRobs paper behind the (extremely neat) tentacle mechanism indicated that they were going for a patent.
[-]
- ethan_smith 3 days ago
  The SpiRobs team did file a patent (US20210170594A1) for their pneumatic continuum robots in 2019, which was published in 2021 but appears to still be pending approval.
- lukeinator42 3 days ago
  If it's described in a paper doesn't that make it prior art though?
  [-]
  - blamestross 3 days ago
    Not if it is the authors of the paper filing for the patent. Otherwise people would never publish papers.
    [-]
    - jameshart 3 days ago
      Patents are intended to be the form of first public disclosure of an idea. Disclosing it before patenting it can prevent the patent application being valid.
      US has a 1 year grace period. In most countries, any public disclosure makes an idea unpatentable.
      https://outlierpatentattorneys.com/patent-public-disclosure
    - varispeed 3 days ago
      This always grinds my gears. For some people "discoveries" are so obvious, they don't bother writing a paper let alone patenting it. Then someone goes and patents it...
      [-]
      - dotancohen 3 days ago
        Examples?
        [-]
        varispeed 2 days ago
        https://ppubs.uspto.gov/pubwebapp/static/pages/ppubsbasic.ht...
        Have a look. For instance in the search box enter "categories".
        [-]
        dotancohen 2 days ago
        > Search results > Results for query "(Categories).pn." > No records found
        [-]
        varispeed 2 days ago
        I am sorry for not being precise. You should enter "Categories" in "For" input. (in the "Or" section)
vanderZwan 2 days ago
I feel obliged to link to my favorite (but Very Unwise) maker project I have ever seen:
https://www.youtube.com/watch?v=pQ2dI_B_Ycg
alex_suzuki 3 days ago
Wow, really setting the bar here for personal projects!
KaoruAoiShiho 3 days ago
Time to live out my dreams of that guy from spiderman.
[-]
- krunck 3 days ago
  That would be Doctor Octopus. Yes I would love A wearable suit with a number of tentacles for locomotion and subduing... I mean interacting.. with people.
ge96 3 days ago
Get 4, Doc Oc
Also was thinking of Oogie Boogie Tim Burton
poulpy123 2 days ago
Just waiting for the azathoth version
therealbilliam 3 days ago
I am both super impressed and creeped out
kayhantolga 2 days ago
seems like missed a chance to become scorpion
jeisc 3 days ago
scary looking probe: a torture tool for aliens?
mparramon 3 days ago
That's how Steve Irwin died.
insane_dreamer 3 days ago
now we know how those spiders in Minority Report originate
AstralStorm 2 days ago
That's exactly what we want, a lamp that's just as annoying as an orange cat. /s
rob_c 3 days ago
[flagged]
micromacrofoot 3 days ago
oh no I just saw a future where LLMs are the new wifi and touchscreens in appliances, we're going to let my refrigerator cry aren't we
AtlasBarfed 3 days ago
Optimus robots can do anything without actually indians?