Why would you bloat the (already crowded) context window with 27 tools instead of the 2 simplest ones: Save Memory & Search Memory? Or even just search, handling the save process through a listener on a directory of markdown memory files that Claude Code can natively edit?
Yep. Most MCP servers are best repackaged as a cli tool, and telling the LLM to invoke `toolname --help` to learn how to use it at runtime. The primary downside is that they LLM will have a lower proclivity to invoke those tools unless explicitly reminded to.
If you do like rolling your own MCP servers, I've had great success recently refactoring a bunch of my own to consume fewer tokens by, instead of creating many different tools, consolidating the tools and passing through different arguments.
People are just ricing out AI like they rice out Linux, nvim or any other thing. It's pretty simple to get results from the tech. Use the CLI and know what you're doing.
Maintain a good agents.md with notes on code grammar/structure/architecture conventions your org uses, then for each problem, prompt it step-by-step as if you were a junior engineer's monologue.
e.g. as I am dropped into a new codebase:
1. Ask Claude to find the section of code that controls X
2. Take a look manually
3. Ask it to explain the chain of events
4. Ask it to implement change Y, in order to modify X to do behavior we want
5. Ask it about any implementation details you don't understand, or want clarification on -- it usually self-edits well.
6. You can ask it to add comments, tests, etc., at this point, and it should run tests to confirm everything works as expected.
7. Manually step through tests, then code, to sanity check (it can easily have errors in both).
8. Review its diff to satisfaction.
9. Ask it to review its own diff as if it was a senior engineer.
This is the method I've been using, as I onboard onto week 1 in a new codebase. If the codebase is massive, and READMEs are weak, AI copilot tools can cut down overall PR time by 2-3x.
I imagine overall performance dips after developer familiarity increases. From my observation, it's especially great for automating code-finding and logic tracing, which often involves a bunch of context-switching and open windows--human developers often struggle with this more than LLMs. Also great for creating scaffolding/project structure. Overall weak at debugging complex issues, less-documented public API logic, often has junior level failures.
Great walkthrough, I might send your comment to my coworkers. I use AI to write pretty much 100% of my code and my process looks similar. For writing code, you really want to step through each edit one by one and course-correct it as you go. A lot of times it's obvious when it's taking a suboptimal approach and it's much easier to correct before the wrong thing is written. Plus it's easier to control this way than trying to overengineer rules files to get it to do exactly what you want. The "I'm running 10 autonomous agents at once" stuff is a complete joke unless you are a solo dev just trying to crap something working out.
I use Sonnet 4.5 exclusively for this right now. Codex is great if you have some kind of high-context tricky logic to think through. If Sonnet 4.5 gets stuck I like to have it write a prompt for Codex. But Codex is not a good daily driver.
As usual with people describing their AI workflows, I’m amazed how complicated and hand-holding their whole process is. Sounds like you’re spending the time you would otherwise spend on the task to struggle with ai tools.
That's a great point, the reality is that context, at least from personal experience, is brittle and over time will start to lose precision. This is a always there, persistent way for claude to access "memories". I've been running with it for about a week now and did not feel that the context would get bloated.
Yes, exactly this. But idiot VC funding (which YC is also somewhat engaged in I imagine) cries for MCP. Hence multi billion valuations and many million dollar salaries and bonuses being thrown around.
It's ridiculous and ties into the overall state of the world tbh. Pretty much given up hoping that we'll become an enlightened species.
So let's enjoy our stupid MCP and stupid disposable plastic because I don't see any way that we aren't gonna cook ourselves to extinction on this planet. :)
While I totally agree with you, I also can see a world where we just throw a ton of calls in the MCP and then wrap it in a subagent that has a short description listing every verb it has access to.
Absolutely. Remember these are just tools, how each one of us uses them it's a diffrent story. A lot can be leveraged as well by adding a couple of lines to CLAUDE.md on how he should use this memory solution, or not, it's totally up to anyone.
You can also have a subagent that is responsible for project management that is in charge of managing memory or having a coordinator. Again a lot of testing needs to be done :)
Memory features are useful for the same reason that a human would use a database instead of a large .md file: it's more efficient to query for something and get exactly what you want than it is to read through a large, ultimately less structured document.
That said, Claude now has a native memory feature as of the 2.0 release recently: https://docs.claude.com/en/docs/claude-code/memory so the parent's tool may be too late, unless it offers some kind of advantage over that. I don't know how to make that comparison, personally.
The other point here, I wanted something more in line with LLMs natural language, something that can be queried more effeciently buy just using normal language, almost like the way we think normally, we first have a though and then we go through our memory archive.
So hilariously, I hadn't actually read those docs yet, I just knew they added the feature. It seems like the docs may not be up to date, as when I read them in response to your reply here, I was like "wait, I thought it was more sophisticated than that!"
It's still ultimately file-based, but it can create non-Claude.md files in a directory it treats more specially. So it's less sophisticated than I expected, but more sophisticated than the previous "add this to claude.md" feature they've had for a while.
Thanks for the nudge to take the time to actually dig into the details :)
Okay so, now that I've had time after work to play with it... it doesn't work like in the video! The video shows /memories, but it's /memory, and when I run the command, it seems to be listing out the various CLAUDE.md files, and just gives you a convenient way to edit them.
I wonder if the feature got cut for scope, if I'm not in some sort of beta of a better feature, or what.
If you look at the changelog[0] for 2.0, it doesn't mention any memory features. I also find it strange that they released this as 2.0 without any new actual Claude Code features other than /rewind, which I'm not sure what is for, since we already have version control.
This is very much in development and I keep adding features to it. Any suggestions let me know.
The way I use it, I add instructions to CLAUDE.md on how I want him to use recall, and when.
## Using Recall Memory Efficiently
*IMPORTANT: Be selective with memory storage to avoid context bloat.*
### When to Store Memories
- Store HIGH-LEVEL decisions, not implementation details
- Store PROJECT PREFERENCES (coding style, architecture patterns, tech stack)
- Store CRITICAL CONSTRAINTS (API limits, business rules, security requirements)
- Store LEARNED PATTERNS from bugs/solutions
### When NOT to Store
- Don't store code snippets (put those in files)
- Don't store obvious facts or general knowledge
- Don't store temporary context (only current session needs)
- Don't duplicate what's already in documentation
### Memory Best Practices
- Keep memories CONCISE (1-2 sentences ideal)
- Use TAGS for easy filtering
- Mark truly critical things with importance 8-10
- Let old, less relevant memories decay naturally
### Examples
GOOD: "API rate limit is 1000 req/min, prefer caching for frequently accessed data"
BAD: "Here's the entire implementation of our caching layer: [50 lines of code]"
GOOD: "Team prefers Tailwind CSS over styled-components for consistency"
BAD: "Tailwind is a utility-first CSS framework that..."
*Remember: Recall is for HIGH-SIGNAL context, not a code repository.*
1. Claude Desktop's built-in `/memory` command (what you tried) - just lists CLAUDE.md files
2. Recall MCP server (this project) - completely separate tool you need to install
Recall doesn't work through slash commands. It's an MCP server that needs setup:
1. Install: npm install -g @joseairosa/recall
2. Add to claude_desktop_config.json
3. Restart Claude Desktop
4. Then Claude can use memory tools automatically in conversation
Quick test after setup: "Remember: I prefer TypeScript" - Claude will store it in Redis.
I still do, but having this allows for strategies like memory decay for older information. It also allows for much more structured searching capabilities, instead of opening file which are less structured.
.md files work great for small projects. But they hit limits:
1. Size - 100KB context.md won't fit in the window
2. No search - Claude reads the whole file every time
3. Manual - You decide what to save, not Claude
4. Static - Doesn't evolve or learn
Recall fixes this:
- Semantic search finds relevant memories only
- Auto-captures context during conversations
- Handles 10k+ memories, retrieves top 5
- Works across multiple projects
Real example: I have 2000 memories. That's 200KB in .md form. Recall retrieves 5 relevant ones = 2KB.
And of course, there's always the option to use both .md for docs, Recall for dynamic learning.
I'm not sure. You don't use a single context.md file, you use multiple and add them when relevant in context. AIs adjust these as you need, so they do "evolve". So what you try to achieve is already solved.
These two videos on using Claude well explain what I mean:
Yeah that's a solid workflow and honestly simpler than what I built - I think Recall makes sense when you hit the scale where managing multiple .md files becomes tedious (like 50+ conversations across 10 projects), but you're right that for most people your approach works great and is way less complex.
Can't you get recency just from git blame? Editors already show you each source line's last-touch age, even in READMEs, and even though this can get obfuscated (by reformatters, file moves, etc.) it's still a decent indicator.
I built a memory tool about 6 months while playing with MCP, it was based on a SQLite db. My experience then was that Claude wasn't very good at using the tools. Even with instructions to be proactive about searching memory and saving new memories it would rarely do so. Once you did press it to be sure to save memories it would go overboard, basically saving every message in the conversation as a memory. Are seeing more success in getting natural and seamless usage of the memory tools?
IIRC at the time I was testing with Sonnet 3.7, I haven't tried it on the newer models.
It is really weird how some sessions with claude are better than others despite similar tasks. I'm certain it's not sleep deprivation or something else. Sometimes it gets on a hot streak by accidentally discovering the right tools to use. It's like an unstable solder joint or something. It's very difficult to guide it. When you do it overfits hard.
Yeah this is what I do, you want the knowledge in md files , but currently you don't want to stuff up the context with everything you know every time. I may be wrong here but my impression is the way that "context" is special and very limited in size vs "things the LLM is trained on" is still an unsolved problem getting AI to act like an "assistant" , AFAICT.
A great hack/shortcut for solving this "memory" problem is to have a rolling RAG KB. You don't fill up the context, and you can use a re-ranking model to further improve accuracy.
Aside from all that, using npm for distribution makes this a total non-starter for me.
I think everyone concluded at this point that we need to improve models memory capabilities, but different people take different approach.
My experience is that ChatGPT can engage in a very thoughtful conversations but if I ask for a summary it makes something very generic, useful to an outsider, but it does not catch salient points which were the most important outcomes.
The memory feature I'd like to have would need built-in support from anthropic
It'd be essentially
1. Language server support for lookups & keeping track of the code
2. Being able to "pin" memories to functions, classes, properties etc via the language server support/providing this context whenever changes are made in this function/class/properties etc, but not kept, so all following changes outside of that will no longer include this context (basically, changes that touch code with which memories will be done by agents with additional context, and only the results are synced back, not the way to achieve it)
3. Provide a ide integration for this context so you can easily keep track of what's available just by moving the cursor to the point the memory is pinned at
I'm surprised Anthropic doesn't offer something like this server-side, with an API to control it. Seems like it'd be a lot more efficient than having client manually reworking the context and uploading the whole thing.
Imagine having 20 years of context / memories and relying on them. Wouldn't you want to own that? I can't imagine pay-per-query for my real memories and I think that allowing that for AI assisted memory is a mistake. A person's lifetime context will be irreplaceable if high quality interfaces / tools let us find and load context from any conversation / session we've ever had with an LLM.
On the flip side of that, something like a software project should own the context of every conversation / session used during development, right? Ideally, both parties get a copy of the context. I get a copy for my personal "lifetime context" and the project or business gets a copy for the project. However, I can't imagine businesses agreeing to that.
If LLMs become a useful tool for assisting memory recall there's going to be fighting over who owns the context / memories and I worry that normal people will lose out to businesses. Imagine changing jobs and they wipe a bunch of your memory before you leave.
We may even see LLM context ownership rules in employment agreements. It'll be the future version of a non-compete.
Whoever is paying for it? If you've got personal stuff you'd keep it in your own account (or maintain it independently), separate from your work account.
Imho you would have an easier sell if you separate knowledge into tiers: 1)overall design 2) coding standards 3) reasoning that lead to design 4) components and their individual structure 5) your current issue 6) etc
Your project becomes progressively more valuable the further you go down the list. The overall design should be documented and curated to onboard new hires. Documenting current issues is a waste of time compared to capturing live discussion, so Recall is super useful here.
> This will reduce token size, performance & operational costs.
How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.
That is what I am looking for. a) LLMs are trained using compressed text tokens and b) use compressed prompts. Don't know how..but that is what I was hoping for.
The whole point of embeddings and tokens are that they are a compressed version of text, a lower dimensionality. now, how low depends on performance, lower amount of vectors=more lossy (usually). https://huggingface.co/spaces/mteb/leaderboard
You can train your own with very very compressed, i mean you could even go down to each token=just 2 float numbers. It will train, but it will be terrible, because it can essentially only capture distance.
Prompting a good LLM to summarize the context is probably funnily enough the best way of actually "compressing" context
The problem is you need to tell prompt Claude to "Store" or "Remember", if you don't it will never call the MCP server. Ideally, Claude would have some mechanism to store memories without any explicit prompting but I don't think that's currently possible today.
I've been experimenting with that in the last couple of days. I added to CLAUDE.md a directive on how and when to use recall and he's autoamtically calling the tool for store and fetch
imo it would be better to carry the whole memory outside of the inference time where you could use an LLM as a judge to track the output of the chat and the prompts submitted
it would sort of work like grammarly itself and you can use it to metaprompt
i find all the memory tooling, even native ones on claude and chatgpt to be too intrusive
I've been building exactly this. Currently a beta feature in my existing product. Can I reach out to you for your feedback on metaprompting/grammarly aspect of it?
Totally get what you're saying! Having Claude manually call memory tools mid-conversation does feel intrusive, I agree with that, especially since you need to keep saying Yes to the tool access.
Your approach is actually really interesting, like a background process watching the conversation and deciding what's worth remembering. More passive, less in-your-face.
I thought about this too. The tradeoff I made:
Your approach (judge/watcher):
- Pro: Zero interruption to conversation flow
- Pro: Can use cheaper model for the judge
- Con: Claude doesn't know what's in memory when responding
- Con: Memory happens after the fact
Tool-based (current Recall):
- Pro: Claude actively uses memory while thinking
- Pro: Can retrieve relevant context mid-response
- Con: Yeah, it's intrusive sometimes
Honestly both have merit. You could even do both, background judge for auto-capture, tools when Claude needs to look something up.
The Grammarly analogy is spot on. Passive monitoring vs active participation.
Have you built something with the judge pattern? I'd be curious how well it works for deciding what's memorable vs noise.
Maybe Recall needs a "passive mode" option where it just watches and suggests memories instead of Claude actively storing them. That's a cool idea.
OpenCog differentiates between Experiential and Episodic memory; and various processes rewrite a hypergraph stored in RAM in AtomSpace. I don't remember how the STM/LTM limit is handled in OpenCog.
So the MRU/MFU knapsack problem and more predictable primacy/recency bias because context length limits and context compaction?
> Economic Attention Allocation (ECAN) was an OpenCog subsystem intended to control attentional focus during reasoning. The idea was to allocate attention as a scarce resource (thus, "economic") which would then be used to "fund" some specific train of thought. This system is no longer maintained; it is one of the OpenCog Fossils.
(Smart contracts require funds to execute (redundantly and with consensus), and there there are scarce resources).
Now there's ProxyNode and there are StorageNode implementations, but Agent is not yet reimplemented in OpenCog?
Memory is hard! I'm very curious how the version history approach is working for you?
Have you considered an age when retrieving? Is model supposed to manage the version history on its own?
Is the semantic search used to help with that?
Yeah it still uses context but way more efficiently, instead of injecting a 50KB context.md every time, Recall searches 10k memories and only injects the top 5 relevant ones (maybe 2KB), so you can store way more total knowledge.
Yeah people do that but it doesn't scale, after a while your "restart prompt" is 50KB and won't fit, plus you're stuck copying stuff manually instead of just asking "what did we say about Redis" and getting the relevant bits automatically.
I think that's a great point. I will experiment with different approaches. I started with redis mostly because it's something I have experience with and was a quick setup win, but having different strategies I think it could make sense.
Absolutely! But this is not a replacement of those files, this is a different (better?) way to navigate through those learnings instead of having to read whole files.
I've been using it for a while now, personally. I've found that I have less issues with context, I can easily recall (pun intended) after a context compact, etc.
That's a great point! And also works really well for shared context between claude instances, for example, we use that for our business model in the company, all business rules and model is stored as memories in a central redis that the mcp connects to. The way that memories are stored are specific to a folder or global (similar to CLAUDE.md home directiory), but with this approach you can have an external redis where multiple claudes read and write into as a shared almost hive like memory.
I'm not seeing how this is any different than a standard vector database MCP tool. It's not like Claude is going to know about any of the things you told it to "remember" unless you explicitly tell it to use its memory tool like shown in the demo, to remember something you've stored.
Yep, me too. I've taken the reference memory mcp that anthropic release and bolted on pgsql, but with a bunch of other features that are specific to the app I'm building. Like user segmentation/isolation with RLS (app is multiuser) and some other entity relationship tracking things.
No particular specific reason. I was working with another project that also had redis and decided to start with it. It can be changed to other tool, which one would you recommend?
Case in point; I'm mostly a Claude user, which has decent background process / BashOutput support to get a long-running process's stdout.
I was using codex just now, and its processes support is ass.
So I asked it, give me 5 options using cli tools to implement process support. After 3 min back and forth, I got this: https://github.com/offline-ant/shellagent-tools/blob/main/ba...
Add single line in AGENTS.md.
> the `background` tool allows running programs in the background. Calling `background` outputs the help.
Now I can go "background ./server; try thing. investigate" and it has access to the stdout.
Stop pre-trashing your context with MCPs people.
If you do like rolling your own MCP servers, I've had great success recently refactoring a bunch of my own to consume fewer tokens by, instead of creating many different tools, consolidating the tools and passing through different arguments.
e.g. as I am dropped into a new codebase:
1. Ask Claude to find the section of code that controls X
2. Take a look manually
3. Ask it to explain the chain of events
4. Ask it to implement change Y, in order to modify X to do behavior we want
5. Ask it about any implementation details you don't understand, or want clarification on -- it usually self-edits well.
6. You can ask it to add comments, tests, etc., at this point, and it should run tests to confirm everything works as expected.
7. Manually step through tests, then code, to sanity check (it can easily have errors in both).
8. Review its diff to satisfaction.
9. Ask it to review its own diff as if it was a senior engineer.
This is the method I've been using, as I onboard onto week 1 in a new codebase. If the codebase is massive, and READMEs are weak, AI copilot tools can cut down overall PR time by 2-3x.
I imagine overall performance dips after developer familiarity increases. From my observation, it's especially great for automating code-finding and logic tracing, which often involves a bunch of context-switching and open windows--human developers often struggle with this more than LLMs. Also great for creating scaffolding/project structure. Overall weak at debugging complex issues, less-documented public API logic, often has junior level failures.
I use Sonnet 4.5 exclusively for this right now. Codex is great if you have some kind of high-context tricky logic to think through. If Sonnet 4.5 gets stuck I like to have it write a prompt for Codex. But Codex is not a good daily driver.
It's ridiculous and ties into the overall state of the world tbh. Pretty much given up hoping that we'll become an enlightened species.
So let's enjoy our stupid MCP and stupid disposable plastic because I don't see any way that we aren't gonna cook ourselves to extinction on this planet. :)
That said, Claude now has a native memory feature as of the 2.0 release recently: https://docs.claude.com/en/docs/claude-code/memory so the parent's tool may be too late, unless it offers some kind of advantage over that. I don't know how to make that comparison, personally.
The answer seems to be both yes and no: see their announcement on youtube yesterday: https://www.youtube.com/watch?v=Yct0MvNtdfU&t=181s
It's still ultimately file-based, but it can create non-Claude.md files in a directory it treats more specially. So it's less sophisticated than I expected, but more sophisticated than the previous "add this to claude.md" feature they've had for a while.
Thanks for the nudge to take the time to actually dig into the details :)
I wonder if the feature got cut for scope, if I'm not in some sort of beta of a better feature, or what.
How disappointing!
[0]: https://github.com/anthropics/claude-code/blob/main/CHANGELO...
This is very much in development and I keep adding features to it. Any suggestions let me know.
The way I use it, I add instructions to CLAUDE.md on how I want him to use recall, and when.
## Using Recall Memory Efficiently
*IMPORTANT: Be selective with memory storage to avoid context bloat.*
### When to Store Memories - Store HIGH-LEVEL decisions, not implementation details - Store PROJECT PREFERENCES (coding style, architecture patterns, tech stack) - Store CRITICAL CONSTRAINTS (API limits, business rules, security requirements) - Store LEARNED PATTERNS from bugs/solutions
### When NOT to Store - Don't store code snippets (put those in files) - Don't store obvious facts or general knowledge - Don't store temporary context (only current session needs) - Don't duplicate what's already in documentation
### Memory Best Practices - Keep memories CONCISE (1-2 sentences ideal) - Use TAGS for easy filtering - Mark truly critical things with importance 8-10 - Let old, less relevant memories decay naturally
### Examples GOOD: "API rate limit is 1000 req/min, prefer caching for frequently accessed data" BAD: "Here's the entire implementation of our caching layer: [50 lines of code]"
GOOD: "Team prefers Tailwind CSS over styled-components for consistency" BAD: "Tailwind is a utility-first CSS framework that..."
*Remember: Recall is for HIGH-SIGNAL context, not a code repository.*
1. Claude Desktop's built-in `/memory` command (what you tried) - just lists CLAUDE.md files 2. Recall MCP server (this project) - completely separate tool you need to install
Recall doesn't work through slash commands. It's an MCP server that needs setup:
1. Install: npm install -g @joseairosa/recall 2. Add to claude_desktop_config.json 3. Restart Claude Desktop 4. Then Claude can use memory tools automatically in conversation
Quick test after setup: "Remember: I prefer TypeScript" - Claude will store it in Redis.
.md files work great for small projects. But they hit limits:
1. Size - 100KB context.md won't fit in the window 2. No search - Claude reads the whole file every time 3. Manual - You decide what to save, not Claude 4. Static - Doesn't evolve or learn
Recall fixes this: - Semantic search finds relevant memories only - Auto-captures context during conversations - Handles 10k+ memories, retrieves top 5 - Works across multiple projects
Real example: I have 2000 memories. That's 200KB in .md form. Recall retrieves 5 relevant ones = 2KB.
And of course, there's always the option to use both .md for docs, Recall for dynamic learning.
Does that help?
These two videos on using Claude well explain what I mean:
1. Claude Code best practices: https://youtu.be/gv0WHhKelSE
2. Claude Code with Playwright MCP and subagents: https://youtu.be/xOO8Wt_i72s
My God, there’s no signal. It’s all noise.
Often memory works too well and crowds out new things, so how are you balancing that?
IIRC at the time I was testing with Sonnet 3.7, I haven't tried it on the newer models.
Repo here: https://github.com/mbcrawfo/KnowledgeBaseServer
Then it can reference those tutorials for specific things.
Interested in giving this a shot but it feels like a lot of infrastructure.
Aside from all that, using npm for distribution makes this a total non-starter for me.
My experience is that ChatGPT can engage in a very thoughtful conversations but if I ask for a summary it makes something very generic, useful to an outsider, but it does not catch salient points which were the most important outcomes.
Did you notice the same problem?
It'd be essentially
1. Language server support for lookups & keeping track of the code
2. Being able to "pin" memories to functions, classes, properties etc via the language server support/providing this context whenever changes are made in this function/class/properties etc, but not kept, so all following changes outside of that will no longer include this context (basically, changes that touch code with which memories will be done by agents with additional context, and only the results are synced back, not the way to achieve it)
3. Provide a ide integration for this context so you can easily keep track of what's available just by moving the cursor to the point the memory is pinned at
Sadly impossible to achieve via MCP.
Imagine having 20 years of context / memories and relying on them. Wouldn't you want to own that? I can't imagine pay-per-query for my real memories and I think that allowing that for AI assisted memory is a mistake. A person's lifetime context will be irreplaceable if high quality interfaces / tools let us find and load context from any conversation / session we've ever had with an LLM.
On the flip side of that, something like a software project should own the context of every conversation / session used during development, right? Ideally, both parties get a copy of the context. I get a copy for my personal "lifetime context" and the project or business gets a copy for the project. However, I can't imagine businesses agreeing to that.
If LLMs become a useful tool for assisting memory recall there's going to be fighting over who owns the context / memories and I worry that normal people will lose out to businesses. Imagine changing jobs and they wipe a bunch of your memory before you leave.
We may even see LLM context ownership rules in employment agreements. It'll be the future version of a non-compete.
Your project becomes progressively more valuable the further you go down the list. The overall design should be documented and curated to onboard new hires. Documenting current issues is a waste of time compared to capturing live discussion, so Recall is super useful here.
Using the VS Code extension you get dynamic context management which works really well.
They also have a memory system built using reflexion (someone please correct me if I'm wrong) so proper evals are derived from lessons before storing.
How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.
You can train your own with very very compressed, i mean you could even go down to each token=just 2 float numbers. It will train, but it will be terrible, because it can essentially only capture distance.
Prompting a good LLM to summarize the context is probably funnily enough the best way of actually "compressing" context
it would sort of work like grammarly itself and you can use it to metaprompt
i find all the memory tooling, even native ones on claude and chatgpt to be too intrusive
Your approach is actually really interesting, like a background process watching the conversation and deciding what's worth remembering. More passive, less in-your-face.
I thought about this too. The tradeoff I made:
Your approach (judge/watcher): - Pro: Zero interruption to conversation flow - Pro: Can use cheaper model for the judge - Con: Claude doesn't know what's in memory when responding - Con: Memory happens after the fact
Tool-based (current Recall): - Pro: Claude actively uses memory while thinking - Pro: Can retrieve relevant context mid-response - Con: Yeah, it's intrusive sometimes
Honestly both have merit. You could even do both, background judge for auto-capture, tools when Claude needs to look something up.
The Grammarly analogy is spot on. Passive monitoring vs active participation.
Have you built something with the judge pattern? I'd be curious how well it works for deciding what's memorable vs noise.
Maybe Recall needs a "passive mode" option where it just watches and suggests memories instead of Claude actively storing them. That's a cool idea.
jj autocommits when the working copy changes, and you can manually stage against @-: https://news.ycombinator.com/item?id=44644820
OpenCog differentiates between Experiential and Episodic memory; and various processes rewrite a hypergraph stored in RAM in AtomSpace. I don't remember how the STM/LTM limit is handled in OpenCog.
So the MRU/MFU knapsack problem and more predictable primacy/recency bias because context length limits and context compaction?
> Economic Attention Allocation (ECAN) was an OpenCog subsystem intended to control attentional focus during reasoning. The idea was to allocate attention as a scarce resource (thus, "economic") which would then be used to "fund" some specific train of thought. This system is no longer maintained; it is one of the OpenCog Fossils.
(Smart contracts require funds to execute (redundantly and with consensus), and there there are scarce resources).
Now there's ProxyNode and there are StorageNode implementations, but Agent is not yet reimplemented in OpenCog?
ProxyNode implementers: ReadThruProxy, WriteThruProxy, SequentialReadProxy, ReadWriteProxy, CachingProxy
StorageNode > Implementations: https://wiki.opencog.org/w/StorageNode#Implementations
Recall just uses basic Redis commands - HSET, SADD, ZADD, etc. Nothing fancy.
Valkey is Redis-compatible so all those commands work the same.
I haven't tested it personally but there's no reason it wouldn't work. The Redis client library (ioredis) should connect to Valkey without issues.
If you try it and hit any problems let me know! Would be good to officially support it.
AI can already form the query DSL quite nicely especially if it knows the indexes.
I set up AI powered search this way, and it works really well with any open ended questions.
how much better was this to justify all that extra complexity?