What’s In An (AI) Memory?

Digital Memory

Jun 23

How does an AI assistant remember you between sessions without storing history in the cloud?

It uses the same three steps every memory system uses: extract facts from your conversations, store them as vector embeddings in a database, then retrieve the relevant ones and inject them into the model before it answers. The only difference is location. A local system runs all three steps on your own hardware, so nothing leaves your network.

There is a small, specific moment that makes people look at their AI a little differently. You open a fresh chat, ask about something new, and the assistant already knows you prefer short answers, or it remembers the side project you mentioned three weeks ago. You never re-typed any of it. It just knew.

The first reaction is usually delight. The second, if you slow down, is quieter: where is that kept? Something about you is now stored somewhere, and you did not decide where. This post is about how that memory works, step by step, and the one variable that changes everything once you see the mechanism. The mechanism is not the interesting part. The location is.

What Does It Mean for an AI to Remember You?

Remembering you means three things under the hood. First, extraction: an LLM pulls out facts worth keeping, like "prefers concise answers" or "is building a Shopify store." Second, storage: each fact becomes a vector embedding in a database. Third, injection: at the start of a new session, the most relevant memories are retrieved and placed in the model's context before you type a word.

Each step sounds technical and is not. Extraction is note-taking: the model reads what was said and writes down the facts likely to matter later. These are atomic memory units, small discrete facts, not whole transcripts. Your entire chat is not the memory; a handful of distilled facts are. Storage is where the word "vector" shows up, and it is plainer than it sounds: each fact becomes an embedding, a numerical fingerprint of its meaning, filed in a vector database next to the original text. When the system wants what is relevant, it does not match keywords. It searches by meaning. Injection is the payoff: at the start of a session, that similarity search grabs the most relevant memories and slides them into the model's context window before you begin. The model knows things about you because it was just handed them.

Here is the part worth sitting with. This process, extract then store then retrieve and inject, is the same whether it runs on a company's servers or on a machine in your house. The architecture is not proprietary. The only real variable is whose hardware runs it, and who controls the store.

How ChatGPT's Dreaming V3 Remembers You (and Where That Lives)

On June 4, 2026, OpenAI launched Dreaming V3, a background process that reads across your entire conversation history and builds a persistent memory profile. That profile is not stored inside your chat log. It lives in a separate data layer on OpenAI's servers and is injected into the system prompt at inference time, before you type anything.

This is a real step up from the older approach. The previous "saved memories" system would explicitly tell you what it was holding onto. Dreaming V3 synthesizes implicitly, reading patterns across all your sessions without necessarily surfacing what it captured. It weighs three things: freshness (recent events count for more), continuity (how your preferences evolve), and relevance (the right memory for the moment).

The detail that matters most is where the memory sits. It is not buried in your conversation history. It is maintained as a separate data layer on OpenAI's servers, then injected into the model at inference. That has a consequence most people miss: deleting a conversation does not delete the memories extracted from it. Those persist in the memory layer and must be cleared manually through Settings. On free and Plus plans, the conversations themselves are stored indefinitely unless you delete them too.

You do get controls. You can view, edit, and delete saved memories. Temporary Chat keeps a conversation out of your history and out of training data, though it still lives on OpenAI's servers for up to 30 days before deletion. ChatGPT Enterprise offers a full opt-out from training-data use. The point is not that controls are missing. It is that the memory of you lives on infrastructure you do not operate.

What That Means for Your Privacy

A behavioral profile of you, built from your history, preferences, and stated life circumstances, is maintained on a third-party server and injected into every new conversation. You can view and edit a list of saved memories, but a synthesized profile is not the same as a simple editable list. This is not a claim of wrongdoing. It is a description of where the data sits.

A 2026 study presented at the ACM CHI Conference named the tension well, and the plain version is the one that lands: the feature you value most is also the one you cannot fully see or constrain. The editable list shows the facts the system chose to surface. But Dreaming V3 reads patterns across sessions, and those patterns may never appear in any readable form. You can edit the list. You cannot necessarily see the synthesis.

None of this implies OpenAI is doing something malicious. It is a description of what the architecture is: a profile of you, covering your work, your habits, your relationships, maintained on a server you do not run, injected into inference at scale, governed by a third party's terms of service.

The law has started to name this too. Under the EU's GDPR, AI systems that build persistent behavioral profiles are classified as profiling activities, which triggers consent obligations and the right to erasure. The EU AI Act's transparency obligations for chatbot systems take effect August 2, 2026, less than two months after Dreaming V3 shipped. The direction is clear: a profile of you, held somewhere else, is something you are owed visibility into.

How Local AI Memory Works Without the Cloud

Local memory uses the identical architecture, but every component runs on your machine. A local vector database such as Chroma, Qdrant, or pgvector stores the embeddings. Local embedding tools convert text to vectors on-device with no external API call. At session start, a similarity search retrieves the relevant memories and injects them into the model. Nothing is transmitted off your network.

Walk it through the same three steps and notice nothing is missing. The vector database runs on your own hardware and holds the text plus its embeddings. The embedding step runs on-device too: a tool like Mem0 with FastEmbed converts conversation text into vectors without sending anything out. And retrieval-augmented generation, usually shortened to RAG, does the injection, pulling the relevant memories from your local store into the context window before the model responds.

The 2026 state of the art goes further with hybrid memory, combining three kinds of recall: vector search for unstructured, intuition-style retrieval; graph traversal for structured facts and relationships; and episodic storage for temporal sequences, the "last Tuesday she mentioned the client deadline" kind of memory. Together they give a local system real depth. This is not a thought experiment. Mem0 has published a working guide for building exactly this on a local stack using Mem0, OpenClaw, and Ollama, the same kind of stack Companion Intelligence builds on. The local memory layer is not a promise; it is a documented, running pattern.

What Today's Local Tools Offer (the Honest Gap)

Most local tools were built for private chat, not persistent personal memory. AnythingLLM has built-in retrieval but keeps memory per workspace rather than unified. LM Studio starts every session blank unless you configure it. Jan.ai is fully offline but is not designed for cross-session memory. Mem0 solves the memory layer technically; Companion Memory solves it as a finished consumer product.

Tool	Cross-session memory	Architecture	Limitation
AnythingLLM	Yes, built-in RAG	30+ LLM providers, 9+ vector DB options	Memory is per-workspace, not unified across sessions
LM Studio + Big RAG	Manual setup required	JSON file storage plus a vector DB	No memory by default; every session starts blank
Jan.ai	Limited	Fully offline, no built-in cross-session memory	Best for session privacy, not persistent context
Mem0 (local mode)	Yes, full stack	FastEmbed on-device, Chroma/Qdrant/pgvector	Requires technical setup; not a consumer product
Companion Memory	Yes, full stack, consumer UI	Local vector storage, no cloud sync	Runs on Companion Core hardware

There is a gap worth stating plainly, because pretending it does not exist would be the wrong kind of pitch. The memory layer is a separate engineering problem from running a model locally, and not every tool has solved it. The deeper point is about quality. Without a continuous background synthesis process like Dreaming V3, local memory leans on explicit extraction: it stores what you tell it to remember, or what a simple extraction model catches in conversation. Right now, cloud memory is more sophisticated in synthesis depth. Local memory is architecturally equivalent in its core mechanism but usually asks for more intentional setup. Mem0's 2026 hybrid architecture narrows that gap meaningfully; it does not erase it out of the box.

The Real Question Is Where the Vector Store Lives

The memory question and the privacy question collapse into one: where does the vector store live? If it is on OpenAI's servers, that is their infrastructure, their governance, their terms of service. If it is on your hardware, the memory is equally capable (extraction, embedding, retrieval, injection) but none of it leaves your network. That is an architectural difference, not a philosophical one.

This is what Companion Memory is built to do. It implements the full memory architecture, extraction, embedding, vector storage, and retrieval-augmented injection, entirely on Companion Core hardware. Your preferences, your history, your context, your professional patterns live on a machine in your home or office. They are not transmitted, not synthesized on a remote server, not governed by a third party's terms of service. The specific technical answer to the question in the title is exactly this: a local vector database and RAG, not a remote synthesis server.

To keep the comparison honest, here is the whole trade-off in one view.

Dimension	Cloud memory (ChatGPT Dreaming V3)	Local memory (Companion Memory)
Memory depth	Deep, synthesizes across years of sessions	Solid, depends on configuration
Memory quality	Higher, background synthesis is state of the art	Good, RAG plus hybrid memory narrows the gap in 2026
Data location	OpenAI's servers	Your hardware
Data governance	OpenAI terms of service	You
Audit	Partial, view and edit listed memories	Full, you own the vector store
GDPR classification	Profiling activity, consent obligations apply	Not applicable, no third-party processing
Setup complexity	Zero, automatic	Moderate for manual tools, zero with Companion Memory
Privacy posture	You trust the provider's governance	Data never leaves your network

The conclusion is the honest one. Cloud memory is currently more sophisticated in synthesis depth. Local memory is architecturally equivalent in its core mechanism, and it asks for more intentional setup, unless you run a purpose-built product like Companion Memory, in which case the setup cost goes away and the data stays home. For more on how cloud convenience quietly compounds its costs, we wrote about the hidden costs of cloud convenience, and on whether you can do this yourself, running your own AI without the cloud covers the adoption side directly.

Once you see the mechanism clearly, the decision is not about whether an AI can remember you without the cloud. It plainly can. The decision is whose machine you want that memory to live on. If you would rather it live on yours, that is what a Companion Core, with Companion Memory underneath it, is for. If that is the version of this you have been waiting for, it is worth a look at the store.

Frequently Asked Questions

Does deleting a ChatGPT conversation delete what it remembered about me?

No. Deleting a conversation does not delete the memories that were extracted from it. Those are stored in a separate memory layer and must be cleared manually through Settings. On free and Plus plans, conversations themselves are also stored indefinitely unless you delete them.

Is local AI memory as good as ChatGPT's memory?

It is architecturally equivalent in mechanism. To be fair about it, cloud memory is currently more sophisticated in synthesis depth, because Dreaming V3 runs a continuous background synthesis across years of conversations. Local memory matches the core mechanism (extract, store, retrieve, inject) but typically depends on more explicit extraction, unless you use a purpose-built product.

What is a vector database, in plain terms?

A vector database stores each remembered fact as both its text and a numerical fingerprint of its meaning, called an embedding. When the AI needs context, it searches by similarity, finding the facts whose meaning is closest to your current question, rather than matching exact words.

Can an AI remember me if it runs completely offline?

Yes. Memory does not require the cloud. A local vector database and retrieval-augmented generation run the full extract, store, and retrieve cycle on your own hardware. An internet connection is needed only for initial model or tool downloads, not for the memory itself.

Does Companion Memory send anything to the cloud?

No. Companion Memory implements the full memory architecture (extraction, embedding, vector storage, and retrieval-augmented injection) entirely on Companion Core hardware. Your preferences, history, and context stay on a machine in your home or office and are not transmitted or synthesized on a remote server.

Works Cited

local AIai memorycompanion memorychatgptmemory-agentsprivacy

Lex Hartman