$A emoticon that shrugs (¯\_(ツ)_/¯).$

May 27, 2026

Teaching AI When to Stay Silent

Have you ever been in a group chat where someone says “lol” and an AI bot immediately responds with a full paragraph? Or sent a quick “ok” to a friend and watched the assistant generate a thoughtful essay in reply?

This is the problem that inspired Shrug. AI chatbots always respond. It doesn’t matter if you sent a meme, a one-word acknowledgement, or a message clearly not directed at them. They reply anyway, because that’s what they’re trained to do: generate text. No one taught them the more fundamental skill of deciding whether to speak at all.

A Decision Gate, Not a Generator

Shrug is a reply-decision model. It doesn’t decide what to say — it decides whether to reply at all.

The idea is simple: instead of feeding every message directly into a language model and asking it to generate a response, you first run it through Shrug. Shrug looks at the conversation context and outputs a single number — the probability that a reply is appropriate. If is low, the bot simply stays silent.

This sounds obvious, but most chatbots don’t do this. They treat every incoming message as a prompt that demands an answer. Shrug introduces a filter, a moment of hesitation between receiving a message and deciding to speak.

Humans Are Not Threshold Machines

The most interesting design decision in Shrug is how it uses that probability. You might expect a simple rule: reply if , stay silent otherwise. But that’s not how humans work.

Think about your own behavior in conversations. Sometimes you see a message at and still decide not to reply. Other times something at catches your attention and you interject anyway. Humans are probabilistic, not deterministic.

Shrug embraces this. The model outputs only the probability . It’s up to the client — your application code — to draw a random number and decide: if , reply. This means:

At , there’s still a chance of silence
At , there’s still a chance of an interjection

This naturally introduces the imperfections that make human conversation feel real: hesitation, omission, and impulse.

Reply Probability: 0.50

Each dot represents one decision. Green = replied, Gray = silent. Click "Regenerate" to see new random outcomes.

How It Works

Under the hood, Shrug is a binary classifier built on top of a language model. I use Qwen3-8B with LoRA adapters for efficient fine-tuning, which means only a small fraction of the model’s parameters are updated during training.

The training data is remarkably simple: a chronological stream of messages from my own conversations, tagged as either <me> or <others>. A sliding window moves over this stream and asks one question: given these last N messages, did I send the next one? If yes, the label is 1. If no, the label is 0.

What’s interesting is that consecutive messages from me are valid positives. The task isn’t turn-taking detection (“is it my turn?”); it’s reply-decision modeling (“would I have replied to this?”). Even if I just sent three messages in a row, the model learns that I might send a fourth.

All messages are considered read. Silence is treated as a deliberate decision, not as “didn’t see it.” This is an important philosophical choice: in human conversation, not replying is itself a form of communication.

If you want to see the exact architecture, data pipeline, and training code, it’s all in the Shrug repository.

From 2 Hours to 38 Minutes

My first training run took 1 hour and 59 minutes. For a model that just outputs a single number, this felt unexpectedly long. But after iterating on the training configuration, I got it down to 38 minutes.

The biggest gains came from reducing the context window: from 36 messages down to 16. Looking at my own message history, most reply decisions don’t require that much context. Smaller windows mean fewer tokens per sample, which directly translates to faster training.

I also cut the training from 3 epochs to 2. The model converged quickly, so that third epoch was mostly redundant. Other changes — a lower learning rate with a constant-with-warmup scheduler, reduced warmup ratio, and larger LoRA rank and alpha — helped the model learn more efficiently per epoch rather than simply training longer.

A subtler but important addition was dynamic class weighting. In conversation data, “not replying” is far more common than “replying.” By weighting the loss function inversely to class frequency, the model learned faster from the minority class instead of defaulting to “always silent.”

Here’s what the training looked like before and after:

Original 3-epoch training run taking 1 hour 59 minutes — Original 3-epoch training. Loss hovered around 4–6 with unstable gradients (peak 664.2). The third epoch showed diminishing returns.

Optimized 2-epoch training run taking 38 minutes — Optimized 2-epoch training. Loss dropped to ~0.5, gradient spikes tamed to 60.9, and the confusion matrix shows strong performance (816 true negatives, 486 true positives).

Where This Could Go

Shrug is currently a personal experiment, but the concept has broader implications. Any chatbot living in a group chat or handling DMs could benefit from a reply-decision layer. Imagine:

A Discord bot that knows when a conversation doesn’t involve it
A customer support AI that stays quiet when users are clearly talking to each other
A personal assistant that doesn’t interrupt your flow with unnecessary confirmations

The ultimate goal is making AI feel less robotic by giving it the ability to choose silence. Real conversation isn’t just about what you say — it’s about when you choose not to say anything at all.

If this idea resonates with you, the code is open source at MrWillCom/Shrug.

Why “Shrug”?

The name comes from that moment of hesitation before deciding whether to reply. You read a message, shrug to yourself, and make a call. Sometimes you respond. Sometimes you don’t. The shrug is the decision.

¯\_(ツ)_/¯

About this Post

This post is written by Mr. Will, licensed under CC BY-NC 4.0.

#ai #ml

This post is written by Kimi K2.6 under human guidance and review.