TAS · Temporal Attractor Steering for Open-Weight LMs

The Problem

Right knowledge, wrong answer

Language models are trained on corpora spanning many years, so a model often encodes both an earlier fact and its later replacement — an older CEO and a newer CEO, an earlier head of state and a later one. Under standard prompting the model surfaces the outdated answer, but a simple temporal cue (“As of 2024, who is…”) reveals the newer fact is already recoverable from parametric memory. We call this Parametric Temporal Conflict (PTC): the right knowledge is in the network, but the default forward pass favors the older basin.

⊘

Not absence

A knowledge-recovery filter separates true PTC from cases where the model simply never learned the newer fact. Without it, raw PTC rates aggregate two failure modes with opposite implications.

↑

Scales with capacity

Filtered PTC rises monotonically with parameter count (0.041 → 0.103 across Qwen-1.5B, Qwen-7B, Mistral-7B, Llama-3.1-8B). Bigger models retain more outdated facts, not fewer.

◎

Localized in the network

A single-layer single-token activation patch at the conflict-critical layer flips the answer on 72–85% of verified PTC instances — direct evidence of a recoverable temporal circuit.

Method

How TAS works

TAS is a three-stage inference-time pipeline: detect a likely conflict with a lightweight probe, locate a conflict-critical layer once per model via causal activation patching, and steer the hidden state toward the newer-fact direction with a precomputed Δ. No retraining, no retrieval, no weight edits.

TAS architecture overview: Frozen LLM baseline, temporal conflict localization, and temporal steering

Figure 1. Overview of Parametric Temporal Conflict (PTC) and Temporal Attractor Steering (TAS). (A) Standard prompting retrieves an outdated answer from parametric memory. (B) Conflict localization compares standard and temporally cued prompts and identifies a conflict-critical layer ℓ* through activation patching. (C) Temporal steering shifts the hidden state toward the newer-fact representation without retrieval or retraining.

1

Detect

A linear probe trained on verified PTC positives scores the hidden state at ℓ*. If the calibrated score c ≤ τ, the forward pass is left untouched; otherwise the query is passed to the steering stage.

2

Locate

For each layer ℓ we measure the answer-flip rate induced by patching the temporal-prompt activation into the standard-prompt forward pass at the last prompt token. The conflict-critical layer is ℓ* = argmax_ℓ AFR(ℓ).

3

Steer

At ℓ* we apply h′ = h + α · c · Δ, where Δ averages the activation gap between temporal and standard prompts over verified PTC instances. A per-relation Δ (V2) dominates global and per-domain variants.

∎

Theoretical handles

Under Parametric Co-encoding, Local Separability, and a Bounded-Steering condition, there exists α* > 0 such that TAS provably flips the inequality defining PTC, while leaving non-conflict queries unchanged.

Results

Verified evaluation across four open-weight LMs

We evaluate on an 8,746-record verified benchmark mined from Wikidata, spanning five superseding-update relations (head of state, head of government, CEO, head coach, chairperson). All four models exhibit PTC on this benchmark, all four have a recoverable conflict-critical layer, and end-to-end TAS recovers a substantial fraction of PTC cases while preserving non-conflict behavior.

Model	Peak AFR	ℓ*	Recovery	PA (clean)	Δ vs ITI
Qwen-2.5-1.5B	0.723	23 / 28	0.392	0.987	+0.142
Qwen-2.5-7B	0.851	23 / 28	0.569	0.948	+0.152
Mistral-7B-v0.3	0.824	30 / 32	0.288	0.936	−0.056
Llama-3.1-8B	0.816	31 / 32	0.374	0.849	+0.050

Table 1. End-to-end TAS (V2 per-relation Δ) at each model's selected operating point. Peak AFR is single-layer single-token activation patching at ℓ*; Recovery is the fraction of PTC cases TAS flips to the newer fact; PA is preservation accuracy on matched non-conflict records; Δ vs ITI compares Recovery against a matched ITI baseline that shares ℓ* and α-sweep but uses a behavioural-contrast Δ. TAS leads on three of four models.

★

Capacity scaling

Filtered PTC rate scales monotonically with parameter count: 0.041 (Qwen-1.5B) → 0.071 → 0.085 → 0.103 (Llama-3.1-8B). The phenomenon is not a small-model artefact.

⤳

Verified-conflict Δ beats ITI

Building Δ from instances where the model demonstrably defaults to outdated and recovers under temporal elicitation gains +0.05 to +0.15 Recovery over ITI's population-level contrast on three of four models, at matched PA.

⊙

Conflict-gated, not always-on

The detector gate sacrifices only 0.005–0.048 Recovery vs. an oracle that steers every verified PTC case, while keeping non-conflict queries within 1% of clean argmax on the strongest model.

Quick Start

Run TAS in minutes

Load an open-weight base LM, point the pipeline at a PTC record, and apply the precomputed steering vector at the model's conflict-critical layer.

1

Clone & install

Pull the repo and install the inference-time dependencies. No fine-tuning, no retraining.

2

Load the PTC benchmark

Pull the 8,746-record verified dataset from HuggingFace and select a model from configs/models.yaml.

3

Steer at inference

Call tas.evaluate(record, model) — the detector, locator, and steering vector are loaded automatically per model family.

install.shBash

# Clone the repository
git clone https://github.com/eliashossain001/temporal-attractor-steering.git
cd temporal-attractor-steering
pip install -e .

# Pull the verified PTC benchmark
huggingface-cli download EliasHossain/ptc-benchmark \
  --repo-type dataset \
  --local-dir data/ptc-benchmark/

steer.pyPython

from tas import load_model, TASPipeline
from datasets import load_dataset

# Load any of the four evaluated open-weight base LMs
model, tokenizer = load_model("Qwen/Qwen2.5-7B")

# Pipeline auto-loads per-model l*, alpha*, tau*, and Delta
tas = TASPipeline.from_pretrained(
    model, tokenizer,
    variant="V2",   # per-relation Delta
    tau=0.15,
    alpha=2.0,
)

records = load_dataset("EliasHossain/ptc-benchmark", split="test")

for r in records:
    out = tas.evaluate(r)
    # out.steered in {True, False}; out.answer is the post-steering prediction
    print(r["query"], "=>", out.answer,
          "(steered)" if out.steered else "(passthrough)")

Benchmark

The PTC benchmark

8,746 verified records mined from Wikidata, spanning five superseding-update relations across politics, corporate, and sports domains. Each record carries a verified (a_old, a_new, t_update) triple with non-overlapping validity windows and both standard and temporally cued prompts.

Download the benchmarkBash

# Verified PTC benchmark (8,746 records, 5 Wikidata relations)
huggingface-cli download \
  EliasHossain/ptc-benchmark \
  --repo-type dataset \
  --local-dir data/ptc-benchmark/

⚠

License notice. The PTC benchmark is built from publicly available Wikidata superseding-fact triples. By using this dataset you agree to use it solely for research on parametric memory, temporal reasoning, and inference-time intervention, and to respect the dataset card on HuggingFace. See the dataset card for the full release terms.

Citation

Cite TAS

If you find TAS or the PTC benchmark useful in your research, please consider citing the paper.

@unpublished{hossain2026tas, title = {Right Knowledge, Wrong Answer: Test-Time Steering for Temporal Fact Conflicts in Open-Weight Language Models}, author = {Hossain, Elias and Saha, Sourav and Biswas, Umesh Chandra and Jennifer, Sanjeda Sara and Yousefi, Niloofar}, year = {2026}, note = {Preprint. Code: \url{https://github.com/eliashossain001/temporal-attractor-steering} Dataset: \url{https://huggingface.co/datasets/EliasHossain/ptc-benchmark}}, }