sanskxr02
/

temporal-self-attention

temporal-reasoning

Model card Files Files and versions

sanskxr02 commited on Jul 9

Commit

dfdc6d9

·

verified ·

1 Parent(s): 7e4eab8

Create README.md

Files changed (1) hide show

README.md +68 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+license: mit
+language:
+- en
+tags:
+- attention
+- temporal-reasoning
+- time-series
+- inductive-bias
+- plug-and-play
+---
+# TemporalSelfAttention - A Time-Biased Attention Module
+> Give Transformers a sense of time - not by scaling, but by structure.
+---
+##  Why?
+Standard attention treats all tokens equally in time.
+This works for syntax, but breaks for:
+-  Temporal event ordering
+-  Causal reasoning
+-  Timeline consistency
+-  Long-range narrative coherence
+💡 Insight: These models *simulate* time via token position. We inject it *structurally*  with a tiny inductive bias.
+---
+##  Core Equation
+The time-aware attention score is computed as:
+$$
+\text{score}_{ij} = \frac{Q_i \cdot K_j^\top}{\sqrt{d_k}} + \gamma \cdot f(t_j - t_i)
+$$
+### Notation
+| Symbol          | Description |
+|-----------------|-------------|
+| \\( \text{score}_{ij} \\) | Attention score between query at position \\( i \\) and key at position \\( j \\) |
+| \\( Q_i \\)     | Query vector for position \\( i \\) |
+| \\( K_j \\)     | Key vector for position \\( j \\) |
+| \\( d_k \\)     | Dimension of key vectors |
+| \\( \gamma \\)  | Learnable time bias strength |
+| \\( f(\cdot) \\) | Time difference function |
+| \\( t_j - t_i \\) | Relative time difference |
+##  How To Use
+```python
+from temporal_attention import TemporalSelfAttention
+model = TemporalSelfAttention(
+    embed_dim=64,
+    num_heads=1,
+    bias_type="linear",  # or 'gaussian'
+    gamma=1.0,
+    causal=False
+)
+# x: (B, T, D), timestamps: (B, T)
+output, weights = model(x, timestamps)