What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
			Paper
			•
			2404.07129
			•
			Published
				
			•
				
				3
			
Modifying activations during training with proper gradient flow