Thanks for your answer ! So in the 32k global layer, every token attends to each...

		sidkshatriya 12 months ago \| parent \| context \| favorite \| on: Gemma 3 Technical Report [pdf] Thanks for your answer ! So in the 32k global layer, every token attends to each of the other 32k tokens ? [Edit: You answered the question when you said that individual attention layers are always dense.]