bias embedding Attention-UNet
bias based HuggingFace implementation for recurrent llm.
- Input
- 6741-dim embedding
- Encoder
- 58 x Attention-UNet with 12 heads
- Output
- rouge-l projection
Training config
optimizer=LARS, lr=0.507, scheduler=cosine, warmup=1455