Skip to content

Commit 4fd0d79

Browse files
authored
add source for alibi
1 parent 47a45c8 commit 4fd0d79

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

_posts/2024-08-07-flexattention.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ Note that unlike typical implementations, this does *not* need to materialize a
120120
### ALiBi Bias
121121

122122
![alibi bias](/assets/images/flexattention/fg6.png){:style="max-width:600px; display:block; margin-left: auto; margin-right: auto; width:100%"}
123+
<p style="text-align: center;"><em>Source: <a href="https://arxiv.org/abs/2108.12409">Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation</a></em></p>
123124

124125
ALiBi was introduced in [Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation](https://arxiv.org/abs/2108.12409), and claims to have beneficial properties for length extrapolation at inference. Notably, MosaicML has pointed to [“lack of kernel support”](https://twitter.com/jefrankle/status/1804567458092605736) as the main reason why they eventually switched from ALiBi to rotary embeddings.
125126

0 commit comments

Comments
 (0)