In a separate post, Behrouz claimed that based on internal testing on the BABILong benchmark (needle-in-a-haystack approach), ...