The model is pretrained on a mixture of publicly available datasets, achieving superior zero-shot performance on various evaluation benchmarks of multi-modal comprehension and generation. It can be ...
Some results have been hidden because they may be inaccessible to you