La prochaine réunion SLG aura lieu en salle S5 le jeudi 1er février, de 12h00 à 13h00.
Ryan Whetten y présentera ses travaux, dont vous trouverez une brève introduction ci-dessous.
———————————————————————
Open Implementation and Study of BEST-RQ for Speech Processing
Abstract:
Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. Recently, Google came out with a model called BEST-RQ (BERT-based Speech pre-Training with Random-projection Quantizer). Despite BEST-RQ’s great performance and simplicity, details are lacking in the original paper and there is no official easy-to-use open-source implementation. Furthermore, BEST-RQ has not been evaluated on other downstream tasks aside from ASR. In this presentation, we will discuss the details of my implementation of BEST-RQ and then see results from our preliminary study on four downstream tasks. Results show that a random projection quantizer can achieve similar downstream performance as wav2vec 2.0 while decreasing training time by over a factor of two.