Indeed, automatic speaker recognition systems are vulnerable not only to artificially produced speech via voice synthesis but also to other forms of attacks such as voice identity conversion and replay. The artifacts created during the creation or manipulation of these fraudulent attacks constitute the marks left in the signal by voice synthesis algorithms, enabling the distinction between the original real voice and a usurped voice. Self-Supervised Learning (SSL) has recently emerged as an incredibly promising artificial intelligence (AI) method. This method allows colossal amounts of unannotated data to be used by AI systems to surpass previously known performance levels. Particularly, the field of automatic speech processing (TAP) is rapidly transformed by the arrival of SSL, thanks in part to massive industrial investments and the explosion of data, both provided by a handful of companies. The performance gains are impressive, but the complexity of SSL models requires researchers and industry professionals in the sector to have an extraordinary computing capacity, drastically reducing access to fundamental research on this topic as well as its deployment in everyday products. For instance, a significant portion of work using an SSL model for TAP relies on a system maintained and provided by a single company (wav2vec 2.0). The entire lifecycle of the technology, from its theoretical foundations to its practical deployment, including the analysis of societal aspects, depends solely on institutions with the physical and financial means to support the intensity of this technique’s development. The E-SSL project aims to restore control over self-supervised learning to the scientific community and industry professionals in TAP to ensure its evolution and equal deployment by facilitating both academic research and its transfer to industry. In practice, E-SSL holistically integrates three key problems of self-supervised learning for TAP, including its effective computational efficiency, societal impact, and the feasibility of its extension to future products, allowing the distinction between the original real voice and a usurped voice.