Learning Interpretable Word Embeddings via Bidirectional Alignment of Dimensions with Semantic Concepts

In collaboration with the Center for Information and Language Processing (CIS) of Ludwig Maximilian University of Munich (LMU) and ASELSAN Research Center, Koc Lab and Icon Lab of UMRAM have published the paper entitled “Learning Interpretable Word Embeddings via Bidirectional Alignment of Dimensions with Semantic Concepts” in Information Processing & Management.

The paper proposes the bidirectional imparting (BiImp) method which aligns both positive and negative dimensions of word embeddings with concepts in order to obtain interpretable word embeddings. Bidirectional nature of the BiImp increases the interpretability capacity of word embeddings by aligning more concepts with embedding dimensions.

Aligning concepts with both positive and negative dimensions of the word embeddings comes with several opportunities to increase the quality of word embeddings. For example, in interpretable embedding models, it is easier to remove redundant or relevant dimensions,  resulting  in  reduced  computation and memory requirements. Additionally, biased information (e.g, gender, race etc.) can be gathered under a specific dimension. By the removal of that dimension, unbiased word embeddings can be obtained. The paper demonstrates this notion in the scope of gender bias. On the other hand, BiImp does not cause any performance degradations in semantic tasks and it has flexibility to be adapted to distinctive learning scenarios.

The paper can be accessed here.

Abstract:

We propose bidirectional imparting or BiImp, a generalized method for aligning embedding dimensions with concepts during the embedding learning phase. While preserving the semantic structure of the embedding space, makes dimensions interpretable, which has a critical role in deciphering the black-box behavior of word embeddings. BiImp separately utilizes both directions of a vector space dimension: each direction can be assigned to a different concept. This increases the number of concepts that can be represented in the embedding space. Our experimental results demonstrate the interpretability of BiImp embeddings without making compromises on the semantic task performance. We also use BiImp to reduce gender bias in word embeddings by encoding gender-opposite concepts (e.g., male-female) in a single embedding dimension. These results highlight the potential of BiImp in reducing biases and stereotypes present in word embeddings. Furthermore, task or domain-specific interpretable word embeddings can be obtained by adjusting the corresponding word groups in embedding dimensions according to task or domain. As a result, BiImp offers wide liberty in studying word embeddings without any further effort.