Paidiom has been designed to carry out three main tasks in MWE preprocessing:
- Detection of MWEs: Paidiom performs a token-based MWE detection following a Lexicon Lookup Method. In this first step, both continuous and discontinuous forms of the MWEs defined in the lexicon are detected.
- Conversion of discontinuous MWEs into their continuous forms: if a flexible discontinuous MWE has been detected, Paidiom switches the position of the gap to the end of the MWE, converting it to its continuous form.
- Translemmatisation of MWEs: Translemmatisation is the process of converting a source-text lexical unit into its target-text equivalent, both of which conform a traslemma or bitextual unit. In this last step, Paidiom hence translemmatises the source MWE into its target equivalent.
The ultimate goal of Paidiom is to enhance the performance of NMT systems. This would be an example of NMT-orientated MWE preprocessing from Spanish into English for the MWE haber gato encerrado :