Using FrameIt for languages other than English

FrameIt supports any language for which Spacy2 models exist. As of January 2019, Spacy supports English, Spanish, German, Italian, Portuguese, French, and Dutch.

Install language files

You can install the Spacy model for other languages similar to how you installed the English spacy model.

$ python -m spacy download en

The model names for various languages are as follows:

'de': 'de_core_news_sm',
'es': 'es_core_news_sm',
'pt': 'pt_core_news_sm',
'it': 'it_core_news_sm',
'nl': 'nl_core_news_sm',
'fr': 'fr_core_news_sm',

Running FrameIt in other languages

Language-dependent Spacy models are used when instantiating new Utterances. Thus, we need to pass a language value whenever we initialize a new Utterance or Corpus (which in turn creates Utterances). In most cases, you will only need to initialize a Corpus (as is shown in the frame training notebook tutorials). However, you may want to generate spacy embeddings for individual sentences, as seen in the lambda_rule notebook.

To set a desired language, simply use the parameter “lang” and set it equal to the two-letter code for the language of your choice. If no language is provided, English will be selected by default. Language codes must also be provided to SRLs.

Corpus initialization example

corpus = Corpus(corpus_file, build_index=False, lang=‘de’)

Sentence initialization example

tp = TextProcessing()
sent = tp.nlp[‘de’](“Friedrich hat mir gestern geholfen”)

SRL initialization example

srl = SRL(lang="de")

Note that Frames and SRLs are designed to be language specific: loading frames built for multiple different languages into the same SRL is not advised.