Skip to main content
All CollectionsAbout Dcipher Analytics
What languages can Dcipher Analytics be used for?
What languages can Dcipher Analytics be used for?

An overview of the languages supported by each NLP feature.

Tomas Larsson avatar
Written by Tomas Larsson
Updated over 10 months ago

Keywords: language coverage

Many Dcipher Analytics features cover a wide range of languages. The languages supported by each feature are listed below. If you're missing a language that is important to you, don't hesitate to let us know.

Tokenization

Tokenization (185 languages): Abkhazian, Afar, Afrikaans, Akan, Albanian, Amharic, Arabic, Aragonese, Armenian, Assamese, Avaric, Avestan, Aymara, Azerbaijani, Bambara, Bashkir, Basque, Belarusian, Bengali, Bihari languages, Bislama, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chamorro, Chechen, Chewa, Chinese, Church Slavic (Old Bulgarian), Chuvash, Cornish, Corsican, Cree, Croatian, Czech, Danish, Divehi, Dutch, Dzongkha, English, Esperanto, Estonian, Ewe, Faroese, Fijian, Finnish, French, Fulah, Galician, Ganda, Georgian, German, Greek, Guarani, Gujarati, Haitian Creole, Hausa, Hebrew, Herero, Hindi, Hiri Motu, Hungarian, Icelandic, Ido, Igbo, Indonesian, Interlingua, Interlingue (Occidental), Inuktitut, Inupiaq, Irish, Italian, Japanese, Javanese, Kalaallisut (Greenlandic), Kannada, Kanuri, Kashmiri, Kazakh, Khmer, Kikuyu, Kinyarwanda, Komi, Kongo, Korean, Kurdish, Kwanyama, Kyrgyz, Lao, Latin, Latvian, Limburgish, Lingala, Lithuanian, Luba-Katanga, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Manx, Maori, Marathi, Marshallese, Mongolian, Nauru, Navajo, Ndonga, Nepali, North Ndebele, Northern Sami, Norwegian, Norwegian Bokmål, Norwegian Nynorsk, Nuosu, Occitan, Ojibwa, Oriya, Oromo, Ossetian, Pali, Pashto, Persian, Polish, Portuguese, Punjabi, Quechua, Romanian, Romansh, Rundi, Russian, Samoan, Sango, Sanskrit, Sardinian, Scottish Gaelic, Serbian, Serbo-Croatian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, South Ndebele, Southern Sotho, Spanish, Sundanese, Swahili, Swati, Swedish, Tagalog, Tahitian, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Tigrinya, Tonga, Tsonga, Tswana, Turkish, Turkmen, Twi, Ukrainian, Urdu, Uyghur, Uzbek, Venda, Vietnamese, Volapük, Walloon, Welsh, Western Frisian, Wolof, Xhosa, Yiddish, Yoruba, Zhuang, Zulu

Part-of-speech tagging

Part-of-speech tagging (28 languages): Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Lithuanian, Macedonian, Norwegian Bokmål, Polish, Portuguese, Romanian, Russian, Slovenian, Spanish, Swedish, Turkish

Stopword removal

Stopword removal (63 languages): Afrikaans, Albanian, Arabic, Armenian, Basque, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Kannada, Kyrgyz, Korean, Latvian, Lithuanian, Luxembourgish, Malayalam, Marathi, Nepali, Norwegian, Norwegian Bokmål, Persian, Polish, Portuguese, Romanian, Russian, Sanskrit, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Tatar, Telugu, Thai, Tswana, Turkish, Ukrainian, Urdu, Vietnamese, Yoruba

Lemmatization

Lemmatization (27 languages): Catalan, Croatian, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian, Italian, Japanese, Korean, Lithuanian, Luxembourgish, Macedonian, Norwegian Bokmål, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Urdu

Named entity recognition

Named Entity Recognition (41 languages): Arabic, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Norwegian, Norwegian Bokmål, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese

Phrase detection

Phrase detection: Language independent

Topic detection

Topic detection: Language independent

Concept detection

Concept detection (45 languages): Afrikaans, Albanian, Arabic, Armenian, Basque, Bengali, Bosnian, Breton, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese

Document and word-level sentiment analysis

Document and word-level sentiment analysis (115 languages): Afrikaans, Albanian, Amharic, Arabic, Aragonese, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chechen, Chinese, Chuvash, Croatian, Czech, Danish, Divehi, Dutch, English, Esperanto, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hebrew, Hindi, Hungarian, Icelandic, Ido, Indonesian, Interlingua, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Latin, Latvian, Limburgish, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Manx, Marathi, Mongolian, Nepali, Northern Sami, Norwegian, Norwegian Bokmål, Norwegian Nynorsk, Occitan, Oriya, Ossetian, Pashto, Persian, Polish, Portuguese, Punjabi, Quechua, Romanian, Romansh, Russian, Sanskrit, Scottish Gaelic, Serbian, Serbo-Croatian, Sinhala, Slovak, Slovenian, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uyghur, Uzbek, Vietnamese, Volapük, Walloon, Welsh, Western Frisian, Yiddish, Yoruba

Entity-level sentiment analysis

Entity-level sentiment analysis (38 languages): Arabic, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Norwegian, Norwegian Bokmål, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese

Emojization

Emojization (1 language): English

Word-level machine translation

Word-level machine translation (58 languages): Afrikaans, Albanian, Arabic, Armenian, Basque, Bengali, Bosnian, Breton, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese

Text-level machine translation

Text-level machine translation via Google Translate (109 languages): Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Chewa, Chinese, Corsican, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Kinyarwanda, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Oriya, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Southern Sotho, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Turkish, Turkmen, Ukrainian, Urdu, Uyghur, Uzbek, Vietnamese, Welsh, Western Frisian, Xhosa, Yiddish, Yoruba, Zulu.

Pre-trained text vectorizers

Pre-trained text vectorizers (11 languages): Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, Spanish, Swedish, Turkish

Did this answer your question?