Keywords: language coverage
Many Dcipher Analytics features cover a wide range of languages. The languages supported by each feature are listed below. If you're missing a language that is important to you, don't hesitate to let us know.
Tokenization
Tokenization (185 languages): Abkhazian, Afar, Afrikaans, Akan, Albanian, Amharic, Arabic, Aragonese, Armenian, Assamese, Avaric, Avestan, Aymara, Azerbaijani, Bambara, Bashkir, Basque, Belarusian, Bengali, Bihari languages, Bislama, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chamorro, Chechen, Chewa, Chinese, Church Slavic (Old Bulgarian), Chuvash, Cornish, Corsican, Cree, Croatian, Czech, Danish, Divehi, Dutch, Dzongkha, English, Esperanto, Estonian, Ewe, Faroese, Fijian, Finnish, French, Fulah, Galician, Ganda, Georgian, German, Greek, Guarani, Gujarati, Haitian Creole, Hausa, Hebrew, Herero, Hindi, Hiri Motu, Hungarian, Icelandic, Ido, Igbo, Indonesian, Interlingua, Interlingue (Occidental), Inuktitut, Inupiaq, Irish, Italian, Japanese, Javanese, Kalaallisut (Greenlandic), Kannada, Kanuri, Kashmiri, Kazakh, Khmer, Kikuyu, Kinyarwanda, Komi, Kongo, Korean, Kurdish, Kwanyama, Kyrgyz, Lao, Latin, Latvian, Limburgish, Lingala, Lithuanian, Luba-Katanga, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Manx, Maori, Marathi, Marshallese, Mongolian, Nauru, Navajo, Ndonga, Nepali, North Ndebele, Northern Sami, Norwegian, Norwegian Bokmål, Norwegian Nynorsk, Nuosu, Occitan, Ojibwa, Oriya, Oromo, Ossetian, Pali, Pashto, Persian, Polish, Portuguese, Punjabi, Quechua, Romanian, Romansh, Rundi, Russian, Samoan, Sango, Sanskrit, Sardinian, Scottish Gaelic, Serbian, Serbo-Croatian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, South Ndebele, Southern Sotho, Spanish, Sundanese, Swahili, Swati, Swedish, Tagalog, Tahitian, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Tigrinya, Tonga, Tsonga, Tswana, Turkish, Turkmen, Twi, Ukrainian, Urdu, Uyghur, Uzbek, Venda, Vietnamese, Volapük, Walloon, Welsh, Western Frisian, Wolof, Xhosa, Yiddish, Yoruba, Zhuang, Zulu
Part-of-speech tagging
Part-of-speech tagging (28 languages): Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Lithuanian, Macedonian, Norwegian Bokmål, Polish, Portuguese, Romanian, Russian, Slovenian, Spanish, Swedish, Turkish
Stopword removal
Stopword removal (63 languages): Afrikaans, Albanian, Arabic, Armenian, Basque, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Kannada, Kyrgyz, Korean, Latvian, Lithuanian, Luxembourgish, Malayalam, Marathi, Nepali, Norwegian, Norwegian Bokmål, Persian, Polish, Portuguese, Romanian, Russian, Sanskrit, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Tatar, Telugu, Thai, Tswana, Turkish, Ukrainian, Urdu, Vietnamese, Yoruba
Lemmatization
Lemmatization (27 languages): Catalan, Croatian, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian, Italian, Japanese, Korean, Lithuanian, Luxembourgish, Macedonian, Norwegian Bokmål, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Urdu
Named entity recognition
Named Entity Recognition (41 languages): Arabic, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Norwegian, Norwegian Bokmål, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese
Phrase detection
Phrase detection: Language independent
Topic detection
Topic detection: Language independent
Concept detection
Concept detection (45 languages): Afrikaans, Albanian, Arabic, Armenian, Basque, Bengali, Bosnian, Breton, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese
Document and word-level sentiment analysis
Document and word-level sentiment analysis (115 languages): Afrikaans, Albanian, Amharic, Arabic, Aragonese, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chechen, Chinese, Chuvash, Croatian, Czech, Danish, Divehi, Dutch, English, Esperanto, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hebrew, Hindi, Hungarian, Icelandic, Ido, Indonesian, Interlingua, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Latin, Latvian, Limburgish, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Manx, Marathi, Mongolian, Nepali, Northern Sami, Norwegian, Norwegian Bokmål, Norwegian Nynorsk, Occitan, Oriya, Ossetian, Pashto, Persian, Polish, Portuguese, Punjabi, Quechua, Romanian, Romansh, Russian, Sanskrit, Scottish Gaelic, Serbian, Serbo-Croatian, Sinhala, Slovak, Slovenian, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uyghur, Uzbek, Vietnamese, Volapük, Walloon, Welsh, Western Frisian, Yiddish, Yoruba
Entity-level sentiment analysis
Entity-level sentiment analysis (38 languages): Arabic, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Norwegian, Norwegian Bokmål, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese
Emojization
Emojization (1 language): English
Word-level machine translation
Word-level machine translation (58 languages): Afrikaans, Albanian, Arabic, Armenian, Basque, Bengali, Bosnian, Breton, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese
Text-level machine translation
Text-level machine translation via Google Translate (109 languages): Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Chewa, Chinese, Corsican, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Kinyarwanda, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Oriya, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Southern Sotho, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Turkish, Turkmen, Ukrainian, Urdu, Uyghur, Uzbek, Vietnamese, Welsh, Western Frisian, Xhosa, Yiddish, Yoruba, Zulu.
Pre-trained text vectorizers
Pre-trained text vectorizers (11 languages): Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, Spanish, Swedish, Turkish