Keywords: language coverage
Many Dcipher Analytics features cover a wide range of languages. The languages supported by each feature are listed below. If you're missing a language that is important to you, don't hesitate to let us know.
Tokenization (185 languages): Abkhazian, Afar, Afrikaans, Akan, Albanian, Amharic, Arabic, Aragonese, Armenian, Assamese, Avaric, Avestan, Aymara, Azerbaijani, Bambara, Bashkir, Basque, Belarusian, Bengali, Bihari languages, Bislama, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chamorro, Chechen, Chewa, Chinese, Church Slavic (Old Bulgarian), Chuvash, Cornish, Corsican, Cree, Croatian, Czech, Danish, Divehi, Dutch, Dzongkha, English, Esperanto, Estonian, Ewe, Faroese, Fijian, Finnish, French, Fulah, Galician, Ganda, Georgian, German, Greek, Guarani, Gujarati, Haitian Creole, Hausa, Hebrew, Herero, Hindi, Hiri Motu, Hungarian, Icelandic, Ido, Igbo, Indonesian, Interlingua, Interlingue (Occidental), Inuktitut, Inupiaq, Irish, Italian, Japanese, Javanese, Kalaallisut (Greenlandic), Kannada, Kanuri, Kashmiri, Kazakh, Khmer, Kikuyu, Kinyarwanda, Komi, Kongo, Korean, Kurdish, Kwanyama, Kyrgyz, Lao, Latin, Latvian, Limburgish, Lingala, Lithuanian, Luba-Katanga, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Manx, Maori, Marathi, Marshallese, Mongolian, Nauru, Navajo, Ndonga, Nepali, North Ndebele, Northern Sami, Norwegian, Norwegian Bokmål, Norwegian Nynorsk, Nuosu, Occitan, Ojibwa, Oriya, Oromo, Ossetian, Pali, Pashto, Persian, Polish, Portuguese, Punjabi, Quechua, Romanian, Romansh, Rundi, Russian, Samoan, Sango, Sanskrit, Sardinian, Scottish Gaelic, Serbian, Serbo-Croatian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, South Ndebele, Southern Sotho, Spanish, Sundanese, Swahili, Swati, Swedish, Tagalog, Tahitian, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Tigrinya, Tonga, Tsonga, Tswana, Turkish, Turkmen, Twi, Ukrainian, Urdu, Uyghur, Uzbek, Venda, Vietnamese, Volapük, Walloon, Welsh, Western Frisian, Wolof, Xhosa, Yiddish, Yoruba, Zhuang, Zulu
Part-of-speech tagging (28 languages): Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Lithuanian, Macedonian, Norwegian Bokmål, Polish, Portuguese, Romanian, Russian, Slovenian, Spanish, Swedish, Turkish
Stopword removal (63 languages): Afrikaans, Albanian, Arabic, Armenian, Basque, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Kannada, Kyrgyz, Korean, Latvian, Lithuanian, Luxembourgish, Malayalam, Marathi, Nepali, Norwegian, Norwegian Bokmål, Persian, Polish, Portuguese, Romanian, Russian, Sanskrit, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Tatar, Telugu, Thai, Tswana, Turkish, Ukrainian, Urdu, Vietnamese, Yoruba
Lemmatization (27 languages): Catalan, Croatian, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian, Italian, Japanese, Korean, Lithuanian, Luxembourgish, Macedonian, Norwegian Bokmål, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Urdu
Named Entity Recognition (41 languages): Arabic, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Norwegian, Norwegian Bokmål, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese
Phrase detection: Language independent
Topic detection: Language independent
Concept detection (45 languages): Afrikaans, Albanian, Arabic, Armenian, Basque, Bengali, Bosnian, Breton, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese
Document and word-level sentiment analysis (115 languages): Afrikaans, Albanian, Amharic, Arabic, Aragonese, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan, Chechen, Chinese, Chuvash, Croatian, Czech, Danish, Divehi, Dutch, English, Esperanto, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hebrew, Hindi, Hungarian, Icelandic, Ido, Indonesian, Interlingua, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Latin, Latvian, Limburgish, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Manx, Marathi, Mongolian, Nepali, Northern Sami, Norwegian, Norwegian Bokmål, Norwegian Nynorsk, Occitan, Oriya, Ossetian, Pashto, Persian, Polish, Portuguese, Punjabi, Quechua, Romanian, Romansh, Russian, Sanskrit, Scottish Gaelic, Serbian, Serbo-Croatian, Sinhala, Slovak, Slovenian, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uyghur, Uzbek, Vietnamese, Volapük, Walloon, Welsh, Western Frisian, Yiddish, Yoruba
Entity-level sentiment analysis (38 languages): Arabic, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Norwegian, Norwegian Bokmål, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese
Emojization (1 language): English
Word-level machine translation (58 languages): Afrikaans, Albanian, Arabic, Armenian, Basque, Bengali, Bosnian, Breton, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese
Text-level machine translation via Google Translate (109 languages): Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Chewa, Chinese, Corsican, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Kinyarwanda, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Oriya, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Southern Sotho, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Turkish, Turkmen, Ukrainian, Urdu, Uyghur, Uzbek, Vietnamese, Welsh, Western Frisian, Xhosa, Yiddish, Yoruba, Zulu.
Pre-trained text vectorizers (11 languages): Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, Spanish, Swedish, Turkish