What functionality does Dcipher Analytics offer?

Preprocessing
└── Case modifications
└── URL cleaning
└── XML/HTML tag cleaning
└── Tab/new line cleaning
└── Hashtag cleaning
└── Punctuation cleaning
└── Spell checking
└── Stop-word cleaning
└── Substring/repeat cleaning
└── Emoji cleaning
└── Changing field types
└── Changing date formats
└── Automated text subtype detection

Text segmentation
└── Split text into words
└── Lemmatization
└── Part-of-speech tagging
└── Named Entity Recognition
└── Phrase detection
└── Split text into sentences
└── Split text into paragraphs
└── Rule-based paragraph boundary detection
└── Context-aware paragraph boundary detection
└── Content extraction
└── Extract content based on keywords
└── Extract content based on pattern

Vectorizers
└── GPT
└── Cohere
└── Sentence Transformers
└── Word2Vec
└── Doc2Vec
└── FastText
└── GloVe
└── BERT
└── ELMo
└── OneHot Encoding

Enrichment
└── Sentiment analysis
└── Emojization & Emoji extraction
└── Key phrase detection
└── Language detection
└── Concept detection
└── Quotation detection
└── Text statistics
└── Date extraction
└── Text summarization
└── Topic detection

Semantic similarity modelling
└── Document to document similarity calculations
└── Document scoring based on words
└── Word scoring based on words

Contextual analysis
└── Cosine similarity on case-term frequency vectors
└── CKC similarity on case-term frequency vectors
└── Co-occurrence similarity
└── Burst-based over-representation

Temporal analysis
└── Momentum
└── Burst detection
└── Topic evolution analysis

Regular expression operations
└── Split by pattern
└── Extract properties by pattern
└── Replace pattern

3rd party integrations
└── OpenAI
└── Cohere
└── Anthropic
└── Google Vertex AI
└── Google Translate
└── Google NLP
└── IBM Watson Natural Language Understanding

Supervised learning
└── Classification
└── Logistic Regression
└── Decision Trees
└── Random Forests
└── Multilayer Neural Networks
└── Support Vector Machines
└── Naive Bayes
└── Regression
└── Linear Regression
└── Decision Trees
└── Random Forests
└── Gradient Boosted Regressors
└── Isotonic Regressors
└── AFT Survival Regressors

Unsupervised learning
└── Clustering
└── K-means clustering
└── Gaussian mixture clustering
└── Power iteration clustering
└── DBSCAN clustering
└── Outlier scoring
└── One-class SVM
└── Robust Covarience
└── Isolation Forest
└── Local Outlier Factor
└── Robust PCA
└── Dimensionality reduction
└── Principal component analysis
└── Singular value decomposition
└── T-distributed Stochastic Neighbor Embedding
└── Uniform Manifold Approximation and Projection

Preprocessing
└── Formula-based transformations
└── Discretization
└── Normalization
└── Imputation
└── Convert numbers to words

Grouping and aggregations
└── SQL aggregation functions
└── Standard deviation
└── Variance
└── Median
└── Vector sum
└── Vector mean
└── Entropy
└── Gini-index

Numeric filters
└── Greater than
└── Greater than or equal to
└── Smaller than
└── Smaller than or equal to
└── In between

Value filters
└── Contains
└── Does not contain
└── Is one of
└── Is not one of

Import data
└── From file
└── JSON/JSONL file
└── Delimiter-separated file (CSV, TSV, etc)
└── PDF files(s)
└── Word files(s)
└── Excel file
└── TXT file(s)
└── From social media
└── Facebook
└── Twitter
└── Instagram
└── YouTube
└── Forums
└── Blogs
└── From news media
└── From Survey Monkey
└── From Miro

Export results
└── As JSON, XML
└── As Table (CSV/TSV, Excel)
└── As Document (PDF, DOCX)
└── As Image (SVG)

NLP/Text analytics