Knowledge extraction

Enrich your data by extracting knowledge in the form of triplets

Zafer Çavdar avatar
Written by Zafer Çavdar
Updated over a week ago

Extract knowledge is an operation that extracts the who and what in text in the form of informative triplets (subject, object, action). The operation is the basis for constructing knowledge graphs to represent textual information in a structured manner.

Coreference Resolution: Coreference resolution refers to finding all expressions that refer to the same entity in a text. For example, the “he” pronoun in “Michael was born in Dubai, and he lives in Spain” is resolved to “Michael” with this operation. Enable this to perform coreference resolution on input text before extracting triples.

Entity Linking and Normalization: This option enables mapping different versions of an entity to a standard form and brings entity-related information stored in external knowledge bases as separate triples. For example, entity normalization connects “Barack Obama” and “President Obama” to “President Barack Obama, “ and the linker adds “is a politician” in relation to the graph.

Step-by-step guide

1. Open the operation configuration window

Select the text field you want to segment and click the "Add operation" button at the top of the workspace.

Search for "Extract knowledge" or find the operation under "Text enrichment" and click it.

2. Specify the language

In the "Language" drop-down, select the language of your input text. Currently, English is the only language available.

3. Enrich the data

You can enrich the result with two additional options, which are "Coreference Resolution" and “Entity Linking and Normalization.” See the description of each method above and choose the relevant option.

4. Name the output field

Under "Output field name,” type the name of the output field.

5. Apply the operation

Click "Apply" to run the operation. Informative triples are extracted from the unstructured text data, and the main components of the knowledge graph, nodes, and edges are created. As seen on Schema, the Triples collection contains the nodes and edges, which are parts of the created graph. Node1 refers to subjects (actors), node2 refers to objects, and edge_name refers to relations. By this operation, expressions are turned into identified knowledge, which gives even more information about the unstructured data. Created triples are then added to our collection, and it is shown on the top-right Table View. The collection is then carried to the workbench to see the extracted triples in detail, which gives more information about the extracted data.

Did this answer your question?