A Knowledge Base (KB) is a collection of source data that powers an Insight Booster (IB) project.
A news-based Knowledge Base allows you to collect articles from a wide range of news sources. By clicking 'Create', you can start building a KB that pulls in relevant news based on your search query, selected geographies, and date range.
In this article, you’ll learn how to:
Create a Knowledge Base using news data
Define a focused search strategy adapted to news
To create a Knowledge Base, navigate to the Knowledge Base icon in the blue ribbon at the top of the page. Alternatively, you can open an existing Insight Booster project and create a Knowledge Base directly within that project. More details can be found in the article titled “Creating a Knowledge Base”.
Click “Select input data” to define the search query to extract news articles.
In this section, you define the search space through keywords and other restrictions.
Number 1 - Import keywords: Use this button to upload keywords from an Excel spreadsheet. This method lets you create detailed search queries using Boolean operators.
Number 2 - All of these words/phrases: Add keywords here that must be in every article. You can copy and paste them from an Excel column, with each row becoming a separate keyword, or enter them one by one. In Boolean terms, these terms are connected with "AND".
Number 3 - Any of these words/phrases: At least one of these keywords must appear in each article. As demonstrated below, here we are downloading articles containing any of the four terms from the query below. You can copy and paste them from an Excel column, with each row as a separate keyword, or add them individually. In Boolean terms, these are connected with "OR." If you fill both keyword sections, the query combines “All of these words/phrases” and “Any of these words/phrases” keywords.
Number 4 - None of these words/phrases: Terms you include here will be excluded from your data, even if they contain the search terms. You can add keywords that may bring irrelevant content. For example, you want to download data about a company whose name is also the last name of an unrelated celebrity, you can add the celebrity’s name in this section to avoid getting posts about that celebrity.
Number 5 - Language: You can specify the language of the articles you want to scan to align with the terms used in your search query. Choosing the appropriate language can help avoid ambiguities related to false friends -words with different meanings in different languages. For example, the word "gift" means "present" in English but translates to "poison" or “wedded” in Swedish. Properly setting the language ensures more accurate and relevant search results.
Number 6 - Country: You can choose the countries from which to download data using the provided list.
Number 7 - Posted after: Select the earliest date to be included in your analysis.
Number 8 - Posted before: Select the latest date to be included in your analysis.
Number 9 - Post order: This is set to weekly sampling by default.
Number 10 - Show advanced settings: Display source-related settings.
Number 11 - Sites: If you want to restrict your analysis to specific sites, you can use this setting.
Number 12 - Lowest site rank (globally): By setting the lowest rank to 10, only the top 10 sites worldwide will be included in the analysis. This global setting limits the analysis to a maximum of 10 sites.
Number 13 - Lowest site rank (within country): This works similarly to the global setting but applies to each country individually. If the analysis involves 3 countries and the rank is set to 10, it will include the top 10 sites from each country, resulting in a total of 30 sites.
Number 14 - Content provider type: You can choose the content provider types from the available options, which include, among others, editorial media, magazines, and local broadcasts.
Number 15 - Media types: You can also choose the media type, such as web, blog, print, TV, or podcast.
Once you are set, click “Find articles”.
From the total number of articles identified that match your search criteria, you can specify how many articles you would like to work with. The maximum number you can enter is equal to the number of matching entries. After setting the desired amount of data and providing a file name, click “Import” to proceed to the workflow settings.
If there’s something specific you’d like to focus on within the data, you can enter it here. The workflow will then extract parts of the posts that relate to your area of interest. You can also choose a language for the summaries, regardless of the original data's language.
You can create a Knowledge Base for one-time use, set it to update on a scheduled basis, or trigger it through an API call for dynamic data integration.
Immediate single run
This is the default option, allowing you to run the workflow once. The results will automatically be converted into a Knowledge Base. Please note that in order to be able to schedule a workflow, you need to first create it through an immediate single run.
Click “Continue” to configure further settings.
You can either create a new Knowledge Base or update an existing one. If you are creating a new one, provide a clear name for your Knowledge Base, and it will appear under this name in the Knowledge Base list.
Next, enter a name for the workflow, which will be displayed under "My Workflows". You may also add a description for the workflow if desired.
If you are updating an existing Knowledge Base, select “Update” as the action type and select the Knowledge Base name from the list underneath.
Once you have filled out these details, click “Create” to finalize the setup.
Once your Knowledge Base has been created, you’ll see it under “My Knowledge Bases” accessed through the blue ribbon on top. Successfully created KBs have the green tick mark next to their name.
At scheduled intervals
You can also run Knowledge Bases to update at scheduled intervals. Keep in mind that you can only apply this functionality on existing Knowledge Bases created through the steps explained in the previous section, “Immediate single run”.
At this stage, you can select your preferred frequency for updating. The settings exemplified below would create a workflow that updates a Knowledge Base on the first day of each month with data covering the past month.
Click “Continue” to proceed.
Select the Knowledge Base you want to automate from the dropdown menu.
If you want the results in the Insight Booster project to update automatically, enable the setting indicated by the arrow to regenerate the results.
Provide a name and optionally a description for the workflow, and click “Create”.
Trigger through API call
Select the “Trigger through API call” option, then click the plus sign to configure the variables you wish to modify for each API run.
You can choose a parameter from the dropdown menu to change the search query (i.e., the downloaded data) and assign a name to the variable. For example, if you want to modify the country for each run, simply select “country” from this menu. When you initiate a new run through the API, you will be prompted to configure a value for the country variable.
You can also define settings through operations. Click on “Add operation”, select the relevant operation, and configure the associated variables. These operations represent the steps your data undergo to transform into a Knowledge Base.
For instance, if you select the node that displays the “Sample” operation and click “Select”, you’ll be able to configure parameters related to the sample, in addition to the country.
Here, you can choose the sample size and set it for the upcoming run. Once you have selected your variables, click “Continue”.
Choose the Knowledge Base where you want to trigger runs through API, enter a workflow name, and click “Create”.
Once your Knowledge Base has been created, you’ll see it under “My Knowledge Bases” accessed through the blue ribbon on top. You will also receive an email once the Knowledge Base has been created. Successfully created KBs have the green tick mark next to their name.