Web Search Block
The Web Search Block in Scout allows users to perform web searches and extract content from search engine results. This lesson will guide you through configuring and using the Web Search Block effectively.
Learning Objectives:
- Understand the configuration options of the Web Search Block.
- Learn how to filter and process web search results.
- Explore best practices for using the Web Search Block in workflows.
1. Introduction to the Web Search Block:
- The Web Search Block allows you to perform web searches and extract content from search engine results. It's particularly useful for data collection and analysis tasks.
2. Configuration Options:
- Search Engine Query: Enter the main query used to search the web. This is the core of your search operation.
- For a basic use case this will often be the search term entered by the user such as,
{{inputs.message}}
. - Be sure to check dependancies if you're using a multistep workflow and want to query the output of another block aside from your "Input" block.
- For a basic use case this will often be the search term entered by the user such as,
- Search Results To Scrape: Set the maximum number of search results to process. Default is 1. Adjust based on the volume of data you need.
- Time Filter: Filter search results by time range. Options include Any time, Past hour, Past 24 hours, Past week, Past month, and Past year. Default is Any time.
- Include Domains: Specify domains to include in the search results. Default is an empty list, which includes results from all domains.
- Exclude Domains: Specify domains to exclude from the search results. Default is an empty list.
- Split Page Text: Toggle whether the extracted text is chunked into smaller sections. Default is true.
- This can significantly affect recall, often times filtering through smaller chunks yields better outputs.
- Splitter Strategy: Choose the strategy for splitting text when Split Page Text is enabled. Default is Smart Splitter.
- Max Results to Return: Set the maximum number of results to return after processing. Default is 0, which returns all processed results.
- Content Capture Mode: Choose between Thorough (processes everything including JavaScript, more complete but slower) or Quick (basic HTML only, faster). Default is Quick.
- Minimum Similarity Score: Set the minimum similarity score for a result to be considered relevant. Default is 0.0, which includes all results.
- Use this to dial in the results, when inspecting returned chunks you will see a similarity score assigned to that particular chunk. If you're looking for more relevant data increase the similarity score to exclude results that may not be relevant.
- You can inspect the returned chunks in the workflow console by clicking into the web search block once it has completed, here you will see all results from the search.

- Page Search Term: Term to search for inside the top search results. Defaults to the Search Engine Query if not provided.
- Another config to help dial in what is returned.
- In the example below, we are looking at the Scout GTM Blog gen template. For a more dialed in search, we are passing in the keyword from the input to ensure the pages/chunks pulled in are highly relevant to the topic.
- Here you can see the page search term has been set to
{{inputs.key_word}}
.

- Text Extractor: Method to use for extracting text from web pages. Default is readability, more coming soon!
3. Outputs:
- The block outputs a list of extracted web page results, each containing text, similarity score, canonical URL, and metadata.
- These can then be passed into subsequent blocks for analysis, summarization or content creation.
4. Best Practices:
- Ensure that the query is specific to obtain relevant search results.
- Use the time filter to narrow down results to a specific time range if needed.
- Specify include or exclude domains to refine the search scope.
- Set an appropriate minimum similarity score to filter out less relevant results.
- Consider the content capture mode based on the need for thoroughness versus speed.
Web Search template playground:
- Use this simple template to test out what you learned and hone in on optimal configuration of this block: