Announcing: The Scout CLI and AI Workflows as CodeLearn More
Back to getting started topics
Getting started

Create Your First Collection

Collections are where you store documents, data, and information that your AI can search through and reference. Think of it as building your AI's personal library.

Building Your AI's Knowledge Base 📚

  • Now that you've built your first workflow, let's give your AI some specific knowledge to work with! Collections are where you store documents, data, and information that your AI can search through and reference. Think of it as building your AI's personal library.

🎯 What We're Building

  • A searchable knowledge base that your AI can use to answer questions with accurate, source-backed information. Whether it's company docs, product information, or research papers – your AI will be able to find and reference the right content every time.

📋 Step-by-Step: Create Your Collection

Step 1: Create Your Collection

  1. Navigate to Collections in the main menu
  2. Click "+ New Collection"
  3. Give it a descriptive name like "Company Knowledge Base" or "Product Documentation"
  4. Add a description (optional but helpful for team organization)
  5. Click "Create"

Scout will take a few seconds to provision your vector database.

Step 2: Understanding Table Structure

  • Once your collection is ready, you'll see a default table with a Content column where your searchable text will live.

The table structure you'll need depends on your data source:

  • PDFs: Add columns for document metadata like title, upload date and anything else deemed valuable.
  • CSVs: Table structure mirrors your CSV columns
  • Web scrapes: Need columns for URLs, page titles, and site organization

To modify your table: Click the "+" button to add columns, or click existing column headers to edit them.

Don't worry about setting up columns now, we'll show you the optimal structure for each upload method below! Scout makes it easy to add and modify columns as you go.

📄 Method 1: Upload PDF Documents

  • Perfect for: Reports, manuals, research papers, contracts, guides

How to Upload PDFs:

  1. Add a "Title" column in your table
  2. Add your first "Row"
  3. Enter a Title for the Document
  4. Next to "Content" click on the upload button to select your PDF
  5. Scout automatically:
    • Extracts all text content
    • Breaks it into searchable chunks
    • Generates embeddings for semantic search

5. Click "Save"

Recommended Table Structure for PDFs:

  • Click the "+" button to add columns
    • Recommended column to add:
      • Document Title (Single Line Text) – File name or custom title

Pro Tips for PDFs:

  • Large files: Scout handles files up to 25 mb.
  • Text quality: Works best with text-based PDFs (not scanned images)

📊 Method 2: Import CSV Data

  • Perfect for: Product catalogs, customer data, FAQ lists, structured information

How to Upload CSVs:

  1. First add columns to match your CSV's headers.
  2. Click "Sources"
  3. Choose your CSV
  4. Scout will:
    • Map CSV columns to your table columns
    • Create one document per row
    • Use one column as the main "content" for search

4. Confirm proper mapping

Table Structure Matches Your CSV:

  • Your table columns will need match your CSV headers. For example, if uploading a product catalog CSV:

Example columns from your CSV:

  • Product Name (Single Line Text) – From CSV "Name" column
  • Category (Single Select) – From CSV "Category" column
  • Price (Number) – From CSV "Price" column
  • SKU (Single Line Text) – From CSV "SKU" column
  • In Stock (Checkbox) – From CSV "Available" column

CSV Best Practices:

  • Headers required: First row should contain column names
  • Content column: Designate which column contains the main searchable text
  • Clean data: Remove empty rows and fix formatting issues first

🌐 Method 3: Web Scraping

Perfect for: Documentation sites, blogs, knowledge bases, product pages. For a full guide on web scraping take a look at this guide.

Single Page Scrape:

  1. Click "+ Add Source" in your collection
  2. Select " Single Web Page Crawl"
  3. Enter the URL you want to scrape
  4. Choose scraper type:
    • HTTP: Faster, good for simple sites
    • Playwright: Slower but handles complex, JavaScript-heavy sites

5. Select text extraction method (Readability is usually best)

6. Map which column, the "Page title", "URL", and "Content" should populate. This can be done manually be selecting from the dropdown or selecting the "Auto-Mapping" feature shown below.

7. Click "Create" and then "Run"

Recommended Table Structure for Web Scrapes:

  • Click the "+" button to add columns
    • Recommended columns:
      • Page Title (Single Line Text) – HTML title tag
      • URL (URL) – Source page URL
      • Content - This is a default column on Scout and where the scraped page contents will be.

Full Site Scrapes:

Sitemap Method (Recommended) Perfect for documentation sites and blogs:

  1. Find the website's sitemap (usually website.com/sitemap.xml)
  2. Select "Sources" then "Add Source" and finally select "Sitemap Crawl"
  3. Enter the sitemap URL
  4. Select the "Scraper" and "Text Extractor"
  5. Map which column, the "Page title", "URL", and "Content" should populate. This can be done manually be selecting from the dropdown or selecting the "Auto-Mapping" feature shown below.
  6. Click "Create" and then "Run"

Scout will scrape all pages listed in the sitemap

WebSite Crawl Great for smaller sites, but use carefully on large domains:

  1. Select "Sources" then "Add Source" and finally select "Website Crawl"
  2. Set crawl depth and page limits
  3. Select the "Scraper" and "Text Extractor"
  4. Map which column, the "Page title", "URL", and "Content" should populate. This can be done manually be selecting from the dropdown or selecting the "Auto-Mapping" feature shown below.
  5. Click "Create" and then "Run"

Scraping Pro Tips:

  • Test first: Try a single page before bulk scraping
  • Respect robots.txt: Scout follows standard web scraping ethics
  • Update regularly: Set up scheduled re-scrapes to account for changing content and keep your collection up to date.


🔍 Your Collection in Action

Once you've added content, your collection becomes immediately searchable:

Automatic Processing:

  • ✅ Text is chunked into optimal search segments
  • ✅ Embeddings are generated for semantic search
  • ✅ Metadata is indexed for filtering
  • ✅ Everything is ready for RAG workflows

Search Quality Features:

  • Semantic search: Finds content by meaning, not just keywords
  • Metadata filtering: Search within specific categories or sources
  • Relevance scoring: Returns the most contextually relevant results

🚀 Next Step: Connect to Workflows

Your collection is now ready to power intelligent workflows!

Share article: