Knowledge Base Connectors

Overview

Knowledge base connectors allow you to automatically sync content from external sources into your AI agent’s knowledge base. Instead of manually uploading files, connectors can fetch and update content programmatically, ensuring your agents always have access to the latest information. Each connector can create and manage multiple knowledge base files, automatically keeping them in sync with the source data. When a connector syncs, it updates all associated files, and when deleted, it cleans up all files it created.

Available Connectors

Website Scraper Connector

The Website Scraper connector fetches and extracts content from web pages, making it easy to keep your agent informed about documentation, help articles, or any web-based content. Depending on the configuration, the Website Scraper can create either a single consolidated file or multiple files (one per page) from the scraped content.

Use Cases

Documentation Sites: Keep your agent updated with the latest product documentation
Help Centers: Sync FAQ pages and support articles
Blog Posts: Include recent blog content in your agent’s knowledge
Company Pages: Pull content from About Us, Terms of Service, or other key pages

Configuration

To use the Website Scraper connector, you need to configure it with the following parameters:

Required Parameters

Parameter	Type	Description
`url`	string	The web page URL to scrape (must be HTTP or HTTPS)

Optional Parameters

Parameter	Type	Default	Description
`timeout`	integer	30	Request timeout in seconds
`selectors`	object	null	Custom CSS or tag selectors for targeted content extraction

Credentials (Optional)

Field	Type	Description
`headers`	object	Custom HTTP headers for authenticated requests (e.g., API keys, auth tokens)

Basic Example

{
  "connector_type": "website",
  "config": {
    "url": "https://docs.example.com/api-guide"
  },
  "credentials": {}
}

Advanced Example with Selectors

For more control over what content gets extracted, you can specify custom selectors:

{
  "connector_type": "website",
  "config": {
    "url": "https://docs.example.com/api-guide",
    "timeout": 60,
    "selectors": {
      "Main Content": {
        "selector": "article.documentation",
        "type": "css"
      },
      "Code Examples": {
        "selector": "pre.code-block",
        "type": "css"
      },
      "Headers": {
        "selector": "h2",
        "type": "tag"
      }
    }
  },
  "credentials": {}
}

Authenticated Requests Example

If the website requires authentication or custom headers:

{
  "connector_type": "website",
  "config": {
    "url": "https://internal-docs.example.com/guide"
  },
  "credentials": {
    "headers": {
      "Authorization": "Bearer your-api-token",
      "X-Custom-Header": "custom-value"
    }
  }
}

Content Extraction Behavior

The Website Scraper automatically:

Validates URLs: Only HTTP/HTTPS schemes are allowed, and private IPs/localhost are blocked for security
Cleans Content: Removes script tags, styles, navigation, footers, and other non-content elements
Formats Text: Extracts clean, readable text with proper line breaks
Manages Files: Creates and updates knowledge base files, maintaining associations with the connector

Default Extraction Strategy (when no selectors are provided):

First tries to find an <article> tag
Falls back to <main> or <div class="content">
If neither exists, extracts all text from <body>

Custom Selectors (when provided):

Extracts content matching each selector
Supports both CSS selectors and HTML tag names
Each section is labeled with the selector name

Multi-Page Support:

When configured for multi-page scraping, the connector creates separate knowledge base files for each page
All files are automatically tracked and managed by the connector
Subsequent syncs update all associated files

Security Features

The Website Scraper includes built-in protections against SSRF (Server-Side Request Forgery) attacks:

Blocks requests to private IP ranges (10.x.x.x, 172.16.x.x, 192.168.x.x)
Blocks localhost and loopback addresses
Blocks link-local addresses (e.g., AWS metadata service at 169.254.169.254)
Only allows HTTP and HTTPS protocols

Limitations

Minimum content length: 100 characters (pages with less content will fail)
Does not execute JavaScript (static HTML only)
Cannot handle pages requiring complex authentication flows
Cannot scrape content behind CAPTCHAs or bot protection

BigQuery Connector

Documentation for the BigQuery connector coming soon.

When to Use Connectors vs. File Uploads

Scenario	Recommended Approach
Content changes frequently	Use connectors with scheduled syncing
Static documents (PDFs, docs)	Direct file upload
Web-based documentation	Website Scraper connector
Database queries	BigQuery connector
One-time knowledge addition	Direct file upload
Multiple related web pages	Website Scraper with multiple configurations

Best Practices

Start Simple: Begin with basic URL configuration, then add selectors if needed
Test Selectors: Use browser dev tools to test CSS selectors before configuring
Set Appropriate Timeouts: Increase timeout for slow-loading pages
Monitor Content Length: Ensure scraped content meets the 100-character minimum
Schedule Regular Syncs: Keep knowledge base fresh by scheduling periodic syncs
Use Specific Selectors: Target main content areas to avoid extracting navigation and footers
Review Synced Files: Check that all expected files are being created and updated correctly

Troubleshooting

”Scraped content appears empty or too short”

Check if the URL is correct and publicly accessible
Verify selectors are matching the expected elements
Try removing custom selectors to use default extraction
Check if the page requires JavaScript (not supported)

“URL validation failed”

Ensure the URL uses HTTP or HTTPS
Check that the URL doesn’t point to a private IP or localhost
Verify the hostname can be resolved

”Network error while fetching URL”

Increase the timeout value for slow-loading pages
Check if the website requires authentication headers
Verify the URL is accessible from your network

API Reference

For programmatic access to knowledge base management, see:

Update Agent — manage knowledge base files via the knowledge_base_files field (reference existing files by id, add new ones with file_name + content_base64)

Next Steps

Set up your first connector
Schedule automated syncs
Monitor sync status and errors
Combine multiple connectors for comprehensive knowledge bases

​Overview

​Available Connectors

​Website Scraper Connector

​Use Cases

​Configuration

Required Parameters

Optional Parameters

Credentials (Optional)

​Basic Example

​Advanced Example with Selectors

​Authenticated Requests Example

​Content Extraction Behavior

​Security Features

​Limitations

​BigQuery Connector

​When to Use Connectors vs. File Uploads

​Best Practices

​Troubleshooting

​”Scraped content appears empty or too short”

​“URL validation failed”

​”Network error while fetching URL”

​API Reference

​Next Steps

Overview

Available Connectors

Website Scraper Connector

Use Cases

Configuration

Basic Example

Advanced Example with Selectors

Authenticated Requests Example

Content Extraction Behavior

Security Features

Limitations

BigQuery Connector

When to Use Connectors vs. File Uploads

Best Practices

Troubleshooting

”Scraped content appears empty or too short”

“URL validation failed”

”Network error while fetching URL”

API Reference

Next Steps