Skip to main content

How It Works

Learn how Wryn's intelligent scraping platform works under the hood.

Architecture Overview

Wryn combines multiple technologies to provide reliable, scalable web scraping:

  1. REST API - Simple interface for making scrape requests
  2. Browser Pool - Managed headless Chrome instances
  3. Proxy Network - Residential and datacenter proxies worldwide
  4. AI Extraction - Machine learning models for data extraction
  5. Data Pipeline - Cleaning, validation, and formatting

Request Flow

1. API Request

You send a simple HTTP POST request:

curl -X POST https://api.wryn.io/v1/<end_point> \
-H "x-api-key: wryn_live_1234567890abcdefghijklmnopqrstuvwxyz" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product/",
"action": "auto_listing",
"engine": "stealth_mode",
"timeout_ms": 45000,
"retries": 2,
"extract_main_content": true
}'

2. Smart Routing

Wryn analyzes the target website and selects optimal settings:

  • Website fingerprint - Domain, technology stack, anti-bot measures
  • Request priority - Urgent vs. batch processing
  • Resource allocation - Browser type, proxy location, concurrency

3. Browser Execution

A headless Chrome instance is allocated from our pool:

  • Unique fingerprint - Randomized browser profiles to avoid detection
  • Proxy selection - Residential proxy from target region
  • JavaScript rendering - Full page load with AJAX requests
  • Screenshot capture - Optional visual verification

4. Page Interaction

The browser navigates and interacts with the page:

  • Wait for elements - Smart waiting for dynamic content
  • Handle popups - Automatic cookie consent, modals, ads
  • Scroll & pagination - Auto-scroll for infinite scroll pages
  • Form filling - Login, search, filters (when configured)

5. Data Extraction

AI-powered extraction finds and structures your data:

  • Field detection - Automatically locate requested fields
  • Schema inference - Understand data types and relationships
  • Multi-page extraction - Follow links for complete datasets
  • Validation - Ensure data quality and completeness

6. Response Delivery

Cleaned data is returned in your preferred format:

{
"status": "success",
"data": {
"title": "Premium Wireless Headphones",
"price": "$299.99",
"description": "High-quality noise-canceling headphones..."
},
"metadata": {
"scraped_at": "2025-12-06T10:30:00Z",
"response_time": 2.4
}
}

Anti-Bot Bypass

Wryn uses multiple techniques to bypass anti-bot protection:

Residential Proxies

  • 10M+ residential IPs across 195 countries
  • Automatic rotation on detection
  • Geographic targeting for regional content

Browser Fingerprinting

  • Randomized user agents, screen resolutions, languages
  • Canvas, WebGL, and audio fingerprint randomization
  • Realistic mouse movements and timing

CAPTCHA Solving

  • Automatic CAPTCHA detection
  • Integration with solving services
  • Smart retry strategies

Rate Limit Handling

  • Automatic request throttling
  • Distributed request timing
  • Session reuse for efficiency

JavaScript Rendering

Modern websites rely heavily on JavaScript. Wryn handles this automatically:

Single Page Applications (SPAs)

  • React, Vue, Angular applications
  • Wait for AJAX requests to complete
  • Handle dynamic routing

Infinite Scroll

  • Auto-scroll to load more content
  • Detect end of content
  • Capture all items

Dynamic Content

  • Observe DOM mutations
  • Wait for elements to appear
  • Handle lazy-loaded images

Error Handling & Retries

Wryn automatically handles failures:

Automatic Retries

  • Network errors: 3 retries with exponential backoff
  • Rate limits: Intelligent throttling and retry
  • Blocked requests: Proxy rotation and retry

Failure Classification

  • Temporary - Network issues, rate limits (auto-retry)
  • Permanent - Invalid URL, content not found (return error)
  • Partial - Some data extracted (return with warnings)

Webhook Notifications

Get notified of scrape completion or errors:

{
"event": "scrape.completed",
"scrape_id": "scr_abc123",
"status": "success",
"url": "https://example.com/product/123"
}

Data Quality

Wryn ensures high-quality data output:

Validation

  • Type checking (numbers, dates, URLs)
  • Required field verification
  • Format normalization

Cleaning

  • HTML tag removal
  • Whitespace normalization
  • Character encoding fixes

Enrichment

  • Currency conversion
  • Date parsing and timezone handling
  • Image URL resolution

Scalability

Wryn handles any scale automatically:

Request Queueing

  • Priority queue for urgent requests
  • Batch processing for large jobs
  • Automatic load balancing

Parallel Processing

  • Concurrent browser instances
  • Distributed across regions
  • Smart resource allocation

Caching

  • Intelligent caching for repeated requests
  • Configurable TTL
  • Cache invalidation options

Security

Your data and credentials are protected:

Encryption

  • TLS 1.3 for all API requests
  • Encrypted storage for sensitive data
  • Secure credential management

Privacy

  • No data retention beyond processing
  • GDPR compliant
  • Credential isolation per account

Authentication

  • API key authentication
  • IP whitelisting (optional)
  • Request signing (enterprise)

Next Steps