Skip to main content

Scraping Basics

Essential concepts and best practices for web scraping with Wryn.

Understanding Web Scraping

Web scraping extracts data from websites by:

  1. Fetching HTML content
  2. Parsing the structure
  3. Extracting specific data points
  4. Formatting the output

Wryn automates this entire process.

Key Concepts

URLs

The target webpage address:

https://example.com/products/item-123

Fields

Data points you want to extract:

fields = ["title", "price", "description"]

Selectors

CSS selectors targeting specific elements:

selectors = {
"title": "h1.product-title",
"price": "span.price-value"
}

Best Practices

1. Start Simple

Begin with basic requests:

result = client.scrape(url="https://example.com")

2. Inspect the Page

Use browser DevTools to identify:

  • Element selectors
  • Dynamic content
  • Required wait times

3. Handle Errors

Always implement error handling:

try:
result = client.scrape(url)
except Exception as e:
log_error(e)

4. Respect Rate Limits

Add delays between requests:

import time
time.sleep(1) # 1 second delay

5. Test Thoroughly

Verify on multiple pages before scaling.

Common Patterns

Product Pages

product = client.scrape(
url="https://store.com/product/123",
fields=["title", "price", "rating", "reviews"]
)

List Pages

items = client.scrape(
url="https://store.com/category",
list_item={
"selector": "div.product-card",
"fields": {...}
}
)

Pagination

results = client.scrape(
url="https://store.com/products?page=1",
pagination={
"type": "next_button",
"selector": "a.next-page"
}
)

Next Steps