Scraping Basics
Essential concepts and best practices for web scraping with Wryn.
Understanding Web Scraping
Web scraping extracts data from websites by:
- Fetching HTML content
- Parsing the structure
- Extracting specific data points
- Formatting the output
Wryn automates this entire process.
Key Concepts
URLs
The target webpage address:
https://example.com/products/item-123
Fields
Data points you want to extract:
fields = ["title", "price", "description"]
Selectors
CSS selectors targeting specific elements:
selectors = {
"title": "h1.product-title",
"price": "span.price-value"
}
Best Practices
1. Start Simple
Begin with basic requests:
result = client.scrape(url="https://example.com")
2. Inspect the Page
Use browser DevTools to identify:
- Element selectors
- Dynamic content
- Required wait times
3. Handle Errors
Always implement error handling:
try:
result = client.scrape(url)
except Exception as e:
log_error(e)
4. Respect Rate Limits
Add delays between requests:
import time
time.sleep(1) # 1 second delay
5. Test Thoroughly
Verify on multiple pages before scaling.
Common Patterns
Product Pages
product = client.scrape(
url="https://store.com/product/123",
fields=["title", "price", "rating", "reviews"]
)
List Pages
items = client.scrape(
url="https://store.com/category",
list_item={
"selector": "div.product-card",
"fields": {...}
}
)
Pagination
results = client.scrape(
url="https://store.com/products?page=1",
pagination={
"type": "next_button",
"selector": "a.next-page"
}
)