E-commerce Scraping
Extract product data, pricing, reviews, and inventory from any e-commerce website.
Use Cases
Price Monitoring
Track competitor prices across multiple retailers to stay competitive.
Inventory Tracking
Monitor stock levels and availability in real-time.
Product Research
Analyze product catalogs, specifications, and variations.
Review Analysis
Collect customer reviews and ratings for sentiment analysis.
Market Intelligence
Track trends, bestsellers, and category performance.
Common Data Points
Typical e-commerce data you can extract:
| Field | Description | Example |
|---|---|---|
title | Product name | "Apple iPhone 15 Pro Max" |
price | Current price | "$1,199.00" |
original_price | Before discount | "$1,299.00" |
currency | Price currency | "USD" |
availability | Stock status | "In Stock" |
sku | Product SKU | "IPHONE15PM-256-TI" |
brand | Brand name | "Apple" |
rating | Average rating | 4.8 |
reviews_count | Number of reviews | 1523 |
images | Product images | Array of URLs |
description | Full description | "The most advanced..." |
features | Key features | Array of strings |
specifications | Tech specs | Object |
variants | Size/color options | Array |
shipping | Shipping info | "Free shipping" |
Example: Product Details
Want to extract listing and fileds without analyzing website template
Use Auto extract to extract listing and its fields without knowing website scheme or template. it won't break based when there is a change in website template.
result = client.auto_listing(url="https://www.amazon.com/s?k=iphone+17+pro+max", engine=ENGINE.STEALTH_MODE)
Extracting filed if needed
from wrynai import WrynAI, Engine
client = WrynAI(api_key="your_api_key")
# Scrape product page
product = client.scrape(
url="https://amazon.com/dp/B0CHXDXZ42",
fields=[
"title",
"price",
"rating",
"reviews_count",
"images",
"description",
"features",
"availability"
],
options={
"render_js": True,
"wait_for": "#productTitle"
}
)
print(f"Product: {product.data['title']}")
print(f"Price: {product.data['price']}")
print(f"Rating: {product.data['rating']}/5 ({product.data['reviews_count']} reviews)")
print(f"In stock: {product.data['availability']}")
Example: Search Results
Scrape product listings from search or category pages:
# Scrape search results page
results = client.scrape(
url="https://ebay.com/sch/i.html?_nkw=wireless+headphones",
list_item: {
"selector": "li.s-item",
"fields": {
"title": "h3.s-item__title",
"price": "span.s-item__price",
"url": "a.s-item__link@href",
"image": "img.s-item__image-img@src",
"shipping": "span.s-item__shipping"
}
},
pagination: {
"type": "next_button",
"selector": "a.pagination__next",
"max_pages": 5
}
)
for item in results.data['items']:
print(f"{item['title']}: {item['price']}")
Example: Price Monitoring
Track prices over time:
import schedule
import time
from datetime import datetime
def monitor_price():
product = client.scrape(
url="https://target.com/p/product/-/A-12345",
fields=["title", "price", "availability"]
)
# Save to database
save_price_history({
"product_id": "12345",
"title": product.data['title'],
"price": product.data['price'],
"available": product.data['availability'],
"timestamp": datetime.now()
})
# Alert if price drops
if price_dropped(product.data['price']):
send_alert(f"Price drop! {product.data['title']} now {product.data['price']}")
# Run every hour
schedule.every(1).hours.do(monitor_price)
while True:
schedule.run_pending()
time.sleep(60)
Example: Review Scraping
Collect and analyze customer reviews:
# Scrape reviews
reviews = client.scrape(
url="https://amazon.com/product-reviews/B0CHXDXZ42",
list_item: {
"selector": "div[data-hook='review']",
"fields": {
"rating": "i[data-hook='review-star-rating']",
"title": "a[data-hook='review-title']",
"author": "span.a-profile-name",
"date": "span[data-hook='review-date']",
"verified": "span[data-hook='avp-badge']",
"text": "span[data-hook='review-body']",
"helpful_count": "span[data-hook='helpful-vote-statement']"
}
},
pagination: {
"type": "next_button",
"selector": "li.a-last a",
"max_pages": 10
}
)
# Analyze sentiment
positive_reviews = [r for r in reviews.data['items'] if float(r['rating']) >= 4]
print(f"Positive reviews: {len(positive_reviews)}/{len(reviews.data['items'])}")
Platform-Specific Examples
Amazon
amazon_product = client.scrape(
url="https://amazon.com/dp/B0CHXDXZ42",
fields=[
"title",
"price",
"rating",
"reviews_count",
"prime_eligible",
"delivery_date",
"seller",
"buybox_winner"
],
options={
"country": "US", # Target specific marketplace
"render_js": True
}
)
Shopify Stores
shopify_product = client.scrape(
url="https://store.com/products/example",
fields=[
"title",
"price",
"compare_at_price",
"variants",
"images",
"description",
"vendor"
],
options={
"wait_for": "product-json"
}
)
Etsy
etsy_listing = client.scrape(
url="https://etsy.com/listing/12345/handmade-item",
fields=[
"title",
"price",
"quantity_available",
"favorites",
"shop_name",
"materials",
"shipping_info",
"reviews"
]
)
Walmart
walmart_product = client.scrape(
url="https://walmart.com/ip/12345",
fields=[
"title",
"price",
"was_price",
"savings",
"rating",
"pickup_available",
"delivery_available",
"seller_name"
]
)
Bulk Product Scraping
Scrape multiple products efficiently:
# List of product URLs
product_urls = [
"https://example.com/product-1",
"https://example.com/product-2",
"https://example.com/product-3",
# ... hundreds more
]
# Batch scrape
results = client.scrape_batch(
requests=[
{"url": url, "fields": ["title", "price", "rating"]}
for url in product_urls
],
options={
"async": True,
"webhook_url": "https://your-app.com/webhook"
}
)
# Or use pagination for category pages
all_products = client.scrape(
url="https://example.com/category/electronics",
list_item: {
"selector": "div.product-card",
"fields": {
"title": "h3.product-title",
"price": "span.price",
"url": "a.product-link@href"
}
},
pagination: {
"type": "infinite_scroll",
"max_items": 1000
}
)
Handling Variants
Extract size, color, and other product variants:
product = client.scrape(
url="https://example.com/t-shirt",
fields=[
"title",
"base_price",
"variants"
],
options={
"extract_variants": True
}
)
# Result includes all variants
for variant in product.data['variants']:
print(f"{variant['size']} / {variant['color']}: {variant['price']}")
Competitor Analysis
def competitor_analysis_example():
"""Competitor analysis example (PRO feature)."""
api_key = os.environ.get("WRYNAI_API_KEY", "your-api-key-here")
with WrynAI(api_key=api_key) as client:
print("=" * 60)
print("Competitor Analysis Example (PRO)")
print("=" * 60)
try:
result = client.competitor_analysis(
keywords=[
"web scraping api",
"data extraction service",
"serp api",
],
competitors=[
"scraperapi.com",
"scrapingbee.com",
"brightdata.com",
],
country_code=CountryCode.US,
language="en",
timeout_ms=120000,
)
# Analysis Summary
print("\nAnalysis Summary:")
keywords_analyzed = result.analysis_summary.get("keywords_analyzed", 0)
total_competitors = result.analysis_summary.get(
"total_competitors_found", 0
)
keywords_failed = result.analysis_summary.get("keywords_failed", 0)
print(f" Keywords Analyzed: {keywords_analyzed}")
print(f" Competitors Found: {total_competitors}")
print(f" Keywords Failed: {keywords_failed}")
print()
# Keyword Analysis
print("Keyword Analysis:")
for keyword, analysis in result.keyword_analysis.items():
print(f"\n '{keyword}':")
print(
f" Competitors in Top 10: {analysis.competitor_count_in_top_10}"
)
print(f" SERP Features: {analysis.serp_features}")
if analysis.competitor_rankings:
print(" Competitor Rankings:")
for domain, ranking in analysis.competitor_rankings.items():
print(f" - {domain}: Position {ranking.position}")
# Competitor Insights
if result.competitor_insights.get("top_competitors"):
print("\n\nTop Competitor Insights:")
for competitor in result.competitor_insights["top_competitors"]:
domain = competitor.get("domain", "Unknown")
avg_pos = competitor.get("average_position", "N/A")
total_keywords = competitor.get("total_keywords_ranking", 0)
top_10_rate = competitor.get("top_10_rate", 0)
perf_score = competitor.get("performance_score", 0)
print(f"\n {domain}:")
print(f" Average Position: {avg_pos}")
print(f" Keywords Ranking: {total_keywords}")
print(f" Top 10 Rate: {top_10_rate}%")
print(f" Performance Score: {perf_score}")
# Ranking Opportunities
if result.competitor_insights.get("ranking_opportunities"):
print("\n\nRanking Opportunities:")
for opp in result.competitor_insights["ranking_opportunities"]:
print(f" Keyword: {opp.get('keyword', 'Unknown')}")
print(f" Opportunity: {opp.get('opportunity', 'N/A')}")
print(
f" Potential Gain: {opp.get('potential_gain', 0)} positions"
)
except WrynAIError as e:
print(f"Competitor analysis failed: {e}")
Best Practices
1. Respect Rate Limits
E-commerce sites often have strict rate limits:
import time
for url in product_urls:
result = client.scrape(url)
time.sleep(2) # 2 second delay
2. Use Realistic User Agents
Mimic real browsers:
result = client.scrape(
url="https://example.com/product",
options={
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)",
"headers": {
"Accept-Language": "en-US,en;q=0.9"
}
}
)
3. Handle Price Formatting
Normalize prices for analysis:
def parse_price(price_str):
# Remove currency symbols and commas
price = price_str.replace('$', '').replace(',', '')
return float(price)
price = parse_price(product.data['price']) # "$1,299.99" -> 1299.99
4. Monitor for Changes
Track when product pages change:
result = client.scrape(
url="https://example.com/product",
options={
"webhook_url": "https://your-app.com/webhook",
"monitor": True,
"check_interval": "1h"
}
)
5. Handle Out of Stock
Check availability before processing:
product = client.scrape(url="https://example.com/product")
if "out of stock" in product.data.get('availability', '').lower():
print("Product unavailable")
else:
process_product(product.data)
Legal Considerations
Always review the target website's Terms of Service and robots.txt before scraping. Some websites explicitly prohibit automated access. Wryn provides the technology, but you are responsible for compliance.
Best Practices:
- Check
robots.txtfiles - Review Terms of Service
- Respect
noindexmeta tags - Add reasonable delays
- Use data responsibly
Next Steps
- API Reference - Complete endpoint documentation
- Guides - Pagination - Handle multi-page scraping
- Integrations - Connect to your tools
- Pricing - Scale your scraping operations
Need Help?
- FAQ - Common questions
- Support - Contact our team
- Email: support@wryn.io