Skip to main content

E-commerce Scraping

Extract product data, pricing, reviews, and inventory from any e-commerce website.

Use Cases

Price Monitoring

Track competitor prices across multiple retailers to stay competitive.

Inventory Tracking

Monitor stock levels and availability in real-time.

Product Research

Analyze product catalogs, specifications, and variations.

Review Analysis

Collect customer reviews and ratings for sentiment analysis.

Market Intelligence

Track trends, bestsellers, and category performance.

Common Data Points

Typical e-commerce data you can extract:

FieldDescriptionExample
titleProduct name"Apple iPhone 15 Pro Max"
priceCurrent price"$1,199.00"
original_priceBefore discount"$1,299.00"
currencyPrice currency"USD"
availabilityStock status"In Stock"
skuProduct SKU"IPHONE15PM-256-TI"
brandBrand name"Apple"
ratingAverage rating4.8
reviews_countNumber of reviews1523
imagesProduct imagesArray of URLs
descriptionFull description"The most advanced..."
featuresKey featuresArray of strings
specificationsTech specsObject
variantsSize/color optionsArray
shippingShipping info"Free shipping"

Example: Product Details

PRO Usage

Want to extract listing and fileds without analyzing website template

Use Auto extract to extract listing and its fields without knowing website scheme or template. it won't break based when there is a change in website template.


result = client.auto_listing(url="https://www.amazon.com/s?k=iphone+17+pro+max", engine=ENGINE.STEALTH_MODE)

Extracting filed if needed

from wrynai import WrynAI, Engine

client = WrynAI(api_key="your_api_key")

# Scrape product page
product = client.scrape(
url="https://amazon.com/dp/B0CHXDXZ42",
fields=[
"title",
"price",
"rating",
"reviews_count",
"images",
"description",
"features",
"availability"
],
options={
"render_js": True,
"wait_for": "#productTitle"
}
)

print(f"Product: {product.data['title']}")
print(f"Price: {product.data['price']}")
print(f"Rating: {product.data['rating']}/5 ({product.data['reviews_count']} reviews)")
print(f"In stock: {product.data['availability']}")

Example: Search Results

Scrape product listings from search or category pages:

# Scrape search results page
results = client.scrape(
url="https://ebay.com/sch/i.html?_nkw=wireless+headphones",
list_item: {
"selector": "li.s-item",
"fields": {
"title": "h3.s-item__title",
"price": "span.s-item__price",
"url": "a.s-item__link@href",
"image": "img.s-item__image-img@src",
"shipping": "span.s-item__shipping"
}
},
pagination: {
"type": "next_button",
"selector": "a.pagination__next",
"max_pages": 5
}
)

for item in results.data['items']:
print(f"{item['title']}: {item['price']}")

Example: Price Monitoring

Track prices over time:

import schedule
import time
from datetime import datetime

def monitor_price():
product = client.scrape(
url="https://target.com/p/product/-/A-12345",
fields=["title", "price", "availability"]
)

# Save to database
save_price_history({
"product_id": "12345",
"title": product.data['title'],
"price": product.data['price'],
"available": product.data['availability'],
"timestamp": datetime.now()
})

# Alert if price drops
if price_dropped(product.data['price']):
send_alert(f"Price drop! {product.data['title']} now {product.data['price']}")

# Run every hour
schedule.every(1).hours.do(monitor_price)

while True:
schedule.run_pending()
time.sleep(60)

Example: Review Scraping

Collect and analyze customer reviews:

# Scrape reviews
reviews = client.scrape(
url="https://amazon.com/product-reviews/B0CHXDXZ42",
list_item: {
"selector": "div[data-hook='review']",
"fields": {
"rating": "i[data-hook='review-star-rating']",
"title": "a[data-hook='review-title']",
"author": "span.a-profile-name",
"date": "span[data-hook='review-date']",
"verified": "span[data-hook='avp-badge']",
"text": "span[data-hook='review-body']",
"helpful_count": "span[data-hook='helpful-vote-statement']"
}
},
pagination: {
"type": "next_button",
"selector": "li.a-last a",
"max_pages": 10
}
)

# Analyze sentiment
positive_reviews = [r for r in reviews.data['items'] if float(r['rating']) >= 4]
print(f"Positive reviews: {len(positive_reviews)}/{len(reviews.data['items'])}")

Platform-Specific Examples

Amazon

amazon_product = client.scrape(
url="https://amazon.com/dp/B0CHXDXZ42",
fields=[
"title",
"price",
"rating",
"reviews_count",
"prime_eligible",
"delivery_date",
"seller",
"buybox_winner"
],
options={
"country": "US", # Target specific marketplace
"render_js": True
}
)

Shopify Stores

shopify_product = client.scrape(
url="https://store.com/products/example",
fields=[
"title",
"price",
"compare_at_price",
"variants",
"images",
"description",
"vendor"
],
options={
"wait_for": "product-json"
}
)

Etsy

etsy_listing = client.scrape(
url="https://etsy.com/listing/12345/handmade-item",
fields=[
"title",
"price",
"quantity_available",
"favorites",
"shop_name",
"materials",
"shipping_info",
"reviews"
]
)

Walmart

walmart_product = client.scrape(
url="https://walmart.com/ip/12345",
fields=[
"title",
"price",
"was_price",
"savings",
"rating",
"pickup_available",
"delivery_available",
"seller_name"
]
)

Bulk Product Scraping

Scrape multiple products efficiently:

# List of product URLs
product_urls = [
"https://example.com/product-1",
"https://example.com/product-2",
"https://example.com/product-3",
# ... hundreds more
]

# Batch scrape
results = client.scrape_batch(
requests=[
{"url": url, "fields": ["title", "price", "rating"]}
for url in product_urls
],
options={
"async": True,
"webhook_url": "https://your-app.com/webhook"
}
)

# Or use pagination for category pages
all_products = client.scrape(
url="https://example.com/category/electronics",
list_item: {
"selector": "div.product-card",
"fields": {
"title": "h3.product-title",
"price": "span.price",
"url": "a.product-link@href"
}
},
pagination: {
"type": "infinite_scroll",
"max_items": 1000
}
)

Handling Variants

Extract size, color, and other product variants:

product = client.scrape(
url="https://example.com/t-shirt",
fields=[
"title",
"base_price",
"variants"
],
options={
"extract_variants": True
}
)

# Result includes all variants
for variant in product.data['variants']:
print(f"{variant['size']} / {variant['color']}: {variant['price']}")

Competitor Analysis

def competitor_analysis_example():
"""Competitor analysis example (PRO feature)."""
api_key = os.environ.get("WRYNAI_API_KEY", "your-api-key-here")

with WrynAI(api_key=api_key) as client:
print("=" * 60)
print("Competitor Analysis Example (PRO)")
print("=" * 60)

try:
result = client.competitor_analysis(
keywords=[
"web scraping api",
"data extraction service",
"serp api",
],
competitors=[
"scraperapi.com",
"scrapingbee.com",
"brightdata.com",
],
country_code=CountryCode.US,
language="en",
timeout_ms=120000,
)

# Analysis Summary
print("\nAnalysis Summary:")
keywords_analyzed = result.analysis_summary.get("keywords_analyzed", 0)
total_competitors = result.analysis_summary.get(
"total_competitors_found", 0
)
keywords_failed = result.analysis_summary.get("keywords_failed", 0)
print(f" Keywords Analyzed: {keywords_analyzed}")
print(f" Competitors Found: {total_competitors}")
print(f" Keywords Failed: {keywords_failed}")
print()

# Keyword Analysis
print("Keyword Analysis:")
for keyword, analysis in result.keyword_analysis.items():
print(f"\n '{keyword}':")
print(
f" Competitors in Top 10: {analysis.competitor_count_in_top_10}"
)
print(f" SERP Features: {analysis.serp_features}")

if analysis.competitor_rankings:
print(" Competitor Rankings:")
for domain, ranking in analysis.competitor_rankings.items():
print(f" - {domain}: Position {ranking.position}")

# Competitor Insights
if result.competitor_insights.get("top_competitors"):
print("\n\nTop Competitor Insights:")
for competitor in result.competitor_insights["top_competitors"]:
domain = competitor.get("domain", "Unknown")
avg_pos = competitor.get("average_position", "N/A")
total_keywords = competitor.get("total_keywords_ranking", 0)
top_10_rate = competitor.get("top_10_rate", 0)
perf_score = competitor.get("performance_score", 0)

print(f"\n {domain}:")
print(f" Average Position: {avg_pos}")
print(f" Keywords Ranking: {total_keywords}")
print(f" Top 10 Rate: {top_10_rate}%")
print(f" Performance Score: {perf_score}")

# Ranking Opportunities
if result.competitor_insights.get("ranking_opportunities"):
print("\n\nRanking Opportunities:")
for opp in result.competitor_insights["ranking_opportunities"]:
print(f" Keyword: {opp.get('keyword', 'Unknown')}")
print(f" Opportunity: {opp.get('opportunity', 'N/A')}")
print(
f" Potential Gain: {opp.get('potential_gain', 0)} positions"
)

except WrynAIError as e:
print(f"Competitor analysis failed: {e}")

Best Practices

1. Respect Rate Limits

E-commerce sites often have strict rate limits:

import time

for url in product_urls:
result = client.scrape(url)
time.sleep(2) # 2 second delay

2. Use Realistic User Agents

Mimic real browsers:

result = client.scrape(
url="https://example.com/product",
options={
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)",
"headers": {
"Accept-Language": "en-US,en;q=0.9"
}
}
)

3. Handle Price Formatting

Normalize prices for analysis:

def parse_price(price_str):
# Remove currency symbols and commas
price = price_str.replace('$', '').replace(',', '')
return float(price)

price = parse_price(product.data['price']) # "$1,299.99" -> 1299.99

4. Monitor for Changes

Track when product pages change:

result = client.scrape(
url="https://example.com/product",
options={
"webhook_url": "https://your-app.com/webhook",
"monitor": True,
"check_interval": "1h"
}
)

5. Handle Out of Stock

Check availability before processing:

product = client.scrape(url="https://example.com/product")

if "out of stock" in product.data.get('availability', '').lower():
print("Product unavailable")
else:
process_product(product.data)
Legal Notice

Always review the target website's Terms of Service and robots.txt before scraping. Some websites explicitly prohibit automated access. Wryn provides the technology, but you are responsible for compliance.

Best Practices:

  • Check robots.txt files
  • Review Terms of Service
  • Respect noindex meta tags
  • Add reasonable delays
  • Use data responsibly

Next Steps

Need Help?