Job Description
We are looking for a Python automation engineer to build a fully automated data pipeline that gathers AI company data from multiple sources (APIs + web scraping), deduplicates it intelligently, and outputs clean structured data to Airtable or Notion on a weekly schedule. You must have proven experience building production-grade scrapers, not basic scripts. Required: Strong Python (Scrapy, BeautifulSoup, requests) API integrations (REST, authenticated APIs) Experience automating recurring pipelines (cron jobs, scheduled tasks, etc.) Data cleaning, deduplication logic, CSV/JSON handling Ability to write clean, well-structured code Nice to have (not required): Selenium or Playwright Experience with Airtable/Notion API Experience with LLMs for data enrichment Deliverables: Scrapers for multiple AI-related sources (APIs + websites) Deduplication + merging logic across sources Weekly automated update pipeline Output to Airtable/Notion in structured columns Clear documentation so we can maintain it long-term
This project should take 2–3 weeks to build, with optional monthly maintenance.
If you’ve built multi-source scrapers before, please apply with examples. Apply tot his job