Data Collection Engineer
Your Role: Data Collection Engineer
As a Data Collection Engineer, you'll play a critical role in acquiring and structuring high-value external data that powers our core products. Your work will fuel our knowledge graph of millions of entities and directly support our mission to deliver transparency and insight into complex global networks.
You’ll work closely with engineering, research, and product teams to identify new data sources, develop reliable pipelines to gather, ingest, and structure that data, and continuously improve our ability to scale and adapt. You'll have ownership over how information flows into our platform — from design and architecture to reliability and performance — and help shape the systems that underpin our next generation of features and products.
What you'll do
- Design and implement systems to collect, extract, and normalize external data from a variety of sources.
- Collaborate with researchers and analysts to identify new sources of valuable company data and define integration strategies.
- Build robust, scalable pipelines that ingest structured and semi-structured data into our database.
- Ensure high levels of accuracy, coverage, and freshness across incoming data streams.
- Contribute to the evolution of our data platform and internal tooling.
- Improve system reliability, observability, and performance over time.
Who you are
- 3+ years of experience as a backend or full-stack software engineer, ideally working with data ingestion or ETL systems.
- Intimate knowledge of how to crawl the internet at scale.
- Strong programming skills, especially in Python.
- Experience working with structured and unstructured data from diverse external systems.
- Comfortable debugging complex issues involving networking, content rendering, or inconsistent source data.
- Proficient with SQL and relational databases.
- A clear communicator who collaborates effectively with both technical and non-technical teammates.
- Passionate about turning raw data into meaningful insight, and eager to work on technically nuanced challenges.
Ideally you'll have
- Familiarity with headless browser automation or techniques for collecting data from dynamic content sources.
- Expertise in the architure, technologies, and tools that run the modern internet such as DNS, networking, CDNs, WAFs, proxies and reverse proxies.
- Experience with event-driven architecture.
- Eagerness to incorporate new technologies and validate their usefulness using structured experiments and thorough testing.
- Experience building health monitoring and observability tools for consumption by automated tools, engineers, and non-technical stakeholders.
Recommended Jobs
Speech-Language Pathologist, Promise Academy
Who We Are: At Harlem Children’s Zone, our mission is to break the cycle of intergenerational poverty and unlock new possibilities for our children, families, and communities. Promise Academy , …
Offer: Employee Relations Representative
Employee Relations Representative Sun River Health provides the highest quality of comprehensive primary, preventative and behavioral health services to all who seek it, regardless of insurance sta…
Senior Product Manager
WireScreen is a fast-growing Series A startup bringing clarity to one of the world’s most complex business landscapes. We’re building the go-to intelligence platform for navigating global supply chai…
Offer: Infrastructure Automation Engineer (2 Positions)
Infrastructure Automation Engineer (2 Positions) Infrastructure Automation Engineer (2 Positions) About University at Albany: Established in 1844 and designated a University Center of…
Job Offer: Accounting Intern - Fall/Spring
Accounting Intern - Fall/Spring Accounting Intern – Fall/Spring What is an Accounting Intern? An Accounting Intern works within our Accounting group. This intern will be working part…
Principal PMT, Sponsored Products and Brands, SPB-Agent
DESCRIPTION The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through advanced generative AI technologies - revolutionizing how millions of customers dis…
Full Time Faculty - Management and leadership abilities
Full Time Faculty - Management and leadership abilities The Full-time Faculty member is a faculty position with exempt status. Reporting to the Department Chair, this non-tenure track full-time fac…
Clean Energy Senior Specialist Products
Required Education/Experience Master's Degree and a minimum of two years relevant experience. or Bachelor's Degree and a minimum of three years relevant experience. or Associate…
Remote AI/ML Software Developer
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Remote AI/ML Software Engineer. In this role, you will design, build, and deploy secure and scalabl…
Staff Full Stack Software Engineer
C the Signs is dedicated to transforming the future of cancer care through our innovative AI-powered platform. We empower healthcare professionals and patients with the tools and information needed t…