Data Collection Engineer
Your Role: Data Collection Engineer
As a Data Collection Engineer, you'll play a critical role in acquiring and structuring high-value external data that powers our core products. Your work will fuel our knowledge graph of millions of entities and directly support our mission to deliver transparency and insight into complex global networks.
You’ll work closely with engineering, research, and product teams to identify new data sources, develop reliable pipelines to gather, ingest, and structure that data, and continuously improve our ability to scale and adapt. You'll have ownership over how information flows into our platform — from design and architecture to reliability and performance — and help shape the systems that underpin our next generation of features and products.
What you'll do
- Design and implement systems to collect, extract, and normalize external data from a variety of sources.
- Collaborate with researchers and analysts to identify new sources of valuable company data and define integration strategies.
- Build robust, scalable pipelines that ingest structured and semi-structured data into our database.
- Ensure high levels of accuracy, coverage, and freshness across incoming data streams.
- Contribute to the evolution of our data platform and internal tooling.
- Improve system reliability, observability, and performance over time.
Who you are
- 3+ years of experience as a backend or full-stack software engineer, ideally working with data ingestion or ETL systems.
- Intimate knowledge of how to crawl the internet at scale.
- Strong programming skills, especially in Python.
- Experience working with structured and unstructured data from diverse external systems.
- Comfortable debugging complex issues involving networking, content rendering, or inconsistent source data.
- Proficient with SQL and relational databases.
- A clear communicator who collaborates effectively with both technical and non-technical teammates.
- Passionate about turning raw data into meaningful insight, and eager to work on technically nuanced challenges.
Ideally you'll have
- Familiarity with headless browser automation or techniques for collecting data from dynamic content sources.
- Expertise in the architure, technologies, and tools that run the modern internet such as DNS, networking, CDNs, WAFs, proxies and reverse proxies.
- Experience with event-driven architecture.
- Eagerness to incorporate new technologies and validate their usefulness using structured experiments and thorough testing.
- Experience building health monitoring and observability tools for consumption by automated tools, engineers, and non-technical stakeholders.
Recommended Jobs
Medical-Surgical / Telemetry Registered Nurse - MSTeleRN 26-01107
Job Title: Medical-Surgical / Telemetry Registered Nurse (MS/Tele RN) Location: Rochester, NY Assignment Duration: 13 Weeks Schedule: Night Shift | 3x12-Hour Shifts Shift Ho…
Dishwasher
Summary: The dishwasher oversees dishwashing operations and kitchen organization, ensuring efficient cleaning of all kitchen dishes, utensils, pots, pans, and equipment. Your role is vital to the sm…
Liquidity Sales Desk - MSIM - Analyst
The Liquidity Sales Desk Analyst is a key member of the U.S. Liquidity Sales Support team within MSIM Global Liquidity. This role supports corporate, financial institution, and cross-divisional sales…
26 Ft Box Truck Owner-Operators (Exclusive Contract)
Exclusive 6-Month Contract with Acies Transport (Work Under Our MC) Looking for stability, great rates, and a reliable partner in trucking? At Acies Transport, we ensure that our Owner-Operators ne…
Load Rating Manager
H&H is seeking a Load Rating Manager to join the firm’s New York City Asset Management team. This leadership role supports the continued expansion of H&H’s bridge inspection and structural services i…
Licensed Beauty Advisor - Part Time
Job ID: 281683 Store Name/Number: NY-Union Square (0398) Address: 45 E 17th Street, New York, NY 10003, United States (US) Hourly/Salaried: Hourly (Non-Exempt) Full Time/Part Time: Part …
Product Expert - Marketing Analytics
Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. As one of the fastest-growing SaaS companies in histo…
Scrum Master
Lighthouse Technology Services is partnering with our client to fill their Scrum Master II position! This is a 6+ month contract-to-hire opportunity and will be hybrid in Buffalo, NY. This role will …