Senior Software Engineer, Text and Data Quality
Backend Engineer with experience in data pipelines and algorithm design.
About this role
At Littlebird, we're building systems that understand the world through text. The quality of our data is the foundation of our intelligence. We're looking for a pragmatic, algorithm-focused engineer to take on the critical challenge of cleaning and refining our raw text data at scale. This isn't a standard data plumbing role. You will design and build the core systems that transform noisy, semi-structured text into clean, coherent documents. This involves tackling complex problems like:
- Content De-duplication: Architecting systems to identify and merge near-duplicate or overlapping text content using techniques like shingling, MinHash, or other similarity algorithms.
- Signal & Noise Separation: Developing robust methods to strip non-essential content (e.g., UI boilerplate, ads, navigation) from raw inputs, using a combination of heuristics, pattern matching, and lightweight models.
- Text Normalization: Creating and optimizing high-performance pipelines that clean and structure text for downstream consumption by our core product and ML models.
The ideal candidate is a strong backend engineer (Python) who enjoys reasoning from first principles and has a deep appreciation for the craft of writing efficient, well-tested, and performance-conscious code. You should be comfortable designing algorithms, managing data pipelines with caching (Redis) and asynchronous processing, and making pragmatic trade-offs to solve ambiguous, real-world data problems.
If you are passionate about the foundational challenge of creating pristine data from messy inputs, let's chat.