Job Description
About Centific Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We harness the power of an integrated solution ecosystemâcomprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 marketsâto create contextual, multilingual, pre-trained datasets; fine-tuned, industry-specific LLMs; and RAG pipelines supported by vector databases. Our zero-distance innovation⢠solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster. Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale, helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets. About Job AI Research Engineer: Vision AI / VLM / Physical AI Company: Centific Location: Seattle, WA (or Remote) Type: Fullâtime Build the Future of Perception & Embodied Intelligence Are you pushing the frontier of computer vision, multimodal large models, and embodied/physical AIâ and have the publications to show it ? Join us to translate cutting-edgeâ research into production systems that perceive, reason, and act in the real world. The Mission We are building state ofâ tâheart Vision AI across 2D/3D perception, egocentric/360° understanding, and multimodal reasoning. As an AI Research Engineer, you will own highâ âleverage experiments from paper â prototype â deployable module in our platform. We are seeking passionate Engineersto join our cutting-edge labs, you could be part of : Computer Vision team as a Research Engineer and dive into the world of 3D reconstruction, scene understanding, and visual AI. Youâll explore innovative techniques like those used to transform real-world spaces into immersive 3D modelsâsuch as the 3D Reconstruction projects âand work with cutting-edge architectures like VGG-T ( Visual Geometry Grounded Transformers) , known for advancing deep learning in vision tasks. This role is perfect for those excited to develop AI systems that interpret, reconstruct, and interact with the visual world, using state-of-the-art tools and methodologies. Physical AI Robotics team, where youâll work at the intersection of simulation, robotics, and AI. Youâll leverage NVIDIAâs Omniverse for advanced 3D simulation and collaboration, Isaac Sim for robotics training and testing, and GR00T for foundation models in robotics. Experience with Holoscan SDK for real-time medical and industrial robotics pipelines, Newton Physics for dynamic simulation, and NVIDIAâs NERD for neural robot dynamics will be a plus. This role is ideal for those eager to push the boundaries of AI-driven robotics using state-of-the-art tools and frameworks. What Youâll Do Advance Visual Perception: Build and fineâtune models for detection, tracking, segmentation (2D/3D), pose & activity recognition, and scene understanding (incl. 360° and multiâview). Multimodal Reasoning with VLMs: Train/evaluate visionâlanguage models (VLMs) for grounding, dense captioning, temporal QA, and tooluse; design retrievalâ augmented and agentic loops for perceptionâ actionâ tasks. Physical AI & Embodiment: Prototype perceptionâinâtheâloop policies that close the gap from pixels to actions (simulation + real data). Integrate with planners and task graphs for manipulation, navigation, or safety workflows. Data & Evaluation at Scale: Curate datasets, author highâsignal evaluation protocols/KPIs, and run ablations that make results irreproducible impossible . Systems & Deployment: Package research into reliable services on a modern stack (Kubernetes, Docker, Ray, FastAPI), with profiling, telemetry, and CI for reproducible science. Agentic Workflows: Orchestrate multi-agent pipelines (e.g., âLangGraphstyle graphs) that combine perception, reasoning, simulation, and âcodeg eneration to âselfc heck and âselfcorrectâ. Example Problems You Might Tackle Long horizonâ video understanding (events, activities, causality) from egocentric or 360° video. 3D scene grounding: linking language queries to objects, affordances, and trajectories. Fast, privacy preserving perception for âondeviceâ or edge inference. Robust multiâmodal evaluation: temporal consistency, openâset detection, uncertainty. Vision conditionedâ policy evaluation in sim (Isaac/MuJoCo) with sim2real stress tests. Minimum Qualifications Masters/Ph.D in CS/EE/Robotics (or related), actively publishing in CV/ML/Robotics (e.g., CVPR/ICCV/ECCV, NeurIPS/ICML/ICLR, CoRL/RSS). Strong PyTorch (or JAX) and Python; comfort with CUDA profiling and mixed precisionâ training. Demonstrated research in computer vision and at least one of: VLMs (e.g., LLaVA style, videoâ languageâ models), embodied/physical AI , 3D perception . Proven ability to move from paper â code â ablation â result with rigorous experiment tracking. Preferred Qualifications Experience with video models (e.g., TimeSFormer/MViT/VideoMAE), diffusion or 3D GS/NeRF pipelines, or SLAM/scene reconstruction . Prior work on multimodal grounding (referring expressions, spatial language, affordances) or temporal reasoning . Familiarity with ROS2 , DeepStream/TAO , or edge inference optimizations (TensorRT, ONNX). Scalable training: Ray , distributed data loaders, sharded checkpoints. Strong software craft: testing, linting, profiling, containers, and reproducibility. Public code artifacts (GitHub) and firstâauthor publications or strong open sourceâ impact. Our Stack (youâll touch a subset) Modeling: PyTorch, torchvision/lightning, Hugging Face, OpenMMLab, xFormers Perception: YOLO/Detectron/MMDet, SAM/Mask2Former, CLIPâstyle backbones, optical flow VLM / LLM: Vision encoders + LLMs, RAG for video, toolformerâ/agent loops 3D / Sim: Open3D, PyTorch3D, Isaac/MuJoCo, COLMAP/SLAM, NeRF/3DGS Systems: Python, FastAPI, Ray, Kubernetes, Docker, Triton/TensorRT, Weights & Biases Pipelines: LangGraphâlike orchestration, data versioning, artifact stores What Success Looks Like A publishable or openâsourced outcome (with company approval) or a productionâready module that measurably moves a product KPI (latency, accuracy, robustness). Clean, reproducible code with documented ablations and an evaluation report that a teammate can rerun endâtoâend. A demo that clearly communicates capabilities, limits, and next steps. Why Centific Real impact: Your research shipsâpowering core features in our MVPs and products. Mentorship: Work closely with our Principal Architect and senior engineers/researchers. Velocity + Rigor: We balance topâtier research practices with pragmatic product focus. Salary : $140K - $150K Annually Centific is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.