Software Architect, Agent Evaluation & Core Framework

🌍 Remote, USA 🎯 Full-time 🕐 Posted Recently

Job Description

About Datagrid Forget everything you know about AI assistants. At Datagrid, we’re building AI agents that actually do the work. We’re a team of passionate, hard-working builders, thinkers, and problem-solvers who are genuinely excited about what we do. Our mission is to supercharge the workday by turning complex data and tedious workflows into simple, automated actions. It’s an incredibly exciting time to join us—we’re growing fast, expanding our platform’s capabilities, and partnering with enterprise customers who want to 10x their teams’ output.

We thrive on collaboration and are looking for people who are ready to make a tangible impact. If you want to be part of a team that’s not just talking about the future of AI but actively creating it, you’ve come to the right place. Our Values At Datagrid, our values guide how we work, build, and grow together. Act with Purpose: Everything we do is tied to our mission. You’ll see the impact of your work as we move quickly to solve meaningful problems for our customers. Own the Outcome: We believe in true ownership.

You’ll take responsibility for your projects and see them through to success—empowered to make decisions that drive real results. Clarity without Ego: We value honesty, transparency, and trust. You can expect and provide direct feedback in an environment where candor sharpens our ideas and strengthens our team. Creativity with Purpose: Innovation is central to our culture. Your creative thinking will be valued and directed toward solving real-world challenges and creating lasting impact. About the role Datagrid Agents operate where our customers work-across Teams, Slack, and even SMS.

Agents make multistep plans, leverage vectorized data from 100+ sources, use tools like Docusign, and manipulate the Datagrid app Software Architect, Agent Evaluation & Core Framework, is crucial because we cannot manually test the vast array of agent interactions and capabilities. You will own and drive extending our evaluation harness to provide actionable reports on agent regressions and improvements, directly impacting strategic direction and customer experience. A key part of this will be incorporating the best open-source benchmarks into our evaluation set, and figuring out how to Agentically generate evaluations that are representative of customer use cases.

As you become established, you will also have the opportunity to make fundamental changes to the Core Framework to improve the way Agents reason, use tools, and collaborate with humans. What you'll do: • Work closely with an Ex Googler who built Gemini evals to create a harness for evaluating Agent performance, make that harness available both for local development an bolthires/CD pipeline, and set up alerts when Agents misbehave. • Influence and contribute to the extension of Datagrid’s Agentic capabilities.

• Choose the best open/closed source components to build out the testing infra. • Integrate publicly available benchmarks such as RAGBench into the testing system. • Grant subject matter experts the ability to add to the test library using customer queries, manually authored cases, and synthetically generated questions. • Expose evaluation performance via alerts and dashboards What we're looking for: • Proven track record of building test harnesses for Chat Agents from 0 ⇒ 1. • 10+ years of B2B software engineering experience.

• Ability to write effective LLM prompts without assistance. • Proficiency with nodejs and server side frameworks such as NestJS or NextJS. • Familiarity with JavaScript frameworks such as React, Angular JS. • Experience with databases such as Weaviate and BigQuery. • Experience working with GCP or similar cloud providers. Who we're looking for: • Experience with any LLM evaluation platform (Galileo, Arize, LangSmith Orq) • Background in B2B SaaS automation tools • Contributions to open-source AI projects or published research • Familiarity with prompt engineering or model evaluation Pay Range and Benefits • Salary Range: $200,000 - $240,000 • Generous equity compensation • Flexible vacation/time-off policy • All U.S.

federal holidays observed, plus an additional company-wide Week of Rest in December • Competitive benefits package - 100% premium coverage for employees and generous coverage for dependents • Work-from-home stipend to support your ideal setup • 401(k) plan

The base pay range bolthires for the role seniority described in this job description is between $200,000 - $240,000. Final offer amounts depend on multiple factors such as candidate experience and expertise, geographic location, total compensation, and market data.

In addition to cash pay, full-time regular positions are eligible for equity, 401(k), health benefits, and other benefits; some of these benefits may be available for part-time or temporary positions. Apply tot his job