Job Description
- Job Description:
- Analyze real-time telemetry alerts and operational datasets to validate incidents, identify anomalies, and assess operational impact.
- Differentiate true events from false positives, duplicate alerts, and known recurring conditions.
- Correlate trends, timestamps, logs, and event data to support severity assessment and operational decision-making.
- Prepare clear, data-backed incident summaries to support client and stakeholder response activities.
- Provide analytical context and data-driven insights to clients and external vendors during incident investigations.
- Track remediation actions, response timelines, and outcomes using structured operational data.
- Support escalation and prioritization decisions through evidence-based analysis. Communicate findings clearly and concisely to operations, service management, and project stakeholders.
- Identify recurring incident patterns, systemic operational risks, and vendor performance trends.
- Highlight gaps in telemetry coverage, data quality, alert logic, or monitoring effectiveness.
- Recommend opportunities for alert tuning, automation, and improved monitoring or reporting practices.
- Support continuous improvement initiatives by surfacing actionable insights from operational data.
- Maintain accurate, auditable records of analyses, incidents, and findings in approved client and company tools.
- Contribute analytical insights to improve runbooks, alert definitions, and operational standards.
- Support initiatives aimed at reducing Mean Time to Resolution (MTTR) and minimizing repeat incidents.
- Requirements:
- Bachelor’s degree in Data Analytics, Data Science, Computer Science, Engineering (Electrical, Mechanical, Computer, Network), Information Systems, Information Technology, or a related technical field.
- 10+ years of experience in a data analyst, operations analyst, or telemetry/monitoring-focused role.
- Hands-on experience working with time-series data, logs, alerts, or event-driven datasets.
- Strong analytical skills with the ability to interpret noisy, incomplete, or real-time data.
- Excellent documentation, written, and verbal communication skills.
- Ability to operate independently in fast-paced, incident-driven environments.
- Comfort working across vendors and cross-functional technical teams.
- Experience supporting data centers, infrastructure operations, NOC/SOC environments, or other critical systems (preferred).
- Familiarity with incident management, problem management, or operational analytics practices (preferred).
- Exposure to telemetry, observability, or monitoring platforms (preferred).
- Understanding of root cause analysis concepts and service management frameworks (e.g., ITIL) (preferred).
- Benefits:
- paid time off
- performance-based bonuses
- excellent medical, vision and dental insurance
Apply tot his job
Apply To this Job