Summary of Responsibilities:
- Supervise a 24x7 Tier1 Network Operation Centers (NOC) shift operations and personnel,
- Responsible for managing production incidents, outages, SLA, uptime, service availability, root cause analysis
- Responsible to work as Ops Program manager and working closely with remote counterpart Ops program manager in US to get project done on time.
- Manages the process to restore normal service operation as quickly as possible to minimize the impact to business operations.
- Oversee day to day Tier1 NOC operations, escalations, ticketing and communications with all stakeholders primarily with Tier2 SRE.
- NOC is a fast paced environment, critical thinking is essential. Ideas will be extrapolated from one situation to another. Strict adherence to defined SLA, Production runbooks and MOPs (Method of Procedure), Company policies, Confidentiality and mature judgment must be demonstrated at all times.
Primary Responsibilities & Objectives:
- Managing the outsourced NOC team and ensuring they manage the operation in accordance with identified SLA, Policies and Procedures.
- Leading technical bridges and coordinating service restoration efforts with various stakeholders including SRE, Customer Support, Engineering, Vendors and Third-party service providers to assure continued QOS and stable operations.
- Responsible for the quality and integrity of the Incident Management process and End to end Management of the lifecycle of Major incidents, identify and capture impact and urgency.
- Managing break-fix activities during incidents to provide any workaround solution incident resolution
- Experience with root cause analysis of critical business and production issues
- Manage the Ops program cycle from initiation through delivery, interfacing with external customers and Vendors as needed.
- Formulate and monitor execution plan form inter-connected initiatives Ops projects. Working closely with remote counterpart Ops program manager in US to get project done on time.
- Expectation to run scrum meetings, track milestones and keep the stakeholders updated of progress
- Review, Refine and further develop support documentations, processes, procedures and system requirements within the NOC.
- Generate key reports for Management including but not limited to: system availability, service level agreements, ticket resolution and production incidents and root cause analysis
- Manage release, hot fix, other production deployments including critical script run and ensure NOC team adhere to Release deployment MOPs.
- Report deployment report status to leadership. Work closely with SRE and engineering for successful release deployment in production.
- Expert in proactive monitoring, alerting, trend analysis and self-healing systems
- Continuously improve and manage systems to proactively monitor the infrastructure which demands deep troubleshooting and scripting skills to improve the availability, capacity, and security of BlueJeans Cloud Services
- Participate in on-call rotations, driving restoration and repair of service-impacting issues
- Strong mentoring and coaching skills that encourage growth for more junior members
- Clear understanding of SRE and NOC best practices, and the product development lifecycle.
- Subject to call 24 hours, 7 days a week
Education, Experience, & Skills Required:
- 9-12 years of experience as a Manager working on a SaaS product running on Private and Public Data centers
- With 5+ year of experience in managing NOC environment
- Bachelor’s Degree or equivalent
- Hands on experience in the public cloud, specifically Amazon Web Services (AWS)
- Strong scripting experience in Python/Bash and good understanding of scripts
- Experience in Linux System Administration
- Experience in technical program management
- Experience with monitoring solutions (e.g. DataDog, Icinga, New Relic)
- Experience in Infrastructure as code using Chef, Terraform, Ansible, Cloudformation
- Experience with Continuous Integration and Continuous Delivery concepts using Jenkins and Rundeck
- Experience with elastically scalable, fault tolerance and other cloud architecture patterns
- Experience with modern cloud development practices (Microservices architectures, REST interfaces, etc.)
- Ability to design roadmaps and relevant technical documentations
- Aware of standard network best practices and integration of all tools;
- Knowledge of H.323, SIP, Microsoft Lync and any other technologies in video / voice conferencing
- Virtualization management and integration (ESX,OpenStack)
- Familiarity with Containerization and Orchestrations concepts like Docker, Containers, Kubernetes and NoSQL
- CCNA / CCNP/CCIE AWS / Azure Certification is big plus
Soft Skills Required:
- Good communicator and highly adaptive
- Ability to interact efficiently with peers and customers is required;
- Ability to multitask effectively and be effective mentor and technical leader to team members
- Ability to and proven success working cross-functionally with demonstration of effective team work and interpersonal skills
- Takes responsibility and ownership for decisions, actions and results. Accountable for both how and what is accomplished
- Be a self-starter, quick learner, has a strong attention to detail, and works well in independent situations
- Proactiveness and Resourcefulness
- Possess strong mentoring and coaching skills