Site Reliability Engineer
What You Will Do:-
Maintain record of system incidents, provide data driven analysis to identify patterns and offer recommendation for preventive measures
Identify components of the applications susceptible to performance and scalability issues.
You will have an opportunity to innovate in an emerging new field
Each day you can make important contributions in the development and architecture of automated solutions to continue building and optimizing our cloud and deployment infrastructure growing your own skills and the product
Building software to help DevOps, ITOps & support teams
Fixing support escalation issues
Optimizing on-call rotations & processes
Documenting âtribalâ knowledge
Act on incidents when it occurs, investigate, and fix issues in production systems
Conducting post-incident reviews
Develop scripts, metrics and alerts to administer health of the systems.
You will be valued for your contributions in a rapidly growing organization with dynamic opportunities and have an opportunity to learn the latest SDN technologies.
You will learn and be mentored directly from expert developers in the field
Required Technical and Professional Expertise :-
Experience in establishing, following, and improving upon procedures within a mission critical environment
Must be efficient in writing scripts using perl and python
Experience with configuration management systems (SALT/Ansible / Chef)
Experience using splunk and or ELK
Must be extremely comfortable using and navigating within a Linux environment
Musts have the ability to do high level debugging and problem analysis by examining logs and running Unix commands
Experience with github,
Excellent written and verbal communication skills
Experience in hands-on production administration of large system environment
Comfortable operating in fast paced environment
Understands how DNS/DHCP works
Experience in virtualization environments such as AWS / Softlayer /Zen /VMWARE
Preferred Technical And Professional Expertise :-
Monitoring experience such as Zabbix or Nagios
Proven experience in driving the stability of cloud platforms
JIRA Experience