Client Image

Site Reliability Engineer

Apply
Programming
|  Posted on:
January 14, 2021

Site Reliability Engineer

Site Reliability Engineer
Full Remote
50000 US Dollar - 100000 US Dollar
Bachelor or Master
40

Job Details



Job description





We are growing and our Operations Department is  looking for support to join our international team!



Responsibilities





  • Daily interactions ensuring the health and maintenance of systems in different geographical locations: hardware, software, application and network are operating at peak performance


  • Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes


  • Troubleshoot issues across the entire stack: hardware, software, application and network


  • Drive standardization efforts across multiple disciplines and services in conjunction with SREs throughout the organization


  • Identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services


  • Represent the SRE organization in design reviews and operational readiness exercises for new and existing services


  • Work with software engineers to improve upon deployment processes


  • Participate in the on-call rotation for production systems




Requirements





  • Sound fundamentals in operating systems, networking, and distributed systems


  • Strong familiarity with Linux systems administration and management best practices


  • Familiarity with container technologies: Kubernetes, CRI, Docker, namespaces, cgroups


  • Strong understanding of: Ethernet, VLANs, IPv4/IPv6, ARP, DHCP, DNS, and TCP


  • Familiarity with distributed system problems: leader election, Raft consensus, etc.


  • Solid understanding of systems and application design, including the operational trade-offs of various designs


  • Expert level understanding with at least one public or private cloud technology such as Amazon AWS, Google GKE, or OpenStack


  • Practical knowledge of various aspects of service design, including messaging protocols and behavior, caching strategies and software design practices   


  • Practical intermediate knowledge of shell scripting, some Ruby is a plus


  • Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures


  • Excellent knowledge of Linux/UNIX systems administration and performance tuning


  • Comfortable configuring DNS, DHCP, and LAN/WAN technologies


  • Minimum 5 years of managing services in an internet scale *nix environment


  • Must be able to communicate well with technical as well as non-technical colleagues to achieve business goals


  • Must be adaptable and able to focus on the simplest, most efficient and reliable solutions


  • Track record of successful practical problem solving, excellent written and interpersonal communication in English, and documentation skills


  • Curiosity and an interest in networking, systems software, and distributed systems


  • Experience as a systems administrator or operations engineer


  • Experience with a 24/7 production environment


  • Experience with managed deployments providing software, platforms, or infrastructure as a service


  • Experience with Mellanox and Vyatta based networking gear is a plus


  • Experience with SuperMicro server and storage gear is a plus




Job Type

Employment
Full-Time

Category