We are looking for a talented engineer who is passionate about building fault-tolerant data services and analytics tools. Your work will be used by hundreds of engineers and have global impact by improving the quality and value of the end clients products.
- Design and implement fault-tolerant data pipelines to integrate large amounts of data from many diverse storage systems.
- Promote a culture of self-serve data analytics by minimizing technical barriers to data access and understanding.
- Execute complex data engineering projects that have a significant impact on clients global business.
- Share knowledge by clearly articulating results and ideas to customers, managers, and key decision makers.
- Stay current with the latest research and technology and communicate your knowledge throughout the enterprise
- Take responsibility for preparing data for analysis and provide critical feedback on issues of data integrity
- Up to 10% travel may be required.
- Degree Level: BS or MS in Computer Science or other engineering discipline
- 3+ years industry experience building and operating distributed data systems in production
- Very strong programming skills in Scala or Java
- Strong understanding in tuning and performance optimization of Apache Spark jobs
- Experience with integration of data from multiple data sources
- Experience with various messaging systems, such as Kafka or RabbitMQ
- Ability to manage and solve ongoing issues with a Spark/Hadoop cluster
- Familiarity with distributed machine learning frameworks like Spark MLlib
- General understanding of machine learning / deep learning methods