Job Details
Pyspark(3 to 9 years)Strong hands on experience on PySpark Good experience in AWS services, Oozie, Airflow Good Understanding experience in Hadoop, Hive, Oozie, HDFS, YARN, Sqoop Should have experience/understanding of AWS design and architectural concepts Clear in communication, ability to understand and articulates solution clearlyBuilding ETL/ELT jobs for batch data with HiveQL and Spark (Scala/Java/Pyspark), Scheduling Jobs using oozie, Data loads/extraction using Sqoop Building Real-Time ingestion using Kafka and Spark Streaming, Data flow pipelines using Ni-Fi Building data pipelines within Big Data Eco-Systems with large structured/unstructured data from multiple sources Implementing large scale data platforms, Ingestion Automation Frameworks, Data Ops with Industry standards, reusable data products and business ready data by utilizing modern and open source technologies, Building automating end to end data lifecycle through CI/CD processes/tools, using Docker & Github