Responsibilities:
- Design, Develop and Implement Big data engineering projects in Hadoop ecosystem.
- Engineer solutions with Cloudera, MapR or HDP for both batch & streaming data with high quality and with a sense of urgency.
- Develop application and custom integration solutions using spark streaming and Hive.
- Understand specifications, plan, design and develop software solutions, adhering to process – either individually or collectively within a project team
- Work in state-of-the art programming languages and utilize object-oriented approaches in designing, coding, testing and debugging programs.
- Work with support teams in resolving operational & performance issues
- Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
- Integrate data from multiple data sources, Implementing ETL process using APACHE NIFI
- Monitoring performance and advising any necessary infrastructure changes
- Management of Hadoop cluster, with all included services such as Hive, HBase, mapReduce and Sqoop
- Cleaning data as per business requirements using streaming API’s or user defined functions.
- Build distributed, reliable and scalable data pipelines to ingest and process data in real-time, defining Hadoop Job Flows.
- Managing Hadoop jobs using scheduler.
- Apply different HDFS formats and structure like Parquet, Avro, etc. to speed up analytics.
- Work with various hadoop ecosystem tools like Hive, pig, Hbase , spark etc.
- Reviewing and managing Hadoop log files.
- Assess the quality of datasets for a hadoop data lake.
- Fine tune Hadoop applications for high performance and throughput.
- Troubleshoot and debug any Hadoop ecosystem run time issues.
Being a part of a POC effort to help build new Hadoop clusters
Education:
Bachelor’s Degree or higher in Computer Science, Information Systems or related engineering disciplines
General Knowledge, Skills & Abilities
- Be a good detail-oriented data engineer
- Systematic and organizational skills important.
- Willing to commit for completing deliverable on time.
Preferred Qualifications:
- Must have experience with Spark, Hive, Scala or py spark.
- Preferred experience in one of the following technologies: Nifi, Kafka, or any other streaming technologies.
- 3+ years experience in data engineering building ETL pipelines using JAVA or Python or Scala
- Should be good at Pig, HIVE scripting.
- Solid understanding of HDFS is important.
- Work experience within a Data Warehousing/Business Intelligence/Data analytics group, and have hand’s-on experience with Hadoop
- Create tables/views in Hive or other relevant scripting language
- Have experience with Agile development methodologies
- Experience with NoSQL databases, such as HBase, Cassandra, MongoDB.
Experience Architecting Solutions Utilizing any of the following:
- JAVA or Python or Scala programming languages
- Nifi, Kafka-topics, or any other streaming technologies
- Parquet/Avro/ORC/XML/JSON/ORC/CSV/TXT formats
Location: Jacksonville, FL
Artificial Intelligence
DataOps & Engineering
Digital Engineering
Enterprise Transformation
Healthcare
Engineering & Auto
Advanced Manufacturing
Supply Chain & Transportation
Public Sector & Strategic Markets
iPDLC (Lifecycle)
iTMS
Modular AI
Blogs
Use Cases