Qualifications:
Preferred skills and experience include:
• Bachelor's degree in maths, statistics computer science, information management, finance or economics
• At least 3 years' experience integrating data into analytical platforms using patterns like API, files, XML, json, flatfiles, Hadoop file formats, and Cloud file formats.
• Experience in ingestion technologies (e.g. sqoop, nifi, flume), processing technologies (Spark/Scala) and storage (e.g. HDFS, HBase, Hive) are essential
• Experience in designing and building data pipelines using Cloud platform solutions and native tools.
• Experience in Python, JVM-compatible languages, use of CICD tools like Jenkins, Bitbucket, Nexus, Sonarqube
• Experience in data profiling, source-target mappings, ETL development, SQL optimisation, testing and implementation.
• Expertise in streaming frameworks (Kafka/Spark Streaming/Storm) essential
• Experience managing structured and unstructured data types
• Experience in requirements engineering, solution architecture, design, and development / deployment
• Experience in creating big data or analytics IT solution
• Track record of implementing databases and data access middleware and high-volume batch and (near) real-time processing
Job Description:
• Implement request for ingestion, creation, and preparation of data sources
• Develop and execute jobs to import data periodically/ (near) real-time from an external source
• Setup a streaming data source to ingest data into the platform
• Delivers data sourcing approach and data sets for analysis, with activities including data staging, ETL, data quality, and archiving
• Design a solution architecture on both On-premises and Cloud platforms to meet business, technical and user requirements
• Profile source data and validate fit-for-purpose
• Works with Delivery lead and Solution Architect to agree pragmatic means of data provision to support use cases
• Understands and documents end user usage models and requirements