Certified Big Data Engineer


HRDF INDCERT : 50% funding (applicable to HRDF members & classroom-based training)

VILT (Virtual Instructor-Led Training): RM4,200 / pax


The Big Data Engineer track is comprised of 5 modules. The final course module consists of a series of lab exercises that require participants to apply their knowledge of the preceding courses in order to fulfill project requirements and solve real world problems. Completion of these courses as part of a virtual or on-site workshop results in each participant receiving an official digital Certificate of Completion, as well as a digital Training Badge from Acclaim/Credly.

A Certified Big Data Engineer has demonstrated proficiency in designing and utilizing Big Data solutions (using Hadoop, MapReduce and other tools), with an emphasis on Big Data mechanisms used to enable data processing, data storage and the establishment of Big Data pipelines. Depending on the exam format chosen, attaining the Big Data Engineer Certification can require passing a single exam or multiple exams. Those who achieve this certification receive an official digital Certificate of Excellence, as well as a digital Certification Badge from Acclaim/Credly with an account that supports the online verification of certification status.


5 days

Course Outline:
Module 1: Fundamental Big Data

This foundational module provides a high-level overview of essential Big Data topic areas. A basic understanding of Big Data from business and technology perspectives is provided, along with an overview of common benefits, challenges, and adoption issues. The course content is divided into a series of modular sections, each of which is accompanied by one or more hands-on exercises.

Primary topics covered are:
– Understanding Big Data
– Fundamental Terminology & Concepts
– Big Data Business & Technology Drivers
– Traditional Enterprise & Technologies Related to Big Data
– Characteristics of Data in Big Data Environments
– Dataset Types in Big Data Environments
– Fundamental Analysis and Analytics
– Machine Learning Types
– Business Intelligence & Big Data
– Data Visualization & Big Data
– Big Data Adoption & Planning Considerations

Module 2: Big Data Analysis, Technology Concepts

This module explores a range of the most relevant topics that pertain to contemporary analysis practices, technologies and tools for Big Data environments. The course content does not get into implementation or programming details, but instead keeps coverage at a conceptual level, focusing on topics that enable participants to develop a comprehensive understanding of the common analysis functions and features offered by Big Data solutions, as well as a high-level understanding of the back-end components that enable these functions.

Primary topics covered are:
– Big Data Analysis Lifecycle (from business case evaluation to data analysis and visualization)
– A/B Testing, Correlation
– Regression, Heat Maps
– Time Series Analysis
– Traditional Enterprise
– Network Analysis
– Spatial Data Analysis
– Classification, Clustering
– Filtering (including collaborative filtering & content-based filtering)
– Sentiment Analysis, Text Analytics
– Processing Workloads, Clusters
– Cloud Computing & Big Data
– Foundational Big Data Technology Mechanisms

Module 3: Fundamental Big Data Engineering

This module covers engineering-related concepts, techniques and technologies for the processing and storage of Big Data datasets. It highlights the unique challenges faced when processing and storing large, volatile and disparate sets of data. NoSQL is covered and the MapReduce data processing engine is explained in detail as a base framework for high-volume batch data processing.

Primary topics covered are:
– Big Data Engineering Techniques and Challenges
– Big Data Storage, including Sharding, Replication, CAP Theorem, ACID and BASE
– Master-Slave, Peer-to-Peer Replication, Combining Replication with Sharding
– Big Data Storage Requirements, Scalability, Redundancy and Availability
– Fast Access, Long-term Storage, Schema-less Storage and Inexpensive Storage
– On-Disk Storage, including Distributed File System and Databases
– Introduction to NoSQL and NewSQL
– NoSQL Rationale and Characteristics
– NoSQL Database Types, including Key-Value, Document, Column-Family and Graph Databases
– Big Data Processing Engines
– Distributed/Parallel Data Processing, Schema-less Data Processing
– Multi-Workload Support, Linear Scalability and Fault-Tolerance
– Big Data Processing Requirements, including Batch, Cluster and Realtime Modes
– MapReduce for Big Data Processing, including Map, Combine, Partition, Shuffle and Sort and Reduce
– MapReduce Algorithm Design
– Task Parallism, Data Parallism

Module 4: Advanced Big Data Engineering

This module builds upon Module 3 by exploring advanced engineering topics pertaining primarily to the storage and processing of Big Data datasets. Specifically, it covers advanced Big Data engineering mechanisms, in-memory data storage and realtime data processing.

The course presents further considerations for building MapReduce algorithms and also introduces the Bulk Synchronous Parallel (BSP) processing engine, along with a discussion of graph data processing. The Big Data mechanisms required for developing Big Data pipelines, its stages and the design process involved in building Big Data processing solutions are also explored.

Primary topics covered are:
– Advanced Big Data Engineering Mechanisms
– Serialization and Compression Engines
– In-Memory Storage Devices
– In-Memory Data Grids and In-Memory Databases
– Read-Through, Read-Ahead, Write-Through and Write-Behind Integration Approaches
– Polyglot Persistence
– Explanation, Issues and Recommendations
– Realtime Big Data Processing
– Speed Consistency Volume (SCV)
– Event Stream Processing (ESP)
– Complex Event Processing (CEP)
– The SCV Principle
– General Realtime Big Data Processing and MapReduce
– Advanced MapReduce Algorithm Designs
– Bulk Synchronous Parallel (BSP) Processing Engine
– BSP vs. MapReduce
– BSP Synchronous Parallel
– Graph Data and Graph Data Processing using BSP (Supersteps)
– Big Data Pipelines, including Definition and Stages
– Big Data with Extract-Load-Transform (ELT)
– BD Solution Characteristics, Design Considerations and Design Process

Module 5: Big Data Engineering Lab

This module covers a series of exercises and problems designed to test the participant’s ability to apply knowledge of topics covered previously in course modules 3 and 4. Completing this lab will help highlight areas that require further attention, and will further prove hands-on proficiency in Big Data engineering practices as they are applied and combined to solve real-world problems.

As a hands-on lab, this course incorporates a set of detailed exercises that require participants to solve various inter-related problems, with the goal of fostering a comprehensive understanding of how different data engineering technologies, mechanisms and techniques can be applied to solve problems in Big Data environments.

For instructor-led delivery of this lab course, the Certified Trainer works closely with participants to ensure that all exercises are carried out completely and accurately. Attendees can voluntarily have exercises reviewed and graded as part of the class completion

More about Arcitura Certifications here

For more details, contact us