Certified Big Data Scientist


HRDF INDCERT : 50% funding (applicable to HRDF members & classroom-based training)

VILT (Virtual Instructor-Led Training): RM4,200 / pax



The Big Data Scientist track is comprised of 5 modules. The final course module consists of a series of lab exercises that require participants to apply their knowledge of the preceding courses in order to fulfill project requirements and solve real world problems. Completion of these courses as part of a virtual or on-site workshop results in each participant receiving an official digital Certificate of Completion, as well as a digital Training Badge from Acclaim/Credly.

A Certified Big Data Scientist has demonstrated proficiency in the application of techniques, principles and processes required for exploring and analyzing large volumes of complex data with the goals of discovering novel insights, developing data products and communicating analytic results to drive decision-making. Depending on the exam format chosen, attaining the Big Data Scientist Certification can require passing a single exam or multiple exams. Those who achieve this certification receive an official digital Certificate of Excellence, as well as a digital Certification Badge from Acclaim/Credly with an account that supports the online verification of certification status.


5 days

Course Outline:
Module 1: Fundamental Big Data

This foundational module provides a high-level overview of essential Big Data topic areas. A basic understanding of Big Data from business and technology perspectives is provided, along with an overview of common benefits, challenges, and adoption issues. The course content is divided into a series of modular sections, each of which is accompanied by one or more hands-on exercises.

Primary topics covered are:
– Understanding Big Data
– Fundamental Terminology & Concepts
– Big Data Business & Technology Drivers
– Traditional Enterprise & Technologies Related to Big Data
– Characteristics of Data in Big Data Environments
– Dataset Types in Big Data Environments
– Fundamental Analysis and Analytics
– Machine Learning Types
– Business Intelligence & Big Data
– Data Visualization & Big Data
– Big Data Adoption & Planning Considerations

Module 2: Big Data Analysis, Technology Concepts

This module explores a range of the most relevant topics that pertain to contemporary analysis practices, technologies and tools for Big Data environments. The course content does not get into implementation or programming details, but instead keeps coverage at a conceptual level, focusing on topics that enable participants to develop a comprehensive understanding of the common analysis functions and features offered by Big Data solutions, as well as a high-level understanding of the back-end components that enable these functions.

Primary topics covered are:
– Big Data Analysis Lifecycle (from business case evaluation to data analysis and visualization)
– A/B Testing, Correlation
– Regression, Heat Maps
– Time Series Analysis
– Traditional Enterprise
– Network Analysis
– Spatial Data Analysis
– Classification, Clustering
– Filtering (including collaborative filtering & content-based filtering)
– Sentiment Analysis, Text Analytics
– Processing Workloads, Clusters
– Cloud Computing & Big Data
– Foundational Big Data Technology Mechanisms

Module 3: Fundamental Big Data Analysis & Science

This course provides an in-depth overview of essential topic areas pertaining to data science and analysis techniques relevant and unique to Big Data with an emphasis on how analysis and analytics need to be carried out individually and collectively in support of the distinct characteristics, requirements and challenges associated with Big Data datasets.

Primary topics covered are:
– Data Science, Data Mining & Data Modeling
– Big Data Dataset Categories
– High-Volume, High-Velocity, High-Variety, High-Veracity, High-Value Datasets
– Exploratory Data Analysis (EDA)
– EDA Numerical Summaries, Rules and Data Reduction
– EDA analysis types, including Univariate, Bivariate and Multivariate
– Essential Statistics, including Variable Categories and Relevant Mathematics
– Statistics Analysis, including Descriptive, Inferential, Covariance, Hypothesis Testing, etc.
– Measures of Variation or Dispersion, Interquartile Range & Outliers, Z-Score, etc.
– Probability, Frequency, Statistical Estimators, Confidence Interval, etc.
– Data Munging and Machine Learning
– Variables and Basic Mathematical Notations
– Statistical Measures and Statistical Inference
– Confirmatory Data Analysis (CDA)
– CDA Hypothesis Testing, Null Hypothesis, Alternative Hypothesis, Statistical Significance, etc.
– Distributions and Data Processing Techniques
– Data Discretization, Binning and Clustering
– Visualization Techniques, including Bar Graph, Line Graph, Histogram, Frequency Polygons, etc.
– Prediction Linear Regression, Mean Squared Error and Coefficient of Determination R2, etc.
– Clustering k-means, Cluster Distortion, Missing Feature Values, etc.
– Numerical Summaries

Module 4: Advanced Big Data Analysis & Science

This course delves into a range of advanced data analysis practices and analysis techniques that are explored within the context of Big Data. The course content focuses on topics that enable participants to develop a thorough understanding of statistical, modeling, and analysis techniques for data patterns, clusters and text analytics, as well as the identification of outliers and errors that affect the significance and accuracy of predictions made on Big Data datasets.

Primary topics covered are:
– Modeling, Model Evaluation, Model Fitting and Model Overfitting
– Statistical Models, Model Evaluation Measures
– Cross-Validation, Bias-Variance, Confusion Matrix and F-Score
– Machine Learning Algorithms and Pattern Identification
– Association Rules and Apriori Algorithm
– Data Reduction, Dimensionality Feature Selection
– Feature Extraction, Data Discretization (Binning and Clustering)
– Advanced Statistical Techniques
– Parametric vs. Non-Parametric, Clustering vs. Non-Clustering
– Distance-Based, Supervised vs. Semi-Supervised
– Linear Regression and Logistic Regression for Big Data
– Classification Rules for Big Data
– Logistics Regression, Naïve Bayes, Laplace Smoothing, etc.
– Decision Trees for Big Data
– Tree Pruning, Feature Splitting, One Rule (1R) Algorithm
– Pattern Identification, Association Rules, Apriori Algorithm
– Time Series Analysis, Trend, Seasonality
– K Nearest Neighbor (kNN), K-means
– Text Analytics for Big Data
– Bag of Words, Term Frequency, Inverse Document Frequency, Cosine Distance, etc.
– Outlier Detection for Big Data
– Statistical, Distance-Based, Supervised and Semi-Supervised Techniques

Module 5: Big Data Analysis & Science Lab

This course module covers a series of exercises and problems designed to test the participant’s ability to apply knowledge of topics covered previously in course modules 4 and 5. Completing this lab will help highlight areas that require further attention, and will further prove hands-on proficiency in Big Data analysis and science practices as they are applied and combined to solve real-world problems.

As a hands-on lab, this course incorporates a set of detailed exercises that require participants to solve various inter-related problems, with the goal of fostering a comprehensive understanding of how different data analysis techniques can be applied to solve problems in Big Data environments and used to make significant, relevant predictions that offer increased business value.

For instructor-led delivery of this lab course, the Certified Trainer works closely with participants to ensure that all exercises are carried out completely and accurately. Attendees can voluntarily have exercises reviewed and graded as part of the class completion

More about Arcitura Certifications here

For more details, contact us