Google Cloud Exam Syllabus

Professional Data Engineer syllabus, skills measured, and exam topics

A Professional Data Engineer empowers data-driven decisions by collecting, transforming, storing, and delivering data for diverse applications. A Professional Data Engineer designs and builds robust data infrastructure, optimizing for performance and security. This individual

Skills measured by domain

Use the weighting table to decide where to spend the most study time.

Domain Weight
Section 1: Designing data processing systems 22%
Section 2: Ingesting and processing the data 25%
Section 3: Storing the data 20%
Section 4: Preparing and using data for analysis 15%
Section 5: Maintaining and automating data workloads 18%

Detailed outline

Scan each section as a working study checklist instead of one long wall of text.

Section 1: Designing data processing systems (~22% of the exam)

  • 1.1 Designing for security and compliance. Considerations include:
  • Identity and Access Management (e.g., Cloud IAM and organization policies)
  • Data security (encryption and key management)
  • Privacy (e.g., strategies to handle personally identifiable information)
  • Regional considerations (data sovereignty) for data access and storage
  • Legal and regulatory compliance
  • Designing the project, dataset, and table architecture to ensure proper data
  • governance
  • Multi-environment use cases (development vs. production)
  • 1.2 Designing for reliability and fidelity. Considerations include:
  • Preparing and cleaning data (e.g., Dataform, Dataflow, and Cloud Data Fusion,
  • prompting LLMs for query generation)

Section 2: Ingesting and processing the data (~25% of the exam)

  • 2.1 Planning the data pipelines. Considerations include:
  • Defining data sources and sinks
  • Defining data transformation and orchestration logic
  • Networking fundamentals
  • Data encryption
  • 2.2 Building the pipelines. Considerations include:
  • Data cleansing
  • Identifying the services (e.g., Dataflow, Apache Beam, Dataproc, Cloud Data Fusion,
  • BigQuery, Pub/Sub, Apache Spark, Hadoop ecosystem, and Apache Kafka)
  • Transformations
  • ○ Batch
  • ○ Streaming (e.g., windowing, late arriving data)

Section 3: Storing the data (~20% of the exam)

  • 3.1 Selecting storage systems. Considerations include:
  • Analyzing data access patterns
  • Choosing managed services (e.g., BigQuery, BigLake, AlloyDB, Bigtable, Spanner, Cloud
  • SQL, Cloud Storage, Firestore, Memorystore)
  • Planning for storage costs and performance
  • Lifecycle management of data
  • 3.2 Planning for using a data warehouse. Considerations include:
  • Designing the data model
  • Deciding the degree of data normalization
  • Mapping business requirements
  • Defining architecture to support data access patterns
  • 3.3 Using a data lake. Considerations include:

Section 4: Preparing and using data for analysis (~15% of the exam)

  • 4.1 Preparing data for visualization. Considerations include:
  • Connecting to tools
  • Precalculating fields
  • BigQuery features for business intelligence (e.g., BI Engine, materialized views)
  • Troubleshooting poor performing queries
  • Security, data masking, Identity and Access Management (IAM), and Cloud Data Loss
  • Prevention (Cloud DLP)
  • 4.2 Preparing data for AI and ML. Considerations include:
  • Preparing data for feature engineering, training and serving machine learning models
  • (e.g., BigQueryML)
  • Preparing unstructured data for embeddings and retrieval-augmented generation
  • 4.3 Sharing data. Considerations include:

Section 5: Maintaining and automating data workloads (~18% of the exam)

  • 5.1 Optimizing resources. Considerations include:
  • Minimizing costs per required business need for data
  • Ensuring that enough resources are available for business-critical data processes
  • Deciding between persistent or job-based data clusters (e.g., Dataproc)
  • 5.2 Designing automation and repeatability. Considerations include:
  • Creating directed acyclic graphs (DAGs) for Cloud Composer
  • Scheduling and orchestrating jobs in a repeatable way
  • 5.3 Organizing workloads based on business requirements. Considerations include:
  • Capacity management (e.g., BigQuery Editions and reservations)
  • Interactive or batch query jobs
  • 5.4 Monitoring and troubleshooting processes. Considerations include:
  • Observability of data processes (e.g., Cloud Monitoring, Cloud Logging, BigQuery