Databricks Exam Syllabus

Generative AI Engineer Associate syllabus, skills measured, and exam topics

The purpose of this exam guide is to give you an overview of the exam and what is covered on the exam to help you determine your exam readiness. This document will get updated anytime there are any changes to an exam (and when those changes will take effect on an exam) so that you can

Exam details

Quick facts pulled from the official source for faster scanning.

Number of items 45 multiple-choice or multiple-selection questions
Time Limit 90 minutes
Registration fee $200
Delivery method Online Proctored
Test aides None allowed
Prerequisite None required; course attendance and six months of hands-on experience in
Validity 2 years.
Recertification Recertification is required every two years to maintain your certified status.

What to know before you study

These sections explain the role, audience, and exam framing behind the outline.

Section 1: Design Applications

  • Design a prompt that elicits a specifically formatted response
  • Select model tasks to accomplish a given business requirement
  • Select chain components for a desired model input and output
  • Translate business use case goals into a description of the desired inputs and outputs for
  • the AI pipeline
  • Define and order tools that gather knowledge or take actions for multi-stage reasoning

Section 2: Data Preparation

  • Apply a chunking strategy for a given document structure and model constraints
  • Filter extraneous content in source documents that degrades quality of a RAG application
  • Choose the appropriate Python package to extract document content from provided
  • source data and format.
  • Define operations and sequence to write given chunked text into Delta Lake tables in Unity
  • Catalog
  • Identify needed source documents that provide necessary knowledge and quality for a
  • given RAG application
  • Identify prompt/response pairs that align with a given model task
  • Use tools and metrics to evaluate retrieval performance

Section 3: Application Development

  • Create tools needed to extract data for a given data retrieval need
  • Select Langchain/similar tools for use in a Generative AI application.
  • Identify how prompt formats can change model outputs and results
  • Qualitatively assess responses to identify common issues such as quality and safety
  • Select chunking strategy based on model & retrieval evaluation
  • Augment a prompt with additional context from a user's input based on key fields, terms,
  • and intents
  • Create a prompt that adjusts an LLM's response from a baseline to a desired output
  • Implement LLM guardrails to prevent negative outcomes
  • Write metaprompts that minimize hallucinations or leaking private data
  • Build agent prompt templates exposing available functions
  • Select the best LLM based on the attributes of the application to be developed

Detailed outline

Scan each section as a working study checklist instead of one long wall of text.

Section 4: Assembling and Deploying Applications

  • Code a chain using a pyfunc model with pre- and post-processing
  • Control access to resources from model serving endpoints
  • Code a simple chain according to requirements
  • Code a simple chain using langchain
  • Choose the basic elements needed to create a RAG application: model flavor, embedding
  • model, retriever, dependencies, input examples, model signature
  • Register the model to Unity Catalog using MLflow
  • Sequence the steps needed to deploy an endpoint for a basic RAG application
  • Create and query a Vector Search index
  • Identify how to serve an LLM application that leverages Foundation Model APIs
  • Identify resources needed to serve features for a RAG application

Section 5: Governance

  • Use masking techniques as guard rails to meet a performance objective
  • Select guardrail techniques to protect against malicious user inputs to a Gen AI application
  • Recommend an alternative for problematic text mitigation in a data source feeding a RAG
  • application
  • Use legal/licensing requirements for data sources to avoid legal risk

Section 6: Evaluation and Monitoring

  • Select an LLM choice (size and architecture) based on a set of quantitative evaluation
  • metrics
  • Select key metrics to monitor for a specific LLM deployment scenario
  • Evaluate model performance in a RAG application using MLflow
  • Use inference logging to assess deployed RAG application performance
  • Use Databricks features to control LLM costs for RAG applications
  • Sample Questions
  • These questions are similar to actual question items and give you a general sense of how questions
  • are asked on this exam. They include exam objectives as they are stated on the exam guide and
  • give you a sample question that aligns to the objective. The exam guide lists all of the objectives
  • that could be covered on an exam. The best way to prepare for a certification exam is to review the
  • exam outline in the exam guide.