Free Sample — 15 Practice Questions
Preview 15 of 90 questions from the Generative AI Engineer Associate exam.
Try before you buy — purchase the full study guide for all 90 questions with answers and explanations.
Question 26
A Generative AI Engineer is building an LLM-based application that has an important transcription (speech-to-text) task. Speed is essential for the success of the application.
Which open Generative AI models should be used?
A. DBRX
B. MPT-30B-Instruct
C. Llama-2-70b-chat-hf
D. whisper-large-v3 (1.6B)
Show Answer
Correct Answer: D
Explanation:
The task is speech-to-text transcription where speed is critical. Among the options, whisper-large-v3 is the only open generative AI model specifically designed for automatic speech recognition. DBRX, MPT-30B-Instruct, and Llama-2-70b-chat-hf are large language models for text generation, not transcription. Whisper-large-v3 provides state-of-the-art transcription accuracy with efficient inference, making it the correct choice.
Question 52
What is the most suitable library for building a multi-step LLM-based workflow?
A. Pandas
B. TensorFlow
C. PySpark
D. LangChain
Show Answer
Correct Answer: D
Explanation:
LangChain is purpose-built for constructing multi-step workflows with large language models, offering abstractions for chaining prompts, managing state and memory, integrating tools and external data, and orchestrating agent-like behaviors. The other options are general data processing or ML libraries and are not designed specifically for LLM workflow orchestration.
Question 57
A Generative Al Engineer is tasked with improving the RAG quality by addressing its inflammatory outputs.
Which action would be most effective in mitigating the problem of offensive text outputs?
A. Increase the frequency of upstream data updates
B. Inform the user of the expected RAG behavior
C. Restrict access to the data sources to a limited number of users
D. Curate upstream data properly that includes manual review before it is fed into the RAG system
Show Answer
Correct Answer: D
Explanation:
Offensive or inflammatory outputs in a RAG system most often originate from problematic source data. Properly curating upstream data with manual review allows removal or filtering of toxic, biased, or inappropriate content before it is indexed and retrieved, directly reducing the likelihood of offensive generations. The other options do not address the root cause of toxic content in the knowledge base.
Question 32
A Generative AI Engineer interfaces with an LLM with instruction-following capabilities trained on customer calls inquiring about product availability. The LLM should output “Success” if the product is available or “Fail” if not.
Which prompt allows the engineer to receive call classification labels correctly?
A. You are a helpful assistant that reads customer call transcripts. Walk through the transcript and think step-by-step if the customer’s inquiries are addressed successfully. Answer “Success” if yes; otherwise, answer “Fail”.
B. You will be given a customer call transcript where the customer asks about product availability. Classify the call as “Success” if the product is available and “Fail” if the product is unavailable.
C. You will be given a customer call transcript where the customer asks about product availability. The outputs are either “Success” or “Fail”. Format the output in JSON, for example: {"call_id": "123", "label": "Succes"}.
D. You will be given a customer call transcript. Answer “Success” if the customer call has been resolved successfully. Answer “Fail” if the call is redirected or if the question is not resolved.
Show Answer
Correct Answer: B
Explanation:
Option B precisely defines the task and labels: it limits the scope to product availability and clearly maps availability to “Success” and unavailability to “Fail.” It avoids extraneous reasoning instructions, unrelated success criteria, or unnecessary output formatting that could confuse classification, ensuring reliable label output.
Question 5
A Generative AI Engineer at a legal firm is designing a RAG system to analyze historical legal case precedents. The system needs to process millions of court opinions and legal documents, already organized by time and topic, to track how interpretations of specific laws have evolved over time. All of these documents are in plain-text. The engineer needs to choose a chunking method that would most effectively preserve continuity and the temporal nature of the cases.
Which method do they choose?
A. Implement windowed summarization with overlapping chunks.
B. Implement a hierarchical tree structure, like RAPTOR, to group similar legal concepts.
C. Implement paragraph level embeddings with each chunk.
D. Implement sentence level embeddings with each chunk tagged with the time to enable metadata filtering.
Show Answer
Correct Answer: A
Explanation:
The goal is to preserve continuity and the temporal evolution of legal reasoning across long, chronologically organized documents. Windowed chunking with overlap maintains narrative flow across chunk boundaries and avoids losing context when arguments span multiple sections, which is critical for tracking how interpretations evolve over time. Hierarchical grouping optimizes semantic clustering rather than temporal continuity, and sentence- or paragraph-level embeddings fragment the reasoning too much even if timestamps are added.
Question 73
What is an effective method to preprocess prompts using custom code before sending them to an LLM?
A. Directly modify the LLM’s internal architecture to include preprocessing steps
B. It is better not to introduce custom code to preprocess prompts as the LLM has not been trained with examples of the preprocessed prompts
C. Rather than preprocessing prompts, it’s more effective to postprocess the LLM outputs to align the outputs to desired outcomes
D. Write a MLflow PyFunc model that has a separate function to process the prompts
Show Answer
Correct Answer: D
Explanation:
An effective way to preprocess prompts with custom logic is to wrap the LLM in an interface that applies preprocessing before inference. Writing an MLflow PyFunc model with a dedicated prompt-processing function allows systematic, reusable, and deployable preprocessing without altering the LLM itself. The other options are ineffective or incorrect approaches.
Question 38
Generative AI Engineer is building a RAG application that answers questions about technology-related news articles. The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news.
Which approach is NOT advisable for building a RAG application focused on answering technology-only questions?
A. Include in the system prompt that the application is not supposed to answer any questions unrelated to technology.
B. Filter out irrelevant news articles in the retrieval process.
C. Keep all news articles because the RAG application needs to understand non-technological content to avoid answering questions about them.
D. Filter out irrelevant news articles in the upstream document database.
Show Answer
Correct Answer: C
Explanation:
Keeping all news articles, including irrelevant non-technology content, adds noise to retrieval and increases the risk of irrelevant context being surfaced. A focused RAG system should reduce noise by filtering irrelevant content upstream or during retrieval; it does not need non-technical articles to avoid answering non-technical questions.
Question 22
A Generative AI Engineer wants their finetuned LLMs in their prod Databricks workspace available for testing in their dev workspace as well. All of their workspaces are Unity Catalog enabled and they are currently logging their models into the Model Registry in MLflow.
What is the most cost-effective and secure option for the Generative AI Engineer to accomplish their goal?
A. Use an external model registry which can be accessed from all workspaces.
B. Use MLflow to log the model directly into Unity Catalog, and enable READ access in the dev workspace to the model.
C. Setup a duplicate training pipeline in dev, so that an identical model is available in dev.
D. Setup a script to export the model from prod and import it to dev.
Show Answer
Correct Answer: B
Explanation:
With Unity Catalog enabled across workspaces, registering the model in the Unity Catalog–backed MLflow Model Registry allows secure, centralized governance. Granting READ access to the dev workspace enables testing without duplicating models or maintaining export/import pipelines, making it the most cost‑effective and secure option.
Question 10
All of the following are python APIs used to query Databricks foundation models. When running in an interactive notebook, which of the following libraries does not automatically use the current session credentials?
A. OpenAI client
B. REST API via requests library
C. MLflow Deployments SDK
D. Databricks Python SDK
Show Answer
Correct Answer: B
Explanation:
Using the REST API via the requests library involves making raw HTTP calls, which do not automatically inherit the Databricks notebook's session credentials. Authentication headers (e.g., PAT or service principal tokens) must be manually provided. The other options are Databricks-aware clients that automatically use the current session credentials in interactive notebooks.
Question 69
After changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a shorter context length that the company self-hosts, the Generative AI Engineer is getting the following error:
What TWO solutions should the Generative AI Engineer implement without changing the response generating model? (Choose two.)
A. Use a smaller embedding model to generate embeddings
B. Reduce the maximum output tokens of the new model
C. Decrease the chunk size of embedded documents
D. Reduce the number of records retrieved from the vector database
E. Retrain the response generating model using ALiBi
Show Answer
Correct Answer: C, D
Explanation:
The error arises because the new self-hosted model has a shorter context window, causing prompt token overflow. Decreasing the chunk size of embedded documents reduces the number of tokens per retrieved chunk, and reducing the number of records retrieved from the vector database lowers the total context included in the prompt. Both approaches shrink the prompt size without changing the response-generating model.
Question 70
A Generative AI Engineer is developing a patient-facing healthcare-focused chatbot. If the patient’s question is not a medical emergency, the chatbot should solicit more information from the patient to pass to the doctor’s office and suggest a few relevant pre-approved medical articles for reading. If the patient’s question is urgent, direct the patient to calling their local emergency services.
Given the following user input:
“I have been experiencing severe headaches and dizziness for the past two days.”
Which response is most appropriate for the chatbot to generate?
A. Here are a few relevant articles for your browsing. Let me know if you have questions after reading them.
B. Please call your local emergency services.
C. Headaches can be tough. Hope you feel better soon!
D. Please provide your age, recent activities, and any other symptoms you have noticed along with your headaches and dizziness.
Show Answer
Correct Answer: B
Explanation:
The chatbot’s logic requires distinguishing urgent from non-urgent situations. Severe headaches combined with dizziness persisting for two days can indicate a potentially serious neurological condition (e.g., stroke, concussion, intracranial issues). In such ambiguous but possibly urgent cases, safety-first design in healthcare chatbots dictates directing the patient to emergency services rather than gathering more information. Therefore, advising the patient to call local emergency services is the most appropriate response.
Question 43
A Generative AI Engineer has just deployed an LLM application at a manufacturing company that assists with answering customer service inquiries. They need to identity the key enterprise metrics to monitor the application in production.
Which is NOT a metric they will implement for their customer service LLM application in production?
A. Massive Multi-task Language Understanding (MMLU) score
B. Number of customer inquiries processed per unit of time
C. Factual accuracy of the response
D. Time taken for LLM to generate a response
Show Answer
Correct Answer: A
Explanation:
MMLU is a research benchmark used to compare and evaluate general-purpose LLMs during development. It is not a practical production metric for a deployed customer service application. In production, enterprise metrics focus on operational performance and user experience, such as throughput (number of inquiries processed), factual accuracy of responses, and response latency.
Question 8
A Generative AI Engineer is using LangGraph to define multiple tools in a single agentic application. They want to enable the main orchestrator LLM to decide on its own which tools are most appropriate to call for a given prompt. To do this, they must determine the general flow of the code.
Which sequence will do this?
A. 1. Define or import the tools 2. Add tools and LLM to the agent 3. Create the ReAct agent
B. 1. Define or import the tools 2. Define the agent 3. Initialize the agent with ReAct, the LLM, and the tools
C. 1. Define the tools 2. Load each tool into a separate agent 3. Instruct the LLM to use ReAct to call the appropriate agent
D. 1. Define the tools inside the agents 2. Load the agents into the LLM 3. Instruct the LLM to use CoT reasoning to determine the appropriate agent
Show Answer
Correct Answer: B
Explanation:
To let an orchestrator LLM autonomously decide which tools to call, the correct flow is to first define or import the tools, then define the agent structure, and finally initialize the agent using a ReAct-style setup that combines the LLM with the available tools. This enables the LLM to reason about the prompt and select tools dynamically.
Question 14
A Generative AI Engineer is building a RAG application that will rely on context retrieved from source documents that are currently in HTML format. They want to develop a solution using the least amount of lines of code.
Which Python package should be used to extract the text from the source documents?
A. pytesseract
B. numpy
C. pypdf2
D. beautifulsoup
Show Answer
Correct Answer: D
Explanation:
The source documents are in HTML format, and BeautifulSoup is specifically designed to parse HTML/XML and extract text with minimal code. It can easily strip tags and return clean text content. The other options are for OCR (pytesseract), numerical computing (numpy), or PDF parsing (pypdf2), which are not suitable for HTML text extraction.
Question 74
A Generative AI Engineer wants to build an LLM-based solution to help a restaurant improve its online customer experience with bookings by automatically handling common customer inquiries. The goal of the solution is to minimize escalations to human intervention and phone calls while maintaining a personalized interaction. To design the solution, the Generative AI Engineer needs to define the input data to the LLM and the task it should perform.
Which input/output pair will support their goal?
A. Input: Online chat logs; Output: Group the chat logs by users, followed by summarizing each user’s interactions
B. Input: Online chat logs; Output: Buttons that represent choices for booking details
C. Input: Customer reviews; Output: Classify review sentiment
D. Input: Online chat logs; Output: Cancellation options
Show Answer
Correct Answer: B
Explanation:
The goal is to automatically handle common booking inquiries, reduce human escalation, and keep interactions personalized. Using online chat logs as input and generating interactive buttons for booking details enables the LLM to guide customers through reservations in a conversational, self-service way. This directly supports task completion (bookings), minimizes phone calls, and still feels personalized. The other options focus on analysis or narrow tasks rather than actively handling customer inquiries.