Pass4itsure > Amazon > AWS Certified Specialty > DAS-C01 > DAS-C01 Online Practice Questions and Answers

DAS-C01 Online Practice Questions and Answers

Questions 4

Three teams of data analysts use Apache Hive on an Amazon EMR cluster with the EMR File System (EMRFS) to query data stored within each teams Amazon S3 bucket. The EMR cluster has Kerberos enabled and is configured to authenticate users from the corporate Active Directory. The data is highly sensitive, so access must be limited to the members of each team.

Which steps will satisfy the security requirements?

A. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3. Create three additional IAM roles, each granting access to each team's specific bucket. Add the additional IAM roles to the cluster's EMR role for the EC2 trust policy. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.

B. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3. Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust policies for the additional IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.

C. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3. Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust polices for the additional IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.

D. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3. Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust polices for the base IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.

Buy Now
Questions 5

A mortgage company has a microservice for accepting payments. This microservice uses the Amazon DynamoDB encryption client with AWS KMS managed keys to encrypt the sensitive data before writing the data to DynamoDB. The finance team should be able to load this data into Amazon Redshift and aggregate the values within the sensitive fields. The Amazon Redshift cluster is shared with other data analysts from different business units.

Which steps should a data analyst take to accomplish this task efficiently and securely?

A. Create an AWS Lambda function to process the DynamoDB stream. Decrypt the sensitive data using the same KMS key. Save the output to a restricted S3 bucket for the finance team. Create a finance table in Amazon Redshift that is accessible to the finance team only. Use the COPY command to load the data from Amazon S3 to the finance table.

B. Create an AWS Lambda function to process the DynamoDB stream. Save the output to a restricted S3 bucket for the finance team. Create a finance table in Amazon Redshift that is accessible to the finance team only. Use the COPY command with the IAM role that has access to the KMS key to load the data from S3 to the finance table.

C. Create an Amazon EMR cluster with an EMR_EC2_DefaultRole role that has access to the KMS key. Create Apache Hive tables that reference the data stored in DynamoDB and the finance table in Amazon Redshift. In Hive, select the data from DynamoDB and then insert the output to the finance table in Amazon Redshift.

D. Create an Amazon EMR cluster. Create Apache Hive tables that reference the data stored in DynamoDB. Insert the output to the restricted Amazon S3 bucket for the finance team. Use the COPY command with the IAM role that has access to the KMS key to load the data from Amazon S3 to the finance table in Amazon Redshift.

Buy Now
Questions 6

A company has collected more than 100 TB of log files in the last 24 months. The files are stored as raw text in a dedicated Amazon S3 bucket. Each object has a key of the form year-month-day_log_HHmmss.txt where HHmmss represents the time the log file was initially created. A table was created in Amazon Athena that points to the S3 bucket. One-time queries are run against a subset of columns in the table several times an hour.

A data analyst must make changes to reduce the cost of running these queries. Management wants a solution with minimal maintenance overhead.

Which combination of steps should the data analyst take to meet these requirements? (Choose three.)

A. Convert the log files to Apace Avro format.

B. Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.

C. Convert the log files to Apache Parquet format.

D. Add a key prefix of the form year-month-day/ to the S3 objects to partition the data.

E. Drop and recreate the table with the PARTITIONED BY clause. Run the ALTER TABLE ADD PARTITION statement.

F. Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement.

Buy Now
Questions 7

A retail company stores order invoices in an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster Indices on the cluster are created monthly. Once a new month begins, no new writes are made to any of the indices from the previous months. The company has been expanding the storage on the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster to avoid running out of space, but the company wants to reduce costs. Most searches on the cluster are on the most recent 3 months of data, while the audit team requires infrequent access to older data to generate periodic reports. The most recent 3 months of data must be quickly available for queries, but the audit team can tolerate slower queries if the solution saves on cluster costs

Which of the following is the MOST operationally efficient solution to meet these requirements?

A. Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to store the indices in Amazon S3 Glacier. When the audit team requires the archived data, restore the archived indices back to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.

B. Archive indices that are older than 3 months by taking manual snapshots and storing the snapshots in Amazon S3. When the audit team requires the archived data, restore the archived indices back to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.

C. Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage.

D. Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage. When the audit team requires the older data, migrate the indices in UltraWarm storage back to hot storage.

Buy Now
Questions 8

A financial company uses Amazon Athena to query data from an Amazon S3 data lake. Files are stored in the S3 data lake in Apache ORC format. Data analysts recently introduced nested fields in the data lake ORC files, and noticed that queries are taking longer to run in Athena. A data analysts discovered that more data than what is required is being scanned for the queries.

What is the MOST operationally efficient solution to improve query performance?

A. Flatten nested data and create separate files for each nested dataset.

B. Use the Athena query engine V2 and push the query filter to the source ORC file.

C. Use Apache Parquet format instead of ORC format.

D. Recreate the data partition strategy and further narrow down the data filter criteria.

Buy Now
Questions 9

A telecommunications company needs to send customer call data records from its on-premise database to AWS to generate near-real-time insights. The solution captures and loads continuously changing updates from the operational data stores that run in PostgreSQL databases. A data analyst has configured an AWS Data Migration Service (AWS DMS) ongoing replication task to read changes in near-real time from the PostgreSQL source database transaction logs for each table and send the data to an Amazon Redshift cluster for further processing.

The data analytics team has reported latency issues during the change data capture (CDC) of the AWS DMS task. The team thinks that the PostgreSQL databases are causing the high latency.

Which set of actions will confirm that the PostgreSQL databases are the source of high latency?

A. Enable Amazon CloudWatch for the AWS DMS task and look for the CDCIncomingChanges metric to identify delays in capturing the changes from the source database.

B. Verify that logical replication is configured for the source database using the postgresql.conf configuration file.

C. Enable Amazon CloudWatch Logs for the AWS DMS endpoint of the source database and check for error messages.

D. Enable Amazon CloudWatch for the AWS DMS task and look for the CDCLatencySource metric to identify delays in capturing the changes from the source database.

Buy Now
Questions 10

A company's marketing and finance departments are storing data in Amazon S3 in their respective AWS accounts managed by AWS Organizations. Both departments use AWS Lake Formation to catalog and secure their data in Amazon S3. The finance department needs to share some tables with the marketing department for reporting purposes.

Which steps are required to complete this process? (Choose two.)

A. The finance department grants Lake Formation permissions for the shared tables to the marketing department's AWS account.

B. The finance department creates a cross-account IAM role with permission to access the shared tables.

C. Users from the marketing department account assume a cross-account IAM role in the finance department account that has permission to access the shared tables.

D. The marketing department creates a resource link to access the shared tables from the finance department.

E. The finance department creates and shares a resource link with the marketing department to access the shared tables.

Buy Now
Questions 11

A company has customer data in CSV format. The company stores the data in Amazon S3 and catalogs the data in an AWS Glue Data Catalog. The company has an Amazon Redshift cluster that contains historical call center data. The

cluster has a heavy load, and the company does not want to load any new data into the cluster.

Data analysts want to JOIN the customer data that is in Amazon S3 with the historical call center data that is in Amazon Redshift. The data analysts will use a daily batch process that takes multiple hours to run.

Which solution will meet these requirements with the LEAST operational overhead?

A. Unload the historical call center data from Amazon Redshift to Amazon S3 by using an AWS Lambda function. Perform the JOIN with the customer data that resides in Amazon S3 by using AWS Glue ETL scripts.

B. Export the historical call center data from Amazon Redshift to an Amazon EC2 instance by using the AWS CLI. Perform the JOIN with the customer data that resides in Amazon S3 by using AWS Glue ETL scripts.

C. Create an external table by using Amazon Redshift Spectrum for the customer data that resides in Amazon S3. Perform the JOIN with the historical call center data by using Amazon Redshift.

D. Export the historical call center data from Amazon Redshift to an Amazon EMR cluster by using Apache Sqoop. Perform the JOIN with the customer data that resides in Amazon S3 by using Apache Hive.

Buy Now
Questions 12

A company stores financial performance records of its various portfolios in CSV format in Amazon S3. A data analytics specialist needs to make this data accessible in the AWS Glue Data Catalog for the company's data analysts. The data analytics specialist creates an AWS Glue crawler in the AWS Glue console.

What must the data analytics specialist do next to make the data accessible for the data analysts?

A. Create an IAM role that includes the AWSGlueExecutionRole policy. Associate the role with the crawler. Specify the S3 path of the source data as the crawler's data store. Create a schedule to run the crawler. Point to the S3 path for the output.

B. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 path of the source data as the crawler's data store. Create a schedule to run the crawler. Specify a database name for the output.

C. Create an IAM role that includes the AWSGlueExecutionRole policy. Associate the role with the crawler. Specify the S3 path of the source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler. Specify a database name for the output.

D. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 path of the source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler. Point to the S3 path for the output.

Buy Now
Questions 13

A marketing company has an application that stores event data in an Amazon RDS database. The company is replicating this data to Amazon Redshift for reporting and business intelligence (BI) purposes. New event data is continuously generated and ingested into the RDS database throughout the day and captured by a change data capture (CDC) replication task in AWS Database Migration Service (AWS DMS). The company requires that the new data be replicated to Amazon Redshift in near-real time.

Which solution meets these requirements?

A. Use Amazon Kinesis Data Streams as the destination of the CDC replication task in AWS DMS. Use an AWS Glue streaming job to read changed records from Kinesis Data Streams and perform an upsert into the Redshift cluster.

B. Use Amazon S3 as the destination of the CDC replication task in AWS DMS. Use the COPY command to load data into the Redshift cluster.

C. Use Amazon DynamoDB as the destination of the CDC replication task in AWS DMS. Use the COPY command to load data into the Redshift cluster.

D. Use Amazon Kinesis Data Firehose as the destination of the CDC replication task in AWS DMS. Use an AWS Glue streaming job to read changed records from Kinesis Data Firehose and perform an upsert into the Redshift cluster.

Buy Now
Exam Code: DAS-C01
Exam Name: AWS Certified Data Analytics - Specialty (DAS-C01)
Last Update: Mar 20, 2024
Questions: 285
10%OFF Coupon Code: SAVE10

PDF (Q&A)

$45.99

VCE

$49.99

PDF + VCE

$59.99