Use Comprehend with Aurora

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. No machine learning experience required. This lab will walk you through the process of integrating Aurora with the Comprehend Sentiment Analysis API and making sentiment analysis inferences via SQL commands.

This lab contains the following tasks:

  1. Create an IAM role to allow Aurora to interface with Comprehend
  2. Associate the IAM role with the Aurora DB cluster
  3. Add the IAM role to the DB cluster parameter group and apply it
  4. Run Comprehend inferences from Aurora

This lab requires the following prerequisites:

1. Create an IAM role to allow Aurora to interface with Comprehend

If you are not already connected to the Session Manager workstation, please connect following these instructions. Once connected, run the command below which will create an IAM role, and access policy.

aws iam create-role --role-name auroralab-comprehend-access \
--assume-role-policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"rds.amazonaws.com\"},\"Action\":\"sts:AssumeRole\"}]}"

aws iam put-role-policy --role-name auroralab-comprehend-access --policy-name inline-policy \
--policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":[\"comprehend:DetectSentiment\",\"comprehend:BatchDetectSentiment\"],\"Resource\":\"*\"}]}"

2. Associate the IAM role with the Aurora DB cluster

Associate the role with the DB cluster by using following command:

aws rds add-role-to-db-cluster --db-cluster-identifier auroralab-mysql-cluster \
--role-arn $(aws iam list-roles --query 'Roles[?RoleName==`auroralab-comprehend-access`].Arn' --output text)

Run the following command and wait until the output shows as available, before moving on to the next step.

aws rds describe-db-clusters --db-cluster-identifier auroralab-mysql-cluster \
--query 'DBClusters[*].[Status]' --output text

Reader Load

3. Add the IAM role to the DB cluster parameter group and apply it

Set the aws_default_comprehend_role cluster-level parameter to the ARN of the IAM role we created in the first step of this lab. Run the following command:

aws rds modify-db-cluster-parameter-group \
--db-cluster-parameter-group-name $DBCLUSTERPG \
--parameters "ParameterName=aws_default_comprehend_role,ParameterValue=$(aws iam list-roles --query 'Roles[?RoleName==`auroralab-comprehend-access`].Arn' --output text),ApplyMethod=pending-reboot"

Reboot the DB cluster for the change to take effect. To minimize downtime use the manual failover process to trigger the reboot:

aws rds failover-db-cluster --db-cluster-identifier auroralab-mysql-cluster

Run the following command and wait until the output shows as available, before moving on to the next step:

aws rds describe-db-clusters --db-cluster-identifier auroralab-mysql-cluster \
--query 'DBClusters[*].[Status]' --output text

Reader Load

4. Run Comprehend inferences from Aurora

Run the command below, replacing the [clusterEndpont] placeholder with the cluster endpoint of your DB cluster to connect to the database:

mysql -h[clusterEndpoint] -u$DBUSER -p"$DBPASS" mltest

Aurora has a built-in Comprehend function which will make a call to the Comprehend service. It will pass the inputs of the aws_comprehend_detect_sentiment function, in this case the values of the comment_text columns in the comments table, to the Comprehend service and retrieve sentiment analysis results.

Run the following SQL query to run sentiment analysis on the comments table.

SELECT comment_text,
aws_comprehend_detect_sentiment(comment_text, 'en') AS sentiment,
aws_comprehend_detect_sentiment_confidence(comment_text, 'en') AS confidence
FROM comments;

You should see result as shown in the screenshot below. Observe the columns sentiment, and confidence. The combination of these two columns provide the inferred sentiment for the text in the comment_text column, and also the confidence score of the inference.

Reader Load

Disconnect from the DB cluster, using:

quit;