Skip to content

Created the Global Tables module #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: newlabs
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions content/design-patterns/ex9globaltables/Step1.en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
+++
title = "Step 1 - Create the recommendations table as a global table"
date = 2019-12-02T10:50:03-08:00
weight = 1
+++


Run the following AWS CLI command to create the `recommendations` table in US West (Oregon).
```bash
aws dynamodb create-table --table-name recommendations \
--attribute-definitions AttributeName=customer_id,AttributeType=S AttributeName=category_id,AttributeType=S \
--key-schema AttributeName=customer_id,KeyType=HASH AttributeName=category_id,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST \
--stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES \
--region us-west-2 \
--tags Key=workshop-design-patterns,Value=targeted-for-cleanup
```
Create an identical `recommendations` table in US East (N. Virginia).
```bash
aws dynamodb update-table --table-name recommendations --cli-input-json \
'{
"ReplicaUpdates":
[
{
"Create": {
"RegionName": "us-east-1"
}
}
]
}'
```
Run the following command to wait until the table becomes active.
```bash
aws dynamodb wait table-exists --table-name recommendations
```
You can view the list of replicas created using describe-table.
```bash
aws dynamodb describe-table --table-name recommendations --region us-west-2
```
Let's take a closer look at the `create-table` command. You are creating a table named `recommendations`. The partition key on the table is `customer_id`. The sort key is `category_id`, which contains the movie genre like Drama, Comedy etc.

#### Table: `recommendations`

- Key schema: HASH, RANGE (partition and sort key)
- Table is created in on-demand capacity mode

| Attribute Name (Type) | Special Attribute? | Attribute Use Case | Sample Attribute Value |
| ------------- |:-------------:|:-------------:| -----:|
| customer_id (STRING) | Partition Key | Customer ID | `1` |
| category_id (STRING) | Sort key | Category ID | `Drama` |

Review the `recommendations` table in the DynamoDB console (as shown in the following screenshot) by choosing the **recommendations** table and then choosing the **Global tables** tab.

![Recommendations table](/images/awsconsole9a.png)

70 changes: 70 additions & 0 deletions content/design-patterns/ex9globaltables/Step2.en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
+++
title = "Step 2 - Load data into the global table and query the replica"
date = 2019-12-02T10:50:03-08:00
weight = 2
+++


Insert a new item to the `recommendations` table in US West (Oregon).

```bash
aws dynamodb put-item \
--table-name recommendations\
--item '{"customer_id": {"S":"99"},"category_id": {"S":"Drama"}}' \
--region us-west-2
```
Wait for a second, and query the replica

```bash
aws dynamodb get-item \
--table-name recommendations \
--key '{"customer_id": {"S":"99"},"category_id": {"S":"Drama"}}' \
--region us-east-1
```
Now, run the script that sequentially writes items to the local region and queries the remote region, measuring the replication time. This is done for 10 items

```bash
python load_recommendations_sequentially.py recommendations ./data/recommendations.csv
```

The sample `recommendations.csv` record looks like the following:
```csv
001,Drama, Argo
```
In addition to the customer_id and category_id, we now have the movie title. The script reads each record from the csv file and puts the item into the DynamoDB table in the Us West (Oregon) region. Immediately, it runs a GetItem for that customer_id from the replica table in the US East (N. Virgina) regionr, which returns an empty record. It waits for a second and tries again. Now the replica returns the item for the newly inserted customer id. The following output shows this pattern for a few items.
Output:
```txt
88e9fe579ead:design-patterns ssarma$ python load_recommendations_sequentially.py recommendations ./data/recommendations.csv
[]
Current time: 1611813327.91749
[{'category_id': 'Drama', 'customer_id': '001', 'title': ' Argo'}]
Current time: 1611813329.044519

[]
Current time: 1611813329.2009711
[{'category_id': 'Thriller', 'customer_id': '002', 'title': 'The Last Seven'}]
Current time: 1611813330.320935

[]
Current time: 1611813330.476702
[{'category_id': 'Comedy', 'customer_id': '003', 'title': "The Night They Raided Minsky's"}]
Current time: 1611813331.594492

[]
Current time: 1611813331.7503822
[{'category_id': 'Thriller', 'customer_id': '004', 'title': 'The Final Destination'}]
Current time: 1611813332.870115
```
The output confirms that 10 items have been inserted to the table.

You can review the replication metrics for the `recommendations` table in the DynamoDB console (as shown in the following screenshot) by choosing the **recommendations** table and then choosing the **Monitor** tab.

![Recommendations table](/images/awsconsole9b.png)

Scroll down to the Latency section to see the Get, Put and Query latency metrics

![Recommendations table](/images/awsconsole9c.png)

You can use Amazon CloudWatch to monitor the behavior and performance of a global table. Amazon DynamoDB publishes ReplicationLatency metric for each replica in the global table.
ReplicationLatency is the elapsed time between when an item is written to a replica table, and when that item appears in another replica in the global table. ReplicationLatency is expressed in milliseconds and is emitted for every source and destination Region pair.
During normal operation, ReplicationLatency should be fairly constant. An elevated value for ReplicationLatency could indicate that updates from one replica are not propagating to other replica tables in a timely manner. Over time, this could result in other replica tables falling behind because they no longer receive updates consistently. In this case, you should verify that the read capacity units (RCUs) and write capacity units (WCUs) are identical for each of the replica tables. In addition, when choosing WCU settings, follow the recommendations in Best Practices and Requirements for Managing Capacity.
ReplicationLatency can increase if an AWS Region becomes degraded and you have a replica table in that Region. In this case, you can temporarily redirect your application's read and write activity to a different AWS Region.

For more information, see DynamoDB Metrics and Dimensions.
45 changes: 45 additions & 0 deletions content/design-patterns/ex9globaltables/Step3.en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
+++
title = "Step 3 - Write to both regions and see the occasional conflict resolution"
date = 2019-12-02T10:50:04-08:00
weight = 3
+++

You can run the following parallel command to write to both regions at the same time. The region field updates the region that won in the conflict resolution process for each of the 10 items.

```py
parallel --jobs 2 < tasks.txt
```

The script should give you output that looks like the following.
```txt
88e9fe579ead:design-patterns ssarma$ parallel --jobs 2 < tasks.txt
[{'category_id': 'Drama', 'customer_id': '001', 'region': 'West', 'title': ' Argo'}]
Current time: 1611816863.0019908
[{'category_id': 'Drama', 'customer_id': '001', 'region': 'East', 'title': ' Argo'}]
Current time: 1611816864.047831

[{'category_id': 'Thriller', 'customer_id': '002', 'region': 'West', 'title': 'The Last Seven'}]
Current time: 1611816864.1282911
[{'category_id': 'Thriller', 'customer_id': '002', 'region': 'East', 'title': 'The Last Seven'}]
Current time: 1611816865.172729

[{'category_id': 'Comedy', 'customer_id': '003', 'region': 'West', 'title': "The Night They Raided Minsky's"}]
Current time: 1611816865.252855
[{'category_id': 'Comedy', 'customer_id': '003', 'region': 'West', 'title': "The Night They Raided Minsky's"}]
Current time: 1611816866.297246

[{'category_id': 'Thriller', 'customer_id': '004', 'region': 'West', 'title': 'The Final Destination'}]
Current time: 1611816866.377374
[{'category_id': 'Thriller', 'customer_id': '004', 'region': 'West', 'title': 'The Final Destination'}]
Current time: 1611816867.41737
```
You can review the transaction conflict errors metrics for the `recommendations` table in the DynamoDB console (as shown in the following screenshot) by choosing the **recommendations** table and then choosing the **Monitor** tab.

![Recommendations table](/images/awsconsole9b.png)

Scroll down to the Transactions section to see the Transaction conflict errors. The chart should say No data available. This is because DynamoDB does the conflict resolution automatically.
![Recommendations table](/images/awsconsole9d.png)

#### Summary

Congratulations, you have completed this exercise and demonstrated how global tables do cross region replications and resolve conflicts. Use DyanmoDB global tables to run your applications that read and write from multiple AWS regions. In the next exercise, you will learn how transactions work in DynamoDB.
20 changes: 20 additions & 0 deletions content/design-patterns/ex9globaltables/_index.en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
+++
title = "Global Tables"
date = 2019-12-02T10:17:33-08:00
weight = 5
chapter = true
pre = "<b>Exercise 9: </b>"
description = "Explore how to create global tables and how the replication works across regions."
+++

A DynamoDB global table is a collection of one or more replica tables, one replica per region, all owned by a single AWS account, that DynamoDB treats as a single unit. Every replica has the same table name, the same primary key schema and stores the same set of data items. When an application writes data to a replica table in one Region, DynamoDB propagates the write to the other replica tables in the other AWS Regions automatically. In a global table, a newly written item is usually propagated to all replica tables within a second. You can add replica tables to the global table so that it can be available in additional Regions.

Use Version 2019.11.21 (Current) of global tables along with on-demand capacity. Using on-demand capacity ensures that you always have sufficient capacity to perform replicated writes to all regions of the global table. The number of replicated write request units will be equal in all Regions of the global table. For example, suppose that you expect 10 writes per second to your replica table in N. Virginia, you should expect to consume 10 replicated write request units in N. Virginia.

When you use the provisioned capacity mode, you manage your auto scaling policy with UpdateTableReplicaAutoScaling. Minimum and maximum throughput and target utilization are established globally for the table and passed to all replicas of the table. For details about autoscaling and DynamoDB, see Managing Throughput Capacity Automatically with DynamoDB Auto Scaling.

When you are using Version 2019.11.21 (Current) of global tables and you also use the Time to Live feature, DynamoDB replicates TTL deletes to all replica tables. The initial TTL delete does not consume write capacity in the region in which the TTL expiry occurs. However, the replicated TTL delete to the replica table(s) consumes a replicated write capacity unit when using provisioned capacity, or replicated write when using on-demand capacity mode, in each of the replica regions and applicable charges will apply.


Transactional operations provide atomicity, consistency, isolation, and durability (ACID) guarantees only within the region where the write is made originally. Transactions are not supported across regions in global tables. For example, if you have a global table with replicas in the US West (Oregon) and US East (N. Virginia) regions and perform a TransactWriteItems operation in the US West (Oregon) Region, you may observe partially completed transactions in US East (N. Virginia) Region as changes are replicated. Changes will only be replicated to other regions once they have been committed in the source region.


If the customer managed CMK used to encrypt a replica is inaccessible DynamoDB will remove this replica from the replication group. The replica will not be deleted and replication will stop from and to this region, 20 hours after detecting the AWS KMS key as inaccessible.
If you disable an AWS Region, DynamoDB will remove this replica from the replication group, 20 hours after detecting the AWS Region as inaccessible. The replica will not be deleted and replication will stop from and to this region.
3 changes: 2 additions & 1 deletion content/reference-materials/_index.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,6 @@ DynamoDB Related Tools:
- **[EMR-DynamoDB-Connector: Access data stored in Amazon DynamoDB with Apache Hadoop, Apache Hive, and Apache Spark](https://github.com/awslabs/emr-dynamodb-connector)**

Online Training Courses:
- **[Linux Academy: Amazon DynamoDB Deep Dive](https://linuxacademy.com/course/dynamo-db-deep-dive/)**
- **[A Cloud Guru: Amazon DynamoDB Deep Dive](https://acloudguru.com/course/amazon-dynamodb-deep-dive/)**
- **[A Cloud Guru: Amazon DynamoDB Data Modeling](https://acloudguru.com/course/amazon-dynamodb-data-modeling/)**
- **[edX: Amazon DynamoDB: Building NoSQL Database-Driven Applications](https://www.edx.org/course/amazon-dynamodb-building-nosql-database-driven-app)**
3 changes: 3 additions & 0 deletions design-patterns/cloudformation/UserData.sh
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,9 @@ function configure_python_and_install
yum install -y python36
alternatives --set python /usr/bin/python3.6

# This is used for the exercise with Global Tables
yum install -y parallel

log Installing workshop requirements.
/usr/bin/pip-3.6 install -r /home/ec2-user/workshop/requirements.txt

Expand Down
10 changes: 10 additions & 0 deletions design-patterns/data/recommendations.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
001,Drama, Argo
002,Thriller,The Last Seven
003,Comedy,The Night They Raided Minsky's
004,Thriller,The Final Destination
005,Comedy,Page Miss Glory
006,Mystery,Sauna
007,Drama,The Last Kiss
008,Comedy,The Monster
009,Thriller,L: Change the World
010,Fantasy,Toys
61 changes: 61 additions & 0 deletions design-patterns/load_recommendations_sequentially.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
from __future__ import print_function # Python 2/3 compatibility
import boto3
import time
from boto3.dynamodb.conditions import Key, Attr
import csv
import sys
from lab_config import boto_args

def import_csv(tableName, fileName):
dynamodb = boto3.resource(**boto_args)
dynamodb_table = dynamodb.Table(tableName)
dynamodb_gt = boto3.resource('dynamodb', region_name='us-east-1')
global_table = dynamodb_gt.Table(tableName)
count = 0

time1 = time.time()
with open(fileName, 'r', encoding="utf-8") as csvfile:
myreader = csv.reader(csvfile, delimiter=',')
for row in myreader:
count += 1
newRecommendation = {}
#primary keys
newRecommendation['customer_id'] = row[0]
newRecommendation['category_id'] = row[1]
newRecommendation['title'] = row[2]

item = dynamodb_table.put_item(Item=newRecommendation)

response = global_table.query(
KeyConditionExpression=Key('customer_id').eq(row[0]) & Key('category_id').eq(row[1])
)

print(response['Items'])
print("Current time: %s" % time.time())

time.sleep(1)

response = global_table.query(
KeyConditionExpression=Key('customer_id').eq(row[0]) & Key('category_id').eq(row[1])
)

print(response['Items'])
print("Current time: %s\n" % time.time())

if count % 100 == 0:
time2 = time.time() - time1
print("recommendations count: %s in %s" % (count, time2))
time1 = time.time()
return count

if __name__ == "__main__":
args = sys.argv[1:]
tableName = args[0]
fileName = args[1]

begin_time = time.time()
count = import_csv(tableName, fileName)

# print summary
print('RowCount: %s, Total seconds: %s' %(count, (time.time() - begin_time)))

2 changes: 2 additions & 0 deletions design-patterns/tasks.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
python write_recommendations_to_west.py recommendations ./data/recommendations.csv
python write_recommendations_to_east.py recommendations ./data/recommendations.csv
59 changes: 59 additions & 0 deletions design-patterns/write_recommendations_to_east.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
from __future__ import print_function # Python 2/3 compatibility
import boto3
import time
from boto3.dynamodb.conditions import Key, Attr
import csv
import sys
from lab_config import boto_args

def import_csv(tableName, fileName):
dynamodb_east = boto3.resource('dynamodb', region_name='us-east-1')
east_table = dynamodb_east.Table(tableName)
count = 0

time1 = time.time()
with open(fileName, 'r', encoding="utf-8") as csvfile:
myreader = csv.reader(csvfile, delimiter=',')
for row in myreader:
count += 1
newRecommendation = {}
#primary keys
newRecommendation['customer_id'] = row[0]
newRecommendation['category_id'] = row[1]
newRecommendation['title'] = row[2]

newRecommendation['region'] = 'East'
item = east_table.put_item(Item=newRecommendation)

response = east_table.query(
KeyConditionExpression=Key('customer_id').eq(row[0]) & Key('category_id').eq(row[1])
)

print(response['Items'])
print("Current time: %s" % time.time())

time.sleep(1)

response = east_table.query(
KeyConditionExpression=Key('customer_id').eq(row[0]) & Key('category_id').eq(row[1])
)

print(response['Items'])
print("Current time: %s\n" % time.time())

if count % 100 == 0:
time2 = time.time() - time1
print("recommendations count: %s in %s" % (count, time2))
time1 = time.time()
return count

if __name__ == "__main__":
args = sys.argv[1:]
tableName = args[0]
fileName = args[1]

begin_time = time.time()
count = import_csv(tableName, fileName)

# print summary
print('RowCount: %s, Total seconds: %s' %(count, (time.time() - begin_time)))
Loading