Getting started with Apache Iceberg write support in Amazon Redshift

In Getting started with Apache Iceberg write support in Amazon Redshift – part 1, you learned how to create Apache Iceberg tables and write data directly from Amazon Redshift to your data lake. You set up external schemas, created tables in both Amazon Simple Storage Service (Amazon S3) and S3 Tables, and performed INSERT operations while maintaining ACID (Atomicity, Consistency, Isolation, Durability) compliance.

Amazon Redshift now supports DELETE, UPDATE, and MERGE operations for Apache Iceberg tables stored in Amazon S3 and Amazon S3 table buckets. With these operations, you can modify data at the row level, implement upsert patterns, and manage the data lifecycle while maintaining transactional consistency using familiar SQL syntax. You can run complex transformations in Amazon Redshift and write results to Apache Iceberg tables that other analytics engines like Amazon EMR or Amazon Athena can immediately query.

In this post, you work with customer and orders datasets that were created and used in the previously mentioned post to demonstrate these capabilities in a data synchronization scenario.

Solution overview

This solution demonstrates DELETE, UPDATE, and MERGE operations for Apache Iceberg tables in Amazon Redshift using a common data synchronization pattern: maintaining customer records and orders data across staging and production tables. The workflow includes three key operations:

DELETE – Remove customer records based on opt-out requests
UPDATE – Modify existing customer information
MERGE – Synchronize order data between staging and production tables using upsert patterns

Figure 1: solution overview

The solution uses a staging table (orders_stg) stored in an S3 table bucket for incoming data and reference tables (customer_opt_out) in Amazon Redshift for managing data lifecycle operations. With this architecture, you can process changes efficiently while maintaining ACID compliance across both storage types.

Prerequisites

For this walkthrough, you should have completed the setup steps from Getting started with Apache Iceberg write support in Amazon Redshift – part 1, including:

Create an Amazon Redshift data warehouse (provisioned or Serverless)
Set up the required IAM role (RedshifticebergRole) with appropriate permissions
Create an Amazon S3 bucket and S3 Table bucket
Configure AWS Glue Data Catalog database and setting up access
Set up AWS Lake Formation permissions
Create the customer Apache Iceberg table in Amazon S3 standard buckets with sample customer data
Create the orders Apache Iceberg table in Amazon S3 Table buckets with sample order data
Amazon Redshift data warehouse on p200 version or higher

Data preparation

In this section, you set up the sample data needed to demonstrate MERGE, UPDATE, and DELETE operations. To prepare your data, complete the following steps:

Log in to Amazon Redshift using Query Editor V2 with the Federated user option.
Create the orders_stg and customer_opt_out tables with sample data:

CREATE TABLE "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg
(
customer_id BIGINT,
order_id BIGINT,
Total_order_amt DECIMAL(10,2),
Total_order_tax_amt REAL,
tax_pct DOUBLE PRECISION,
order_date DATE,
order_created_at_tz TIMESTAMPTZ,
is_active_ind BOOLEAN
)
USING ICEBERG;
INSERT INTO "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg
(order_date, order_id, customer_id, total_order_amt, total_order_tax_amt, tax_pct, order_created_at_tz, is_active_ind)
VALUES
('2024-11-11', 1016, 10, 167.45, 13.40, 0.08, '2024-11-11 06:55:00-06:00', true),
('2024-11-12', 1017, 15, 34.99, 2.80, 0.08, '2024-11-12 23:30:30-06:00', true),
('2024-11-09', 1014, 9, 500.60, 56.80, 0.09, '2024-11-09 16:20:55-06:00', true),
('2024-11-10', 1015, 5, 329.85, 33.51, 0.08, '2024-11-10 11:45:30-06:00', true);
select * from "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg;

Figure 2: orders_stg result set

CREATE TABLE dev.public.customer_opt_out
(
customer_id bigint,
customer_name varchar,
opt_out_ind char(1),
cust_rec_upd_ind char(1)
);
INSERT INTO dev.public.customer_opt_out VALUES
(9, 'Customer9 Martinez', 'Y', 'N'),
(12, 'Customer12 Thomas', 'Y', 'N'),
(13, 'Customer13 Albon', 'N', 'Y'),
(14, 'Customer14 Oscar', 'N', 'Y');
select * from dev.public.customer_opt_out;

Figure 3: customer_opt_out result set

You can now use the orders_stg and customer_opt_out tables to demonstrate data manipulation operations on the orders and customer tables created in the prerequisite section.

MERGE

MERGE conditionally inserts, updates, or deletes rows in a target table based on the results of a join with a source table. You can use MERGE to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.

To perform a MERGE operation:

Verify that the current data in the orders table for order IDs 1014, 1015, 1016, and 1017.You loaded this sample data in Part 1:

select * from "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders
where order_id in (1014,1015,1016,1017);

Figure 4: orders data for existing orders for orders in orders_stg

The orders table contains existing rows for order IDs 1014 and 1015.

Run the following MERGE operation using order_id as the key column to match rows between the orders and orders_stg tables:

MERGE INTO "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders
USING "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orders_stg
ON orders.order_id = orders_stg.order_id
WHEN MATCHED THEN UPDATE 
SET
customer_id         = orders_stg.customer_id,
total_order_amt     = orders_stg.total_order_amt,
total_order_tax_amt = orders_stg.total_order_tax_amt,
tax_pct             = orders_stg.tax_pct,
order_date          = orders_stg.order_date,
order_created_at_tz = orders_stg.order_created_at_tz,
is_active_ind       = orders_stg.is_active_ind
WHEN NOT MATCHED THEN INSERT
VALUES 
(orders_stg.customer_id,orders_stg.order_id,orders_stg.total_order_amt,orders_stg.total_order_tax_amt,orders_stg.tax_pct,orders_stg.order_date,orders_stg.order_created_at_tz,orders_stg.is_active_ind);

The operation updates existing rows (1014 and 1015) and inserts new rows for order IDs that don’t exist in the orders table (1016 and 1017).

Verify the updated data in the orders table:

select * from "iceberg-write-blog@s3tablescatalog".iceberg_write_namespace.orderswhere order_id in (1014,1015,1016,1017);

Figure 5: merged data on orders from orders_stg

The MERGE operation performs the following changes:

Updates existing rows – Order IDs 1014 and 1015 have updated total_order_amt and total_order_tax_amt values from the orders_stg table
Inserts new rows – Order IDs 1016 and 1017 are inserted because they don’t exist in the orders table

This demonstrates the upsert pattern, where MERGE conditionally updates or inserts rows based on the matching key column.

UPDATE

UPDATE modifies existing rows in a table based on specified conditions or values from another table.

Update the customer Apache Iceberg table using data from the customer_opt_out Amazon Redshift native table. The UPDATE operation uses the cust_rec_upd_ind column as a filter, updating only rows where the value is ‘Y’.

To perform an UPDATE operation:

Verify the current customer_name values for customer IDs 13 and 14 in customer_opt_out and customer (loaded this sample data in Part 1) tables:

select * from dev.public.customer_opt_out
where cust_rec_upd_ind = 'Y';

Figure 6: verify existing customer data for customers from customer_opt_out

select customer_id,customer_name from dev.demo_iceberg.customer
where customer_id in(13,14);

Figure 7: verify existing customer name for customers from customer_opt_out

Run the following UPDATE operation to modify customer names based on the cust_rec_upd_ind from customer_opt_out:

UPDATE dev.demo_iceberg.customerSET customer_name = customer_opt_out.customer_name
FROM dev.public.customer_opt_out
WHERE customer_opt_out.cust_rec_upd_ind = 'Y'and customer.customer_id = customer_opt_out.customer_id;

Verify the changes for customer IDs 13 and 14:

select customer_id,customer_name from dev.demo_iceberg.customer where customer_id in(13,14) order by 1;

Figure 8: updated customer names in customer table

The UPDATE operation modifies the customer_name values based on the join condition with the customer_opt_out table. Customer IDs 13 and 14 now have updated names (Customer13 Albon and Customer14 Oscar).

DELETE

DELETE removes rows from a table based on specified conditions. Without a WHERE clause, DELETE removes all the rows from table.

Delete rows from the customer Apache Iceberg table using data from the customer_opt_out Amazon Redshift native table. The DELETE operation uses the opt_out_ind column as a filter, removing only rows where the value is ‘Y’.

To perform a DELETE operation:

Verify the opt-out indicator data in the customer_opt_out table:

select * from dev.public.customer_opt_out
where opt_out_ind = 'Y';

Figure 9: verify customer records for opt out

Verify the current customer data for customer IDs 9 and 12:

select * from dev.demo_iceberg.customerwhere customer_id in(9,12);

Figure 0: verify existing customers data in customer table for opt out

Figure 10: verify existing customers data in customer table for opt out

Review the query execution plan:

EXPLAINDELETE FROM demo_iceberg.customerUSING public.customer_opt_out
WHERE customer.customer_id = customer_opt_out.customer_id
AND customer_opt_out.opt_out_ind = 'Y';

Figure 1: query plan for the DELETE queryThe execution plan shows Amazon S3 scans for Apache Iceberg format tables, indicating that Amazon Redshift removes rows directly from the Amazon S3 bucket.

Figure 11: query plan for the DELETE query. The execution plan shows Amazon S3 scans for Apache Iceberg format tables, indicating that Amazon Redshift removes rows directly from the Amazon S3 bucket.

Run the following DELETE operation:

DELETE FROM demo_iceberg.customer
USING public.customer_opt_out
WHERE customer.customer_id = customer_opt_out.customer_id
AND customer_opt_out.opt_out_ind = 'Y';

Verify that the rows were removed:

select * from dev.demo_iceberg.customer where customer_id in(9,12);

Figure 2: result set from customer table for opt out customer after delete

Figure 12: result set from customer table for opt out customer after delete

The query returns no rows, confirming that customer IDs 9 and 12 were successfully deleted from the customer table.

Best practices

After performing multiple UPDATE or DELETE operations, consider running table maintenance to optimize read performance:

For AWS Glue tables – Use AWS Glue table optimizers. For more information, see Table optimizers in the AWS Glue Developer Guide.
For S3 Tables – Use S3 Tables maintenance operations. For more information, see S3 Tables maintenance in the Amazon S3 User Guide.

Table maintenance merges and compacts deletion files generated by Merge-on-Read operations, improving query performance for subsequent reads.

Conclusion

You can use Amazon Redshift support for DELETE, UPDATE, and MERGE operations on Apache Iceberg tables to build data architectures that combine warehouse performance with data lake scalability. You can modify data at the row level while maintaining ACID compliance, giving you the same flexibility with Apache Iceberg tables as you have with native Amazon Redshift tables.

Get started:

About the authors

Source link

What's Hot

Apple scrambles to handle component price hikes – Computerworld

Kubernetes in the Age of AI – O’Reilly

The Download: a new hunt for dark matter and Kenya’s case for going solar

Getting started with Apache Iceberg write support in Amazon Redshift – Part 2

AI-assisted data development with Kiro and SageMaker Unified Studio

The Partner Well-Architected Framework: What’s New and What’s Next

U.S. Government Kills Anthropic’s Flagship Model |

How to Generate AI Videos using Gemini

The Missing Piece of Every Core System Transformation

Building AI shopping agent using Amazon Bedrock AgentCore Runtime and Amazon OpenSearch Service

Understanding U-Net Architecture in Deep Learning

Hard-braking events as indicators of road segment crash risk

Redefining AI efficiency with extreme compression

Apple scrambles to handle component price hikes – Computerworld

Kubernetes in the Age of AI – O’Reilly

The Download: a new hunt for dark matter and Kenya’s case for going solar

AI-assisted data development with Kiro and SageMaker Unified Studio

Our Picks

Apple scrambles to handle component price hikes – Computerworld

Kubernetes in the Age of AI – O’Reilly

What's Hot

Getting started with Apache Iceberg write support in Amazon Redshift – Part 2

Solution overview

Prerequisites

Data preparation

MERGE

UPDATE

DELETE

Best practices

Conclusion

About the authors

Related Posts

Subscribe to Updates