Close Menu
geekfence.comgeekfence.com
    What's Hot

    Designing trust & safety (T&S) in customer experience management (CXM): why T&S is becoming core to CXM operating model 

    January 24, 2026

    iPhone 18 Series Could Finally Bring Back Touch ID

    January 24, 2026

    The Visual Haystacks Benchmark! – The Berkeley Artificial Intelligence Research Blog

    January 24, 2026
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook Instagram
    geekfence.comgeekfence.com
    • Home
    • UK Tech News
    • AI
    • Big Data
    • Cyber Security
      • Cloud Computing
      • iOS Development
    • IoT
    • Mobile
    • Software
      • Software Development
      • Software Engineering
    • Technology
      • Green Technology
      • Nanotechnology
    • Telecom
    geekfence.comgeekfence.com
    Home»Big Data»Streamline large binary object migrations: A Kafka-based solution for Oracle to Amazon Aurora PostgreSQL and Amazon S3
    Big Data

    Streamline large binary object migrations: A Kafka-based solution for Oracle to Amazon Aurora PostgreSQL and Amazon S3

    AdminBy AdminJanuary 22, 2026No Comments9 Mins Read1 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Streamline large binary object migrations: A Kafka-based solution for Oracle to Amazon Aurora PostgreSQL and Amazon S3
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Customers migrating from on-premises Oracle databases to AWS face a challenge: efficiently relocating large object data types (LOBs) to object storage while maintaining data integrity and performance. This challenge originates from the traditional enterprise database design where LOBs are stored alongside structured data, leading to storage capacity constraints, backup complexity, and performance bottlenecks during data retrieval and processing. LOBs, which can include images, videos, and other large files, often cause traditional data migrations to suffer from slow speeds and LOB truncation issues. These issues are particularly problematic for long-running migrations that can span several years.

    In this post, we present a scalable solution that uses Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Aurora PostgreSQL-Compatible Edition, and Amazon MSK Connect. The data streaming enables data replication where modifications are sent and received in a continuous flow, allowing the target database to access and apply the changes in real time. This solution generates events for database actions such as insert, update, and delete, triggering AWS Lambda functions to download LOBs from the source Oracle database and upload them to Amazon Simple Storage Service (Amazon S3) buckets. Simultaneously, the streaming events migrate the structured data from the Oracle database to the target database while maintaining proper linking with their respective LOBs.

    The complete implementation is available on GitHub, including AWS Cloud Development Kit (AWS CDK) deployment code, configuration files, and setup instructions.

    Solution overview

    Although traditional Oracle database migrations handle structured data effectively, they struggle with LOBs that can include images, videos, and documents. These migrations often fail due to size limitations and truncation issues, creating significant business risks, including data loss, extended downtime, and project delays that can force you to delay your cloud transformation initiatives. The problem becomes more acute during long-running migrations spanning several years, where maintaining operational continuity is critical. This solution addresses the key challenges of LOB migration, enabling continuous, long-term operations without compromising performance or reliability.

    By removing the size limitations associated with traditional migration technologies, our solution provides a robust framework that helps you seamlessly relocate LOBs while facilitating data integrity throughout the process.

    Our approach uses a modern streaming architecture to alleviate the traditional constraints of Oracle LOB migration. The solution includes the following core components:

    • Amazon MSK – Provides the streaming infrastructure.
    • Amazon MSK Connect – Using two connectors:
      • Debezium Connector for Oracle as a source connector to capture row-level changes that occur in Oracle database. The connector emits change events and publishes to a Kafka source topic.
      • Debezium Connector for JDBC as a sink connector to consume events from Kafka source topic and then write those events to Aurora PostgreSQL-Compatible by using a JDBC driver.
    • Lambda function – Triggered by an event source mapping to Amazon MSK. The function processes events from the Kafka source topic, extracting the Oracle row primary key from each event payload. It uses this key to download the corresponding BLOB data from the source Oracle database and uploads it to Amazon S3, organizing files by primary key folders to maintain simple linking with the relational database records.
    • Amazon RDS for Oracle – Amazon Relational Database Service (Amazon RDS) for Oracle is used as the source database to simulate an on-premises Oracle database.
    • Aurora PostgreSQL-Compatible – Used as the target database for migrated data.
    • Amazon S3 – Used as object storage for storing the BLOB data from source database.

    The following diagram shows the Oracle LOB data migration architecture solution.

    Message flow

    When data changes occur in the source Amazon RDS for Oracle database, the solution executes the following sequence, moving through event detection and publication, BLOB processing with Lambda, and structured data processing:

    1. The Oracle source connector captures the change data capture (CDC) events, including the change to BLOB data column. This connector configures the BLOB data column to exclude from the Kafka event to optimize the Kafka payload.
    2. The connector publishes this event to an MSK topic.
      1. The MSK event triggers the BLOB Downloader Lambda function for the CDC events.
        1. The Lambda function examines two key conditions: the Debezium event code (specifically checking for create (c) or update(u)) and the configured list of Oracle BLOB table names along with their column names. When a Kafka message matches both the configured table list and valid Debezium events, the Lambda function initiates the BLOB data download from the Oracle source using the primary key and table name; otherwise, the function bypasses the BLOB download process. This selective approach makes sure the Lambda function only executes SQL queries when processing Kafka messages for tables containing BLOB data, optimizing database interactions.
        2. The Lambda function uploads the BLOB to Amazon S3, organizing by primary key folders with unique object names, which enables linking between structured database records and their corresponding BLOB data in Amazon S3.
      2. The PostgreSQL sink connector receives the event from the MSK topic.
        1. The connector applies these changes to the Aurora PostgreSQL database for the Oracle database changes except the BLOB data column. The BLOB data column is excluded by the Oracle source connector.

    Key benefits

    The solution offers the following key advantages:

    • Cost optimization and licensing – Our approach offers significant cost optimization benefits by reducing the overall size of your database and alleviating your need for expensive licenses associated with traditional databases and replication technologies. By decoupling LOB storage from the database and using Amazon S3, you can reduce your overall database footprint and reduce costs associated with traditional licensing and replication technologies. The streaming architecture also minimizes your infrastructure overhead during long-running migrations.
    • Avoids size constraints and migration failures – Traditional migration tools often impose size limitations on LOB transfers, leading to truncation issues and failed migrations. This solution removes those constraints entirely, so you can migrate LOBs of different sizes while maintaining data integrity. The event-driven architecture enables near real-time data replication, allowing your source systems to remain operational during migration.
    • Business continuity and operational excellence – Changes flow continuously to your target environment, allowing for business continuity. The solution preserves relationships between structured database records and their corresponding LOBs through primary key-based organization in Amazon S3, allowing for referential integrity while providing the flexibility of object storage for large files.
    • Architectural advantages – Storing LOBs in Amazon S3 while maintaining structured data in Aurora PostgreSQL-Compatible creates a clear separation. This architecture simplifies your backup and recovery operations, improves query performance on structured data, and provides flexible access patterns for binary objects through Amazon S3.

    Implementation best practices

    Consider the following best practices when implementing this solution:

    • Start small and scale gradually – To implement this solution, start with a pilot project using non-production data to validate your approach before committing to full-scale migration. This gives you a chance to work out issues in a controlled environment and refine your configuration without impacting production systems.
    • Monitoring – Set up comprehensive monitoring through Amazon CloudWatch to track key metrics like Kafka lag, Lambda function errors, and replication latency. Establish alerting thresholds early so you can catch and resolve issues quickly before they impact your migration timeline. Size your MSK cluster based on expected CDC volume and configure Lambda reserved concurrency to handle peak loads during initial data synchronization.
    • Security – For security, use encryption in transit and at rest for both structured data and LOBs, and follow the principle of least privilege when setting up AWS Identity and Access Management (IAM) roles and policies for your MSK cluster, Lambda functions, S3 buckets, and database instances. Document your schema mappings between Oracle and Aurora PostgreSQL-Compatible, including how database records link to their corresponding LOBs in Amazon S3.
    • Testing and preparation – Before you go live, test your failover and recovery procedures thoroughly. Validate scenarios like Lambda function failures, MSK cluster issues, and network connectivity problems to ensure you’re prepared for potential issues. Finally, remember that this streaming architecture maintains eventual consistency between your source and target systems, so there might be brief lag times during high-volume periods. Plan your cutover strategy with this in mind.

    Limitations and considerations

    Although this solution provides a robust approach for migrating Oracle databases with LOBs to AWS, there are several inherent constraints to understand before implementation.

    This solution requires network connectivity between your source Oracle database and AWS environment. For on-premises Oracle databases, you must establish AWS Direct Connect or VPN connectivity before deployment. Network bandwidth directly impacts replication speed and overall migration performance, so your connection must be able to handle the expected volume of CDC events and LOB transfers.

    The solution uses Debezium Connector for Oracle as the source connector and Debezium Connector for JDBC as the sink connector. This architecture is specifically designed for your Oracle-to-PostgreSQL migrations. Other database combinations require different connector configurations or might not be supported by the current implementation. Migration throughput is also constrained by your MSK cluster capacity and Lambda concurrency limits. You can also exceed AWS service quotas for large-scale migrations and you might need to request quota increases through AWS Enterprise Support.

    Conclusion

    In this post, we presented a solution that addresses the critical challenge of migrating your large binary objects from Oracle to AWS by using a streaming architecture that separates LOB storage from structured data. This approach avoids size constraints, reduces Oracle licensing costs, and preserves data integrity throughout extended migration periods.

    Ready to transform your Oracle migration strategy? Visit the GitHub repository, where you will find the complete AWS CDK deployment code, configuration files, and step-by-step instructions to get started.


    About the authors

    Naresh Dhiman

    Naresh Dhiman

    Naresh is a Sr. Solutions Architect at AWS supporting US federal customers. He has over 25 years of experience as a technology leader and is a recognized inventor with six patents. He specializes in containers, machine learning, and generative AI on AWS.

    Archana Sharma

    Archana Sharma

    Archana is a Sr. Database Specialist Solutions Architect, working with Worldwide Public Sector customers. She has years of experience in relational databases, and is passionate about helping customers in their journey to the AWS Cloud with a focus on database migration and modernization.

    Ron Kolwitz

    Ron Kolwitz

    Ron is a Sr. Solutions Architect supporting US Federal Government Sciences customers including NASA and the Department of Energy. He is especially passionate about aerospace and advancing the use of GenAI and quantum-based technologies for scientific research. In his free time, he enjoys spending time with his family of avid water-skiers.

    Karan Lakhwani

    Karan Lakhwani

    Karan is a Sr. Customer Solutions Manager at Amazon Web Services. He specializes in generative AI technologies and is an AWS Golden Jacket recipient. Outside of work, Karan enjoys finding new restaurants and skiing.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Data and Analytics Leaders Think They’re AI-Ready. They’re Probably Not. 

    January 24, 2026

    Alchemist: from Brickbuilder to a Databricks Marketplace App

    January 21, 2026

    The 5 Best Platforms Offering the Most Diverse Research Datasets in 2026

    January 20, 2026

    How to Handle Large Datasets in Python Like a Pro

    January 19, 2026

    Prompt Engineering Guide 2026

    January 18, 2026

    2026 AI Predictions: Why Data Integrity Matters More Than Ever

    January 17, 2026
    Top Posts

    Understanding U-Net Architecture in Deep Learning

    November 25, 202511 Views

    Hard-braking events as indicators of road segment crash risk

    January 14, 20269 Views

    Microsoft 365 Copilot now enables you to build apps and workflows

    October 29, 20258 Views
    Don't Miss

    Designing trust & safety (T&S) in customer experience management (CXM): why T&S is becoming core to CXM operating model 

    January 24, 2026

    Customer Experience (CX) now sits at the intersection of Artificial Intelligence (AI)-enabled automation, identity and access journeys, AI-generated content…

    iPhone 18 Series Could Finally Bring Back Touch ID

    January 24, 2026

    The Visual Haystacks Benchmark! – The Berkeley Artificial Intelligence Research Blog

    January 24, 2026

    Data and Analytics Leaders Think They’re AI-Ready. They’re Probably Not. 

    January 24, 2026
    Stay In Touch
    • Facebook
    • Instagram
    About Us

    At GeekFence, we are a team of tech-enthusiasts, industry watchers and content creators who believe that technology isn’t just about gadgets—it’s about how innovation transforms our lives, work and society. We’ve come together to build a place where readers, thinkers and industry insiders can converge to explore what’s next in tech.

    Our Picks

    Designing trust & safety (T&S) in customer experience management (CXM): why T&S is becoming core to CXM operating model 

    January 24, 2026

    iPhone 18 Series Could Finally Bring Back Touch ID

    January 24, 2026

    Subscribe to Updates

    Please enable JavaScript in your browser to complete this form.
    Loading
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2026 Geekfence.All Rigt Reserved.

    Type above and press Enter to search. Press Esc to cancel.