SAA-C03 CERTIFICATION NOTE – DAY 10

Spread the love

AWS DataSync – Discover and move your data between on-premises, AWS, and other cloud storage with end-to-end security, including data encryption and integrity validation. AWS DataSync is a fully managed data transfer service that simplifies, automates, and accelerates transferring data between on-premises storage systems and Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server. To use AWS DataSync for this task, you should first install an AWS DataSync agent in the on-premises data center. This agent is a lightweight software application that you install on your on-premises data source. The agent communicates with the AWS DataSync service to transfer data between the data source and target locations.

Workflow of AWS Glue

Define crawler to populate the Data catalog
Create the ETL job
AWS Glue generate script for the ETL job or you can also provide/write one
Run the job on-demand or define the scheduler or Trigger
Extract data from DS, transforms it & load it into the Data Target

What makes AWS Glue Data Catalog

Databases: A logical group of tables
Tables: Metadata definition that represents dara
Crawlers & Classifiers: Detect & infer schemas to store it in Dara catalog
Connections: An object that contains the properties to connect to a particular data store
AWS Glue Schema Registry: Schema & Registry for streaming data

AWS Glue Job bookmarks are used to track the source data that has already been processed, preventing the reprocessing of old data. Job bookmarks can be used with JDBC data sources and some Amazon Simple Storage Service (Amazon S3) sources. Job bookmarks are tied to jobs. If you delete a job, then its job bookmark is also deleted.

Leave a Reply Cancel reply