Data Engineering on AWS Cloud
Build Data Engineering Pipelines on AWS with Data Analytics Services: Glue,
EMR, Athena, Kinesis, Lambda, Redshift
Week 1-2: Introduction to Big Data and AWS
Big Data Basics:
Day 1-2: Define Big Data
- Introduction to Big Data concepts
- Historical context and growth of Big Data
Day 3-4: Explore Big Data Challenges
- Volume, Velocity, Variety, and Veracity
- Scalability and storage solutions
AWS Cloud Fundamentals:
Day 5: AWS Infrastructure
- Regions, Availability Zones (AZs), and Edge Locations
Day 6: AWS Global Infrastructure
- Data centers, networking, and global reach
Day 7: AWS Identity and Access Management (IAM)
- IAM roles, policies, users, and groups
- Security best practices
Week 3-4: AWS Data Analytics Fundamentals
AWS Glue:
Day 1-2: AWS Glue Introduction
- ETL basics and role of Glue
Day 3-4: Creating Glue Jobs
- Extracting, transforming, and loading data
- Data catalog and schema inference
Amazon Redshift:
Day 5-6: Redshift Setup and Configuration
- Creating and configuring Redshift clusters
Day 7: Querying Data with Redshift
- SQL queries for data analysis
Amazon Athena and Quick Sight:
Day 8-9: Amazon Athena
- Setting up Athena, querying data in S3
Day 10-11: Amazon Quick Sight
- Creating visualizations, dashboards, and sharing insights
Week 5-6: Big Data Technologies
Apache Hadoop and HDFS:
Day 1-3: Hadoop Fundamentals
- Hadoop architecture and ecosystem
Day 4-5: Hadoop Distributed File System (HDFS)
- File storage, replication, and data access
Apache Spark:
Day 6-8: Spark Introduction
- Spark RDDs and transformations
Day 9-10: Spark Applications
- Developing Spark applications, data processing
Apache Kafka:
Day 11-12: Kafka Introduction
- Pub-Sub messaging, Kafka architecture
Day 13-14: Kafka Setup and Usage
- Creating topics, producers, consumers
Week 7-8: AWS Big Data Services
Amazon EMR:
Day 1-3: Amazon EMR Setup
- Creating EMR clusters, cluster configuration
Day 4-5: Running Jobs on EMR
- Using Hadoop, Spark, Hive on EMR clusters
Amazon Kinesis:
Day 6-8: Kinesis Data Streams
- Creating and managing data streams
Day 9-10: Kinesis Data Firehose
- Data delivery to AWS services
Day 11-12: Kinesis Data Analytics
- Real-time data processing and analysis
AWS Lambda and Step Functions:
Day 13-14: AWS Lambda
- Serverless computing, Lambda functions
Day 15-16: AWS Step Functions
- Orchestrating serverless workflows, state machines
Week 9-10: Data Engineering with AWS
AWS Data Pipeline:
Day 1-3: Data Pipeline Introduction
- Data movement, transformation, and automation
Day 4-5: Creating Data Pipelines
- Defining data sources, destinations, and activities
AWS Glue Data Brew:
Day 6-8: Glue Data Brew Introduction
- Visual data preparation and transformation
Day 9-10: Data Cleaning and Transformation
- Using Data Brew for data cleaning, enrichment
Hands-on Practice & Project:
Day 11-14: Application of AWS services
- Building ETL pipelines, data workflows
- Real-world data engineering project
Instructor
