Azure Data Engineering and Data bricks with power bi
Chapter 1: Cloud Basics, Azure SQL
Cloud Introduction and Azure Basics:
- Azure Implementation: IaaS, PaaS, SaaS
- ADE Job Roles; Azure Storage Components
- Azure ETL & Streaming Components; Need for Azure Data Factory (ADF)
- Need for Azure Synapse Analytics; Azure Resources and Resource Types
- Azure Account, Subscription (Free); Azure SQL Server [Logical Server]; Firewall Rules
- Azure SQL Database & SQL Pool Deployment; DTU Versus DWU; SSMS Connections
Chapter 2: Synapse SQL Pools (DWH)
- Dedicated SQL Pools in Azure; Data Warehouse with Synapse
- Massively Parallel Processing (MPP); Control Nodes and Compute Nodes
- DMS: Data Movement Service; Start/Resume/Pause& Scaling
- SQL Pool Config @ TSQL Scripts; Start/Resume/Pause, Scaling Options
- Table Creations@ TSQL Scripts; Table Partitions: Left & Right
- Distributions: Round Robin, Hash
- Distributions: Replicate and Usage; Auto Indexing & Column Store; Planning for Big Data Loads; Need for ADF
Chapter 3: Azure Data Factory, Pipelines
ADF Concepts; ADF Pipelines
- Architecture; Integration Runtime (IR) & Use
- Linked Services and Datasets; Pipeline Activities: Copy Data Tool; DIU : Data
Integration Units; DTU Vs DWUs Vs DIU; ADF Pipeline with Copy Data Tool - Azure SQL DB to Synapse Data Loads
- Multi Tables Data Loads with ADF; Bulk Insert, Data Copy Methods
- ETL Staging: Storage Account; Staging Container Connections; DIU Allocations; ETL Pipeline Monitoring, Runs
Chapter 4: OnPremise Data Loads, Upsert
Copy Data Tool
- Incremental Loads; On-Premise Data Sources with Azure; Self Hosted Integration Runtime (IR); Access Keys, Remote Linked Service
- Synapse SQL Pool (DW), on Premise; ETL Staging with Storage Account
- Copy Method: Polybase – Tuning; Polybase : Big Data Loads
- ETL Pipelines for Incremental Loads; Business Keys For Table Upsert; Pipeline Schedules with ADF; ETL Logging with Storage Account
- Copy Method: UPSERT; DIU, DOCP & Publish; Manual Pipeline Executions in ADF
Chapter 5: File Incremental Loads in ADF
Incremental Loads with Files (BLOB); ETL Schedules
- Tumbling Window; Execution Retry and Delay Options; Binary Copy,
- Structural Data Loads; Incremental Loads Verification Tests
- Incompatible Rows & Fault Tolerance; Pipeline Compression & Tuning; Pipeline Publish, Monitor Options
- Azure Monitor Resource : Metrics; ADF Metrics; Pipeline Monitoring; Synapse:
Storage Monitoring, Alerts; Conditions, Signal Rules and Metrics; Alerts & Action
Groups: Emails
Chapter 6: ADF Data Flow – 1
- Data Flow Task, Data Flow Activity; Transformations with Data Flow
- Spark Cluster for Debugging; Cluster Node Configurations; Spark Cluster Types & Sizing; Transaction Optimized – Capacity
- Memory Optimized – Capacity; Data Cleansing with ADF; Data Orchestration with Data Flow
SELECT Transformation; Conditional Split Transformation; UNION, SELECT
Transformation; Spark Cluster For Pipeline Executions; Pipeline Monitoring & Run IDs;
Adding Data Flow
Chapter 7: ADF Data Flow – 2
- ADF Pipelines For ETL Operations; Data Flow Tasks, Activities in Synapse; JOIN &
EXISTS Transformations; Aggregate & Group By Transformations; - Window Functions, Rank in Data Flow; Rank / Dense Rank / Row Number; Derived
Column Transformation; Lookup, Surrogate Key, Parse; - Type Convert, Cast Transformations; Reusing Data Flow Tasks in Synapse; Pipeline
Validations & Executions; - Inline Datasets, Schema Drift; Data De-duplication with ADF; DFT
Optimization Techniques; Data Flow Task – Staging, Logging
Chapter 8: Azure Synapse Analytics
- Azure Synapse Analytics Resource; Azure Synapse Analytics Workspace; Managed
Resource Group, SQL Account; Synapse Workspace & Synapse Studio; Operations with
Synapse Workspace; DLS Gen 2 Storage Account, Container; - Synapse Studio: Scripts & Pipelines; Dedicated SQL Pools: Creation, Use; Synapse
Tables, Data Loads with TSQL; COPY INTO Statements with T-SQL; Row Terminator
and Compression; T-SQL Queries and Aggregations; Aggregation Data Loads; Synapse
Pipelines with TSQL; Stored Procedure Activity & Triggers
Chapter 9: Synapse Analytics with Spark
- Synapse Pipelines: Performance Advantage; Pivot Transformation For Normalization;
Generate Pivot Column, Aggregations; Pivot Transformation & Pivot Setting; Pivot Key
Selection, Value and Nulls - Pivoted Columns & Column Pattern; Column Prefix, Help Graphic, Metadata;
Denormalized Data and Aggregations; Apache Spark Pool in Azure Synapse - Spark Cluster Nodes: Vcores,Memory; Notebooks : Purpose, Usage Options; Python
Notebooks; Databases in Apache Spark Pool; Data Loads from Dedicated SQL Pools;
PySpark Code for Data Operations, Writes
Chapter 10: Synapse Security & Parameters
- Azure Active Directory (AAD) Users, Groups; IAM: Identity & Access Management;
Synapse Workspace Security with RBAC; ADF Security: RBAC, Owner, Contributor;
Azure Synapse SQL Pool Security: Logins; Creating SQL Logins & Users : master; SQL
Users in Azure SQL DB and SQL Pool - Grant, Control, Revoke: Security Roles; Parameters – Creation and Use in Pipelines;
Dynamic Connections with Credentials; User Name and Password Connectivity;
Dynamic Dataset Configurations; Pipeline Expressions with Parameters; Resource
Classes
Chapter 11: Change Data Capture (CDC)
- Change Data Capture (CDC) Data Loads; Incremental Loads with CDC Types; SQL
Server CDC : ETL Load Dates; Pipeline Expression, Data Window; JSON Parameters,
Pipeline Scheduling; ETL Optimization Techniques; Serverless Pool in Azure Synapse;
Connections, Use with Serverless Pool; Using Azure Open Datasets in Synapse; - OPENROWSET and BULK Data Loads; Working with Parquet Files in Synapse; Python
Notebooks (Pyspark) in Synapse
AZURE DATABRICKS
- Azure Cloud : SaaS, PaaS, PaaS & IaaS; Azure Cloud : Storage, ETL Resources; Azure
Databricks; Compute Resources; Need for Azure Databricks (ADB) - Azure Databricks :Purpose & Config; ;Azure Databricks Service Creation; Azure
Databricks Component; Azure Databricks Workspace, Usage; Spark Cluster
Configurations, Capacity; Driver Nodes, Worker Nodes in Spark - Cluster Types : Personal, Unrestricted; CPU, Memory & IO Resources; Virtual Machines
(VM) for Clusters; Databricks : Runtime & DBFS Storage; DBFS : Files, Tables with
Spark DB
Chapter 2: SparkDatabase, SQL Notebooks
- DBFS : File Uploads from ON-Premise; Creating Spark Tables; Spark DB; Data
Explorer: HIVE Metastore; Data Explorer; Spark Database, Tables; Notebooks: SQL,
Python and Scala; Creating SQL Notebooks in Databricks - Creating User Defined Spark Databases; Connecting / Using Spark Databases; Spark
SQL : Big Data Loads; Spark SQL : Database & Table List; Spark SQL; Data
Aggregations, Jobs; Spark SQL : Data Analytics, Reports; Analytics: X, Y Axis, Group
By; Notebooks : Export, Import, Clone; Notebooks : Storage & Versions
Chapter 3: Python Intro, Data Loads
- Python : Introduction, Real-time Use; Python For ETL and DWH; Python For Azure:
Data Engineer; Python Data Frames & Purpose; Python Dataframes – Pandas; Python
with Spark Integrations; PySpark for DDL and ETL; PySpark Versus SQL Notebooks;
Reading DBFS Data into Spark; Creating Dataframes for ETL; Temporary Views &
Dataframes; Spark Temp Views: Aggregations; Spark Table Loads, HIVE Data;
dataframe.write.format(); Spark Parquet Tables
Chapter 4: PySpark with ADLS
- Azure Storage Account : Creation; Azure Data Lake Storage : HNS; Creating Containers
in ADLS; BLOB File Uploads / Generation; Account Key : Access Key / SAS Key;
BLOB Access URL for Databricks; WASBS URL for PySpark Notebook; Generating
PySpark Script; PySpark Connection Variables; Databricks : Data Import Scripts; Config
Options with ADLS, Spark; spark.config (), Session Context; DataFrames with Temp
Tables; Escape Sequence ; HIVE & Spark DB
Chatper 5: PySpark Widgets & Spark
- Widgets : Notebook Parameters; dbutils.widget module : Text, Combo; Dropdown, Multi
Select Parameters; dbutils help(), get() & remove() - Dataframes, Spark SQL @ Variables; Python Data Frames, Spark SQL; Reading
Parameters Values; Parameters Versus Variables; Using Parameters For Temp Tables;
Using Parameters for Spark Tables - Data Storage and HIVE Metastore; Reading Parameterized Data; Format Strings with
PySpark; Dynamic Queries with Spark SQL; Aggregations and f Strings
Chapter 6: Architecture, Workflows
- Driver Nodes, Worker Nodes, DBUs; RDD : Resilent Data Distribution; DAG : Directed
Acyclic Graph; Hadoop HDES and Spot Instance; Cluster Manager, Master Node;
RDDS, Worker, Excecutor & Slave; Hadoop HDES & Databricks Runtime; - Databricks Optimization Techniques; Spot Instance, Photon Acceleration; All Purpose
Cluster, Job Cluster; Databricks Jobs: Creation & Tasks; Jobs with Parameters,
Executions; Task Dependency & Notifications; Continuous & Manual Schedules; Active
Jobs, Recent Run Jobs, Monitor
Chapter 7: Databricks Security, Scala
- Azure Databricks Security Operations; Azure Active Directory (Azure AD); AD Users
and RBAC with IAM; Owner, Contributor & Reader Roles; Workspace Admin
Permissions; Notebook Permissions & Share; Workflow Security, HTTP Path; User
Tokens & ServerName; Scala : Differences with PySpark; Scala - Variables Declaration, Usage; SparkSQL with Scala Notebooks; Temp Views with Scala
Notebooks; Aggregations with Scala Notebooks; Visual Data Analytics with Scala;
PySpark to Scala Conversions
Chapter 8: Scala with ADLS, Azure SQL
- Data Imports with Azure SQL DB; Using Scala for Big Data Loads; Spark SQL Queries
@ Temp Views; Variables, display(), spark.read(); Scala Transformations, display();
JSON, AVRO and DBFS Mounts; fs.azure.sas.container @ ADLS; dataframe.write.jdbc()
& JVM; JDBC Connection, DataframeWriter; Data Extraction, SQLContext - Spark Context and Spark Session; SQLServerDriver with Scala; ADLS with Scala
Notebooks; ;Parameters (Widgets) with Scala
Chapter 9: DeltaLake Incr Loads, DWH
- Azure DeltaLake Implementation; ACID Properties, Upsert Advantages; Delta Engine
Optimizations & Uses; Pipeline Creation: JSON Files in DBFS; Delta Tables Creation,
Data Loads; Spark Cluster Settings: Auto Optimize; Auto Compact, Delta Table
Optimize; JSON Files, Delta Streaming Location; Joins and Merge with Delta Tables - Incremental Loads, Delta Tables; Create & Use DWH with Databricks; Upsert (Merge)
with Spark Tables; Big Data & Jupyter Notebooks; Databricks with Data Factory (ADF);
End to End Implementations
POWERBI
Ch 1: POWER BI INTRODUCTION
- Power BI : Introduction to Analytics
- Power BI Tools Suite, Advantages
- Power BI : Career Options, Plan
- Power BI Developer Job Role
- Microsoft Data Analyst Job Role
- Big Data Analyst Job Role
- Power BI Data Analyst (PL 300)
- Data Engineer*, Power BI (DP 500 *)
- Artificial Intelligence (AI) Visuals
- AI Enabled Power BI Features
- Course – Lab Plan with Design Tools
- Need for Power Query & DAX
- Power BI Licensing Types
- Power BI Cloud – Advantages
- Power BI Report Server Advantages
Ch 2: Basic Report Design
- Power BI Eco System: Architecture
- Data Sources & Types in Real-world
- Report Types: Interactive, Paginated
- Analytical Reports & Mobile Reports
- Data Sources : File, Database, Web
- Visualizations : Report Shapes
- Power BI Design Tools, Requirements
- Power BI Desktop Tool : Installation
- Desktop Interface: Overview, Canvas
- Get Data, Data View, Report View
- In-Memory Xvelocity Database
- Basic Visuals: Table, Tree Map
- Data Labels, Legend, Category
- Local Store: PBIX & PBIT Files
- Data Points and Tooltips
Ch 3: Visual Interaction, Visual Sync
- Visual Interaction with Data Points, Disabling / Enabling Interactions
- Edit Interactions: Format Options, Spotlight and Focus Mode
- Report Export to CSV, PDF, Tooltip Options and Usage
- Working with Pages in PBI, Rename, Duplicate, Hide Pages
- Slicer Visual : Real-time Usage, Orientation, Selection Properties
- Slicer Settings : Tiles & Slider, Single & Multi Select, Header
- Number, Text, Show Summary,Date Slicer and Value Selections,
- Slicer List, Dropdowns & Clear
Ch 4: Grouping & Hierarchies
- Grouping : Visuals with Pdf Sources, List Grouping and Binning Options
- Grouping Static / Fixed Data Values, Grouping Dynamic / Changing Data
- Bin Size and Bin Limits (Max, Min),Bin Count and Grouping Options
- Group with Bins & Clustering, Group, Layer with Selection Pane
- Creating Hierarchies in Power BI, Independent, Dependant Drill-Down
- Drill-Down with Interactive Reports,Conditional Drilldowns, Data Points
- Drill Up Buttons and Operations,Expand & Show Next Level
- Dynamic Data Drills Limitations
Ch 5: Filters & Bookmarks
- Filters : Types and Usage in Real-time, Visual Filter, Page Filter, Report Filter
- Basic, Advanced and TOP N Filters,Category and Summary Level Filters
- Data / Drill Options, DrillThru Filters, Keep All Filters” Options in DrillThru
- CrossReport Filters, Include, Exclude, Drill-thru Filters, Page Navigations
- Bookmarks : Report Navigations, Buttons, Images with Actions
- Selection Pane, Actions, Text URLs, Show Data and See Records
- Custom Tooltips, Table Visual, Table Vs Matrix : Drill-downs
- Styles, Cell Properties, Databars, Conditional Formatting, Divergent
Ch 6: Big Data Access, Visuals
- OLTP Databases, Big Data Sources, Azure Database Access, Reports
- Import, Direct Query & Dual Mode, Data Modeling: Do Not Summarize
- Data Modeling: Currency, Relations,Power BI Archtiecture, Eco System
- Power BI Interface for Reports, Stacked Chart, Clustered Chart
- Line Chart, Area Chart, Bar Chart, 100% Stacked Bar & Column Chart
- Map Visuals: Tree, Filled, Bubble, Small Multiples, Legends, Axis
- Cards, Funnel, Table, Matrix, Scatter Chart : Play Axis, Labels
- Waterfall Chart, Multi Row Cards
Ch 7: POWER QUERY LEVEL 1
- Power Query M Language Purpose
- Power Query Architecture and ETL
- Data Types, Literals and Values
- Power Query Transformation Types
- Table & Column Transformations
- Text & Number Transformations
- Date, Time and Structured Data
- let, source, in statements @ M Lang
- Get Data, Table Creations and Edit
- ETL Operations with Power Query
- Merge Transformations in Power BI
- Join Kinds: Inner, Outer & Apply
- Union All Transformation & Appends
- Power Query Editor, Step Edits
- Close & Apply Options. Report Design
Ch 8: POWER QUERY LEVEL 2
- Query Duplicate, Query Reference
- Group By and Advanced Options
- Aggregations with Power Query
- Transpose, Header Promotion
- Reverse Rows and Row Count
- Data Type Changes & Detection
- Replace Columns: Text, NonText
- Advanced Query Edit Options
- Replace Nulls: Fill Up, Fill Down
- Pivot, Unpivot Transformations
- Move Column and Split Column
- Date & Time Transformations
- Derive Year, Quarter, Month, Day
- Add Column : Query Expressions
- Query Step Inserts and Step Edits
Ch 9: POWER QUERY LEVEL 3
- Big Data Loads : Parameter Queries
- Creating Parameters in Power Query
- Parameter Data Types, Default Lists
- Static & Dynamic Lists: List Queries
- Convert Tables to Lists, Use Cases
- Linking Parameters to Queries
- Testing Parameters with Canvas
- Multi-Valued Parameter Lists
- Creating Lists in Power Query
- Converting Lists to Table Data
- Invoke Function, Type Conversions
- Function Query & Parameter List
- Columns From Examples, Indexes
- Conditional Columns, Expressions
- Disable / Enable Data Loads
Ch 10: POWER BI CLOUD – 1
- Power BI Cloud Components
- App Workspaces, Report Publish
- Reports & Related Datasets Cloud
- Creating New Reports in Cloud
- Report Publish, Report Uploads
- Report Edits and New Reports
- Report Actions: Downloads
- Dataset Usage Options in Cloud
- Dashboards Creation and Usage
- Pining Visuals and Report Pages
- Visual Pin Actions in Dashboards
- Dashboard & LIVE Interactions
- Media Tiles: Images, Custom Links
- Q & A Option with Dashboards
- Pin with Q & A; Standard Visuals
Ch 11: POWER BI CLOUD – 2
- Report Actions : Share, Subscribe
- Report Actions : Lineage, Embed
- Report Actions : Export Options
- Report Actions : Public User Access
- Dashboard Actions : Share, Subscribe
- Dashboard Actions : Themes, Lineage
- Dashboard Actions : Share, Subscribe
- Favorite, Insights, Embed Code
- Gateways Configuration, PBI Service
- Gateway Types, Cloud Connections
- Gateway Cluster, Add Data Sources
- Data Refresh : Manual, Scheduled
- Power Query Parameters, Gateways
- DataFlows, Power Query in Cloud
- Lineage, Share, Subscribe, Insights
- Performance Inspector& Gateways
Ch 12: POWER BI CLOUD – 3
- Workbooks : Excel Online & Pins
- Power BI Apps: Creation & Usage
- Power BI Segments, Content
- Navigation Screens, Audience
- App Publish, Verification & Edits
- Export, Share & Subscribe
- List View & Lineage View Options
- Power BI Scorecards: Realtime Use
- Paginated Reports – Design & Usage
- Power BI Report Builder Tool
- Microsoft Report Builder Tool
- Report Builder : Datasets, Charts
- Report Builder : Bar Charts, Fields
- Report Builder : Creating RDL Files
- Paginated Reports : Deployments
Ch 13: DAX Functions – Level 1
- DAX : Importance in Real-time
- DAX Data Types, Syntax Rules
- DAX Measures and Columns
- ROW Context and Filter Context
- Operators, Special Characters
- DAX Functions, Vertipaq Engine
- DAX Cheat Sheet : Expressions
- Data Analytics with DAX
- DAX Measures : Expressions
- ISBLANK, IF, IN, SUM
- SUMX, AVG, AVERAGEX
- Data Models: Fact, Dimensions
- Detecting Relations for DAX
- Star & Snowflake Schemas
- Data Modeling Options in DAX
Ch 14: DAX Functions – Level 2
- Quick Measures in Power BI
- Average and Filtered Average
- Running Totals, EARLIER( )
- RELATED, COUNTROWS
- CALCULATE Function Conditions
- ALL Members Scope & IN
- Account and Time Calculations
- Star Rating, DAX Expressions
- Data Modeling Options in DAX
- 1:1, 1:M and M:1 Relations
- Working with Facts & Measures
- Modeling : Missing Relations
- Relationships & Importance
- Modeling : Relation Management
- Modeling with Multiple Keys
Ch 15: DAX Functions – Level 3
- DAX : Variables and Expressions
- Dynamic Expressions, RETURN
- Current Value, Previous Value
- SELECTED VALUE, Joins
- FORMAT Function with DAX
- RELATED, Joins in DAX
- DAX Expressions with SQL DB
- Time Intelligence Functions
- Date Dimension : Generation
- CALENDAR(), DATESYTD()
- TOTALYTD, TOTALQTD
- TODAY, DATE, DAY with DAX
- SELECTEDVALUE, FORMAT
- Date, Time and Text Functions
Ch 16: DAX Functions – Level 4
- RLS: Row Level Security
- Data Models in Power BI Desktop
- DAX Roles Creation and Testing
- DAX Expressions & Operators
- PBIX Uploads: Power BI Cloud
- Dataset Security with DAX Roles
- Entity Sets and Slicing in DAX
- Dataflows with Power BI
- Analytical Reports – DAX Usage
- Creating Data Models with DAX
- Datasets in Excel and Dashboards
- Using Excel Analyzer in Power BI
- Power BI Data Source in Excel
- Connection Strings and Refresh
- Analytical Reports – Limitations
Ch 17: Power BI Report Server
- Power BI Report Server Config
- SQL Server Instance Verifications
- Report Server DB, Temp Database
- WebService & WebPortal URL
- Uploading Interactive Reports
- End User Report Share (pdf)
- Power BI Desktop RS Tool
- Interactive Reports: Report Server
- Mobile Reports : Design Options
- Mobile Reports : Grids, Elements
- Mobile Reports : Uploads, Edits
- Paginated Reports : Deployments
- Paginated Vs Interactive Reports
- Paginated Vs Analytical Reports
- Paginated Vs Mobile Reports
- Power BI Report Server Vs Cloud
Ch 18: Power BI Admin & AI
- Power BI Cloud Management, Power BI Admin : Alerts
- Workspace Management, Users
- Security: Report, Dataset Levels, Security: Dataset, App Levels
- Security: Workspace Options
- PBI Performance Inspector
- Power BI & Artificial Intelligence, Power BI & CoPilot Add-Ins
- AI Visuals & Big Data Analytics
- Smart Narrative and Q & A
- Infographics, Icons and Labels
- Key Influencer Visual in Power BI, Metrics Visual, Performance
- Paginated Reports Visual
Instructor
