[AZ-305] Design Data Storage Solutions

หัวข้อที่จดๆ จาก MS Learn ไว้ก็ประมาณนี้

Design a data storage solution for non-relational data

- Design for data storage

types of data: Structured (Table) / Semi-structured (JSON/XML) / Unstructured (Stream / VDO / Image)
ดูเพิ่มได้จาก DP-900: Identify data formats
Choosing data storage
- Azure Blob Storage - Store vast amounts of unstructured data
- Azure Files - file shares protocol SMB / Network File System (NFS) อ๋อมี REST API
- Azure managed disks - ใช้กับ Azure VM
- Azure Queue Storage - backlog of work to process asynchronously

- Design for Azure storage accounts

Azure storage account
- groups together all of your Azure Storage services
- unique namespace สำหรับ access
- storage account - มันจะจัดการพวก location/ replication strategy / Access Tier เป็นต้น
- มี 4 แบบ Standard general-purpose v2 / Premium block blobs / Premium file shares / Premium page blobs
สิ่งที่ต้องพิจารณา storage account types
- storage locations
- compliance requirements
- data storage costs. ขึ้นกับ Service ที่เลือก Standard/ Premium และตัว Storage Account ยังเอามาแยก Cost Center ได้ด้วยนะ
- replication scenarios
- administrative overhead - ถ้ามีหลาย Storage ทาง admin ต้องมาตามประกบ Policy / Config
- data sensitivity - public data / sensitive data ต้องแยก Storage Account ออกมาจัดการ
- data isolation

- Design for data redundancy

แต่ละอันต่างยังไง Microsoft Azure LRS vs ZRS vs GRS vs GZRS
- พวก GRS: LRS > LRS ของอีก Region
- พวก GZRS: ZRS > LRS ของอีก Region
when using data redundancy ดู Check List ดังนี้
- access requirements - only primary หรือ ต้อง replicate ไป zone / region อื่นๆ
- primary replication options ใน primary region มี LRS / ZRS เราสนอันไหน Cost / HA
- locally redundant storage - low cost/ limited durability ทำ LRS ใน DC
- zone-redundant storage. - ไม่ป้องกัน region fail
- secondary regions
-> ทำ GRS and GZRS ต้อง Failover ก่อน ถึง read ได้
-> ได้ถ้าต้องการ read access เพิ่ม RA-GRS / RA-GZRS

Design for Azure Blob Storage

access tier: Premium Blob Storage (ดีสุด) / Hot Access Tier / Cool Access Tier (30 วัน) / Cold Access Tier (90 วัน) / Archive access tier (ช้า แต่ถูก / 180 วัน ++) แต่ละแบบจะส่งผลกับ availability and latency

เหมือนเขียน Blog ไว้ปลายปี 2022 (Cold ไม่มี)

WORM = Write Once Read Many
when implementing Azure Blob Storage ต้องดู
- Blob Storage availability - ดู access tier
- Blob Storage latency - required time to access ดีสุด premium blob storage
- Blob Storage costs
- immutable storage ป้องกัน Modified/Delete โดยดูจาก Time-based retention policies / Legal hold policies - ไม่มีระยะเวลานะ เก็บจดกว่าจะเอา Legal ออก

- Design for Azure Files

Azure Files
- จาก On-Premises NAS / File Share ขึ้นมา
- encrypted in transit/rest
- Access Method
-> Direct mount: SMB access
-> Azure File Sync (Cache Azure File Shared ที่ on-premises)
performance ขึ้นกับการเลือก
- Storage account Standard / Premium
- Storage tier Premium/Transaction optimized (heavy workloads that don't need the latency) /Hot access tier/Cool access tier
NOTE: มีผลกับ Latency/IOPS/Bandwidth
แล้วเราควรใช้ Azure File Share / Blob หรือตัวอื่นดูจาก
Compare NFS access to Azure Files, Blob Storage, and Azure NetApp Files

- Design for Azure managed disks

Azure managed disks - สำหรับ VM

when using managed disks

scenarios, throughput, and IOPS
- Ultra-disk - SAP / DB Server
- Premium SSD - Production + Performance
- Standard SSD - Web servers / DEV / Test
- Standard HDD - Backup
ดู data caching เอามาทด เพื่อทำให้เร็วขึ้นได้ โดย OS Disk (Read/Write) / Data Disk (Read) แต่ต้องใช้ Disk เล็กกว่า 4 TB
ดู encryption
- Azure Disk Encryption (ADE) ของ VM
- Server-Side Encryption (SSE) encryption at rest
- กรณีที่ SSE ไม่ได้เปิด Encryption at host

- Design for storage security

implementing storage security

Azure security baseline options
shared access signatures - Define the access permissions for resources. Configure how long the SAS remains valid.
firewall policies and rules
service endpoints - ไม่ให้เข้าผ่าน public IP

private endpoints

secure transfer - Enable secure transfer ที่ storage accounts.
customer-managed keys - Manage encryption keys for your storage account ใน Azure Key Vault
Design for storage security - Training | Microsoft Learn

Knowledge check: Design a data storage solution for non-relational data

Design a data storage solution for relational data

SQL Server มี 3 แบบ Azure SQL / Azure SQL Manage Instance / SQL Server on VM

ดูจากตารางข้างต้น

Azure SQL เหมาะ DB ขนาดใหญ่ มี 100 TB รับได้ทั้งแบบ
- vCore (Recommend - ใช้ Azure Hybrid Benefit หรือ reserved capacity)
- DTU - ไม่ต้องมาสนใจ CPU / RAM / Storage คิดเหมาๆตาม Database Transaction Unit เหมือนอะไรแรงม้าเครื่องยนต์ ราคาลองไปกดตรวจได้
- serverless
- elastic database pools - สำหรับเคสมีหลาย DB แล้วอยากให้ Share Resource ร่วมกัน จะได้มี Limit Peak Resource ได้ หรือ pool share
Azure SQL Managed Instance เหมาะกับย้ายจาก On-Premise > Cloud แต่ต้องระวัง instance-scoped features เช่น Service Broker, CLR, SQL Server Agent ดูให้ครบก่อนย้าย และ มันใช้ Model
- vCore Scale CPU+Disk
- instance pool
SQL Server on Azure VM เคสที่ไป Azure SQL Managed Instance ไม่ได้ ถ้าใช้แล้วควร
- จัดการ server access / automated management มันเป็น VM เราต้องดูแลมัน
- Azure Hybrid Benefit

Azure SQL/SQL MI-dynamic scalability

vertical scaling (scaling up) แก้ CPU Storage ให้ Instance DTU / vCore กำหนด Max //Elastic ได้ min + max
horizontal scaling (scaling out) ใช้
- sharding partition data
- หรือ read scale-out provisioning (Always On Availability Group)

Scale แบบไหนดี?

Elastic database pools and vertical scaling - เพิ่ม Spec
- Multiple Azure SQL + Scale + unpredictable resource
Horizontal scaling and sharding - เพิ่ม DB
- Different sections of a database reside in different geographic locations for compliance reasons
- sharding to split your data into several databases and scale them independently
Elastic database tools and elastic query
- Dependency support for commercial BI or data integration tools, where multiple databases contribute rows into a single overall result for use in Excel, Power BI, or Tableau
- T-SQL that spans multiple databases in Azure SQL Database. Run cross-database queries to access remote tables

ทำไมเหมือนซ้ำกับที่เคยดูในหัวข้อก่อนๆ

General Purpose - tempdb (remote-attached SSD) + data and log files are stored in Azure Premium Storage
Business Critical - Always On availability group + direct-attached SSD ต่อตรง ลด Latency
Hyperscale - เฉพาะ Azure SQL

ลองดูเพิ่มเติมได้ใน Recommend a solution for database availability

ตัว VM ทำได้นะ แต่เราทำเองหมดไง สร้าง VM หลาย Region ทำ Alway On AG / Backup เอง ส่วนพวก Cloud Azure SQL บางอันที่ใช้ On-Premise บน Cloud ไม่ได้นะ อย่าง
- Log Shipping เหมือน Huawei / AWS ทำได้นะ
- Replication Azure SQL MI ทำไห้

- Design security for data at rest, data in motion, and data in use

จากเดิม AZ-900 / SC-900 จะมีบอก Data State 3 จุด Data at rest / Data in transit / Data in process

Data at rest (TDE at page level)
- ตัว TDK ใช้ Database Encryption Key (DEK) ที่ถูกจัดการจาก Service-managed TDE / Customer-managed TDE
- Always On AG - ตัว DEK ต้องไปเอาติดตั้งทุกๆ Server
Data in transit/motion (SSL/TLS)
- ถ้าข้าม Site ไปดู Network เพิ่ม // https ระหว่าง DB + Storage
- หรือทำ VPN > Express Route
Data in Process (Dynamic data masking ทำเฉพาะ Data ที่สำคัญ)
- มี Feature Dynamic data masking ตรวจตอน Query กำหนดได้จาก Azure Portal SQL

data classification - การทำ Catalog ข้อมูล และ Label Public, Confidential, or Restricted

- Design for Azure SQL Edge

Azure SQL Edge Container SQL Server ขนาดเล็กสำหรับ IoT มีรูปแบบติดตั้ง 2 แบบ Connected / Disconnected (ติดตั้งแยกเอง) deployment ก่อน Deploy ต้องมาสนใจ Platform and system security / Authentication and authorization / Database object security (TDE) / Application security (App IoT)
เรื่องอื่นๆ ที่ต้องมาพิจารณาด้วย
- network connectivity limitations
- slow or intermittent broadband connection มันมีผลกับการ Sync
- data security and privacy concerns
- synchronization and connectivity to back-end systems
- code and skill familiarity - มันจะเกี่ยวกับ Deployment Application Security

Note: ปี 2025 ไปแล้ว Beginning September 30th, 2025 Azure SQL edge service will be retired

ส่วนตัวชอบเอา Docker ของ Azure SQL Edge มาลองเล่น เล็กดี รอดูว่าปี 2025 Container ยังอยู่ไหม

- Design for Azure Cosmos DB and Table Storage

ย้ายไปดูหัวข้อ แล้วเกี่ยวอะไรกับ Relational Data ?

พวก Table API เอง มีตารางสรุป ดังนี้

Knowledge check: Design a data storage solution for relational data
Summary Resource: Summary and resources

Design data integration

- Design a data integration solution with Azure Data Factory

Azure Data Factory - cloud-based data integration service ที่เริ่มจาก 1. Connect and collect / 2. Transform and enrich / 3. CI/CD for ETL Process / 4. Monitor

Recap Component ของ ADF: Pipelines and activities / Datasets / Linked services / Data flows (transformation logic without writing code) / Integration runtimes ( bridge between the activity and linked Services objects)
ก่อนจะใช้ Azure Data Factory สิ่งที่ควรพิจารณา
- requirements for data integration
- coding resources - ทำ pipelines/data flows โดยเลือกจาก Code หรือ Low Code
- data sources ที่ต้องการใช้งาน มันมี connectors ให้ใช้
- serverless infrastructure

- Design a data integration solution with Azure Data Lake

Data lake is a repository of data that are stored in its natural format, usually as blobs or files

ถ้าเทียบกับ Azure Data Factory จะมองว่า
🚀 Azure Data Factory (Pre-process)
🧱 และ Azure Data Lake (Optimize Storage for Big Data) รอบรับการทำ
1. ingest real-time data directly from multiple sources
2. Support Apache Hadoop Distributed File System (HDFS)
3. รองรับไฟล์หลายรูปแบบ JSON, CSV, log + เก็บ Raw Data
Steps
- Ingest data
- Access stored data ได้หลายตัว Azure CLI / Azure Storage Explorer / SDKs เป็น
- Configure access control - RBAC / Postfix / ACL

choosing Azure Blob Storage or Azure Data Lake

NEXT: Design a data integration and analytic solution with Azure Databricks - Training | Microsoft Learn

- Design a data integration and analytic solution with Azure Databricks

Databricks = Unified Data Analytic Platform เป็น Solution ด้าน AI เลย โดยรวมทุกอย่างตั้งแต่ Data Preparation + Train และมี built-in core API เช่น SQL, Java, Python, R, and Scala. สามาถจัดการในส่วน Control Plane/ Data Plane
Tools มีสามมุม SQL / Data Science and Engineering / Machine Learning

โดยการใช้งานมีสิ่งที่ต้องพิจารณา ดังนี้

data science preparation of data - ใช้ data clusters ให้เหมาะกับ Job
insights in the data - recommendation engines / churn analysis / intrusion detection รวมถึง Databricks SQL แต่ไม่เหมาะกับ unstructured data
productivity across data and analytics teams - ใช้ shared workspaces แต่ละฝ่ายทำงานร่วมกันได้ดีขึ้น
big data workloads - ใช้ Tools ให้เหมาะสม Integrate กับ Service ของ Azure
machine learning programs - integrated end-to-end ML environment ที่ช่วย tracking, model training, feature development and management/ feature and model serving.

- Design a data integration and analytic solution with Azure Synapse Analytics

Azure Synapse Analytics เป็นชุดเครื่องมีที่ช่วยทำ Data Analytic ได้สะดวกรองรับในส่วน data ingestion, exploration และ transformation รวมถึงรองรับการทำ massively parallel processing (MPP) จาก PolyBase ใช้งานได้ SQL / NOSQL Azure Synapse Analytics มี Component ดังนี้

Azure Synapse SQL pool - serverless (Unplan) and dedicated (Plan) resource ที่รองรับ MPP
Azure Synapse Spark pool - Apache Spark Cluster รองรับ Custom Logic โดยใช้ Python, Scala, SQL, and C#
Azure Synapse Pipelines - cloud-based ETL and data integration service (Azure Data Factory)
Azure Synapse Link - near real-time analytics จากข้อมูลใน Azure Cosmos DB
Azure Synapse Studio - web-based IDE ที่ใช้จัดการ และทำงาน Analytic

Azure Data Factory or Azure Synapse Analytics

variety of data sources - ETL แบบ Free Code
Machine Learning - ใช้ built-in support Azure ML ได้
data lake integration - ถ้ามี data lake แล้วต่อ Azure Synapse
real-time analytics - ใช้ Azure Synapse Link

- Design strategies for hot, warm, and cold data paths

อ่านอันนี้ก็งงนะ Data Path คือ อะไร 5555

- Design an Azure Stream Analytics solution for data analysis

Azure Stream Analytics is a fully managed (PaaS offering), real-time analytics and complex event-processing engine //เพิ่งรู้ว่ามีอันนี้ ตอนดู 305 นี่แหละ หรือ อาจจะไม่ได้สนใจแต่แรก โดยเจ้า Stream Analytic รองรับ

Data streams - help us understand change over time
Event processing - temporal information at specific time เช่น เวลาที่รถผ่านด่านเก็บเงิน เป็นต้น

Azure Stream Analytics เอามาใช้ใน use-case ไหนบ้าง

Analyze real-time telemetry streams from IoT devices.
Build web logs and clickstream analytics.
Create geospatial analytics.
Execute remote monitoring and predictive maintenance of high value assets.
Perform real-time analytics on point-of-sale data.

สิ่งที่ควรพิจารณาเวลาใช้งาน Azure Stream Analytics

provisioning requirements - พอมันเป็น PAAS ไม่ต้องดู Maintenance Cost / Focus Business
costs - Streaming Units (SUs), Scaling up/down.
implementation
performance - complex queries to be parallelized and executed on multiple streaming nodes
security - TLS 1.2 และ In-Memory Processing

Summary: Summary and resources

Knowledge check: Design data integration

Reference

AZ-305: Design data storage solutions - Training | Microsoft Learn

Discover more from naiwaen@DebuggingSoft

Subscribe to get the latest posts sent to your email.

[AZ-305] Design data storage solutions

Design a data storage solution for non-relational data

- Design for data storage

- Design for Azure storage accounts

- Design for data redundancy

Design for Azure Blob Storage

- Design for Azure Files

- Design for Azure managed disks

- Design for storage security

Design a data storage solution for relational data

- Design security for data at rest, data in motion, and data in use

- Design for Azure SQL Edge

- Design for Azure Cosmos DB and Table Storage

Design data integration

- Design a data integration solution with Azure Data Factory

- Design a data integration solution with Azure Data Lake

- Design a data integration and analytic solution with Azure Databricks

- Design a data integration and analytic solution with Azure Synapse Analytics

- Design strategies for hot, warm, and cold data paths

- Design an Azure Stream Analytics solution for data analysis

Reference

Like this:

Related

Discover more from naiwaen@DebuggingSoft

Design a data storage solution for non-relational data

- Design for data storage

- Design for Azure storage accounts

- Design for data redundancy

Design for Azure Blob Storage

- Design for Azure Files

- Design for Azure managed disks

- Design for storage security

Design a data storage solution for relational data

- Recommend a solution for database scalability

- Recommend a solution for database availability

- Design security for data at rest, data in motion, and data in use

- Design for Azure SQL Edge

- Design for Azure Cosmos DB and Table Storage

Design data integration

- Design a data integration solution with Azure Data Factory

- Design a data integration solution with Azure Data Lake

- Design a data integration and analytic solution with Azure Databricks

- Design a data integration and analytic solution with Azure Synapse Analytics

- Design strategies for hot, warm, and cold data paths

- Design an Azure Stream Analytics solution for data analysis

Reference

Share this:

Like this:

Related

Discover more from naiwaen@DebuggingSoft

Related Posts

ขั้นตอนย้าย Nameservers ของ Domain ไป Cloudflare

ลองเขียน C# WebAPI เรียกใช้ Azure Document Intelligent อ่านใบเสร็จ

ลอง Azure Document Intelligent – REST API อ่านใบเสร็จ