Always On SQ L Assaf Frenkel

Mission-Critical Availability
• Detect failures reliably

• Able to stand multiple failures
• Unified solution
• Easy to configure, manage, and monitor
• Reuse existing investments

• SAN/DAS environments
• Allow using HA hardware resources

• Fast seamless failover
AlwaysOn
SQL Server HA/DR Technologies
Failover Cluster Instances Availability Groups

(for servers) (for groups of databases)
Pre-existent New
Server failover Multi-database Failover
Useful in consolidation scenarios DBs that app depends on
Shared storage (SAN / SMB) Direct attached storage
Depends on storage redundancy Log synchronization
Failover takes minutes Failover takes seconds
Server restart Secondary replicas are
online
Multi-node instance
Multiple Secondary Replicas
Passive secondary nodes
Active Secondary Replicas
Failover Cluster Instances
Enhancements in SQL Server 2012
• Multi-Site Clusters
Clusters across subnets without stretch vLAN
• TempDB on local disk

Improved performance, better SAN utilization
• Indirect Checkpoints
More predictable DB recovery (failover) times
• Flexible Failover Policy

Sensitivity to failures for automatic failover
Availability Groups
Introduced in SQL Server 2012
Integrated Efficient
Multi-database Seamless App Active Secondaries
Failover Connectivity
Read workloads
Multiple Configuration
Wizard Backups
secondaries (4)
Monitoring PowerShell
Sync (max 2) / Async Dashboard Automation
Compression & Diagnostics Fast Failover
Encryption infrastructure
Manual/Automatic System Center
Failover integration
Flexible Failover Full cross-feature
Policy support
Automatic Page Contained Databases,

FileStream, FileTable,
Repair Service Broker
An Availability Group Deployment
Sync Log Async Log

Synchronization Synchronization
AlwaysOn
SQL Server HA/DR Technologies
Failover Cluster
Availability Groups
(for groups of databases)
Instances
(for servers)
Pre-existent
Increased Number of Support
New for Windows
Secondaries
Server failover Cluster Shared
Multi-database Failover
Useful in consolidation scenarios Volumes
DBs that app depends on
Increased Availability of
Shared storage (SAN / SMB) Direct attached storage
Readable Secondaries
Depends on storage redundancy Log synchronization
Add Azure Replica
Failover takes minutes Failover takes seconds
Wizard
Server restart
Enhanced D i a g n oSecondary
s t i c s replicas are online
Multi-node instance Multiple Secondary Replicas
Passive secondary nodes Active Secondary Replicas
Availability Groups
Increased Number of Secondaries
• SQL Server 2012: Customers using (max 4) readable

secondaries to offload read workloads
• Single technology to configure / manage
• Higher throughput (~7x) than Replication
• Customers asking for more replicas
• Reduce query latency in geo-distributed environments (e.g.
Bwin)
• Scale-out read workloads (e.g. Baltika)
• SQL Server 2014: Max 8 secondaries
• Max 2 sync secondaries for high availability
• Secondary delay depends on network latency and I/O
• ~1s within data center ~5s between data centers
Availability Groups
Increased Number of Secondaries
• Minimal performance impact

• Commits don’t wait for async secondaries
• Log sender threads share log pool
• Added transaction latency of 8 async secondaries: <1%
• Scoped-out: Load balancing via connection string

• Read_Only connections still routed to first available readable
secondary
• Load balancing possible via DNS round-robin or specialized
load balancers (e.g. NLB)
Availability Groups
Increased Readable Secondaries Availability
• SQL Server 2012: Read workloads killed during network failures

• Geo-distributed environments (e.g. failure/upgrade of network
equipment, ISP failures)
• Hybrid (on-premise to Azure) deployments
• SQL Server 2014: read workloads not impacted during network

failures.. or primary down.. or cluster quorum loss..
• Readable secondaries remain available during “Resolving” state
• Requires direct connections to readable secondaries (Read-only
routing not supported yet)
• Replica state and last commit time available in DMV/Dashboard
Availability Groups
Sync Log Async Log

Availability Groups
“The increased readable secondaries availability means our users can still find
answers online and the world keeps spinning - StackOverflow
http://nickcraver.com/blog/2013/11/18/running-stack-overflow-sql-2014-ctp-2/
Availability Groups
StackOverflow can now:
• Offload more critical read workloads to readable secondary in main data

center
• Network glitches even within the same DC can happen
• Use readable secondary in DR site while data center is down (70% reads)
• Simpler to change DNS than force failover and failback
• Doesn’t result in data loss
Demo
Availability Groups – Increased Availability of Readable Secondaries

Availability Groups
Add Azure Replica Wizard
• Many customers can’t afford a DR site

• Site rent + maintenance, hardware, Ops
• SQL Server 2012: Started supporting replicas on Windows Azure VMs

this year
• Offload read workloads
• Offload backups (policy compliance)
• Disaster recovery
• At best region
• West US, East US, East Asia, Southeast Asia, North Europe, West Europe
• Latency / political considerations
Availability Groups
Sync Log Async Log

Availability Groups
• Low TCO
• VM and storage
• Free ingress traffic
• Case studies
• Lufthansa, Thomson Reuters, Buffalo Hospital Supply
• SQL Server 2014: “Add Replica Wizard” supports Windows Azure
• E2E: From provisioning VM to starting log synchronization
• Validates environment
• Handles failures
• Does cleanup
Demo
Screen Shots
Availability Groups – Add Azure Replica Wizard

Availability Groups & Failover Cluster Instances
Enhanced Diagnostics
• 24 Enhancements on Dashboard, Error Messages, DMVs, XEvents

• Simplify troubleshooting & prevent issues
• Based on feedback from customers & CSS
Availability Groups & Failover Cluster Instances
Enhanced Diagnostics
Title Component
Show
Showtimestamps
timestampsininXEL
XELoutput
outputinin
UTC (not
UTC adjusted
(not toto
adjusted client SSMS
client computer)
SSMS computer) XEvents Viewer
Warning
Warningabout
aboutlog
logsynchronization behavior
synchronization when
behavior primary
when replica
primary is async
replica is async Dashboard
System
Systemfunction
functionIsPrimaryReplica(database_name)
IsPrimaryReplica(database_name) System function
Add AG name (and replica name and DB name if relevant) to many more XEvents to XEvents
allow better data correlation between the logs
Report major HADRON Manager transitions to AlwaysOn XEvent session XEvents
Add Replica name context to connection established error log entry Error Log
Dump
Dumprelevant
relevantoutput
outputfrom
from sys.dm_hadr_database_replica_states toto
sys.dm_hadr_database_replica_states SQL error
SQL loglog XEvents
error
when
whenreplicas
replicaschange
changetotoresolving
resolvingstate
state
Add new error message to detect AG startup failure when quorum is forced Error Log
Separate
Separateerror
errormsg
msg41142
41142(replica can't
(replica become
can't primary)
become - raised
primary) forfor
- raised two importantly Error Log
two
different reasons
importantly different reasons
AlwaysOn Functions/DMVs should also support FCIs where applicable DMVs
Improve the CREATE AG error message “AG already exists”, to say “It’s possible that a Error Message
previous DROP AG operation, executed during cluster quorum loss, didn’t delete the
AG from the cluster. If so, please retry the DROP operation”
Remove FCI setup dependency on cluster.exe (deprecated) – Use Powershell Error Log
Support for Windows Cluster Shared Volumes (Windows Server 2012 & 2012 R2)
• Cluster Shared Volume (CSV)

• Shared disk accessible to all nodes (over SMB)
• One or more per physical drive
• Failover Cluster Instances on CSV
• Improves SAN utilization
Removes limitation of 24 drives
• Increases I/O resiliency
Retry read/write via other nodes
• Increases failover resiliency
Disks don’t need to be unmounted/mounted
Support for Windows Cluster Shared Volumes
AlwaysOn and Windows Server
Windows Cluster Enhancements
• Windows Server 2012

• Dynamic Quorum
Removes votes from unavailable nodes
Enables “last man standing”
• Increased network resiliency
Handle more exceptions
Avoid node evictions
• Windows Server 2012 R2

• Network names without Active Directory
Avoid Listeners issues: permissions, collisions
‫סיכום‪:‬‬
‫• גירסא שניה של ‪ ,Always ON‬מוסיפה‪:‬‬

‫‪ ‬יציבות‬
‫‪Features ‬‬
‫‪ ‬דיאגנוסטיקה‬
‫• מומלץ עם ‪Windows Server 2012 R2‬‬
‫• משתלב נהדר ב ‪Azure‬‬

assaff@microsoft.com

Always On SQ L Assaf Frenkel

Uploaded by

Document Information

Original Title

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Always On SQ L Assaf Frenkel

Uploaded by

Copyright:

Mission-Critical Availability

• Detect failures reliably

• Reuse existing investments

• Allow using HA hardware resources

Failover Cluster Instances Availability Groups

• TempDB on local disk

• Flexible Failover Policy

Automatic Page Contained Databases,

Sync Log Async Log

• SQL Server 2012: Customers using (max 4) readable

• Minimal performance impact

• Scoped-out: Load balancing via connection string

• SQL Server 2012: Read workloads killed during network failures

• SQL Server 2014: read workloads not impacted during network

Sync Log Async Log

StackOverflow can now:

• Offload more critical read workloads to readable secondary in main data

Availability Groups – Increased Availability of Readable Secondaries

• Many customers can’t afford a DR site

• SQL Server 2012: Started supporting replicas on Windows Azure VMs

Sync Log Async Log

Availability Groups – Add Azure Replica Wizard

• 24 Enhancements on Dashboard, Error Messages, DMVs, XEvents

• Cluster Shared Volume (CSV)

• Windows Server 2012

• Windows Server 2012 R2

‫• גירסא שניה של ‪ ,Always ON‬מוסיפה‪:‬‬

‫• מומלץ עם ‪Windows Server 2012 R2‬‬

‫• משתלב נהדר ב ‪Azure‬‬

You might also like