Install a Server 2012 Failover Cluster (with SQL Server)

Last week my colleagues and I tried to install a failover cluster on Server 2012. It wasn’t the first time, but this time we did it for an upcoming production SQL Server Installation. Here are some hint’s you should consider and steps you must go trough when setting up a cluster.

Naming Convention for this Article

Nodes: SRV01c1, SRV01c2
Cluster Name: SRV01Win
SQL Cluster Name: SRV01

Preparation Checklist

Computer Accounts for the Cluster, on…

…Server 2008:

  • DO NOT PREPARE accounts for the cluster
  • just move them to the right OU afterwards

…Server 2012

  • Prepare AD Accounts like SRV01, SRV01win, SRV01dtc, SRV01c1, SRV01c2
  • disable the prepared accounts in AD
  • logged on user must have full rights on accounts to join domain

Additional Steps

  • get IP addresses, one for each hostname

Installation Checklist

  • Install Windows 2012 on SRV01C1 and SRV01c2
  • connect DTC & Quorum disks to c1 & c2
  • Validate Cluster, check things and details like Network Adapter bindings and order
  • create Cluster SRV01win using Failover Management GUI
  • set Quorum to Node & Disk majority (make sure you’re using the right disk for that)
  • if necessary, install DTC (you only need that for active/active clustering)
  • check Eventlog for errors and solve them

ISSUE: DNS FQDN missing / could not register

After setting up the Windows Failover Cluster using the Failover Clustering Feature GUI, the Installation ended successfully but we weren’t able to ping the Host SERVER-WIN. Everythin seemed to be okay, but Eventlog listed this errors:

*** ERROR1: Eventid 1228 ***

Cluster network name resource ‘Cluster Name’ encountered an error enabling the network name on this node. The reason for the failure was: ‘Unable to obtain a logon token’. The error code was ‘1326’.

You may take the network name resource offline and online again to retry.

*** ERROR2: Eventid 1196 ***

Cluster network name resource ‘Cluster Name’ failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid. .

Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.

*** ERROR3: Eventid 1205 ***

The Cluster service failed to bring clustered service or application ‘SQL Server (MSSQLSERVER)’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.

After troubleshooting a while and asking Google various questions, we tried to solved that using ipconfig /registerdns on the active node. This didn’t help, but some minutes later we found that there’s a quick and easy way to let Windows re-try the IP registration by just telling the Cluster GUI “repair” on the Host’s name:

2-25-2013 9-33-49 AM

Set up SQL Server

  • start SQL Setup from installation media / ISO file
  • do not install SQL as usual, choose the cluster installation option
  • Provide setup neccessary information like Clustername “SRV01”, IP and the Features you want to install
  • if everything is successful, repeat the installation on node 2; setup will now detect the installed features and repeat the installation the same way as on node 1

Setup Test Cluster using VM’s

If you’re planning to setup a Cluster, you maybe like to test everything first. There’s a VMware setup guide on Windows Clustering using VM’s in vSphere [1] and their limitations [2]. If you decide to prepare a test environment with two VM’s, consider the following advantages and disadvantages.

Advantage

  • no additional SAN Storage / LUN needed
  • no extra SCSI connection must be built up
  • works the same way like real

Disadvantage

  • VM’s using physical SCSI are not able to be vMotion’ed to another Host

Sources:

[1] Microsoft Cluster Service (MSCS) support on ESXi/ESX

[2] Microsoft Clustering on VMware vSphere: Guidelines for Supported Configurations

Advertisements

Cluster Shared Volumes (CSV)

Server 2008 R2 / 2012 came with a new failover Cluster feature called Cluster Shared Volumes (abbrev: CSV). This is a new feature that enables accessing a LUN from multiple a Windows Failover Cluster Nodes at the same time.  In past, this was not possible on Windows Failover Clusters.

Lets take a look on to the details.

Advantages

  • all nodes in a cluster can access the LUN at the same time, no failover needed
  • if a node’s storage connection fails / has issues, the node can send the write/read requests over LAN to another Node to write/read the needed stuff for him (“The cluster will re-route the communication through an intact part of the SAN or network”, Technet [1] )

Disadvantages

  • From Technet: “Be sure to review carefully what your backup application backs up. Also, for management operating-system based backup, ask your backup application vendor about the compatibility of your backup application with Hyper-V and with Cluster Shared Volumes.” [3]
  • a MUST for Hyper-V, but no advantage for Applications that don’t run more than one instance on the same volume at the same time (e.g. SQL failover cluster, one instance, two servers)
  • NOT SUPPORTED for SQL Server clustered Workloads [4]

Manuals

add storage to Clustered Shared Volumes in Windows Server 2012

Sources

[1] Understanding Cluster Shared Volumes in a Failover Cluster
[2] Recommendations for Using Cluster Shared Volumes in a Failover Cluster
[3] Backing Up Cluster Shared Volumes

[4] Use Cluster Shared Volumes in a Windows Server 2012 Failover Cluster

.

Some Notes to SQL and Performance

Some months ago, I tried to start a own Wiki site to maintain Know-How. Unfortunately I stopped filling it with updates as soon as I started this Blog 🙂

To not loose the small but interesting Knowledge, I want just put it in here:

RAID Layouts

Dell uses only a 2-Disk RAID1 LUN for Database Logfiles, while using a 10-Disk RAID10 for Database.

Performance Tests

There are three basic parameters to alter when running the I/O tests: block size, read/write, and serial/random. For SQL Server, the minimum tests to run are 8-KB and 64-KB block sizes, both serial and random, and both read and write. Most of the ratings you find on the Web are stated in throughput of bytes/sec for the 64-KB serial read test (which gives the highest throughput rating). A value of over 250 MB/sec per core for today’s computers is typically sufficient.

external Links

Scaling Up Your Data Warehouse with SQL Server 2008
Performance of SQL Server 2005 on ESX Server 3.5 

Benchmark Tools

SQLIO Disk Subsystem Benchmark Tool www.microsoft.com
IOmeter Project www.iometer.org

IBM VMware Veranstaltung

Datenbank konfig von Emmener Firma empfohlene Diskkonfig:

dddd|dddd|dddd|...
 R5   R5   R10
Sys  DBF  LOG

Legende:
d=Disk, R5=Raid5, R10=Raid10
Sys=System Disk, DBF=Data Files Disk, LOG=Log Disk

Auch mehrere Systeme auf 1 Host möglich wenn:

  • genögend RAM vorhanden
  • keine CPU’s mehrfach benutzt werden müssen (je Server 2 Stück)

Hyper-V hat ab mehr als 1 Gast OS einbussen von 40% Leistung je Gast, VMware büsst nur 20% ein; sagt IBM.

SAP on VMware Training: AD310 (SAP Kurs); siehe auch SAP Videos on Youtube.

Performance_Tests

  • SAPS Werte z.B. 46’000 bei 8×4 Core
  • SAP SD Standard Application Benchmark
  • IBM Rational Performance tester

www.ibm.com/redbooks

Transactional Workload für DB’s, ex. 50GB

Online Transaction Processing (OLTP)

  • kleine SQL Abfragen auf welche innert einer Sekunde reagiert werden kann
  • viele gleichzeitige Sessions
  • Beispiele: ATM, sale terminal, Tel. sales etc.

Bei solchen DB’s kann erwartet werden

  • Reaktion auf Query innert 2sek
  • Diskplatz muss mit 3x DB Grösse gerechnet werden (index, temp)

Enterprise Ressource Plannung (ERP)

  • komplexe DB mit tausenden Tabellen
  • somit komplexe DB Abfragen im Alltag
  • grosse Anzahl Transaktionen gemischt aus OLTP + OLAP

Hier kann erwartet werden:

  • fünffacher Diskplatz wie DB grösse
  • Antwortzeiten kleiner 4sek

Data Warehouse (DWH)

  • braucht sechs weitere DB’s, somit kann der Plattenplatz mit zehn multipliziert und verdreifacht werden (x*3*10)

Zu erwartende Fakten

  • Antwortzeiten 1h – 1 Woche

Online Analytical Processing (OLAP)

  • multidimensionale oder aggregated views der DAten, multidimensional = cube
  • meist für analytische Fragen wie “welche Region hat die meisten Verkäufe”
  • typischerweise arbeiten auf einer solchen DB wenige Benutzer
  • viele komplexe SQL Statements

Somit kann erwartet werden:

  • fünffache DB Grösse als Diskspace
  • 20-50 Users
  • Antwortzeiten <20sek
  • grosse Abfragen max 15min

Reporting

  • ähnliche wie OLTP, enthält aber die ausgewerteten Daten der OLTP-DB
  • Beispiel: eine DB enthält Werte von vielen Sensoren, mit Reporting werden die Auswertungen gefahren wie Analyse des Gesamtstatus, Probleme welche Reaktion erfordern oder Spot’s und Peaks in den Daten aufzeigen

Hier ist zu beachten:

  • dreifache DB Grösse = Diskplatz
  • einer bis 40 User
  • Reaktionszeiten je nach Natur des Reports, bis zu acht Stunden und mehr