This month’s T-SQL Tuesday topic by Wayne Sheffield (t) is “Tell me about your Brick Wall and how you overcame it.” I hadn’t originally planned this post for this reason, but it’s right on topic.
I recently discovered an issue while adding an Integration Services Catalog to a SQL Server 2016 Availability Group. SQL Server 2016 introduced support for having a SSIS Catalog (SSISDB) in an Always On Availability Group. This provides a high-availability and disaster-recovery solution for the packages, projects, executions logs, and environments found within this database. After a failover occurs, and your packages are configured to use the listener, processes can continue (in the case a failover occurs and a package is running, they do NOT restart or resume). See here for more details. Great!
In most cases this works just as specified. However, what if you have the following scenario?
Instance | Availability Group | Includes SSISDB? | Availability Replicas |
---|---|---|---|
SQLSRV01 | AG1 | No | SQLSRV01, SQLSRV02, SQLSRV03, SQLSRV04 |
SQLSRV01 | AG2 | Yes | SQLSRV01, SQLSRV02 |
SQLSRV03 | AG3 | Yes | SQLSRV03, SQLSRV04 |
In the above example, we follow the instructions and add the SSIS Catalog to the AG2 availability group. Once added, we complete Step 3 (Enable SSIS support for Always On) http://bit.ly/2OmQMvn. Everything works as expected, and Auto-failovers can now occur. Wait. What does this step do any way?
- Determines the secondary replica server names. Notice, this is cluster wide, not specific to any one AG. More on this later.
1 2 3 4 5 6 |
SELECT dharc.[replica_server_name] FROM [sys].[dm_hadr_availability_replica_cluster_states] dharc INNER JOIN [sys].[dm_hadr_availability_replica_states] dhars ON dharc.[replica_id] = dhars.[replica_id] WHERE dharc.[group_id] = (SELECT group_id FROM [sys].[availability_databases_cluster] WHERE database_name = 'SSISDB') AND dhars.[role] = 2 |
- Creates the dbo.sp_ssis_startup stored procedure, within the master database, on each replica.
- Set the dbo.sp_ssis_startup job to run whenever SQL Server starts.
- Sets ‘clr enabled’ to 1 on all secondary replicas.
- Adds replica information to SSISDB.internal.alwayson_support_state.
- Creates two logins on all secondary replicas.
- ##MS_SQLEnableSystemAssemblyLoadingUser##
- Login is created using the MS_SQLEnableSystemAssemblyLoadingKey asymmetric key. This key created from the Microsoft.SqlServer.IntegrationServices.Server.dll file.
- Login is granted the UNSAFE ASSEMBLY permission.
- ##MS_SSISServerCleanupJobLogin##
- ##MS_SQLEnableSystemAssemblyLoadingUser##
- Creates two SQL Server Agent jobs on all replicas hosting the SSIS Catalog.
- SSIS Failover Monitor Job – This job runs every 2 minutes and executes the master.dbo.sp_ssis_start procedure if it detects the host replica is now the primary.
- SSIS Server Maintenance Job – Runs every day. Job removes operation records from the SSISDB database that are outside the retention window and maintains a maximum number of versions per project.
The AG1 Availability Group is pictured below.
Now that we have an understanding of the process happening when we click “Enable Always On Support” let’s try creating a new availability group between SQLSRV03 and SQLSRV04. We’ll attempt to add the SSISDB database and enable Always On support. We’ll name this availability group AG3.
We are prompted with a reminder “to enable Always On support under Integration Services Catalogs after adding SSISDB into Always On availability group.” Click ok, next, and then finish. AG3 has now been created.
Next step, enable Always On support for the Integration Services catalog. This is done on the primary replica (currently SQLSRV03).
This results in the following error and we’re unable to proceed further.
Problem
One of the first steps the wizard does is to run the following query:
1 2 3 4 5 6 |
SELECT dharc.[replica_server_name] FROM [sys].[dm_hadr_availability_replica_cluster_states] dharc INNER JOIN [sys].[dm_hadr_availability_replica_states] dhars ON dharc.[replica_id] = dhars.[replica_id] WHERE dharc.[group_id] = (SELECT group_id FROM [sys].[availability_databases_cluster] WHERE database_name = 'SSISDB') AND dhars.[role] = 2 |
The query above results in the subquery error if the following is true:
- Multiple Availability Groups exist within the Windows Server Failover Cluster. Two of which have a SSISDB database included (although each have separate primary and secondary replicas).
-
123SELECT group_id, group_database_id, database_nameFROM sys.availability_databases_clusterWHERE database_name = 'SSISDB'
- The sys.availability_databases_cluster view contains one row for each availability database on the instance of SQL Server that is hosting an availability replica for any Always On availability group in the Windows Server Failover Clustering cluster, regardless of whether the local copy database has been joined to the availability group yet(http://bit.ly/2OswMHI). The query will now return two rows resulting in the error: “Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <=, >, >= or when the subquery is used as an express. (Microsoft SQL Server, Error:512).”
-
By the definition of this query, you can only have one Availability Group, within a WSFC, hosting an Integrations Services Catalog. This is unfortunate because there are scenarios in which you need high availability for multiple catalogs and different replicas.
To support this scenario the query would need to:
- Join to the sys.availability_replicas view and filter on the replica_server_name column (ar.replica_server_name = @@servername).
- It only needs to identify secondary replicas, within the availability groups the local server is a member of, that includes the SSISDB database (instead of cluster wide).
1 2 3 4 5 6 |
SELECT dharc.[replica_server_name] FROM [sys].[dm_hadr_availability_replica_cluster_states] dharc INNER JOIN [sys].[dm_hadr_availability_replica_states] dhars ON dharc.[replica_id] = dhars.[replica_id] WHERE dharc.[group_id] = (SELECT adc.group_id FROM [sys].[availability_databases_cluster] adc INNER JOIN sys.availability_replicas AS ar ON adc.group_id = ar.group_id WHERE adc.database_name = 'SSISDB' and ar.replica_server_name = @@servername ) AND dhars.[role] = 2 |
Workaround
We’ve identified the issue, how do we work around it? The following steps can be taken to enable Always On support without using the built in wizard.
Primary Replica
The following scripts need to run on the primary replica.
- Add replica information and state to [internal].[alwayson_support_state].
-
1234567891011USE SSISDB--Add replica info. Run on the primary replicaEXEC [internal].[add_replica_info]@server_name = 'SQLSRV03', --primary@state = 1--secondariesEXEC [internal].[add_replica_info]@server_name = 'SQLSRV04', --secondary@state = 2
-
- Modify permissions for the ##MS_SSISServerCleanupJobUser## user.
-
12345--Setup permissions for the ##MS_SSISServerCleanupJobUser## user.USE SSISDBALTER ROLE ssis_failover_monitoring_agent ADD MEMBER ##MS_SSISServerCleanupJobUser##GRANT EXECUTE ON internal.refresh_replica_status TO ##MS_SSISServerCleanupJobUser##GRANT EXECUTE ON internal.update_replica_info TO ##MS_SSISServerCleanupJobUser##
-
- Create the SSIS Failover Monitor job.
-
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253--Create the SSIS Failover Monitor JobUSE msdbGOEXEC dbo.sp_add_job@job_name = N'SSIS Failover Monitor Job',@enabled = 1,@owner_login_name = '##MS_SSISServerCleanupJobLogin##',@description = N'Runs every 2 minutes. This job execute master.dbo.sp_ssis_startup if detect AlwaysOn failover on SSISDB.'DECLARE @IS_server_name NVARCHAR(30)SELECT @IS_server_name = CONVERT(NVARCHAR, SERVERPROPERTY('ServerName'))DECLARE @commandScript NVARCHAR(max)SELECT @commandScript = N'DECLARE @role intDECLARE @status tinyintSET @role = (SELECT [role] FROM [sys].[dm_hadr_availability_replica_states] hars INNER JOIN [sys].[availability_databases_cluster] adc ON hars.[group_id] = adc.[group_id] WHERE hars.[is_local] = 1 AND adc.[database_name] =''SSISDB'')IF @role = 1BEGINEXEC [SSISDB].[internal].[refresh_replica_status] @server_name = N''' + @IS_server_name + ''', @status = @status OUTPUTIF @status = 1EXEC [SSISDB].[catalog].[startup]END'EXEC sp_add_jobserver @job_name = N'SSIS Failover Monitor Job',@server_name = @IS_server_nameEXEC sp_add_jobstep@job_name = N'SSIS Failover Monitor Job',@step_name = N'AlwaysOn Failover Monitor',@subsystem = N'TSQL',@command = @commandScript,@database_name = N'msdb',@on_success_action = 1,@retry_attempts = 3,@retry_interval = 3;--scheduleEXEC sp_add_jobschedule@job_name = N'SSIS Failover Monitor Job',@name = 'Monitor Scheduler',@enabled = 1,@freq_type = 4, /*daily*/@freq_interval = 1,/*every day*/@freq_subday_type = 4, /*minutes*/@freq_subday_interval=2, /*every 2 minutes*/@freq_relative_interval=0, /*further defines the frequency_interval when frequency_type is set to 32 */@freq_recurrence_factor=0, /*number of weeks or months between the scheduled execution of the job*/@active_start_date = 20001231,@active_end_date = 99991231,@active_start_time = 0,@active_end_time = 235959
-
Secondary Replicas
Next, run the following scripts on all secondaries within the Availability Group. SSISDB should have already been added to the Availability Group. If any additional replicas are added at a later date, these would need to be applied to them as well.
- Enable CLR.
-
123--enable CLREXEC sys.sp_configure N'clr enabled', N'1'RECONFIGURE
-
- Create the dbo.sp_ssis_startup stored procedure, within the master database. Set this stored procedure to run whenever SQL Server starts.
-
123456789101112131415161718192021222324USE MasterGO--CREATE PROCEDURE [dbo].[sp_ssis_startup]ASSET NOCOUNT ON/* Currently, the IS Store name is 'SSISDB' */IF DB_ID('SSISDB') IS NULLRETURNIF NOT EXISTS(SELECT name FROM [SSISDB].sys.procedures WHERE name=N'startup')RETURN/*Invoke the procedure in SSISDB *//* Use dynamic sql to handle AlwaysOn non-readable mode*/DECLARE @script nvarchar(500)SET @script = N'EXEC [SSISDB].[catalog].[startup]'EXECUTE sp_executesql @scriptGO/* Run sp_ssis_startup when Sql Server restarts */EXEC sp_procoption N'sp_ssis_startup','startup','on'GO
-
- Create the MS_SQLEnableSystemAssemblyLoadingKey asymmetric key and the ##MS_SQLEnableSystemAssemblyLoadingUser## login.
-
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748USE masterGODECLARE @productVersion NVARCHAR(128)DECLARE @majorVersion intDECLARE @minorVersion intDECLARE @buildVersion intDECLARE @revisionVersion intSET @productVersion = CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(128))SET @majorVersion = CONVERT(int, PARSENAME(@productVersion, 4))SET @minorVersion = CONVERT(int, PARSENAME(@productVersion, 3))SET @buildVersion = CONVERT(int, PARSENAME(@productVersion, 2))SET @revisionVersion = CONVERT(int, PARSENAME(@productVersion, 1))/* 14.0.800.11 is an intermediate version between RC0 and RC1. This feature should be enabled in RC1 *//* So all versions before SQL 14 RC1 will use asymmetric key and login *//* So all versions from SQL 14 RC1 will use trusted assembly */IF NOT(@majorVersion > 14 OR (@majorVersion = 14 AND (@minorVersion > 0 OR (@minorVersion = 0 AND (@buildVersion > 800 OR (@buildVersion = 800 AND @revisionVersion > 11))))))BEGINIF EXISTS(SELECT [name]FROM sys.server_principalswhere name='##MS_SQLEnableSystemAssemblyLoadingUser##')DROP LOGIN ##MS_SQLEnableSystemAssemblyLoadingUser##IF EXISTS(SELECT *FROM sys.asymmetric_keysWHERE name='MS_SQLEnableSystemAssemblyLoadingKey')DROP ASYMMETRIC KEY MS_SQLEnableSystemAssemblyLoadingKeyCREATE ASYMMETRIC KEY MS_SQLEnableSystemAssemblyLoadingKey FROM EXECUTABLE FILE = 'C:\Program Files\Microsoft SQL Server\130\DTS\Binn\Microsoft.SqlServer.IntegrationServices.Server.dll'CREATE LOGIN ##MS_SQLEnableSystemAssemblyLoadingUser## FROM ASYMMETRIC KEY MS_SQLEnableSystemAssemblyLoadingKeyGRANT UNSAFE ASSEMBLY TO ##MS_SQLEnableSystemAssemblyLoadingUser##ENDELSEBEGINDECLARE @asm_bin VARBINARY(max);DECLARE @isServerHashCode VARBINARY(64)SELECT @asm_bin = BulkColumnFROM OPENROWSET (BULK 'C:\Program Files\Microsoft SQL Server\130\DTS\Binn\Microsoft.SqlServer.IntegrationServices.Server.dll',SINGLE_BLOB) AS dllSELECT @isServerHashCode=HASHBYTES('SHA2_512', @asm_bin)IF NOT EXISTS(SELECT *FROM sys.trusted_assembliesWHERE hash=@isServerHashCode)EXEC sys.sp_add_trusted_assembly @isServerHashCode, N'C:\Program Files\Microsoft SQL Server\130\DTS\Binn\Microsoft.SqlServer.IntegrationServices.Server.dll'END
-
- Create the ##MS_SSISServerCleanupJobLogin## login. First, we’ll need to grab the SID of the login on the primary. Second, we’ll create the login, using the SID from the first step, on all secondary replicas.
-
12345--Execute on primary replica.USE MasterSELECT sidFROM sys.server_principalsWHERE name = '##MS_SSISServerCleanupJobLogin##'
-
123456789101112USE masterGODECLARE @loginPassword nvarchar(256)SELECT @loginPassword = REPLACE (CONVERT( nvarchar(256), CRYPT_GEN_RANDOM( 64 )), N'''', N'''''')EXEC ('CREATE LOGIN ##MS_SSISServerCleanupJobLogin## WITH PASSWORD =''' + @loginPassword + ''', sid= 0x9B7718EA99283649983F6BEB2A2C96B2, CHECK_POLICY = OFF')--disableALTER LOGIN ##MS_SSISServerCleanupJobLogin## DISABLE--grant view server stateGRANT VIEW SERVER STATE TO ##MS_SSISServerCleanupJobLogin##
-
- Create the SSIS Server Maintenance job.
-
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253USE msdbGOEXEC dbo.sp_add_job@job_name = N'SSIS Server Maintenance Job',@enabled = 1,@owner_login_name = '##MS_SSISServerCleanupJobLogin##',@description = N'Runs every day. The job removes operation records from the database that are outside the retention window and maintains a maximum number of versions per project.'DECLARE @IS_server_name NVARCHAR(30)SELECT @IS_server_name = CONVERT(NVARCHAR, SERVERPROPERTY('ServerName'))EXEC sp_add_jobserver @job_name = N'SSIS Server Maintenance Job',@server_name = @IS_server_nameEXEC sp_add_jobstep@job_name = N'SSIS Server Maintenance Job',@step_name = N'SSIS Server Operation Records Maintenance',@subsystem = N'TSQL',@command = N'DECLARE @role intSET @role = (SELECT [role] FROM [sys].[dm_hadr_availability_replica_states] hars INNER JOIN [sys].[availability_databases_cluster] adc ON hars.[group_id] = adc.[group_id] WHERE hars.[is_local] = 1 AND adc.[database_name] =''SSISDB'')IF DB_ID(''SSISDB'') IS NOT NULL AND (@role IS NULL OR @role = 1)EXEC [SSISDB].[internal].[cleanup_server_retention_window]',@database_name = N'msdb',@on_success_action = 3,@retry_attempts = 3,@retry_interval = 3;EXEC sp_add_jobstep@job_name = N'SSIS Server Maintenance Job',@step_name = N'SSIS Server Max Version Per Project Maintenance',@subsystem = N'TSQL',@command = N'DECLARE @role intSET @role = (SELECT [role] FROM [sys].[dm_hadr_availability_replica_states] hars INNER JOIN [sys].[availability_databases_cluster] adc ON hars.[group_id] = adc.[group_id] WHERE hars.[is_local] = 1 AND adc.[database_name] =''SSISDB'')IF DB_ID(''SSISDB'') IS NOT NULL AND (@role IS NULL OR @role = 1)EXEC [SSISDB].[internal].[cleanup_server_project_version]',@database_name = N'msdb',@retry_attempts = 3,@retry_interval = 3;EXEC sp_add_jobschedule@job_name = N'SSIS Server Maintenance Job',@name = 'SSISDB Scheduler',@enabled = 1,@freq_type = 4, /*daily*/@freq_interval = 1,/*every day*/@freq_subday_type = 0x1,@active_start_date = 20001231,@active_end_date = 99991231,@active_start_time = 0,@active_end_time = 120000
-
- Create the SSIS Failover Monitor Job.
-
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253--Create the SSIS Failover Monitor JobUSE msdbGOEXEC dbo.sp_add_job@job_name = N'SSIS Failover Monitor Job',@enabled = 1,@owner_login_name = '##MS_SSISServerCleanupJobLogin##',@description = N'Runs every 2 minutes. This job execute master.dbo.sp_ssis_startup if detect AlwaysOn failover on SSISDB.'DECLARE @IS_server_name NVARCHAR(30)SELECT @IS_server_name = CONVERT(NVARCHAR, SERVERPROPERTY('ServerName'))DECLARE @commandScript NVARCHAR(max)SELECT @commandScript = N'DECLARE @role intDECLARE @status tinyintSET @role = (SELECT [role] FROM [sys].[dm_hadr_availability_replica_states] hars INNER JOIN [sys].[availability_databases_cluster] adc ON hars.[group_id] = adc.[group_id] WHERE hars.[is_local] = 1 AND adc.[database_name] =''SSISDB'')IF @role = 1BEGINEXEC [SSISDB].[internal].[refresh_replica_status] @server_name = N''' + @IS_server_name + ''', @status = @status OUTPUTIF @status = 1EXEC [SSISDB].[catalog].[startup]END'EXEC sp_add_jobserver @job_name = N'SSIS Failover Monitor Job',@server_name = @IS_server_nameEXEC sp_add_jobstep@job_name = N'SSIS Failover Monitor Job',@step_name = N'AlwaysOn Failover Monitor',@subsystem = N'TSQL',@command = @commandScript,@database_name = N'msdb',@on_success_action = 1,@retry_attempts = 3,@retry_interval = 3;--scheduleEXEC sp_add_jobschedule@job_name = N'SSIS Failover Monitor Job',@name = 'Monitor Scheduler',@enabled = 1,@freq_type = 4, /*daily*/@freq_interval = 1,/*every day*/@freq_subday_type = 4, /*minutes*/@freq_subday_interval=2, /*every 2 minutes*/@freq_relative_interval=0, /*further defines the frequency_interval when frequency_type is set to 32 */@freq_recurrence_factor=0, /*number of weeks or months between the scheduled execution of the job*/@active_start_date = 20001231,@active_end_date = 99991231,@active_start_time = 0,@active_end_time = 235959
-
At this point, review the two jobs created above and ensure they run successfully. I also deploy a test SSIS project to each replica for verification (deploy project, failover, and redeploy again using the AG listener).
Thanks for reading!