Tuesday, January 30, 2007

SQL Server 2005 CLR Integration

SQL Server 2005 CLR Integration

This article describes the new CLR integration features of SQL Server 2005. CLR-based programming against existing programming models supported in SQL Server, such as Transact-SQL and extended stored procedures; highlights the advantages and disadvantages of each technique.

Contents:-
-Introduction
-Benefits of CLR Integration
-Creating Managed Objects in SQL Server 2005
-Using managed Code
-SQL CLR Alternatives
-XPs Vs SQL CLR
-Middle Tier-Business Logic Vs CLR
-T-SQL VS CLR
- Conclusion

Introduction
SQL Server supports hosting of .Net Language Runtime inside SQL Server Process. This enables database programmers to write business logic as database stored procedures, functions, triggers, data types and aggregates. The ability to run application code inside the database adds flexibility to the design of application architectures that require business logic to execute close to the data and cannot afford the cost of shipping data to a middle tier process to perform computations outside the database.
Database programmers can now take full advantage of the Microsoft .NET Framework class library and modern programming languages to implement functionality within the server.
Using CLR Integration developers can write procedures, triggers, and functions in any of the CLR languages, particularly Microsoft Visual C# .NET, Microsoft Visual Basic .NET, and Microsoft Visual C++. It also allows creation of new types. Managed code is used to do the task that is not possible with T-SQL. Prior to SQL Server 2005 XPs or middle tier were used to do task that was not possible with T-SQL.
SQL Server and the CLR are 2 different runtimes with different model of threading, scheduling and memory management. SQL Server supports non-preemptive threading model in which the DBMS threads voluntarily yield execution periodically or when they are waiting on locks or I/O whereas CLR supports preemptive threading model. If the user code in DBMS can directly call operating system primitives, then it does not work well with SQL Server task scheduler and can degrade the scalability of SQL Server. CLR cannot differentiate between virtual and physical memory. SQL Server directly manages physical memory and is required to use this memory within a configurable limit.
The different model for threading, scheduling, and memory management presents an integration challenge for a DBMS that scales to support concurrent users. SQL Server solves this challenge by becoming the operating system for the CLR when it runs inside the SQL Server process. The CLR calls the low-level APIs implemented by SQL Server for threading, scheduling, and memory management. It uses low level APIs for running user code and for its own internal use such as garbage collector. In order to synchronize between multiple threads, the CLR calls SQL Server synchronization object. This allows SQL Server scheduler to schedule other tasks when a thread is waiting on synchronization object. For instance, when CLR initiates a garbage collection, all of its thread waits for garbage collection to finish. Since CLR threads and the synchronization object they are waiting on are known to the SQL Server scheduler, it can schedule other thread that are running other database tasks not involving CLR. The SQL Server has the ability to detect and stop threads that have not yielded for long time The ability to hook CLR threads to SQL Server threads implies that the SQL Server scheduler can identify runaway threads in the CLR and manage their priority so that they do not consume significant CPU resources, thereby affecting the throughput of the system. Such runaway threads are suspended and put back in the queue.
The common memory management between SQL Server and CLR makes CLR call SQL Server primitives for allocating and de-allocating memory. Since the memory used by the CLR accounted for in the total memory usage of the system, SQL Server can stay within its configured memory limits and ensures that SQL Server and CLR are not competing with each other for memory. Also SQL Server can reject CLR memory request when the system is constrained and asks CLR to reduce its memory use when other tasks need memory.

Benefits of CLR Integration

Better programming model:
Programming languages that are compatible with the .NET Framework are richer than Transact-SQL, offering constructs and capabilities that were previously not available to SQL Server developers.
Managed code is suitable for procedural, complex numerical computations.
CLR languages offer object-oriented capabilities such as encapsulation, inheritance, and polymorphism. A large amount of code can be better organized into classes and namespaces. Net Framework also provides a huge class library for regular expressions, string handling, numerical processing, date functions, data encryption etc.

Ensures Type Safety and Security:
Managed code runs in a CLR environment, hosted by the database engine.
Before managed code is executed, the CLR performs several checks, through a process known as verification, to ensure that the code is safe to run. This allows .NET Framework database objects to be safer and more secure than the extended stored procedures available in earlier versions of SQL Server.

Allows new User-Defined Types and Aggregates:
SQL Server 2005 allows creation of database objects UDTs and Aggregates using any CLR language. These objects expand the storage and querying capabilities of SQL Server.

Provides Common development environment:
Database development is integrated into the Microsoft Visual Studio 2005 development environment. The same tool is also used for developing & debugging database objects.

Performance and scalability:
Because managed code compiles to native code prior to execution, a significant performance increase can be achieved.

Creating Managed objects within SQL Server 2005 requires three steps:

  1. Write the managed code & compile it into an assembly.

  2. Install the managed code’s assemblies into SQL Server 2005.

  3. Use DDL statement to tie the managed code to named objects (Stored Procedures, functions etc.)


    For managed code that is supported, SQL Server has introduced a new namespace
    System.SqlServer.Server
    Classes belonging to this namespace are:-
    1) SQLContext - Like an HTTPContext, SQlContext is associated with client’s request.
    2) SQLPipe - Can send results to calling environment.
    3) SQLTriggerContext - It is used with Managed Triggers.

    For each type of managed code that is supported, there are related attributes that are used to decorate the code to help SQL Server discover about specific behaviors about the managed code. These attributes include:

    -Sqlprocedure
    -SqlFunctions
    - SqluserDefinedAggregates,
    -SqlUserDefinedType
    -SqlTrigger

    Creating Managed Objects in SQL Server 2005
    The above tasks can be performed in 2 ways as below
    A) Manual deployment
    B) Using visual Studio 2005

    A) Manual Deployment :-
    1) The developer writes a managed program as a set of class definitions. SQL Server routines—stored procedures, functions, or triggers—are written as static (or Shared in Microsoft Visual Basic .NET) methods of a class. User-defined types and aggregates are written as entire classes. The code is compiled into an assembly.
    2) The assembly is uploaded into a SQL Server database, where it is stored in the system catalogs using the CREATE ASSEMBLY data definition language (DDL) statement.
    3) DDL statements are used to tie managed code to named objects in the database. This is accomplished using the CREATE PROCEDURE/FUNCTION/TRIGGER/TYPE/AGGREGATE statements.

    B) Using Visual Studio 2005 :-
    1) Visual Studio 2005 supports development, deployment, and debugging of managed code in SQL Server 2005. A new SQL Server Project provides code templates that make it easy for developers to get started writing code for CLR-based database routines, types and aggregates. This project also allows developers to add references to other assemblies in the database.
    2) When a SQL Server Project is built, it is compiled into an assembly. Deploying the project uploads the assembly binary into the SQL Server database that is associated with the project. The deploy operation also automatically creates the routines, types, and aggregates defined in the assembly in the database based on the custom attributes (SqlProcedure, SqlFunction, SqlTrigger, etc.) in the code. Deploying also uploads the source code and debugging symbols (the .pdb file) associated with the assembly.

    Using managed Code :-
    After routines are created, they can be used like T-SQL routines by applications. For example, CLR functions can be called from T-SQL queries and CLR procedures can be called from a client application writing ADO.Net Code.


    SQL CLR Alternatives :-
    1) Extended Stored procedures-XPs.
    2) Middle Tier-Business Logic

    1) XPs Vs SQL CLR
    Extended Stored Procedure can do the task that was not possible with T-SQL,
    But CLR Integration is a better choice over XPs for the following reasons :-

    A) Better Control and Programming Model :-
    T-SQL cannot control what code written in XPs can do. But CLR Integration can control what managed code can or can not do by placing the assemblies in one pf the three permission buckets like SAFE, UNSAFE and EXTERNAL.

    SAFE Permission Level:-
    SAFE permission level can use pre-defined .Net classes, can call any COM Component. It cannot use Pinvoke and cannot access external resources :-network resources.

    UNSAFE Permission Level:-
    With this permission level, any code can be run inside SQL Server.

    EXTERNAL Permission Level:-
    Its like SAFE, but it allows to access network resources like files from network, file system, DNS system , event viewer etc.

    So CLR Integration provides a more reliable programming model than XPs do. One of the biggest advantage is type-safety. It checks whether the code is safe to run.

    B) Scalability:
    The managed APIs that expose resources such as memory, threads, and synchronization are implemented on top of the SQL Server resource manager, allowing SQL Server to manage these resources for CLR code. But SQL Server has no view or control over the resource usage of an XP. If an XP consumes too much CPU time or memory, there is no way to detect or control this from within SQL Server. With CLR code, SQL Server can detect that a given thread has not yielded for a long period of time and force the task to yield so that other work can be scheduled. Consequently, using managed code provides for better scalability and robustness.

    C) Better Support for datatypes :-
    The managed APIs support new data types (such as XML, (n)varchar(max), and varbinary(max)) introduced in SQL Server 2005

    2) Middle Tier-Business Logic Vs CLR
    Developers may choose Middle Tier-Business Logic to place logic outside database.
    The following considerations can favor the database tier as the preferred code location:

    A) Data validation:

    Keeping the data validation logic in the database offers better encapsulation of this logic with the data, avoiding duplication of validation logic across different data touch points, such as back-end processing, bulk upload, data updates from the middle tier, etc.



    B) Network traffic reduction:
    Placing logic in the database may be appropriate for data processing tasks that involve processing a large amount of data while producing a very small percentage of that data.
    The ability to run application code inside the database adds flexibility to the design of application architectures that require business logic as in-process code that execute close to the data and cannot afford the cost of shipping data to a middle tier process to perform computations outside the database.

    T-SQL Vs Managed Code :-

T-SQL
Use T-SQL for basically Data-Centric set based operations.

For data access and data manipulation-Declarative SQL statements.

TSQL cannot control access to external resources by XPs.

No full fledged Business Logic can be defined in UDTs.

Managed Code :-

Programming languages that are compatible with the .NET Framework are best suited for computationally intensive functions and procedures that feature complex logic, or for situations in which you want to take advantage of the .NET Framework class library.
For data access and data manipulation-Declarative SQL statements.

Provides improved string handling functions, math functions, date operations, access to system resources, advanced encryption algorithms, file access, image processing, and XML data manipulation is easily accessible from managed stored procedures, functions, triggers, and aggregates.

Managed Code allows creation of Assemblies which access external resources with Permission buckets like SAFE, UNSAFE and EXTERNAL.

Managed code can help us to create full fledged data types-UDTs. Business logic can reside in UDTs.

Conclusion :-
-SQL Server 2005 is a new Host of the .Net Runtime.

-Many tasks that were awkward or difficult to perform in Transact SQL can be better accomplished by using managed code. So CLR Integration is built to compliment TSQL.

-Developer can better use acquired knowledge and skills to write in-process Managed Code.

- SQL Server 2005 enables you to extend the database server to more easily perform complex procedural computation and operations on the back end.

- Provides enhanced programming model.

-Supports type safety and security.

-Allows User defined types and aggregates.

SQL Server 2005 introduces a number of new features that allow developers to significantly extend their application capabilities through judicious use of database Services.