Santanu Datta: Few Points in SQL Server

on VarChar and NVarchar

First of all, the basic difference between a varchar and a nvarchar datatype is that for each variable the latter one takes double the space needed by the former one. Now why double? Because of the extra "n", which means that it *can* store Unicode characters as well.

Unicode characters represent a wide variety of foreign locales and many times they need an extra byte for the same char storage. Hence nvarchar can store an extended character set.

In SQL Server 2000, varchar has the maximum limit of 8000 characters (when you need to give a particular size), and nvarchar has 4000 characters only (remember it needs double the space needed by a varchar, hence the storage capacity becomes half). Infact a given row in a SQL Server 2000 table cannot exceed 8000 characters in size.

But if you need to store more than 8000 characters? There was no option other than to use TEXT or the NTEXT datattypes, common called as BLOBS (Binary Large Objects), both of which have their own limitations.

Hence in SQL Server 2005, the MAX identifier was introduced, which allows us to go beyond 8000 characters and store upto 2^31- 1 bytes,which comes around 2 GB! But note that you still cannnot specify a size greater than 8000 characters in n/varchar (like varchar(12000) is still not allowed, need to use MAX for data greater than 8000 bytes).

We can use them as: varchar (MAX) / nvarchar (MAX)

So as a general practice, avoid using TEXT/NTEXT datattypes and use varchar(MAX) and nvarchar(MAX) instead.

just watch out for the + (plus) operator (or string concatination operator) when using varchar(max) or nvarchar(max). it truncates ones data after the total number of page size bytes. rather use the replace function;

               declare @result nvarchar(max)
               set @result = N'the text which would have been used as the first operand before the plus operator that also exceeds 4000 characters FANCYPLACEHOLDER'
               set @result = replace(@result, N'FANCYPLACEHOLDER', ' ...some more text which may also exceed 4000 characters...')

Using Large-Value Data Types

Microsoft SQL Server 2005 introduces the max specifier. This specifier expands the storage capabilities of the varchar, nvarchar, and varbinary data types. varchar(max), nvarchar(max), and varbinary(max) are collectively called large-value data types. You can use the large-value data types to store up to 2^31-1 bytes of data.
________________________________________________________________________________
Note:
When the sp_tableoption stored procedure 'large value types out of row' option is set to OFF, the in-row storage limit for large-value types is 8,000 bytes. A 16-byte root is stored in-row when this option is set to ON.
_______________________________________________________________________________

The large-value data types are similar in behavior to their smaller counterparts, varchar, nvarchar and varbinary. This similarity enables SQL Server to store and retrieve large character, Unicode, and binary data more efficiently.

With large-value data types you can work with SQL Server in a way that was not possible using the text, ntext and image data types from earlier versions of SQL Server. For example, in SQL Server 2005, you can define variables that can store large amounts of data, up to 2^31 bytes of character, binary, and Unicode data.

The following table shows the relationship between the large -value data types and their counterparts from earlier versions of SQL Server.

Large-value data types              LOB from earlier versions

varchar(max)                              text*
nvarchar(max)                            ntext*
varbinary(max)                           image

* SQL Server version 6.5 clients do not support the ntext data type; therefore, they do not recognize nvarchar(max).
_______________________________________________________________________________
Important:
Use varchar(max), nvarchar(max), and varbinary(max) data types instead of text, ntext, and image data types.
_______________________________________________________________________________

Large-value data types exhibit the same behavior as their smaller counterparts, varchar(n), nvarchar(n), varbinary(n). The following describes using large-value data types in some particular scenarios:

     Cursors
     Because large-value data type variables can be defined, data from large-value data type columns of a FETCH can be put into local variables.
     Using large-value data types does not affect cursor usage that forces cursor type conversion.

     Chunked updates
     The UPDATE statement now supports a .WRITE( ) clause to perform partial updates to the underlying large-value data columns. This is similar to the text pointer operations, WRITETEXT and UPDATETEXT, supported on text, ntext, image data types in earlier versions of SQL Server.

     Triggers
     AFTER triggers are allowed on large-value data type column references in the inserted and deleted tables.

     String functions
     The built-in string functions that can operate on character and binary data are enhanced to support large-value data types as arguments. These functions include the following:
          COL_LENGTH
          CHARINDEX
          PATINDEX
          LEN
          DATALENGTH
          SUBSTRING

Using text and image Data

_________________________________________________________________________________
Important:
This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature. Use varchar(max), nvarchar(max) and varbinary(max) data types instead.
_________________________________________________________________________________
Microsoft SQL Server 2005 stores character strings longer than 8,000 characters and binary data longer than 8,000 bytes in special data types named text and image. Unicode strings longer than 4,000 characters are stored in the ntext data type.

For example, a large text file (.txt) of customer information must be imported into your SQL Server database. This data should be stored as one piece of data rather than integrated into the multiple columns of your data tables. You can create a column with the text data type for this purpose. However, if you must store company logos currently stored as Tagged Image File Format (TIFF) images (.tif) that are 10 KB each, create a column with the image data type.

If the textual data to be stored is in Unicode format, use the ntext data type. For example, a form letter created for international customers is likely to contain international spellings and characters used in various different languages. Store this data in an ntext column.

Each text and ntext data value has a collation. Collations define attributes such as comparison rules and sensitivity to case or accenting. The collations for text values also specify a code page that defines the bit patterns used to represent each character. Each ntext value uses the Unicode code page, which is the same for all the collations. Each database has a default collation. When a text or ntext column is created, it is assigned the default collation of the database unless you assign a specific collation using the COLLATE clause. When two text or ntext values having different collations are combined or compared, collation precedence rules determine which collation is used for the operation.

Data in an image data is stored as a string of bits and is not interpreted by SQL Server. Any interpretation of the data in an image column must be made by the application. For example, an application could store data in an image column using a BMP, TIFF, GIF, or JPEG format. The application that reads the data from the image column must recognize the format of the data and display it correctly. All an image column does is provide a location to store the stream of bits that make up the image data value.

Using text in row to Store text, ntext, and image Values

Usually, text, ntext, or image strings are large, a maximum of 2GB, character or binary strings stored outside a data row. The data row contains only a 16-byte text pointer that points to the root node of a tree built of internal pointers that map the pages in which the string fragments are stored.

With SQL Server, you can store small to medium text, ntext, and image values in a data row, thereby increasing the speed of queries accessing these values.

When the text, ntext, or image string is stored in the data row, SQL Server does not have to access a separate page or set of pages to read or write the string. This makes reading and writing the text, ntext, or image in-row strings about as fast as reading or writing varchar, nvarchar, or varbinary strings.

To store text, ntext, or image strings in the data row, enable the text in row option using the sp_tableoption stored procedure.
sp_tableoption N'MyTable', 'text in row', 'ON'

Optionally, you can specify a maximum limit, from 24 through 7000 bytes, for the length of a text, ntext, and image string stored in a data row:

sp_tableoption N'MyTable', 'text in row', '1000'

If you specify 'ON' instead of a specific limit, the limit defaults to 256 bytes. This default value provides most of the performance benefits: It is large enough to ensure that small strings and the root text pointers can be stored in the rows but not so large that it decreases the rows per page enough to affect performance.

Although in general, you should not set the value below 72, you also should not set the value too high, especially for tables where most statements do not reference the text, ntext, and image columns or there are multiple text, ntext, and image columns.

You can also use sp_tableoption to turn the option off by specifying an option value of either 'OFF' or 0:
sp_tableoption N'MyTable', 'text in row', 'OFF'

Using uniqueidentifier Data

The uniqueidentifier data type stores 16-byte binary values that operate as globally unique identifiers (GUIDs).

A GUID is a unique binary number; no other computer in the world will generate a duplicate of that GUID value. The main use for a GUID is for assigning an identifier that must be unique in a network that has many computers at many sites.

A GUID value for a uniqueidentifier column is usually obtained by one of the following ways:

In a Transact-SQL statement, batch, or script by calling the NEWID() function.
In application code by calling an application API function or method that returns a GUID.

The Transact-SQL NEWID() function and the application API functions and methods generate new uniqueidentifier values from the identification number of their network card plus a unique number from the CPU clock. Each network card has a unique identification number. The uniqueidentifier value that is returned by NEWID() is generated by using the network card on the server. The uniqueidentifier value returned by application API functions and methods is generated by using the network card on the client.

A uniqueidentifier value is not typically defined as a constant. You can specify a uniqueidentifier constant in the following ways:

Character string format: '6F9619FF-8B86-D011-B42D-00C04FC964FF'
Binary format: 0xff19966f868b11d0b42d00c04fc964ff

The uniqueidentifier data type does not automatically generate new IDs for inserted rows in the way the IDENTITY property does. For example, to obtain new uniqueidentifier values, a table must have a DEFAULT() clause specifying the NEWID() or NEWSEQUENTIALID() function, or INSERT statements must use the NEWID() function.

                  CREATE TABLE MyUniqueTable
                  (UniqueColumn UNIQUEIDENTIFIER DEFAULT NEWID(),
                  Characters VARCHAR(10) )
                  GO

                  INSERT INTO MyUniqueTable(Characters) VALUES ('abc')
                  INSERT INTO MyUniqueTable VALUES (NEWID(), 'def')
                  GO
_________________________________________________________________________________
Note:
You can use NEWSEQUENTIALID to generate GUIDs to reduce page contention at the leaf level of indexes. NEWSEQUENTIALID can be used only with DEFAULT constraints on table columns of type uniqueidentifier.
_________________________________________________________________________________

uniqueidentifier columns may contain multiple occurrences of an individual uniqueidentifier value, unless the UNIQUE or PRIMARY KEY constraints are also specified for the column.
A foreign key column that references a uniqueidentifier primary key in another table will have multiple occurrences of individual uniqueidentifier values when multiple rows reference the same primary key in the source table.

A table can have multiple uniqueidentifier columns. One uniqueidentifier column for each table may be specified with the ROWGUIDCOL property. The ROWGUIDCOL property indicates that the uniqueidentifier values in the column uniquely identify rows in the table. However, the property does not do anything to enforce this. The uniqueness must be enforced by using other mechanisms, such as specifying the PRIMARY KEY constraint for the column. The ROWGUIDCOL property is primarily used by Microsoft SQL Server 2005 replication. Merge replication and transactional replication that have updating subscriptions use uniqueidentifier columns to make sure that rows are uniquely identified across multiple copies of the table.

The uniqueidentifier data type has the following disadvantages:

The values are long and obscure. This makes them difficult for users to type correctly, and more difficult for users to remember.
The values are random and cannot accept any patterns that may make them more meaningful to users.
There is no way to determine the sequence in which uniqueidentifier values were generated. They are not suited for existing applications that depend on incrementing key values serially.
At 16 bytes, the uniqueidentifier data type is relatively larger than other data types, such as 4-byte integers. This means indexes that are built using uniqueidentifier keys might be relatively slower than indexes using an int key.

Consider using the IDENTITY property when global uniqueness is not required, or when having a serially incrementing key is preferred.

Using Binary Data

The binary and varbinary data types store strings of bits. Although character data is interpreted based on the Microsoft SQL Server 2005 code page, binary and varbinary data is simply a stream of bits.

binary data can store a maximum of 8,000 bytes.
varbinary, using the max specifier, can store a maximum of 2^31 bytes.

Binary constants have a leading 0x (a zero and the lowercase letter x) followed by the hexadecimal representation of the bit pattern. For example, 0x2A specifies the hexadecimal value of 2A, which is equivalent to a decimal value of 42 or a one-byte bit pattern of 00101010.

Use binary data when storing hexadecimal values such as a security identification number (SID), a GUID (using the uniqueidentifier data type), or a complex number that can be stored using hexadecimal shorthand.

The following Transact-SQL example stores a SID and hexadecimal literal:

                     USE AdventureWorks;
                     GO
                     CREATE TABLE MyCcustomerTable
                     (
                                   user_login varbinary(85) DEFAULT SUSER_SID(),
                                   data_value varbinary(1)
                     );
                     GO

                     INSERT MyCustomerTable (data_value) VALUES (0x4F);
                     GO

Santanu Datta

Wednesday, September 23, 2009

Few Points in SQL Server

No comments:

View Tenant (ULS) Logs in SharePoint Online using CSOM