![]() In other words, if we use SQL Server older that 2012, like SQL Server 2008 R2 for example, the ENCODING for those fields will use UCS-2 ENCODING which support a subset of UNICODE. These data types store only the subset of character data supported by Types store the full range of Unicode character data and use the Supplementary Character (SC) enabled collation is used, these data Starting with SQL Server 2012 (11.x), when a Microsoft's "nchar and nvarchar (Transact-SQL)" documentation specifies perfectly: Again, it depends on the field's COLLATION and also SQLServer Version. OK, SO HOW CAN WE SOLVE THIS? Actually, it depends, and that is GOOD!īefore SQL Server 2019 all we had was NCHAR and NVARCHAR fields. If we check the Windows-1252 ENCODING specification Character List for Windows-1252, we will find out that this encoding won't support our emoji character. ![]() So, for a varchar column, with Latin1_General_CI_AI COLLATION, this field will handle its data using the Windows-1252 ENCODING, and only correctly store characters supported by this encoding. ![]() For the Latin1_General_CI_AI COLLATION it returns the Windows Code Page code 1252, that maps to Windows-1252 ENCODING. A Windows Code Page is nothing more than another mapping to ENCODINGs. This simple SQL returns the Windows Code Page for a COLLATION. So, HOW WE KNOW WHAT IS THE ENCODING USED BY OUR COLLATION? With this: SELECT COLLATIONPROPERTY( 'Latin1_General_CI_AI', 'CodePage' ) AS So, differently than a lot think, COLLATION is not only about sorting and comparing data, but also about ENCODING, and by consequence: how our data will be stored! MSSQL Server uses the COLLATION to determine what ENCODING is used on char/ nchar/ varchar/ nvarchar fields. The plain English part would be perfectly supported even by ASCII, but since there are also an emoji, which is a character specified in the UNICODE standard, we need an ENCODING that supports this Unicode character. It could be an Instagram comment as "I love stackoverflow! □". ![]() Let's say we wanna store a peculiar text on our MSSQL Server database. Also, everything I say here is specific to Microsoft SQL Server, and how it stores and handles data in char/ nchar and varchar/ nvarchar fields. If you don't, then first take a look below at my humble and simplified explanation on "What is UNICODE, ENCODING, COLLATION and UTF-8, and how they are related" section and supplied documentation links. To fully understand what I'm about to explain, it's mandatory to have the concepts of UNICODE, ENCODING and COLLATION all extremely clear in your head. Setting this collation on a VARCHAR field (or entire table/database), will use UTF-8 ENCODING for storing and handling the data on that field, allowing fully support UNICODE characters, and hence any languages embraced by it. You should use NVARCHAR/ NCHAR whenever the ENCODING, which is determined by COLLATION of the field, doesn't support the characters needed.Īlso, depending on the SQL Server version, you can use specific COLLATIONs, like Latin1_General_100_CI_AS_SC_UTF8 which is available since SQL Server 2019. You can support Spanish characters like ñ and English, with just common varchar field and Latin1_General_CI_AS COLLATION, e.g. It should have nothing to do with "store different/multiple languages". VARCHAR when guaranteed to be constrainedīoth the two most upvoted answers are wrong. There is absolutely no reason to use NVARCHAR for those. Also VARCHAR for user-entered, and very constrained (like a phone number) or a code (ACTIVE/CLOSED, Y/N, M/F, M/S/D/W, etc). I would recommend VARCHAR for any column which is a natural key (like a vehicle license plate, SSN, serial number, service tag, order number, airport callsign, etc) which is typically defined and constrained by a standard or legislation or convention. I would recommend NVARCHAR for any column which will have user-entered data in it which is relatively unconstrained. It also means that the SQL Server will not be able to deal with the data easily for purposes of querying within T-SQL on (potentially variably) encoded columns. The problem comes when you are attempting to encode and decode, especially if the code page is different for different rows. If you treat the database as dumb storage, it is perfectly possible to store wide strings and different (even variable-length) encodings in VARCHAR (for instance UTF-8). The real reason you want to use NVARCHAR is when you have different languages in the same column, you need to address the columns in T-SQL without decoding, you want to be able to see the data "natively" in SSMS, or you want to standardize on Unicode.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |