Cannot insert Bassa Vah characters into SQL Server database

Dukaw LiberiaInfo 20 Reputation points
2025-05-08T02:15:30.54+00:00

I am using SQL Server and Visual Studio 2022 for a language database, but I am unable to insert characters from this page for the Bassa Vah script (https://scriptsource.org/cms/scripts/page.php?item_id=character_list&key=16AD0).

I tried NVARCHAR field, NTEXT field and N' before the characters, but none of that worked. I can install other foreign characters. Using someone else's example below, all of the other languages worked, except Bassa which showed boxes and question marks. Seems SQL Server does not support Bassa? Please advise.

CREATE TABLE TestLang

(

LangName VARCHAR(100),

Value VARCHAR(1000),

NValue NVARCHAR(1000)

)

GO

INSERT INTO TestLang (LangName, Value, NValue)

VALUES ('English', 'Welcome to GFG', N'Welcome to GFG');

INSERT INTO TestLang (LangName, Value, NValue)

VALUES ('Gujarati', 'GFG માં આપનું સ્વાગત છે', N'GFG માં આપનું સ્વાગત છે');

INSERT INTO TestLang (LangName, Value, NValue)

VALUES ('Hindi', 'GFG में आपका स्वागत है', N'GFG में आपका स्वागत है');

INSERT INTO TestLang (LangName, Value, NValue)

VALUES ('Bassa', '𖫨𖫐', N'𖫬𖫐'); can't even paste the letters here, it pastes as boxes

SQL Server Transact-SQL
SQL Server Transact-SQL
SQL Server: A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.Transact-SQL: A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
181 questions
0 comments No comments
{count} votes

Accepted answer
  1. Viorel 121.6K Reputation points
    2025-05-08T06:26:38.6366667+00:00

    For further investigations, consider this test in Management Studio:

    CREATE TABLE TestLang
    (
        LangName VARCHAR(100),
        NValue NVARCHAR(1000) collate Latin1_General_100_CI_AS_SC
    )
    GO
    
    INSERT INTO TestLang (LangName, NValue)
    VALUES ('Bassa', N'𖫬𖫐');
    
    select *, 
        len(NValue) as Length,
        format(unicode(substring(NValue, 1, 1)), 'X') as Code1,
        format(unicode(substring(NValue, 2, 1)), 'X') as Code2  
    from TestLang
    order by NValue
    

    You will probably see that the results are correct (length, codes), therefore the specified collation seems to accept the letters.

    However, the letters appear as boxes due to limitations of the font, or of Management Studio. Try to find and use a font that supports the language.

    Or use Azure Data Studio instead of Management Studio.

    Therefore, SQL Server seems at least able to store such texts.


2 additional answers

Sort by: Most helpful
  1. Olaf Helper 46,641 Reputation points
    2025-05-08T05:51:16.5833333+00:00

    Bassa character set is UTF-16 and has at least 5 bytes; you use Unicode and that can store max 4 bytes characters; so not possible.


  2. Erland Sommarskog 120.4K Reputation points MVP Moderator
    2025-05-08T21:05:08.78+00:00

    If you see boxes, that is a font issue. That is, your font does not support Bassa.

    When I look at your post in my web browser, the Bassa characters displays fine:User's image

    But when I copy the text to SSMS, I only see boxes. The font I use both in the query window and the results grid is Courier New, and apparently this font does not include Bassa characters.

    The good news is that when you see boxes, there has not been any data loss. The actual characters are still there with some probability. (To be sure, you would need to cast to binary and look at the hex digits.) You only need to find away to display them. Note that SQL Server is not involved here. SQL Server only stores the data. Display is done by client programs such as SSMS.

    On the other hand, if you see question marks, there has been data loss. Typically, this happens if you don't include the N before the string literal. This makes the string literal into varchar, and which characters that can be displayed depends on the code page for the collation. If the code page is not UTF-8, many characters cannot be represented, and they will be replaced by a fallback character. A fallback character can be a lookalike, for instance á is replaced by a. When there is no suitable fallback, the question mark is used as a general fallback character.

    As for what font you should use to get proper display of Bassa characters, I don't know. I need to admit that I had never heard of Bassa Vah prior to seeing our post.

    I should add that for proper handling of Bassa characters, you need a collation with SC in the name, since the Bassa characters are outside the Unicode base plane. If you use a collation without SC in the name, you can get some confusing result. Even more so, if you use a collation with a version number in the name like Latin1_General_CI_AS.


Your answer