UChars plugin for Take Command / TCC / TCC/LE

Version 1.4.2     2024-09-03

Charles Dye

Purpose:

This plugin adds a few new functions to support Unicode characters. They behave similarly to TCC’s familiar native functions, but support high-order Unicode characters — characters outside the Basic Multilingual Plane, with values greater than 0xFFFF. These functions also expect and return hexadecimal character values, as God and the Unicode Consortium intended.

This plugin has been largely obsoleted by Take Command v33, which supports high-order characters natively in @CHAR and @UNICODE.

Installation:

To use this plugin, copy UChars.dll to some known location on your hard drive. (If you are still using the 32-bit version of Take Command, take UChars-x86.dll instead of UChars.dll.) Load the .DLL file with a PLUGIN /L command, for example:

plugin /l c:\bin\tcmd\test\uchars.dll

If you copy the .DLL file to a subdirectory named PlugIns within your Take Command program directory, the plugin will be loaded automatically when TCC starts.

Plugin Features:

New commands:

Command:Purpose:
CHARENCODINGshow UTF-16 and UTF-8 encodings for characters
TOCLIPcopy the text on the command line to the clipboard

New functions:

Function:Like:Returns:
@CAPS2@CAPSthe input string, with initial caps
@LOWER2@LOWERthe input string in lowercase
@SMILEYcharacters from the Emoticons block
@UCHAR@CHARcharacters with the specified hex values
@UCODE@UNICODEhex values of characters in a string (no prefix)
@UCODEX@UNICODEhex values of characters in a string (leading 0x)
@ULEN@LENthe number of characters in a string
@UPPER2@UPPERthe input string in lowercase
@UREVERSE@REVERSEa string with characters in reverse order

Syntax Note:

The syntax definitions in the following text use these conventions for clarity:

BOLD CODEindicates text which must be typed exactly as shown.
CODEindicates optional text, which may be typed as shown or omitted.
Bold italicnames a required argument; a value must be supplied.
Regular italicnames an optional argument.
ellipsis…after an argument means that more than one may be given.

New Commands:

CHARENCODING — Show UTF-16 and UTF-8 encodings for characters.

Syntax:
CHARENCODING /16 /8 /C /D /K /N /X value "string"

/16show UTF-16 encoding
/8show UTF-8 encoding
/Cshow characters
/Dshow decimal values
/Kshow character class
/Nshow character name if available
/Xexpand C-style character escapes in quoted strings
valuehex character value; leading 0x or U+ is optional
"string"strign literal between quotes

You may enter characters as quoted string literals, character values, HTML 4 character entities, or any combination. You may prefix hex values with 0x or U+ but neither is required. With or without either prefix, hexadecimal is assumed. Separate values with spaces. If you specify neither /16 nor /8, the default is to show both.

/K displays a one-letter code to indicate the type of character:

KClass
Aalphabetic
Ddigit
Ppunctuation
Wwhitespace
Ccontrol character
BByte Order Mark
Nnoncharacter
Hunpaired surrogate (high) — not a character
Lunpaired surrogate (low) — not a character
-anything else

/N displays the official Unicode name of a character, if it is available. This feature requires Windows 10 build 1703 or later; it will not work in earlier versions.

/X expands any escapes in quoted strings after the /X on the command line. Strings before the /X will not be expanded.

charencoding /c "Hello, world. %@smiley[56]"



TOCLIP — copy the text on the command line to the clipboard.

Syntax:
TOCLIP /A /H /Q /X text

/Aappend to text already on the clipboard
/Hexpand HTML entities in the text
/Qquietly
/Xexpand C-style character escapes in the text
textthe text to write to the clipboard

This command works much like echo text > clip: However, it uses Unicode throughout, so it won’t mangle Unicode characters.

toclip Hello, world! %@uchar[1f30e]



New Functions:

@CAPS2 — Returns the input string, forced to initial caps.

Syntax:
%@CAPS2[string]

stringthe string to capitalize

By default, whitespace characters and the ASCII hyphen are treated as word separators. You can specify a different list of word separator characters by creating an environment variable Caps2Separators.

This function uses CharUpperW() to handle many non-ASCII letters, including accented Latin-1 letters and Cyrillic.

echo %@caps2[æther à la carte]

See also: @LOWER2 and @UPPER2.



@LOWER2 — Returns the input string, forced to lowercase.

Syntax:
%@LOWER2[string]

stringthe string to lowercase

This function uses CharLowerW() to handle many non-ASCII letters, including accented Latin-1 letters and Cyrillic.

echo %@lower2[CRÈME BRÛLÉE €3.50]

See also: @CAPS2 and @UPPER2.



@SMILEY — Returns Unicode characters from the Emoticons block.

Syntax:
%@SMILEY[value value…]

valuedecimal, or hexadecimal with a leading 0x

Values must be in the range of 0 to 79. This function returns characters in the range of U+1F600 to U+1F64F. Separate values with spaces.

echo %@smiley[7 8 7]



@UCHAR — Returns Unicode characters with the specified values.

Syntax:
%@UCHAR[value value…]

valuea hexadecimal number

This function behaves like @CHAR, except that the input values are assumed to be hexadecimal, and characters outside the Basic Multilingual Plane are supported. You may prefix values with 0x or U+ but neither is required. With or without either prefix, each value will be parsed as hexadecimal. Separate values with spaces.

Values must be in the range of 1 to 10FFFF.

echo %@uchar[16a6 16d6 16eb 16bb 16a9 16d2 16d2 16c1 16cf]



@UCODE — Returns the hexadecimal values of characters in a string.

Syntax:
%@UCODE[string]

stringthe string to examine

This function behaves like @UNICODE, except that it returns values as hexadecimal (without any prefix), and characters outside the Basic Multilingual Plane are supported. A few characters, including the backquote and the closing square bracket, will need to be escaped.

echo %@ucode[This is a test.]

echo %@ucode[😎]



@UCODEX — Returns the hexadecimal values of characters in a string.

Syntax:
%@UCODEX[string]

stringthe string to examine

This function behaves like @UNICODE, except that it returns values as hexadecimal with a leading 0x, and characters outside the Basic Multilingual Plane are supported. A few characters, including the backquote and the closing square bracket, will need to be escaped.

echo %@ucodex[This is a test.]

echo %@ucodex[💀]



@ULEN — Returns the number of Unicode characters in a string.

Syntax:
%@ULEN[string]

stringthe string to be counted

This function is almost the same as @LEN, except that it counts surrogate pairs as single characters. (Surrogates which are not properly paired will be counted as separate ‘characters’.)

echo %@ulen[😺]

echo %@ulen[%@char[0xd83d 0xde00]]



@UPPER2 — Return the input string, forced to uppercase.

Syntax:
%@UPPER2[string]

stringthe string to uppercase

This function uses CharUpperW() to handle many non-ASCII letters, including accented Latin-1 letters and Cyrillic.

echo %@upper2[crème brûlée €3.50]

See also: @CAPS2 and @LOWER2.



@UREVERSE — Returns a string with characters in reverse order.

Syntax:
%@UREVERSE[string]

stringthe string to be counted

This function is almost the same as @REVERSE, except that it does not mangle high-order Unicode characters by swapping surrogate characters.

set test=This is a test %@uchar[1f638]

echo %@ureverse[%test]

echo %@reverse[%test]



Character Escapes:

These may be used in CHARENCODING or TOCLIP with the /X option.

Escape:Expands to:Example:
\bbackspace
\eASCII escape (27 decimal)
\kgrave accent
\nnewline
\ppercent sign
\qdouble quote
\rcarriage return
\tASCII horizontal tab
\uxxxxUnicode character, up to U+FFFF\u03a3 → Σ
\UxxxxxxxxUnicode character, up to U+10FFFF\U1f63a → 😺
\nnnoctal value, up to 777\101 → A
\xnnnnhexadecimal value, up to FFFF\x41 → A
\#nnnnndecimal value, up to 65535\#65 → A
\\backslash

Startup Message:

This plugin displays an informational line when it initializes. The message will be suppressed in transient or pipe shells. You can disable it for all shells by defining an environment variable named NOLOADMSG, for example:

set /e /u noloadmsg=1

Changes:


1.4.22024-09-03StringToUnicode() and f_uchar() use PLUGIN_BUFFER_MAX for the buffer size.
CHARENCODING and @UCHAR allow octal values prefixed with 0o.
1.4.1.12024-03-15Fixed CHARENCODING /N to correctly handle characters outside the BMP.
1.4.12024-03-15Added /N to CHARENCODING.
1.4.0.42023-10-15Tweaked ShowCmdHelp() to include VER_PATCH.
1.4.0.32023-10-12Updated the plugin’s web address.
1.4.0.12023-07-24Updated ParseArgs.cpp to the current version.
1.4.02023-04-07Added @CAPS2, @LOWER2, and @UPPER2.
1.3.02022-11-16Added /H to TOCLIP to expand any HTML entities in the text.
1.2.02022-11-16Added /X to both CHARENCODING and TOCLIP to support C-style character escapes.
1.1.02022-06-02Added @UREVERSE. Also, CHARENCODING now supports all HTML 4 character entities.
1.0.02021-07-19First release.

Status and Licensing:

This plugin is © Copyright 2024, Charles Dye. Unaltered copies of the binary and documentation files may be freely distributed without restriction. I make no guarantee and give no warranty for its operation. If you find a problem, you can report it in the JP Software support forum.

Download:

You can download the current version of the plugin from https://charlesdye.net/dl/uchars.zip.