UChars plugin for Take Command / TCC / TCC/LE
Version 1.4.2 2024-09-03
Charles Dye
Purpose:
This plugin adds a few new functions to support Unicode characters. They behave similarly to TCC’s familiar native functions, but support high-order Unicode characters — characters outside the Basic Multilingual Plane, with values greater than 0xFFFF. These functions also expect and return hexadecimal character values, as God and the Unicode Consortium intended.
This plugin has been largely obsoleted by Take Command v33, which
supports high-order characters natively in @CHAR
and
@UNICODE
.
Installation:
To use this plugin, copy UChars.dll to some
known location on your hard drive. (If you are still using the 32-bit version
of Take Command, take UChars-x86.dll instead of
UChars.dll.) Load the .DLL file with a
PLUGIN /L
command, for example:
plugin /l c:\bin\tcmd\test\uchars.dll
If you copy the .DLL file to a subdirectory named PlugIns within your Take Command program directory, the plugin will be loaded automatically when TCC starts.
Plugin Features:
New commands:
Command: | Purpose: |
---|---|
CHARENCODING | show UTF-16 and UTF-8 encodings for characters |
TOCLIP | copy the text on the command line to the clipboard |
New functions:
Function: | Like: | Returns: |
---|---|---|
@CAPS2 | @CAPS | the input string, with initial caps |
@LOWER2 | @LOWER | the input string in lowercase |
@SMILEY | characters from the Emoticons block | |
@UCHAR | @CHAR | characters with the specified hex values |
@UCODE | @UNICODE | hex values of characters in a string (no prefix) |
@UCODEX | @UNICODE | hex values of characters in a string (leading 0x ) |
@ULEN | @LEN | the number of characters in a string |
@UPPER2 | @UPPER | the input string in lowercase |
@UREVERSE | @REVERSE | a string with characters in reverse order |
Syntax Note:
The syntax definitions in the following text use these conventions for clarity:
BOLD CODE | indicates text which must be typed exactly as shown. |
CODE | indicates optional text, which may be typed as shown or omitted. |
Bold italic | names a required argument; a value must be supplied. |
Regular italic | names an optional argument. |
ellipsis… | after an argument means that more than one may be given. |
New Commands:
CHARENCODING
— Show UTF-16 and UTF-8
encodings for characters.
Syntax:
CHARENCODING
/16 /8 /C /D /K /N /X
value "
string"
…
/16 | show UTF-16 encoding |
/8 | show UTF-8 encoding |
/C | show characters |
/D | show decimal values |
/K | show character class |
/N | show character name if available |
/X | expand C-style character escapes in quoted strings |
value | hex character value; leading 0x or U+ is optional |
" string" | strign literal between quotes |
You may enter characters as quoted string literals, character values,
HTML 4 character entities,
or any combination. You may prefix hex values with 0x
or U+
but neither is required. With or without either prefix, hexadecimal is assumed. Separate
values with spaces. If you specify neither /16
nor /8
, the default
is to show both.
/K
displays a one-letter code
to indicate the type of character:
K | Class |
---|---|
A | alphabetic |
D | digit |
P | punctuation |
W | whitespace |
C | control character |
B | Byte Order Mark |
N | noncharacter |
H | unpaired surrogate (high) — not a character |
L | unpaired surrogate (low) — not a character |
- | anything else |
/N
displays the official
Unicode name of a character, if it is available. This feature requires
Windows 10 build 1703 or later; it will not work in earlier versions.
/X
expands any escapes
in quoted strings after the /X
on the command line. Strings
before the /X
will not be expanded.
charencoding /c "Hello, world. %@smiley[56]"
TOCLIP
— copy the text on
the command line to the clipboard.
Syntax:
TOCLIP
/A /H /Q /X
text
/A | append to text already on the clipboard |
/H | expand HTML entities in the text |
/Q | quietly |
/X | expand C-style character escapes in the text |
text | the text to write to the clipboard |
This command works much like echo text > clip:
However,
it uses Unicode throughout, so it won’t mangle Unicode characters.
toclip Hello, world! %@uchar[1f30e]
New Functions:
@CAPS2
— Returns the input
string, forced to initial caps.
Syntax:
%@CAPS2[
string]
string | the string to capitalize |
By default, whitespace characters and the ASCII hyphen are treated as word
separators. You can specify a different list of word separator characters by
creating an environment variable Caps2Separators
.
This function uses CharUpperW()
to handle many non-ASCII letters, including accented Latin-1 letters and Cyrillic.
echo %@caps2[æther à la carte]
See also: @LOWER2
and
@UPPER2
.
@LOWER2
— Returns the input
string, forced to lowercase.
Syntax:
%@LOWER2[
string]
string | the string to lowercase |
This function uses CharLowerW()
to handle many non-ASCII letters, including accented Latin-1 letters and Cyrillic.
echo %@lower2[CRÈME BRÛLÉE €3.50]
@SMILEY
— Returns Unicode
characters from the Emoticons block.
Syntax:
%@SMILEY[
value value…]
value | decimal, or hexadecimal with a leading 0x |
Values must be in the range of 0 to 79. This function returns characters in the range of U+1F600 to U+1F64F. Separate values with spaces.
echo %@smiley[7 8 7]
@UCHAR
— Returns Unicode characters
with the specified values.
Syntax:
%@UCHAR[
value value…]
value | a hexadecimal number |
This function behaves like @CHAR
, except that the input values
are assumed to be hexadecimal, and characters outside the Basic Multilingual
Plane are supported. You may prefix values with 0x
or U+
but neither is required. With or without either prefix, each value will be
parsed as hexadecimal. Separate values with spaces.
Values must be in the range of 1 to 10FFFF.
echo %@uchar[16a6 16d6 16eb 16bb 16a9 16d2 16d2 16c1 16cf]
@UCODE
— Returns the
hexadecimal values of characters in a string.
Syntax:
%@UCODE[
string]
string | the string to examine |
This function behaves like @UNICODE
, except that it returns
values as hexadecimal (without any prefix), and characters outside the Basic
Multilingual Plane are supported. A few characters, including the backquote
and the closing square bracket, will need to be escaped.
echo %@ucode[This is a test.]
echo %@ucode[😎]
@UCODEX
— Returns the
hexadecimal values of characters in a string.
Syntax:
%@UCODEX[
string]
string | the string to examine |
This function behaves like @UNICODE
, except that it returns
values as hexadecimal with a leading 0x
, and characters outside
the Basic Multilingual Plane are supported. A few characters, including the
backquote and the closing square bracket, will need to be escaped.
echo %@ucodex[This is a test.]
echo %@ucodex[💀]
@ULEN
— Returns the
number of Unicode characters in a string.
Syntax:
%@ULEN[
string]
string | the string to be counted |
This function is almost the same as @LEN
, except that it counts
surrogate pairs as single characters. (Surrogates which are not
properly paired will be counted as separate ‘characters’.)
echo %@ulen[😺]
echo %@ulen[%@char[0xd83d 0xde00]]
@UPPER2
— Return the input
string, forced to uppercase.
Syntax:
%@UPPER2[
string]
string | the string to uppercase |
This function uses CharUpperW()
to handle many non-ASCII letters, including accented Latin-1 letters and Cyrillic.
echo %@upper2[crème brûlée €3.50]
@UREVERSE
— Returns a
string with characters in reverse order.
Syntax:
%@UREVERSE[
string]
string | the string to be counted |
This function is almost the same as @REVERSE
, except that it does
not mangle high-order Unicode characters by swapping surrogate characters.
set test=This is a test %@uchar[1f638]
echo %@ureverse[%test]
echo %@reverse[%test]
Character Escapes:
These may be used in CHARENCODING
or TOCLIP
with the
/X
option.
Escape: | Expands to: | Example: |
---|---|---|
\b | backspace | |
\e | ASCII escape (27 decimal) | |
\k | grave accent | |
\n | newline | |
\p | percent sign | |
\q | double quote | |
\r | carriage return | |
\t | ASCII horizontal tab | |
\u xxxx | Unicode character, up to U+FFFF | \u03a3 → Σ |
\U xxxxxxxx | Unicode character, up to U+10FFFF | \U1f63a → 😺 |
\ nnn | octal value, up to 777 | \101 → A |
\x nnnn | hexadecimal value, up to FFFF | \x41 → A |
\# nnnnn | decimal value, up to 65535 | \#65 → A |
\\ | backslash |
Startup Message:
This plugin displays an informational line when it initializes. The
message will be suppressed in transient or pipe shells. You can disable it
for all shells by defining an environment variable named NOLOADMSG
,
for example:
set /e /u noloadmsg=1
Changes:
1.4.2 | 2024-09-03 | StringToUnicode() and f_uchar() use PLUGIN_BUFFER_MAX
for the buffer size.CHARENCODING and @UCHAR allow octal values
prefixed with 0o . |
1.4.1.1 | 2024-03-15 | Fixed CHARENCODING /N to correctly handle
characters outside the BMP. |
1.4.1 | 2024-03-15 | Added /N to CHARENCODING . |
1.4.0.4 | 2023-10-15 | Tweaked ShowCmdHelp() to include
VER_PATCH . |
1.4.0.3 | 2023-10-12 | Updated the plugin’s web address. |
1.4.0.1 | 2023-07-24 | Updated ParseArgs.cpp to the current version. |
1.4.0 | 2023-04-07 | Added @CAPS2 ,
@LOWER2 , and @UPPER2 . |
1.3.0 | 2022-11-16 | Added /H to TOCLIP to expand
any HTML entities in the text. |
1.2.0 | 2022-11-16 | Added /X to both CHARENCODING
and TOCLIP to support C-style character escapes. |
1.1.0 | 2022-06-02 | Added @UREVERSE .
Also, CHARENCODING now supports all HTML 4 character entities. |
1.0.0 | 2021-07-19 | First release. |
Status and Licensing:
This plugin is © Copyright 2024, Charles Dye. Unaltered copies of the binary and documentation files may be freely distributed without restriction. I make no guarantee and give no warranty for its operation. If you find a problem, you can report it in the JP Software support forum.
Download:
You can download the current version of the plugin from https://charlesdye.net/dl/uchars.zip.