TextUtils plugin for Take Command / TCC / TCC/LE
beta version 0.85.2 2024-11-05
Charles Dye
Purpose:
This plugin implements a variety of text-related features. There are new commands to count words, sentences, and paragraphs in English text; find words in text and display them in context; replace words in text; generate random passwords; display the lines of a text file in reverse order; wrap text to a desired width; and save an entire array to disk and reload it later. New functions allow you to obscure text to make it unreadable, and restore it later; determine the character encoding and text format of text files; generate Metaphone codes; remove accents from text strings; and count vowels in a string.
Installation:
To use this plugin, copy TextUtils.dll and
TextUtils.chm to some known location on your hard
drive. (If you are still using the 32-bit version of Take Command, take
TextUtils-x86.dll instead of
TextUtils.dll.) Load the plugin with a
PLUGIN /L
command, for example:
plugin /l c:\bin\tcmd\test\textutils.dll
If you copy these files to a subdirectory named PlugIns within your Take Command program directory, the plugin will be loaded automatically when TCC starts.
Plugin Features:
Syntax Note:
The syntax definitions in the following text use these conventions for clarity:
BOLD CODE | indicates text which must be typed exactly as shown. |
CODE | indicates optional text, which may be typed as shown or omitted. |
Bold italic | names a required argument; a value must be supplied. |
Regular italic | names an optional argument. |
ellipsis… | after an argument means that more than one may be given. |
New Commands:
CHARENCODING
— Show UTF-16 and UTF-8
encodings for characters.
Syntax:
CHARENCODING
/16 /8 /C /D /K /N /X
value "
string"
…
/16 | show UTF-16 encoding |
/8 | show UTF-8 encoding |
/C | show characters |
/D | show decimal values |
/K | show character class |
/N | show character names if available |
/X | expand C-style character escapes in quoted strings |
value | hex character value; leading 0x or U+ is optional |
" string" | strign literal between quotes |
You may enter characters as quoted string literals, character values,
HTML 4 character entities,
or any combination. You may prefix hex values with 0x
or U+
but neither is required. With or without either prefix, hexadecimal is assumed. Separate
values with spaces. If you specify neither /16
nor /8
, the default
is to show both.
/K
displays a one-letter code
to indicate the type of character:
K | Class |
---|---|
A | alphabetic |
D | digit |
P | punctuation |
W | whitespace |
C | control character |
B | Byte Order Mark |
N | noncharacter |
H | unpaired surrogate (high) — not a character |
L | unpaired surrogate (low) — not a character |
- | anything else |
/N
displays the official
Unicode name of a character, if it is available. This feature requires
Windows 10 build 1703 or later; it will not work in earlier versions.
/X
expands any escapes
in quoted strings after the /X
on the command line. Strings
before the /X
will not be expanded.
charencoding /c "Hello, world. %@uchar[1f638]"
CLIP2TEXT
— Copy text from
the clipboard to a file or standard output.
Syntax:
CLIP2TEXT
/A /NB /O /P /T /UTF8 /UTF16
filename
/A | append to an existing file |
/NB | do not write a Byte Order Mark |
/O | overwrite an existing file |
/P | page output (useful only when output is to stdout) |
/T | quietly |
/UTF8 | write file in the UTF-8 encoding |
/UTF16 | write file in the UTF-16 encoding (default) |
Only one filename is allowed. If no
filename is specified, CLIP2TEXT
will
dump the clipboard to standard output.
See also: the TEXT2CLIP
command.
CONTEXT
— Search for words in
English text and display them in context.
Syntax:
CONTEXT
/A:
attribs /C:
n /CP:
n /F:
n /H:
n /K:
n /P /N /S /V /W:
base /X:
word /Y:
word
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/C: n | specifies the number of sentences of context to display, before and after |
/CP: n | interpret non-Unicode input text using code page n |
/F: n | specifies the format of the input text; n is one of: |
0 — best guess (default) | |
1 — unformatted (line breaks are used only to end paragraphs) | |
2 — prewrapped (line breaks are used to wrap text) | |
3 — unformatted, with blank lines between paragraphs | |
/H: n | set highlight colors for matching words |
/K: n | output columns for word-wrap |
/P | page output |
/N | disable features |
/S | search in subdirectories for matching filenames |
/V | verbose; report counts of found items after each file and at the end |
/W: base | search for forms of a word |
/W:" base base…" | search for a series of word forms |
/X: word | search for an exact word |
/X:" word word…" | search for a series of exact words |
/Y: word | search for words that sound like word |
… | Range options are also supported. |
CONTEXT
can read from disk files or
from a pipe. If you want to pipe to CONTEXT
, remember that pipes
open a new shell. To pipe to a plugin command, you must either ensure that the
plugin is loaded in the transient shell, e.g. by installing the
.DLL file in the shell’s PlugIns
directory; or else use temporary files or an in-process pipe.
You may specify more than one filename;
wildcards and directory aliases are supported. You can search recursively into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to
search for words on the clipboard.
Note: This command was created specifically to search through English text. I make many Anglocentric assumptions about what constitutes a ‘word’, a ‘sentence’, a ‘paragraph’, ‘forms’ of a word, and so on. These assumptions are probably not useful for any other language.
Word search: /W:
base
searches for forms of a word; this will probably be your most frequently-used
option. Specify the base form of a word, and CONTEXT
will attempt
to match variations of it. For example, /W:DOG
will match dog,
dogs, dog’s, doggy, and even doggedly.
A word in the input text is considered a ‘form’ of the specified
base word if (1) the beginning matches for the entire
length of base, except that a final Y at
the end of the base word will match an I in
the word from the text; and (2) the remainder of the word does not contain more
than one vowel other than Y. Case is not significant, and most common
accents are ignored; /W:garcon
will match garçon,
/W:"deja vu"
will match Déjà vu, and so on.
If a word from the input text contains a hyphen, the /W:
search
will also look for the specified base word to either
side of the hyphen; /W:LEVEL
will match level-headed,
sub-level, and even poorly-levelled.
Word series: You can search for a series of words with
/W:"
base base…"
.
To match, a series of words must appear within the same sentence in the input
text; a word series cannot span the end of a sentence. Matching words must be
consecutive, and may be separated by spaces, tabs, or other punctuation.
CONTEXT
will check for forms of each base word as above,
but will not look for the base within hyphenated
words. For instance, /W:"LITTLE OLD LADY"
will match little,
old ladies.
Exact-word search: /X:
word
searches for a word without checking for variant forms. /X:
does
not look for the specified word within hyphenated
words. Case and accents are still ignored. You can search for a series of
exact words with /X:"
word word…"
.
Sound-alike search: /Y:
word searches
for words which sound similar to the specified word.
The comparison uses a Metaphone-like algorithm to guess at a word’s
pronunciation. (This type of search does not support word series.)
Surrounding context: By default, CONTEXT
displays one sentence before, and one sentence after, each sentence containing any
of the specified search words. You can adjust this value with
/C:
n; legal values are 0 to 15. Note
that you may see more than 2n sentences between
found words that are close together; CONTEXT
will display a little extra
text rather than introduce a very short break. You may also see fewer than
n sentences near the start or the end of a file.
Highlighting: If
CONTEXT
’s output is to the screen (i.e. stdout is
not redirected), text which matches your search words will be highlighted in a
different color. By default, CONTEXT
picks a highlight color which
contrasts with the current console colors. You can specify your own highlight
color either with the option /H:
n, or
by setting an environment variable named HIGHLIGHT
. Either way,
the value should be a decimal number from 1 to 254, or a hexadecimal value from
0x01 to 0xFE. The high four bits set the background color, and the low four
bits set the foreground color; the two values must be different. The command-line
option takes precedence over the environment variable. You can disable highlighting
with /NC
. Text is not highlighted if the commands’s output
is redirected.
Reports: If /V
is specified,
CONTEXT
will also report the number of times each search word
was found within a file. If more than one file is processed it will also show
a final report for all files, giving the number of times each search word was
found in total, and in how many files.
Text encoding: CONTEXT
automatically
detects Unicode text files. If the file is not Unicode, the command has no way
of detecting the character encoding; the default Windows code page is assumed.
You can specify a different code page for non-Unicode text files with
/CP:
n. Most single-byte (i.e.,
alphabetic) code pages are supported, but multibyte
code pages (Chinese, Japanese, Korean) are not. This option only affects
non-Unicode files.
Text format: Text files use line-break characters
in different ways. In some files, line break characters are used only to mark
where a line end should occur: the end of a paragraph. In other files,
line breaks are used to wrap text to some desired width. You can use
/F:
n to tell CONTEXT
how to handle line breaks. /F:1
indicates that the text is unformatted,
with line breaks only at the ends of paragraphs. CONTEXT
will
honor all line breaks, and add an extra blank line after each paragraph.
/F:2
means that the input text is prewrapped, having line
breaks within paragraphs and even within sentences. CONTEXT
will
skip single line breaks, honoring only sequences of two or more in a row. /F:3
is also for unformatted text and acts like /F:1
, but does not insert a
blank line after each paragraph. If you specify /F:0
or do not
specify any /F:
n, CONTEXT
will attempt to guess how the input text is formatted. (Guessing is not reliable
when there isn’t much input text.)
Word wrap: Text output by CONTEXT
will be word-wrapped. If output is to the screen, it will be wrapped to the
screen width. If output has been redirected, the default width is 100 columns.
You can set a different width using the /K:
n
option; the value must be between 40 and 512.
Disabling features:
/N
disables features:
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
C:\> context https://www.gutenberg.org/files/11/11-0.txt /w:paint
File "D:\download\pg11.txt" :
CHAPTER VIII. The Queen's Croquet-Ground
A large rose-tree stood near the entrance of the garden: the roses growing on it
were white, but there were three gardeners at it, busily painting
them red. Alice thought this a very curious thing, and she went nearer to watch
them, and just as she came up to them she heard one of them say, 'Look out now,
Five! Don't go splashing paint over me like that!'
'I couldn't help it,' said Five, in a sulky tone; 'Seven jogged my elbow.'
* * *
Seven flung down his brush, and had just begun 'Well, of all the unjust things--'
when his eye chanced to fall upon Alice, as she stood watching them, and he
checked himself suddenly: the others looked round also, and all of them bowed low.
'Would you tell me,' said Alice, a little timidly, 'why you are painting
those roses?'
Five and Seven said nothing, but looked at Two.
C:\>
COPYCHARS
— Put characters
on the clipboard.
Syntax:
COPYCHARS
/A /Q
value
entity "
string"
…
/Q | append to current clipboard text |
/A | quietly |
Character values may be specified in decimal, or in hexadecimal with a
leading 0x
.
Entities are as in HTML 3.2; the leading ampersand may be omitted. Entities are case sensitive.
rem A non-breaking space, an em dash, and a space:
copychars nbsp; mdash; 32
rem Text in fancy quotes:
copychars ldquo; "This is a test." rdquo;
rem High-order characters are supported:
copychars 0x1f603
COUNTCHARS
— Count characters
in text files.
Syntax:
COUNTCHARS
/C:
x-
y /CP:
n /O /P /R /RO /S /U /V /W /X
filespec…
/C: x- y | specify a range of characters to count |
/CP: n | interpret non-Unicode input text using code page n |
/O | sort by frequency |
/P | page output |
/R | report counts for ranges as well as individual characters |
/RO | report range counts only, not counts of individual characters |
/S | search in subdirectories for matching files |
/U | force characters to uppercase |
/V | do not automatically merge overlapping ranges |
/W | do not report count of ‘other’ characters |
/X | do not report total characters count |
/ASCII | short for /C:0-127 |
/BMP | short for /C:0-0xFFFF |
/HI | short for /C:0x10000-0x10FFFF |
… | Range options are also supported. |
Input filenames may be specified on the command line, or text may be
redirected or piped into COUNTCHARS
. If you want to pipe to
COUNTCHARS
, remember that pipes open a new shell. To pipe to a plugin
command, you must either ensure that the plugin is loaded in the transient
shell, e.g. by installing the .DLL file
in the shell’s PlugIns directory; or else
use temporary files or an in-process pipe.
You may specify more than one filename;
wildcards and directory aliases are supported. You can search recursively into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to read
text from the clipboard.
Specify ranges of characters to count with
/C:
x-
y.
The start and end characters x and
y may be given as decimal, hexadecimal with a
leading 0x
, or as literal characters:
rem These three are all the same:
countchars /c:65-90 myfile.txt
countchars /c:0x41-0x5a myfile.txt
countchars /c:A-Z myfile.txt
To specify a literal digit, wrap it in apostrophes:
countchars /c:'0'-'9' myfile.txt
You may specify up to 32 ranges. If you do not specify any ranges, the
default is /C:0-127
(ASCII characters).
All values, both in character ranges and in COUNTCHARS
’s
reports, refer to Unicode code points. If the text uses an 8-bit or OEM
encoding, the values reported are the values of the Unicode characters that
the OEM characters are translated into — not the OEM
character values.
How many letters are in Engine Summer.txt?
countchars /c:A-Z /u /ro "Engine Summer.txt"
File "C:\Bin\JPSDK\TextUtils\Engine Summer.txt" :
0041 - 005A : 343
Other : 161
TOTAL : 504
/C:A-Z
defines a range of characters from A to Z.
/U
converts lowercase letters to uppercase so they will also be
counted in the same range. /RO
reports only the the total number
of characters in the range; we only want the total number of letters, not the
number of As, Bs, Cs, and so on. There are 343 letters in this file.
How many Cyrillic letters? Most Cyrillic letters fall in the range of U+0400 to U+04FF:
countchars /c:0x0400-0x04ff /ro "Engine Summer.txt"
File "C:\Bin\JPSDK\TextUtils\Engine Summer.txt" :
0400 - 04FF : 0
Other : 504
TOTAL : 504
Mr. Crowley is not writing in Russian.
DEDUP
— Dump text files to
standard output, merging repeated lines.
Syntax:
DEDUP
/A:
attribs /B /C /CP:
n /D /H /I /M /N /P /S /T /U
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/B | discard blank lines |
/C | show line repeat counts |
/CP: n | interpret non-Unicode input text using code page n |
/D | show only repeating lines |
/H | display filenames |
/I | ignore case when comparing lines |
/M | merge repeating lines (default) |
/N | disable features |
/P | page output |
/S | search in subdirectories for matching files |
/T | trim leading and trailing whitespace |
/U | show only lines which do not repeat |
… | Range options are also supported. |
Input filenames may be specified on the command line, or text may be
redirected or piped into DEDUP
. If you want to pipe to
DEDUP
, remember that pipes open a new shell. To pipe to a plugin
command, you must either ensure that the plugin is loaded in the transient
shell, e.g. by installing the .DLL file
in the shell’s PlugIns directory; or else
use temporary files or an in-process pipe.
You may specify more than one filename;
wildcards and directory aliases are supported. You can search recursively into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to read
text from the clipboard.
Options /D
, /M
, and /U
select the
operating mode. If you don’t specify one, the default is /M
.
If you specify more than one, the last one wins.
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
DEGAS
— Remove excess spaces
and blank lines from text.
Syntax:
DEGAS
/A:
attribs /B:
n /CP:
n /E:
n /H /L /N /P /R /S /T /W
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/B: n | maximum whitespace characters |
/CP: n | interpret non-Unicode input text using code page n |
/E: n | maximum blank lines |
/H | display filenames |
/L | display line numbers |
/N | disable features |
/P | page output |
/R | remove all blank lines at the start and end of the file |
/RS | remove all blank lines at the start of the file |
/RE | remove all blank lines at the end of the file |
/S | search in subdirectories for matching files |
/T | trim all leading and trailing whitespace from each line |
/W | convert all whitespace characters to ASCII spaces |
… | Range options are also supported. |
The contents of the files will be dumped to standard output, with excess spaces and blank lines removed.
Input filenames may be specified on the command line, or text may be
redirected or piped into DEGAS
. If you want to pipe to
DEGAS
, remember that pipes open a new shell. To pipe to a plugin
command, you must either ensure that the plugin is loaded in the transient
shell, e.g. by installing the .DLL file
in the shell’s PlugIns directory; or else
use temporary files or an in-process pipe.
You may specify more than one filename;
wildcards and directory aliases are supported. You can search recursively into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to dump
the clipboard.
/B:
lets you specify the maximum number
of whitespace characters in a row. For example, /B:4
allows no
more than four whitespace characters in a row.
DEGAS
allows for the convention of spacing twice at the end of
a sentence. Specify two numbers separated by a comma:
/B:
n,
m.
The first sets the maximum number of whitespace characters after a period,
question mark, or exclamation point; the second is the maximum after any other
character. /B:2,1
allows up to two spaces at the end of a
sentence, but only one elsewhere.
/E:
specifies the maximum number of blank
lines in a row. (A line containing only whitespace characters is considered a
‘blank line’.) /E:3
allows no more than three blank
lines together. /E:0
removes all blank lines; /E:0
can be abbreviated to /E
.
You can remove all blank lines at the start
of a file with /RS
. Likewise, you can remove all blank
lines at the end of a file with /RE
. /R
does both.
This option is independent of the /E:
compression of blank lines.
/T
strips all leading and trailing
whitespace from each line. This is a separate operation from the
/B:
compression of spaces, and happens
earlier.
If none of /B:
/E:
/R
/RS
/RE
or /W
are specified, the default is
/B:2,1 /E:1
— a maximum of two spaces at the end
of a sentence, one space elsewhere; and no more than one blank line in a row.
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
DEHTML
— Strip HTML tags from a file
and dump the contents to standard output.
Syntax:
DEHTML
/A:
attribs /B /C /CP:
n /E /H /M /N /N: /O:
n /P /R /S
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/B | exclude text outside the body and title |
/C | include text in <!-- comments --> |
/CP: n | interpret non-Unicode input text using code page n |
/E | omit empty (blank) lines |
/H | display filenames |
/M | look in <meta> tags for charset info |
/N | by itself: include text in <noscript> or <applet> tags |
/N: | with suboptions: disable features |
/O: n | include text inside <option> tags: |
0 — don’t include any (the default) | |
1 — include only the first <option> | |
2 — include all <option> text | |
/P | page output |
/R | remove title |
/S | search in subdirectories for matching files |
… | Range options are also supported. |
Input filenames may be specified on the command line, or text may be
redirected or piped into DEHTML
. If you want to pipe to
DEHTML
, remember that pipes open a new shell. To pipe to a plugin
command, you must either ensure that the plugin is loaded in the transient
shell, e.g. by installing the .DLL file
in the shell’s PlugIns directory; or else
use temporary files or an in-process pipe.
You may specify more than one filename;
wildcards and directory aliases are supported. You can search recursively into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to dump
the clipboard if it contains HTML.
DEHTML
will strip HTML tags from the file and replace
HTML entities
with the corresponding characters; most of the remaining text will be dumped to
stdout. This command will also discard: any text in the header which does not
appear within <title> tags; anything in
<script> or <style>
tags; anything within an HTML comment unless you specify /C
;
anything in <noscript> or
<applet> tags unless you specify /N
;
and anything in <option> tags within a
<select> block unless you specify /O:1
or /O:2
.
If you specify /M
, DEHTML
will look in
<meta> tags in the header for information
about the document’s character encoding. This only works if the file
is not in Unicode; /M
has no effect with Unicode files.
/N
with suboptions disables features:
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
• Note: HTML files often include
some unusual characters like non-breaking spaces, bullets, em dashes, ellipses,
and guillemets. If you want to pipe or redirect the output from this command,
it’s a good idea to enable Unicode output with
OPTION //UNICODEOUTPUT=YES
. If Unicode output is disabled,
some characters may be mangled in translation.
FFIELDS
— Read a file and
print fields in a specified format.
Syntax:
FFIELDS
/A:
attribs /C /CP:
n /E /F:"
format" /H /K:
n /L:
string /N /P /Q /S /T /W /X
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/C | separate fields at commas |
/CP: n | interpret non-Unicode input text using code page n |
/E | separate fields at first unquoted equals sign |
/F:" format" | format string; see below |
/H | display filenames |
/K: n | output line width (columns) |
/L: string | insert line numbers on the left |
/N | disable features |
/P | page output |
/Q | remove quotes (the default is to retain them) |
/S | search in subdirectories for matching files |
/T | separate fields at tabs |
/W | separate fields at whitespace |
/X | perform variable expansion on each line |
… | Range options are also supported. |
The FFIELDS
command reads a file, divides each line into fields
(blank lines are skipped), and then prints the fields using a format string.
FFIELDS
can read from disk files or from a pipe. If you want to
pipe to FFIELDS
, remember that pipes open a new shell. To pipe to
a plugin command, you must either ensure that the plugin is loaded in the
transient shell, e.g. by installing the .DLL
file in the shell’s PlugIns directory; or
else use temporary files or an in-process pipe.
You may specify more than one filename;
wildcards and directory aliases are supported. You can search recursively into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to
read from the clipboard instead of a file.
The format string may contain $
n
to print field n, or
$
n=
wf
to print field n truncated to length w;
the final letter is L
to left-justify the field if it contains
fewer than w characters, R
to
right-justify it, C
to center it, or T
to simply
truncate the field without padding it to length w.
For example, a field specifier of $4=10L
would print field 4,
left-justified to 10 characters. Use $$
to print a literal dollar
sign, or $N
to insert a line break.
Fields are numbered starting from 0.
set |! ffields /e /f:"$0=20l $1=58t"
…displays variable names truncated to 20 characters, followed by a space and the variables’ values truncated to 58 characters.
If you include /L
on the command line,
FFIELDS
will insert line numbers to the left of each output line.
Lines are numbered starting at 0. If you include the optional string
argument, FFIELDS
will perform variable expansion on it before
prepending it to each output line; use the variable _LINE
to get the current line number. For example, /L:"%%@FORMAT[03,%%_LINE]"
will prepend the line number, zero-padded to at least three digits.
If you don’t specify a format string,
FFIELDS
will invent one at random:
alias |! ffields /e
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
/X
does variable expansion on each line
before displaying it. You could, for example, count the characters in each
alias definition:
alias |! ffields /e /f:"$0 = (%%@len[$1]) $1" /x
FILTERFILES
— Pass files
through a text filter command.
Syntax:
FILTERFILES
/B:.
ext /C /J /N /P /Q /S /UTF8 /UTF16
filespec… :
command
args…
/B:. ext | extension for backups; the default is .original |
/C | do not abort if the command exits with errorlevel 3 |
/N | not really |
/N | disable features |
/J | redirect input |
/P | prompt for each file |
/Q | quietly |
/S | search in subdirectories for matching files |
/UTF8 | redirect output as UTF-8 |
/UTF16 | redirect output as UTF-16 |
… | Range options are also supported. |
filespec… | the files to process; at least one filespec is required |
command | a filter command which writes to stdout |
At least one filespec is required. Anything after the first unquoted colon is the command to execute; this also is required.
Matching files will be renamed with a .original
extension, or as per /B
. Then the specified command
will be called, passing the new filename on its command line after any
args, and with its output redirected to the new
filename.
This command only supports local files. CLIP:
, URLs, standard
input, and so on are not supported.
/N
by itself prevents FILTERFILES
from doing anything. Matching files will be displayed but not renamed, and the
command will not be executed.
/N
with suboptions disables features:
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
/P
causes FILTERFILES
to
prompt before processing each file. You can press:
Y | to filter the file |
N or Esc | to skip the file |
A | to stop prompting and filter all remaining files |
Q | to exit immediately |
/UTF8
and /UTF16
let you
set the output encoding. They call OPTION //UnicodeOutput=
and OPTION //UTF8Output=
before processing files, and then
restore the original settings before FILTERFILES
exits. Note
that //UTF8Output
does not actually work in TCC/LE.
By default, FILTERFILES
passes each
original filename to the command on its command
line:
filtercmd "file.original" > "file.txt"
If you specify /J
, it will use input redirection instead:
filtercmd < "file.original" > "file.txt"
FILTERFILES
is mainly intended for use with the filters in this
plugin: DEDUP
, DEGAS
,
DEHTML
, WRAP
,
and so on. But you can use it with any command that either accepts a filename
on its command line or reads from standard input, and that writes text to
standard output.
rem Convert all .TXT file in the current directory to Pig Latin:
filterfiles *.txt : oink
rem Add line numbers to MyFile.txt:
filterfiles myfile.txt : type /L
LOADARRAY
— Load data from
a file into an array variable.
Syntax:
LOADARRAY
/Q
filename
arrayname
/Q | quietly |
filename | a file created by SAVEARRAY |
arrayname | an array variable name |
The arrayname must begin with a letter. It may contain only letters, digits, underscores, and dollar signs; it should not be more than 31 characters long. If you don’t specify an arrayname, the name of the original array saved in the file will be used. The array will be created (or recreated) automatically, with the correct dimensions to hold the data from the file.
All elements in the file will be loaded. There is no provision for loading a partial array.
• Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.
See also: the SAVEARRAY
command.
OINK
— Translate a text file to
Pig Latin.
Syntax:
OINK
/A:
attribs /CP:
n /D /H /N /P /Q /S
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/CP: n | interpret non-Unicode input text using code page n |
/D | disable highlight |
/H | display filenames |
/N | disable features |
/P | page output |
/Q | replace ASCII quotes and apostrophes with Unicode open and close quotes |
/S | search in subdirectories for matching files |
… | Range options are also supported. |
If standard input (stdin) is redirected, OINK
will read from
stdin before any filenames specified on the
command line. If no filenames are specified, then
OINK
will read from stdin whether it is redirected or not.
Filenames may include wildcards and directory aliases. You can search into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to read
from the clipboard.
If you want to pipe to OINK
, remember that pipes open a new
shell. To pipe to a plugin command, you must either ensure that the plugin is
loaded in the transient shell, e.g. by installing the
.DLL in the shell’s .DLL
directory; or else use temporary files or an in-process pipe.
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
(Yes, this is silly. It was a simple test driver to generate gribble for testing some of the other commands and functions in this plugin. It’s very small — most of the code is shared with other commands — so I left it in.)
See also: the @OINK
function, which renders a string as Pig Latin.
PARSEARGS
— Divide a string
into arguments.
Syntax:
PARSEARGS
/A:
array /F:
flags /Q /V:
var !
string
/A: array | name of an array to receive the arguments; the default is ARG |
/F: flags | parse flags; bitmapped, see below; the default is 1 |
/Q | quiet; don’t display arguments to stdout |
/V: var | name of an environment variable containing the string to parse |
! string | the string to parse |
This command exposes the plugin’s internal ParseArgs() function, which
divides a string into command-line arguments. Its operation can be changed in
various ways with the /F:
flags option.
The string to be parsed may be passed in two different ways. You can pass
the string on the command line, immediately following an exclamation point.
The string must be the last item on the command line; everything following the
exclamation point is considered the string to parse. Alternatively, you can
store the string in an environment variable, and pass the name of the variable
with the /V:
var option.
The resulting arguments will be stored in an array. You can specify the
name of the array with the /A:
array
option. The array name must begin with a letter. It may contain only letters,
digits, underscores, and dollar signs; it should not be more than 31 characters
long. If you don’t specify an array name, the default is ARG
.
The number of arguments found will be stored in an environment variable; the
name of this variable is the name of the array with an _N
appended, for example ARG_N
.
Parse flags: | |
1 | divide the string at unquoted spaces |
2 | divide the string at unquoted commas |
4 | slashes kludge: treat /A/B like /A /B |
8 | quotes kludge: treat /A"foo" like /A:"foo" |
16 | equals kludge: break at the first unquoted equals sign |
32 | one-arg kludge: allow unquoted spaces in arg not beginning with / |
64 | don’t swallow double quotes |
128 | force all arguments to uppercase |
256 | don’t trim spaces from the end of args |
512 | disable special handling of double quotes |
You should specify at least one of 1, 2, or 16; specifying more than one is allowed. If you don’t specify any, then 1 is assumed. Note that if you include a value of 2 (break at commas), then empty arguments are possible.
A value 4 causes causes a slash to terminate an argument beginning with a
slash followed by a letter. It treats an argument like /A/B
as
two separate arguments.
A value of 8 checks for arguments beginning with a slash followed by a
single letter and then a double quote. If this kind of construction is found,
the missing colon is supplied, changing /A"foo"
into /A:foo
.
If you only expect one argument which does not begin with a slash, and if that argument will always be the last one in the string, you can add 32 to flags. This allows the (only) argument to contain spaces without the necessity of double quotes.
A value of 16 is useful for commands that, like SET
or ASSOC
,
expect a name=
value
pair. This mode has a number of peculiar quirks. It splits arguments at the
first unquoted equals sign in an argument which does not begin with a slash.
Spaces around the equals sign are dropped. Spaces in the argument after
the equals sign, the value part, are retained even
if they are not quoted; the name=
value
pair is expected to be the last item on the command line. The equals sign is retained
as the first character in the value argument; this
allows you to distinguish a name=
construction
(to clear or reset the value for name, perhaps) from
a name alone (to report the value for
name without changing it.)
Normal behavior is to remove double quotes from the string. Typically the double quotes are not part of the filename, value, etc. per se, but a syntactic mechanism for escaping spaces; once the string has been parsed there is no further need for them. If you want to retain double quotes, add 64 to the value of flags.
• Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.
PASSWORD
— Generate random strings
suitable for use as passwords.
Syntax:
PASSWORD
/A:
min,
max /C:
n /D:
min,
max /E:
min,
max /F /L:
min,
max /N:
n /P:
min,
max /S:
min,
max /Y
/A: min, max | the number of alphabetic characters to use |
/C: n | specify the case of the alphabetic characters: |
0: random | |
1: lowercase | |
2: uppercase | |
3: word case | |
5: alternating | |
6: leet (vowels lower, consonants upper) | |
7: unleet (reverse of the above) | |
/D: min, max | the number of digits to use |
/E: min, max | the number of extended characters to use |
/F | make the first character a letter if possible |
/L: min, max | the total length of the password, in characters |
/N: n | the number of strings to generate |
/P: min, max | the number of punctuation characters to use |
/S: min, max | the number of syllables to use |
/Y | also copy the password to the clipboard |
This command displays proposed passwords to standard output. Output can be redirected.
The default behavior is to generate a password from 7 to 10 characters long.
You can specify the desired length with
/L:
min,
max.
The allowed range is 4 to 1024 characters. If you specify only one value after
the /L:
it will be used as both the minimum and the maximum. (All
the other options which accept a min,
max
range behave the same way.)
/A:
min,
max
sets the number of alphabetic characters to include. ‘Alphabetic characters’
are the unaccented Latin letters, A to Z. The values must be from 0 to 512.
The legal range is from 0 to 512 alpha characters.
/D:
min,
max
specifies the number of digits to include; digits are of course 0 to 9. The
legal range is from 0 to 128 digits.
Punctuation is by default limited to standard ASCII punction marks with no
special meaning to TCC: !@#$*()-_=+;:,./?{}~
You can specify a
custom set of punctuation characters by setting an environment variable named
PUNCTUATION_CHARACTERS
. You may include from 0 to 64 punctuation
characters.
‘Extended characters’ are the Unicode code points from U+00C0
through U+00FF: accented Latin letters, thorn, eth, easc, eszett, and a few
other hard-to-type glyphs. These characters are not included unless you
specify a nonzero value using /E:
. You can include up to 64
extended characters.
‘Syllables’ are series of four letters, alternating consonant and vowel sounds. They are intended to be somewhat pronounceable, and perhaps more memorable than an entirely random letter salad. Syllables are not guaranteed to be real words; nor are they not guaranteed not to be real words. You may include up to 64 syllables.
The /C:
n case option, if specified,
is only applied to the regular Latin letters A — Z. It does not
affect extended characters. If you specify /C:3
(word case), then
the first letter in a run of consecutive letters will be capitalized and the
remainder will be in lowercase. These runs are not likely to correspond to
actual words. The /C:5
option will give roughly equal numbers of
uppercase and lowercase letters.
rem Generate a 10-character random password, and
rem stash it on the clipboard:
password /l:10 /y
This command also saves its parameters for future calls to the
_PASSWORD
variable.
RECASE
— Change the case of
text.
Syntax:
RECASE
/A:
attribs /C /CP:
n /H /L /P /S /U
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/C | capitalize the first letter of each word |
/CP: n | interpret non-Unicode input text using code page n |
/H | display filenames |
/L | make text lowercase |
/P | page output |
/S | search in subdirectories for matching files |
/U | make text uppercase |
If standard input (stdin) is redirected, RECASE
will read
from stdin before any filenames specified on the
command line. If no filenames are specified, then
RECASE
will read from stdin whether it is redirected or not.
Filenames may include wildcards and directory aliases. You can search into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to read
from the clipboard.
If you want to pipe to RECASE
, remember that pipes open a
new shell. To pipe to a plugin command, you must either ensure that the plugin
is loaded in the transient shell, e.g. by installing the
.DLL in the shell’s .DLL
directory; or else use temporary files or an in-process pipe.
REPLACETEXT
— Replace strings
in text from a file.
Syntax:
REPLACETEXT
/A:
attribs /C /CP:
n /H /N /P /R:
from:
to /S /W /X:
from:
to
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/C | replace character escapes (affects following /R: and /X: ) |
/CP: n | interpret non-Unicode input text using code page n |
/H | display filenames |
/N | disable features |
/P | page output |
/R: from: to | specify old and replacement text |
/S | search in subdirectories for matching files |
/W | whole words only (affects following /R: and /X: ) |
/X: from: to | specify old and replacement text (do not auto-capitalize) |
… | Range options are also supported. |
If standard input (stdin) is redirected, REPLACETEXT
will read
from stdin before any filenames specified on the
command line. If no filenames are specified, then
REPLACETEXT
will read from stdin whether it is redirected or not.
Filenames may include wildcards and directory aliases. You can search into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to read
from the clipboard.
If you want to pipe to REPLACETEXT
, remember that pipes open a
new shell. To pipe to a plugin command, you must either ensure that the plugin
is loaded in the transient shell, e.g. by installing the
.DLL in the shell’s .DLL
directory; or else use temporary files or an in-process pipe.
Use /R:
or /X:
to specify the strings to search for (from)
and to substitute (to). You must have at least
one of these; you may add as many as you like. The text from each matching file
will be dumped to stdout, with every occurrence of from
replaced with the corresponding to string. If you
give a from string without a matching
to, then matching strings will simply be omitted
from the output. The difference between the two options is that /R:
automatically capitalizes the to string to match
the from text which it replaces, but /X:
does not. The rules for /R:
are simple:
- The first two letters in the matching text are examined.
- If the first letter is lowercase, the to text is used unchanged.
- If the first letter is uppercase but the second is lowercase, only the first letter in to is forced to uppercase.
- If both the first two letters are uppercase, all of the to string is forced to uppercase.
/W
only affects those /R:
and /X:
options which follow it on the command line. /W
prevents matching text which immediately follows or immediately precedes a
letter or digit.
/C
only affects those /R:
and /X:
options which follow it on the command line. /C
expands character escapes of the form \
nnn
(decimal) or \X
xx (hexadecimal) in both
the from and to text.
Use this option to embed troublesome characters. For example, you could use
/C /R:\x22:
to strip double-quote marks from a file.
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
replacetext "Engine Summer.txt" /w /r:winter:autumn /r:but:yet
ROT13
— Encode or decode text
with ROT13.
Syntax:
ROT13
/A:
attribs /CP:
n /H /N /P /S
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/CP: n | interpret non-Unicode input text using code page n |
/H | display filenames |
/P | page output |
/N | disable features |
/S | search in subdirectories for matching files |
… | Range options are also supported. |
If standard input (stdin) is redirected, ROT13
will read from
stdin before any filenames specified on the
command line. If no filenames are specified, then
ROT13
will read from stdin whether it is redirected or not.
Filenames may include wildcards and directory aliases. You can search into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to read
from the clipboard.
If you want to pipe to ROT13
, remember that pipes open a new
shell. To pipe to a plugin command, you must either ensure that the plugin is
loaded in the transient shell, e.g. by installing the
.DLL in the shell’s .DLL
directory; or else use temporary files or an in-process pipe.
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
See also: the @ROT13
function, which transforms a string using ROT13.
SAVEARRAY
— Save data from
an array variable to a file.
Syntax:
SAVEARRAY
/O /P /Q /X:
m,
n /Y:
m,
n /Z:
m,
n /W:
m,
n
arrayname
filename
/O | the command may overwrite an existing file |
/P | save a partial array as if it were the whole thing; only useful with /X: /Y: /Z: /W: |
/Q | quietly |
/X: m, n | save only X index m through n |
/Y: m, n | save only Y index m through n |
/Z: m, n | save only Z index m through n |
/W: m, n | save only W index m through n |
arrayname | an array variable name |
filename | the file to create |
The arrayname should begin with a letter. It should contain only letters, digits, underscores, and dollar signs; it should not be more than 31 characters long.
All non-empty elements in the array will be saved. You can restore the data
later with LOADARRAY
.
The default behavior is to save the entire array. You can restrict the
elements saved using the /X:
, /Y:
, /Z:
,
and /W:
options. /X:
restricts the first dimension
of the array, /Y:
affects the second, /Z:
the third,
and /Z:
the fourth.
• Note: The maximum size for any element in the array is 8,191 characters. Longer elements may cause issues!
• Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.
See also: the LOADARRAY
command.
SHUFFLE
— Dump randomized
lines from a text file.
Syntax:
SHUFFLE
/A:
attribs /B /CP:
n /H /J /L /M:
n /N /P /S
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/B | discard blank lines |
/CP: n | interpret non-Unicode input text using code page n |
/H | display the filename before each file |
/J | show line numbers (original) |
/L | show line numbers (new) |
/M: n | maximum number of lines to show |
/N | disable features |
/P | page output |
/S | search in subdirectories for matching files |
… | Range options are also supported. |
SHUFFLE
randomly reorders lines from the specified file.
It can read from disk files or from a pipe. If you want to pipe to
SHUFFLE
, remember that pipes open a new shell. To pipe to a plugin
command, you must either ensure that the plugin is loaded in the transient
shell, e.g. by installing the .DLL file
in the shell’s PlugIns directory; or else
use temporary files or an in-process pipe.
If standard input (stdin) is redirected, SHUFFLE
will read from
stdin before any filenames specified on the
command line. If no filenames are specified, then
SHUFFLE
will read from stdin whether it is redirected or not.
Filenames may include wildcards and directory aliases. You can search into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to
read lines from the clipboard.
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
shuffle /b "engine summer.txt"
TEXT2CLIP
— Copy text
from a file onto the clipboard.
Syntax:
TEXT2CLIP
/A /CP:
n /Q /T
filename
/A | append to any text already on the clipboard |
/CP: n | interpret non-Unicode input text using code page n |
/Q | replace ASCII quotes and apostrophes with Unicode open and close quotes |
/T | quietly |
Only one filename is allowed. Text may be
piped or redirected into TEXT2CLIP
.
See also: The CLIP2TEXT
command.
TEXTUTILSHELP
— Open
the TextUtils plugin help file.
Syntax:
TEXTUTILSHELP
/C /F /S /S:
text /V
topic
/C | select the ‘Contents’ tab |
/F | select the ‘Favorites’ tab |
/S | select the ‘Search’ tab |
/S: text | select the ‘Search’ tab and search for text |
/V | show detailed plugin version info |
topic | the page to display |
The TEXTUTILSHELP
command will locate and open this
plugin’s help file. In most cases, the internal HELP
command, and the F1 and Ctrl-F1 keys, will be
more convenient. The main advantage to this command is that it can be used to
open the help file to any desired topic, not only to the names of commands,
functions, and variables.
Note that any /C
/F
or /S
must
precede any topic on the command line. (This
command has a very simple-minded parser.)
UNICODIFY
— Convert text
files to Unicode.
Syntax:
UNICODIFY
/A:
attribs /CP:
n /L /N /O /P /Q /S /T /UTF8 /UTF16
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/CP: n | interpret non-Unicode input text using code page n |
/L | normalize line endings to CR/LF |
/N | disable features |
/O | overwrite read-only files |
/Q | replace ASCII quotes and apostrophes with Unicode open and close quotes |
/S | search in subdirectories for matching files |
/T | quietly |
/UTF8 | rewrite files using UTF-8 encoding |
/UTF16 | rewrite files using UTF-16 encoding (default) |
… | Range options are also supported. |
UNICODIFY
rewrites the contents of text files, changing them to
UTF-16 or UTF-8 encoding. By default, it will skip:
- files which already appear to be in the desired encoding
- files with the read-only attribute set (use
/O
to disable) - empty files
The original contents of the file will be saved in a new file with the extension .original.
• Note: This command only converts files. Standard input, internet URLs, and the clipboard are not supported. (You can use wildcards, directory aliases, @file lists, and so on.)
OEM characters will be interpreted according to the
current
Windows code page by default; use the /CP:
n
option to specify a different code page. To check the translation before you
actually convert the file, try
UTYPE
with the /CP:
n
option first.
/NB | do not write a Byte Order Mark |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
UPEND
— Display lines from a
file in reverse order.
Syntax:
UPEND
/A:
attribs /B /C /CP:
n /E /H /L:
string /N /P /R:
string /S /T /V /W:
n
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/B | discard blank lines |
/C | replace control characters with ^ sequences |
/CP: n | interpret non-Unicode input text using code page n |
/E | expand variables in the /L: and /R: strings |
/H | display the filename before each file |
/L: string | insert string to the left of each line |
/N | disable features |
/P | page output |
/R: string | insert string to the right of each line |
/S | search in subdirectories for matching files |
/T | trim leading and trailing whitespace |
/V | also reverse each line in the file |
/W: n | truncate lines to n characters |
… | Range options are also supported. |
UPEND
is a low-budget substitute for the Unix tac
command. It can read from disk files or from a pipe. If you want to pipe to
UPEND
, remember that pipes open a new shell. To pipe to a plugin
command, you must either ensure that the plugin is loaded in the transient
shell, e.g. by installing the .DLL file
in the shell’s PlugIns directory; or else
use temporary files or an in-process pipe.
If standard input (stdin) is redirected, UPEND
will read from
stdin before any filenames specified on the
command line. If no filenames are specified, then
UPEND
will read from stdin whether it is redirected or not.
Filenames may include wildcards and directory aliases. You can search into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to
read lines from the clipboard.
If /L:
is specified, the given string
will be inserted to the left of each line; /R:
inserts a
string to the right. If /E
is also
specified, variable expansion will be performed on each string.
Along with TCC’s usual complement of internal variables, functions, and
so on, UPEND
will set an environment variable _LINE
.
_LINE
will contain the value 0 for the first line listed (i.e.
the last line in the file), 1 for the second line listed, and so on. You can
massage this value with functions like @INC
, @EVAL
,
@FORMAT
, and so on. To prevent the variables from being expanded
before UPEND
executes, you must either enclose the
string in backquotes or double the percent signs.
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
upend D:\download\pg11.txt /l:"%%@format[4,%%_line] " /e
UTYPE
— Dump text files to standard output.
Syntax:
UTYPE
/A:
attribs /B /C /CP:
n /D /E /F:
string /H /HW:
n /K:
n /L:
format /N /P /Q /S /T /U:
string /X /Z:
n
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/B | discard BEL characters (control-G, ASCII 7) |
/C | replace control characters with ^ sequences |
/CP: n | interpret non-Unicode input text using code page n |
/D | discard blank lines at the start of the file |
/E | discard all empty lines |
/F: string | show only lines following this string; /FF: inclusive |
/H | display the filename before each file |
/HH | display the filename, file size, and encoding before each file |
/HW: n | hex dump width, in bytes; only useful with /X |
/K: n | expand tabs to n columns |
/L: format | insert line numbers on the left |
/N | disable features |
/P | page output |
/Q | replace ASCII quotes and apostrophes with Unicode open and close quotes |
/S | search in subdirectories for matching files |
/T | trim leading and trailing whitespace |
/U: string | show only lines until (before) this string; /UU: inclusive |
/X | dump file in hexadecimal |
/Z: | handling of NUL characters in text: |
/Z:N — treat like end-of-line (default) | |
/Z:I — treat as invalid character | |
/Z:S — skip over (ignore) any NUL characters | |
… | Range options are also supported. |
UTYPE
displays files to standard output, much like the internal
TYPE
command. The primary advantage of UTYPE
is that
it recognizes and handles UTF-8 text files; you can think of it as a ‘UTF-8
TYPE’.
If you want to pipe to UTYPE
, remember that pipes open a new
shell. To pipe to a plugin command, you must either ensure that the plugin is
loaded in the transient shell, e.g. by installing the
.DLL file in the shell’s
PlugIns directory; or else use temporary files or
an in-process pipe.
If standard input (stdin) is redirected, UTYPE
will read from
stdin before any filenames specified on the
command line. If no filenames are specified, then
UTYPE
will read from stdin whether it is redirected or not.
Filenames may include wildcards and directory aliases. You can search into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to
display the contents of the clipboard.
If you include /L
on the command line,
UTYPE
will insert line numbers on the left, starting at 1, as
TYPE
does. If you include the optional format
string, UTYPE
will perform variable expansion on the string before
displaying it; use the variable _LINE
to get the current
(zero-based) line number. For example, /L:"%%@FORMAT[03,%%_LINE] "
will show the line number zero-padded to at least three digits.
/F:
and /U:
can be used to
chop off a simple header or footer. /F:
discards all lines up to
and including the first line which contains the specified
string (case-insensitive); /U:
discards all lines including and after a line which contains the specified
string (again, case-insensitive). For example,
most Project Gutenberg
ebooks include a header which ends in a line beginning with
“*** START” and a footer beginning with “*** END”. You
can strip them off like this:
utype "https://www.gutenberg.org/cache/epub/11/pg11.txt" /f:"*** start" /u:"*** end" /d | list
If you double the option letter — /FF:
or
/UU:
— the matching line will be included in
UTYPE
’s output, not discarded.
/E
discards all blank lines;
/D
discards only those at the start of a file. If you specify
both, /D
wins. If you combine /D
with
/F:
string, UTYPE
will
discard any blank lines following the header. A line containing only spaces or
tabs is considered blank.
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NH | disable the handbrake |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
The handbrake:
When scrolling a long file to the console and /P
was not
specified, UTYPE
watches for the Ctrl and Esc
keys. Hold down the Ctrl key to slow the scrolling; press
Esc to pause the file as if /P
had been specified.
This feature will be disabled automatically if you specify /P
or
if output is redirected; you can also disable it with /NH
.
Quotes replacement:
/Q
causes UTYPE
to replace generic ASCII apostrophes
and quote marks ( ' and " ) with
Unicode open and close quote marks ( ‘ ’ and
“ ” ). The new
quote marks may or may not look different from the originals, depending on how
they are displayed and the font used. If the output is displayed in a
non-Unicode font, the curly quotes will be lost or mangled. You can set some
environment variables to control this feature.
utype "Engine Summer.txt"
WORDS
— Count words, sentences,
and paragraphs in English text.
Syntax:
WORDS
/A:
attribs /C /CP:
n /D /F:
fmt /K /M:
n /N /S /U:
mode /X
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/C | code mode; words may contain underscores and dollar signs |
/CP: n | interpret non-Unicode input text using code page n |
/D | dumps lists of unique words, sorted by frequency |
/F: fmt | specifies the format for input text; fmt is one of: |
0 — best guess (default) | |
1 — unformatted (line breaks are used only to end paragraphs) | |
2 — prewrapped (line breaks are used to wrap text) | |
/K | keeps hyphens when reassembling split words |
/M: n | minimum number of letters in a word |
/N | by itself: no words containing digits |
/N | with suboptions: disable features |
/S | search in subdirectories for matching files |
/U: mode | controls the counting of unique words; mode is one of: |
0 — do not count unique words (faster for large files) | |
1 — count unique words for each file individually (the default) | |
2 — count unique words for all files together (slower) | |
3 — separate counts for each file and for all files together (double oink!) | |
/X | no words beginning with a digit |
… | Range options are also supported. |
WORDS
counts words, sentences, and paragraphs in English text.
It can read text from standard input, or from one or more files specified on
the command line. A report is written to standard output; this report can be
piped or redirected. The results of the last file processed are also saved
internally, and can be acessed through internal
variables.
Note: This command was designed
specifically for use with English text. I make many Anglocentric assumptions
about what constitutes a ‘word’, a ‘sentence’, a
‘paragraph’, ‘forms’ of a word, and so on. These
assumptions are probably not useful for any other language. WORDS
may give strange or undesired results when used on source code, program output,
HTML, or whatnot.
If standard input (stdin) is redirected, WORDS
will read from
stdin before any filenames specified on the
command line. If no filenames are specified, then
WORDS
will read from stdin whether it is redirected or not.
Filenames may include wildcards and directory aliases. You can search into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to count
words on the clipboard.
This command’s definition of a ‘word’
is complex and subject to ongoing tweaking. In general, though, a word may
contain only letters, digits (unless /N
is specified), periods,
apostrophes, and hyphens; at least one character must be a letter. For
instance, 20th, 1920s, 1969's, and post-1941
are all considered words, but 1984 is not. The first character must
be alphanumeric or (very rarely) an apostrophe.
If /C
is specified, words may also
contain underscores and dollar signs, but must not begin with a digit or dollar
sign. /C
also suppresses the count of sentences and paragraphs in
the final report.
Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this command will find only three ‘unique’ words.
A word is counted as ‘proper’ only if it never occurs in an all-lowercase form; no proper nouns will be found in Polish polish. Acronyms like NATO will be counted as ‘proper nouns’; so will ordinary words capitalized at the start of a sentence. The latter are often common words like articles and prepositions, which tend to be weeded out in longer files as they recur midsentence.
Note that a hyphenate is always counted as a single word. Without a dictionary, the command has no way of knowing whether it is composed of actual words (red-eye, half-baked) or not (pre-K, Wi-Fi).
WORDS
also gives counts of sentences, paragraphs, lines,
characters, and bytes. All counts should be viewed as estimates rather than
gospel truth. The sentences count in particular must be taken with a healthy
dose of salt; the command has no good way to determine whether a period ends an
abbreviation, a sentence, or both.
A line, or a series of lines, which contains one or more sentences is counted as a ‘paragraph’. A line or series of lines which contains one or more words, but no recognized sentences, is instead counted as a ‘title’. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….
The number of lines reported may differ from the number of carriage returns or line feeds in the text, e.g. if the last line in the file is not terminated. A line containing only whitespace characters (spaces and tabs) is considered blank. The character and byte counts do not include any Unicode byte-order mark at the beginning of the file.
Split words:
If a hyphenated word is split across a line break, WORDS
will
reassemble it and treat it as a single word. By default, the hyphen is
dropped — the command has no way of knowing whether a hyphenated
compound word was broken at a hyphen, or whether a normal word was divided
between syllables and a hyphen added. The latter seems more common, and I
wanted to avoid cluttering the vocabulary list with differently-hyphenated
versions of the same word. If /K
is specified, the command will
instead retain hyphens when reassembling words broken at the end of a line.
This option may cause a larger number of ‘unique’ words to be
reported.
Vocabularies: In
order to count unique words and ‘proper nouns’, WORDS
must build a list of all words found. Building this list can slow down the
process and use a good deal of memory if the text file involved is large.
/U:
mode controls the vocabulary lists.
/U:0
disables vocabularies; the command executes faster, but there
will be no counts of unique and proper words. /U:1
causes
WORDS
to build a vocabulary list for each file it processes; this
is the default behavior. /U:2
builds a combined vocabulary for
all files that WORDS
processes; this is slower than the default.
Finally, /U:3
builds a vocabulary for each file that WORDS
reads, and at the same time builds a master vocabulary for all files together;
this is much slower than the default behavior, and devours memory
shamelessly.
If you are processing extremely large text files, or files which are not
English prose — e.g. output from a program or command —
I strongly recommend using /U:0
to disable vocabulary lists.
Dump: If /D
is specified, the vocabulary for each file will be dumped to stdout. If
/D
is combined with /U:2
, you’ll instead get a
combined vocabulary for all files. The list is sorted by frequency, with more
common words appearing first. Note that words may be shown in a different case
than they appear in the input text. This is because the command stores all words
in lowercase internally for speed (lowercase letters are more streamlined).
Text format: Text files use line-break characters
in different ways. In some files, line break characters are used only to mark
where a line end should occur: the end of a paragraph. In other files,
line breaks are used to wrap text to some desired width. You can use
/F:
n to tell CONTEXT
how to handle line breaks. /F:1
indicates that the text is unformatted,
with line breaks only at the ends of paragraphs. CONTEXT
will
honor all line breaks, and add an extra blank line after each paragraph.
/F:2
means that the input text is prewrapped, having line
breaks within paragraphs and even within sentences. CONTEXT
will
skip single line breaks, honoring only sequences of two or more in a row. /F:3
is also for unformatted text and acts like /F:1
, but does not insert a
blank line after each paragraph. If you specify /F:0
or do not
specify any /F:
n, CONTEXT
will attempt to guess how the input text is formatted. (Guessing is not reliable
when there isn’t much input text.)
Text encoding: WORDS
automatically
detects Unicode text files. If the file is not Unicode, the command has no way
of detecting the character encoding; the default Windows code page is assumed.
You can specify a different code page for non-Unicode text files with
/CP:
n. Most single-byte (i.e.,
alphabetic) code pages are supported, but multibyte
code pages (Chinese, Japanese, Korean) are not. This option only affects
non-Unicode files.
Disabling features:
/N
with suboptions disables features:
/NB | do not write a Byte Order Mark |
/NC | disable highlight |
/ND | do not search into hidden directories; only useful with /S |
/NF | suppress the file-not-found error |
/NJ | do not search into junctions; only useful with /S |
/NZ | do not search into system directories; only useful with /S |
You can combine these, e.g. /NDJ
.
C:\> type EBS.txt
This is a test. For the next sixty seconds, this station will conduct a test
of the Emergency Broadcast System. This is only a test.
C:\> words /d EBS.txt
File "C:\EBS.txt" :
25 words total, 17 unique, 4 proper. 25 runs of non-blanks.
3 sentences total: 3. 0! 0? Average sentence 8.3 words.
1 paragraph, 0 titles. Average paragraph 3.0 sentences.
2 lines total, 2 not blank; the longest had 77 characters.
137 characters in 137 bytes (OEM, prewrapped).
3: a test this
2: is the
1: Broadcast conduct Emergency For next of only seconds sixty station System will
C:\>
The results from the last file processed are saved, and can be accessed using these internal variables:
_WORDS | _UNIQUEWORDS | _PROPERNOUNS | _WC |
_SENTENCES | _SENTENCESD | _SENTENCESE | _SENTENCESQ |
_SENTENCEWORDS | _PARAGRAPHS | _TITLES | |
_LINES | _NONBLANKLINES | _LONGESTLINE | _CHARACTERS |
The cumulative results from all files processed by the last invocation of
WORDS
can be accessed through these variables:
WRAP
—Word-wrap English text to
fit a specified number of columns.
Syntax:
WRAP
/A:
attribs /C: /CP:
n /D /F:
fmt /G:
n,
m /H /J /N:
n /N /P /Q /R /S /T:
n /W:
width /Z:
char
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/C: n | condense repeated spaces in input text |
/CP: n | interpret non-Unicode input text using code page n |
/D | disable special handling of soft hyphens (character 173 / 0xAD) |
/F: fmt | specifies the format for input text; fmt is one of: |
0 — best guess (default) | |
1 — unformatted (line breaks are used only to end paragraphs) | |
2 — prewrapped (line breaks are used to wrap text) | |
3 — unformatted, with blank lines between paragraphs | |
/G: n, m | indent all paragraphs n spaces; if m is specified, it’s the indent for the second and later lines |
/H | display filenames |
/J | justify right margins |
/N: n | minimum characters left on each line to split at a hyphen; 0 disables breaking at hyphens |
/P | page output |
/N | disable features |
/Q | replace ASCII quotes and apostrophes with Unicode open and close quotes |
/R | remove hyphens from line ends |
/S | search in subdirectories for matching filenames |
/T: n | tab stops every n spaces |
/W: width | desired width of output text |
/Z: char | define a forced line-break character |
… | Range options are also supported. |
The WRAP
command word-wraps English text to fit a specified
width. It can be used as a filter reading from standard input, or it can read
from one or more files specified on the command line. The resulting text is
written to standard output; it can be piped or redirected.
If you want to pipe to WRAP
, remember that pipes open a new
shell. To pipe to a plugin command, you must either ensure that the plugin is
loaded in the transient shell, e.g. by installing the
.DLL file in the shell’s
PlugIns directory; or else use temporary files or
an in-process pipe.
You may specify more than one filename;
wildcards and directory aliases are supported. You can search recursively into
subdirectories for matching files with /S
. @File lists and
internet files are supported. You may also specify CLIP:
to wrap
text from the clipboard.
‘Width’ here refers to a specified number of character positions, or columns. All characters are assumed to have the same width. The word-wrapped output should have neat, reasonably uniform line lengths when viewed or printed in a fixed-pitch font such as Courier, or displayed in a console window. Note that the specifed width includes the final newline character; if you specify a width of 80, then up to 79 printable characters may appear on a line.
Note: This command is designed specifically for use with English prose. It may give weird or undesired results when used on source code, program output, HTML, or whatnot. It makes Anglocentric assumptions that may not be appropriate to other languages.
If standard input (stdin) is redirected, WRAP
will read from
stdin before any filenames specified on the command
line. If no filenames are specified, then
WRAP
will read from stdin whether it is redirected or not. If
/H
is used, each file’s name will be printed before it
is processed. (For standard input, <stdin>
will be
shown.)
Output width:
/W:
width sets the desired
width in characters for the output text. Width
may be from 40 to 512. If no /W:
width
is specified, the default is the console width if output is to the console,
or defaults to 100 columns if output is redirected. (You can set an
environment variable COLUMNS to change this default.) If you type just a
/W
without a colon or width,
then the current console width is assumed; this is useful if you are
redirecting WRAP
’s output but want it wrapped to the console
width anyway, e.g. for piping to LIST.
Text format: Text files use line-break characters
in different ways. In some files, line break characters are used only to mark
where a line end should occur: the end of a paragraph. In other files,
line breaks are used to wrap text to some desired width. You can use
/F:
n to tell WRAP
how to handle line breaks. /F:1
indicates that the text is unformatted,
with line breaks only at the ends of paragraphs. WRAP
will
honor all line breaks, and add an extra blank line after each paragraph.
/F:2
means that the input text is prewrapped, having line
breaks within paragraphs and even within sentences. WRAP
will
skip single line breaks, honoring only sequences of two or more in a row. /F:3
is also for unformatted text and acts like /F:1
, but does not insert a
blank line after each paragraph; use this option to wrap the output from
DEHTML
. If you specify /F:0
or do
not specify any /F:
n, WRAP
will attempt to guess how the input text is formatted. (Guessing is not reliable
when there isn’t much input text.)
Tab size: The
/T:
n option controls the
expansion of tab characters. By default, tab stops are every four columns
(set an environment variable TABSIZE to change this default). /T:8
would make tabs eight columns wide. /T:0
disables special handling
of tab characters, treating them like any other character; this will probably
bollix word-wrapping and is not recommended. n may
be 0 to 20.
Breaking at hyphens:
WRAP
will usually break lines at spaces. It may also break a line after a
hyphen, if all of the following are true: (1) the character before the hyphen
is a letter, and the following character is either a letter or a digit; (2) at
least three characters, not counting the hyphen, will remain at the
end of the line; and (3) at least three characters will move to the start of
the following line. So, for example, if the phrase true-blue
fell near the end of a line, WRAP
might break the line after the hyphen, since
true and blue have four letters each. The
phrases do-nothing and derring-do would not
be divided, however, since splitting either one would leave a two-letter
do on a line by itself. You can adjust this behavior with
/N:
n, which sets the minimum number
of characters for both lines. If you specify /N:4
then at least
four characters, not counting the hyphen, must remain on each line.
/N:0
prevents WRAP
from breaking lines after hyphens.
Removing hyphens: If
/R
is used, WRAP
may discard a hyphen at the end of a line
if the preceding character was a letter, and if the first character on the
following line is also a letter. Without /R
, WRAP
retains all hyphens from line ends.
Forced indentation:
The /G:
n option forcibly indents
each new paragraph n spaces (not tabs.) Any
indentation in the input text will be lost. n
must be 0 to 20. /G:0
will strip all leading whitespace, leaving
text flush with the left margin. The optional second value, if present,
indents the second and later lines m spaces;
m is also 0 to 20. You might use /G:0,4
to produce a hanging indent. If /G:
is not specified, any
indentation in the input text is preserved.
Condensing spaces:
The /C:
n option allows you to
condense runs of consecutive spaces in the input text. Any sequence of
more than n spaces will be truncated. Only
spaces (character 32) are counted, not other whitespace characters. Spaces
generated by the program itself (e.g. by expanding tabs or
indenting paragraphs) will not be condensed. n
must be 0 to 10; if n is 0, spaces are not
condensed (the default.) This option might be useful for packing output text
just a little more tightly; if the original text file had extra spaces inserted
to justify margins; or if you are one of those unfortunates who suffer a
violent reaction to the sight of two spaces after a period.
Quotes replacement:
/Q
causes WRAP
to replace generic ASCII apostrophes
and quote marks ( ' and " ) with
Unicode open and close quote marks ( ‘ ’ and
“ ” ). The new
quote marks may or may not look different from the originals, depending on how
they are displayed and the font used. If the output is displayed in a
non-Unicode font, the curly quotes will be lost or mangled. You can set some
environment variables to control this feature.
Text encoding:
WRAP
automatically detects Unicode text files. If the file is
not Unicode, the command has no way of detecting the character encoding; the
default Windows code page is assumed. You can specify a different code page
for non-Unicode text files with /CP:
n.
Most single-byte (i.e., Western) code pages are
supported, but multibyte code pages (Chinese,
Japanese, Korean) are not. This option only affects non-Unicode files.
Forced line break:
/Z:
char defines a forced
line-break character. char may be entered as either
a single character, or as a decimal or hexadecimal (prefixed with 0x) character
code. If a matching character is found in the input file or stream,
WRAP
will end the current line and begin a new one.
Disabling features:
/N
with suboptions disables features:
/NB | do not write a Byte Order Mark |
/ND | do not search into hidden directories; only useful with /S |
/NH | do not add a hyphen when breaking a word |
/NJ | do not search into junctions; only useful with /S |
You can combine these, e.g. /NDJ
.
These variables may be set to a numeric value to modify the command’s default behavior:
COLUMNS : | sets the default width when output is redirected and /W is not specified. Legal values are 40 to 512. |
TABSIZE : | sets the default number of columns between tab stops when /T is not specified. Legal values are 1 to 20. |
wrap /w:100 "Fishy Story.txt"
XFILTER
— Process lines of a
file using variable expansion.
Syntax:
XFILTER
/A:
attribs /B /CP:
n /F:"
format"
/H /N /P /S /T
filename…
/A: attribs | attributes mask; valid flags are -ACEHIORS |
/B | discard blank lines |
/CP: n | interpret non-Unicode input text using code page n |
/F:" format" | format string: required; see below |
/H | display filenames |
/N | disable features |
/P | page output |
/S | search in subdirectories for matching files |
/T | trim leading and trailing whitespace |
… | Range options are also supported. |
The required format
string contains TCC variables and functions, which will be expanded for each
line in the file. Double all percent signs to prevent variables from being
expanded before the command is executed. An asterisk in the format
string will be replaced with each line from the file. The current (zero-based)
line number is also available in the variable _LINE
.
XFILTER
can be used as a filter reading from standard input, or
it can read from one or more files specified on the command line. The resulting
text is written to standard output; it can be piped or redirected. If you want
to pipe to XFILTER
, remember that pipes open a new shell. To pipe
to a plugin command, you must either ensure that the plugin is loaded in the
transient shell, e.g. by installing the .DLL
file in the shell’s PlugIns directory; or
else use temporary files or an in-process pipe.
You may specify more than one filename; wildcards
and directory aliases are supported. You can search recursively into subdirectories
for matching files with /S
. @File lists and internet files are
supported. You may also specify CLIP:
to process text from the
clipboard instead of from a file.
To prevent problems caused by troublesome characters in the input text, certain ‘dangerous’ characters from the file will be temporarily replaced with safe alternatives from Unicode’s Halfwidth and Fullwidth Forms block. They will be restored to ASCII after variable expansion. This shuffle prevents issues when characters with special meanings to TCC are inadvertently present in the input text, but it might be confusing if you want to find or replace any of the remapped characters. The characters which are temporarily replaced are:
Character | ASCII | Hex | Remapped to |
" | 34 | 22 | U+FF02 |
% | 37 | 25 | U+FF05 |
( | 40 | 28 | U+FF08 |
) | 41 | 29 | U+FF09 |
, | 44 | 2C | U+FF0C |
[ | 91 | 5B | U+FF3B |
] | 93 | 5D | U+FF3D |
^ | 94 | 5E | U+FF3E |
` | 96 | 60 | U+FF40 |
rem Dump a file in uppercase:
xfilter /f:"%%@upper[*]" "Engine Summer.txt"
rem Display the length of each line:
xfilter /f:"Line %%_line has %%@len[*] characters." "Engine Summer.txt"
New Functions:
@B85TOBIN — Decodes a base-85 string into a binary buffer.
Syntax:
%@B85TOBIN[
handle,
start,
string]
handle | the handle to a binary buffer, as returned by @BALLOC |
start | the offset in bytes to which to begin decoding; defaults to 0 |
string | a base-85 encoded string as returned by @BINTOB85 |
This function decodes a base-85 string returned by
@BINTOB85
and stores the resulting data
in a binary buffer. Note that there is no option to control the number of
bytes written; the entire string is decoded and written to the buffer. If
there is any error in decoding the string, no change will be made to the binary
buffer.
Note that the two commas between parameters are both required. You must supply both commas even if you omit the optional start value.
The return value is the number of bytes written to the buffer.
See also: the BINTOB85
function.
@BETWEEN — Returns the portion of a string between two delimiters.
Syntax:
%@BETWEEN[
delims,
string]
delims | exactly two characters, one start and one end delimiter |
string | the string to parse |
You generally do not need to quote or escape the delims string; the first two characters found are assumed to be the start and end delimiter characters, and the third must be a comma. (Exception: If you want to use a close bracket as a delimiter, escape it.) To use the same character as both start and end delimiter, type it twice.
The function returns the portion of string between the start and end delimiters. If the start delimiter is not found in the string, an empty string is returned. If the start delimiter occurs more than once, the first one found is used. If the start delimiter is found but the end delimiter is not, everything after the start delimiter is returned.
echo %@between[<>,This is <only> a test.]
only
echo %@between["",Let's parse out a "quoted chunk" of text.]
quoted chunk
@BINTOB85 — Encodes the contents of a binary buffer as a base-85 string.
Syntax:
%@BINTOB85[
handle,
start,
length]
handle | the handle to a binary buffer, as returned by @BALLOC |
start | the offset in bytes at which to begin encoding; defaults to 0 |
length | the number of bytes to encode; defaults to 128 or the remainder of the buffer |
This function encodes binary data (from a binary buffer) as a string which
can be easily handled by TCC. You can store this string in an environment
variable, write it to an .INI file, and so on. To restore the original binary
data, use the @B85TOBIN
function.
Four bytes of data are encoded into five characters; encoding a 1024-byte buffer will result in a 1,281-character-long string (counting the terminal null). Keep in mind that encoding long series of bytes will produce even longer strings! If you don’t specify a length, the default is 128 bytes or to the end of the buffer.
This implementation of base-85 differs from others. The set of characters used to encode binary data has been chosen to avoid syntactically troublesome signs like quotes, percent signs, ampersands, carets, and so on. All characters are ASCII, so the string should not be mangled by code page translations.
See also: the B85TOBIN
function.
@CLARIFY
— Returns the
original text mangled by @OBSCURE
.
Syntax:
%@CLARIFY[
obscured-text]
obscured-text | obfuscated text |
The input obscured-text should be a string
returned by the @OBSCURE
function;
anything else is very unlikely to return meaningful text.
You probably should not write the restored value into an environment variable,
an .INI file, or a registry value, or display it to the screen. Just use it
immediately, plugging the @CLARIFY
function directly into the
command which requires the original text. (The ditzy little example below
displays a password to the screen because it’s just a ditzy little
example.)
set inifile="%userprofile\Passwords.ini"
set password=%@iniread[%inifile,Personal,Password]
echo Password: %@clarify[%password]
unset inifile password
See also: the @OBSCURE
function.
@INIVALUE
— Returns a value
from an .INI file.
Syntax:
%@INIVALUE[
filename,
section,
entry,
index,
errorstr,
flags]
filename | the file to examine |
section | the name of the section to search for the entry |
entry | the name associated with the desired value |
index | which entry to return; defaults to 0 (the first); -1 returns the number of matching entries |
errorstr | the string to return on any error; defaults to nothing (the empty string) |
flags | a bitmapped integer controlling advanced features: |
1 — bomb out on file errors | |
2 — treat section as a wildcard to match | |
4 — treat entry as a wildcard to match |
This function is essentially @INIREAD
without
GetPrivateProfileString(). It can handle some things that @INIREAD
can’t, such as UTF-8 .INI files, sectionless
values, multiple values with the same name, and multiple headers for the same
section.
You must specify the full name and extension of the filename.
If you do not include a path, the file is assumed to be in the Windows directory,
not in the current directory! To force this function to look in the current
directory, begin the filename with .\
.
If you do not specify a section, the function will look for a matching entry before the first section header. If section is an asterisk, the function will look for a matching entry throughout the file, ignoring all section headers.
Sometimes an .INI file will contain multiple
lines with the same entry name. For example, TCMD.INI
may have more than one NormalKey
directive. You can loop through
multiple entries with the index argument. An
index of 0
returns the first matching
entry, 1
returns the second, and so on. Set index
to -1
to return the number of matching entries.
The default behavior is to return an empty string on any error: file not
found, access denied, or no matching section or entry. If you specify an
errorstr, then that value will be returned
instead. (This is useful if the .INI file can
contain empty values.) Additionally, you can set flags
to 1, and any error opening the file will result in an error message instead of
returning a string value. You can also check the
_INIVALUERC
internal variable to get
information about the last call to @INIVALUE
.
See also: the _INIVALUERC
variable, which returns an exit code for this function.
@LINEENDS
— Reports the
line-end characters used in a text file.
Syntax:
%@LINEENDS[
filename,
n]
filename | the file to scan |
n | what to report: |
1 : the number of lines ending in CR/LF pairs | |
2 : the number of lines ending in LF/CR pairs | |
3 : the number of lines ending in CR not followed by LF | |
4 : the number of lines ending in LF not followed by CR | |
5 : the number of lines ending in NEL | |
10 : the total number of line-end sequences in the file |
If n is zero or not present, @LINEENDS
returns a string describing the file’s format:
Empty | The file contains no data. |
None | No line-end characters were found. |
CR/LF | The file uses CR/LF line ends. |
LF/CR | The file uses LF/CR line ends. (Who does this?) |
CR | The file uses CR line ends. |
LF | The file uses LF line ends. |
NEL | The file uses NEL line ends. |
Mixed | The file uses more than one line-end sequence. |
ERROR | There was an error reading from the file. |
See also: the @TEXTENCODING
and @TEXTFORMAT
functions.
@METAPHONE
— Returns a
roughly phonetic code for an English word.
Syntax:
%@METAPHONE[
word,
length,
flags]
word | the word or words to process |
length | the maximum length of the codes to return (8) |
flags | set to 1 for better compatibility |
Metaphone codes are meant to roughly approximate the pronunciation of a word. Words that sound similar should have similar Metaphone codes. You can use this function to compare the sounds of words, to suggest similar words, or to group words by pronunciation.
If you pass more than one word, separate them with spaces. The resulting codes will also be separated by spaces.
rem Compare two words:
set word1=cougher
set word2=coffer
if %@metaphone[%word1] == %@metaphone[%word2] echo "%word1" may sound like "%word2".
By default, this function returns Metaphone codes of up to eight characters
long. You can specify a different length with the length
parameter, e.g. %@metaphone[
word,10]
to return ten-letter Metaphone codes. Legal values are 4 to 20.
• Note: Values returned by this function are not guaranteed to match those generated by any other implementation. Documentation of the Metaphone algorithm is invariably unclear and self-contradictory, and never seems to agree with the corresponding code. This is my attempt to implement Lawrence Philips’s original algorithm to the best of my limited understanding, with a few additional tweaks thrown in.
More specifically, comparing against assertFull_v1.1.txt,
dated 2011-11-25, by the Metaphone-standards
project, @METAPHONE
produces different codes for 40 out of
2753 words: about 98.5% agreement. If flags is
set to 1, there are no mismatches — but I still cannot guarantee
perfect agreement with any other implementation.
@MKENTITIES
— Replaces characters
in a string with HTML entities.
Syntax:
%@MKENTITIES[
string]
@MKENTITIES
will replace these characters with HTML entities:
Character: | Replaced with: |
" (double quote) | " |
% (percent sign) | % |
& (ampersand) | & |
< (less-than sign) | < |
> (greater-than sign) | > |
- ASCII control codes (U+0001 through U+001F, U+007F)
- C1 control codes (U+0080 through U+009F) — will be replaced with Windows-1252 characters if possible
- high-order characters (U+10000 and up)
• Note: This function can return ampersands
in its output. You will need to quote it, or use SETDOS /C
to
temporarily change the command separator character.
@OBSCURE
— Mangles a text
string, making it difficult to read.
Syntax:
%@OBSCURE[
text]
text | text to be obfuscated |
The input text should be reasonably short,
preferably not more than a kilobyte or two. The resulting, mangled string will
be longer than the original string, usually by about one-third. The same input
text can return different obfuscated text; you
cannot meaningfully compare the output from two calls to @OBSCURE
.
Do not edit or alter the returned text in any way.
If the input text comes from an environment
variable, it’s probably a good idea to remove or overwrite that variable
as soon as possible after calling @OBSCURE
. One way to do this
would be to simply store the returned string back in the original variable.
set inifile="%userprofile\Passwords.ini"
input /p Enter password: %%password
set password=%@obscure[%password]
set rv=%@iniwrite[%inifile,Personal,Password,%password]
unset inifile password
• Note: This function does not provide
secure cryptography! It was designed for ease of use, not for real
security. Using @OBSCURE
to muddle text will discourage casual
snooping, but a sophisticated user can recover the original data easily by
passing the obscured text to @CLARIFY
.
(A determined attacker could also reverse-engineer the algorithm, although that
would be a pointless waste of time when the plugin itself is readily
available.)
See also: the @CLARIFY
function.
@OINK
— Translates text to
Pig Latin.
Syntax:
%@OINK[
text]
echo %@oink[This is only a test.]
See also: the OINK
command,
which Pig Latinizes text files.
@ROT13
— Transforms a string
using ROT13.
Syntax:
%@ROT13[
text]
echo %@rot13[This is only a test.]
See also: the ROT13
command,
which encodes or decodes text files.
@ROUGHLYSIMILAR
— Compares words
in two text strings.
Syntax:
%@ROUGHLYSIMILAR[
string1,
string2]
string1 | the first string to compare |
string2 | the second string to compare |
Both strings are simplified before comparing them:
- Any leading or trailing spaces are removed.
- Both strings are forced to uppercase.
- Any Latin-1 accents are stripped, as per
@STRIPACCENTS
. - Any
-
and_
are converted to spaces; all other punctuation is removed. - All whitespace characters are converted to ASCII spaces.
- Any repeated spaces are collapsed into one.
After both strings have been simplified, they are compared. %@ROUGHLYSIMILAR
returns
1
if the two strings match, 0
if they differ.
echo %@roughlysimilar[THIS IS A TEST!,This-is-a-test.]
@STRIPACCENTS
— Removes
accents from letters.
Syntax:
%@STRIPACCENTS[
text]
Only characters in the range U+00C0 through U+00FF, plus U+0152 and U+0153, will be replaced. (This function only recognizes a few accented characters, so it’s fast.)
echo %@stripaccents[Déjà vu]
@TEXTENCODING
— Returns
a guess at the character encoding of a text file.
Syntax:
%@TEXTENCODING[
filename,
flags]
filename | the file to examine |
flags | set to 1 to also report presence of a BOM |
If file begins with a Unicode Byte Order Mark, then it is assumed to be Unicode; the encoding is inferred from the BOM. If the file does not begin with a BOM, the function can only guess at the encoding; the longer the file, the more likely the guess is to be accurate.
Possible return values include:
Empty | There is no data in the file. |
OEM | The file is probably not Unicode. |
UTF-16LE | The file is probably 16-bit Unicode. |
UTF-16BE | The file is probably 16-bit Unicode (big-endian). |
UTF-8 | The file is probably UTF-8 encoded Unicode. |
UTF-32LE | The file looks like UTF-32 (little-endian). |
UTF-32BE | The file looks like UTF-32 (big-endian). |
EBCDIC | The file is probably in some version of EBCDIC. |
If flags is 1, and if the file is Unicode and
begins with a Byte Order Mark, the phrase with BOM
will be
appended.
set filename=myfile.txt
echo File %filename is %@textencoding[%filename].
See also: the @LINEENDS
and @TEXTFORMAT
functions.
@TEXTFORMAT
— Returns
a guess at the formatting of a text file.
Syntax:
%@TEXTFORMAT[
filename]
filename | the file to examine |
Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to limit text to a desired width. This function attempts to determine how the specified text file is formatted.
Possible return values include:
Empty | There is no text in the file. |
Unformatted | Line breaks are used to end paragraphs. |
Prewrapped | Line breaks are used to limit line width. |
set filename=myfile.txt
set format=%@textformat[%filename]
if %format == Unformatted echo File %filename is not word-wrapped.
See also: the @LINEENDS
and @TEXTENCODING
functions.
@UCHAR
— Returns Unicode
characters with the specified values.
Syntax:
%@UCHAR[
value value…]
This function behaves like @CHAR
, except that the input values
are assumed to be hexadecimal. You may prefix values with 0x
or
U+
but neither is required. With or without either prefix, each
value will be parsed as hexadecimal.
echo %@uchar[16a6 16d6 16eb 16bb 16a9 16d2 16d2 16c1 16cf]
See also: the @UCODE
and
@UCODEX
functions.
@UCODE
— Returns the
hexadecimal values of characters in a string.
Syntax:
%@UCODE[
string]
This function behaves like @UNICODE
, except that it returns
values as hexadecimal (without any prefix). A few characters, including the
backquote and the close square bracket, will need to be escaped.
echo %@ucode[This is a test.]
See also: the @UCHAR
and
@UCODEX
functions.
@UCODEX
— Returns the
hexadecimal values of characters in a string.
Syntax:
%@UCODEX[
string]
This function behaves like @UNICODE
, except that it returns
values as hexadecimal with leading 0x
. A few characters,
including the backquote and the close square bracket, will need to be escaped.
echo %@ucode[This is a test.]
See also: the @UCHAR
and
@UCODE
functions.
@ULEN
— Returns the number of
Unicode characters in a string.
Syntax:
%@ULEN[
string]
This functions is almost the same as @LEN
, except that it
counts properly-paired surrogates as a single character.
echo %@ulen[😺]
echo %@ulen[%@char[0xd83d 0xde00]]
Surrogates which are not properly paired will be counted as separate ‘characters’.
@UQUOTES
— Replaces ASCII
apostrophes and quote marks with Unicode open and close quotes.
Syntax:
%@UQUOTES[
text]
text | English text containing apostrophes or quotation marks |
Generic ASCII apostrophes ( ' ) and quote marks ( " ) in text will be replaced with Unicode open and close quote marks ( ‘ ’ and “ ” ). Also, any doubled hyphens will be replaced with em dashes.
The modified string may or may not look different from the original, depending on how you use it and the font used to display it. If it is redirected to a file and //UnicodeOutput=No, then the fancy Unicode quotes will be smashed right back into ASCII. (Worse yet, under some versions of Windows the Unicode single open-quote character may be mangled to a grave accent….) If the modified string is ECHOed to the console and the console font doesn’t support the relevant Unicode characters, then again the Unicode quotes may be lost. In Take Command, curly quotes must be supported by both the tab-window font (Options / Configure Take Command / Tabs / Font) and also the underlying console window (detach a tab to check this).
echo %@uquotes["Never use a GUI to do a shell's work!" said Tom commandingly.]
You can set some environment variables to control this feature.
@VOWELS
— Returns the number
of vowels in a string.
Syntax:
%@VOWELS[
string]
string | the text to examine |
Only vowels in the Latin alphabet are counted: A
, E
,
I
, O
, U
, and Y
. Accented
variants in the range U+00C0 through U+00FF (Unicode’s
Latin-1
Supplement) are also recognized.
echo %@vowels[Déjà vu]
New Variables:
_CHARACTERS
— Returns the
number of characters in the last file processed by
WORDS
.
Syntax:
%_CHARACTERS
This count does not include any Unicode byte-order mark at the beginning of
the file. If the WORDS
command has not been
called, or if there was any error reading the last file, this variable returns
the value N/A
.
_CHARACTERSALL
— Returns
the number of characters in all files processed by the last call to
WORDS
.
Syntax:
%_CHARACTERSALL
This count does not include any Unicode byte-order marks at the beginnings of
files. If the WORDS
command has not been
called, this variable returns the value N/A
.
_GETACP
— Returns the current
Windows code page.
Syntax:
%_GETACP
This function returns the current Windows code page. (This value is also
traditionally miscalled the ‘ANSI code page’, although it has
nothing to do with ANSI.) Note that this value can and usually does differ
from the OEM code page returned by %_CODEPAGE
.
echo The current Windows code page is %_getacp.
_INIVALUERC
— Returns an
exit code for the last call to @INIVALUE
.
Syntax:
%_INIVALUERC
This variable returns a code indicating the success or failure of the last
call to the @INIVALUE
function, and the
nature of the error if it failed. Possible return values include:
an empty string if @INIVALUE has not been called | |
Syntax error | any error in arguments |
File error n | any error opening the file; n is a Windows error number |
File empty | the file contains no data |
Found n | a matching entry was found at line n |
Count n | successfully counted matching entries |
No section | no matching section header was found |
No entry n | no matching entry, or fewer than n entries found |
If the correct entry was found, the return value is
Found
n. The n
is the line number, starting from zero and not counting any blank lines.
See also: the @INIVALUE
function.
_LINES
— Returns the number of
lines in the last file processed by WORDS
.
Syntax:
%_LINES
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_LINESALL
— Returns the
number of lines in all files processed by the last call to
WORDS
.
Syntax:
%_LINESALL
If the WORDS
command has not been called,
this variable returns the value N/A
.
_LONGESTLINE
— Returns the
number of characters in the longest line of the last file processed by
WORDS
.
Syntax:
%_LONGESTLINE
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_LONGESTLINEALL
— Returns
the number of characters in the longest line in all files processed by the last
call to WORDS
.
Syntax:
%_LONGESTLINEALL
If the WORDS
command has not been called,
this variable returns the value N/A
.
_NONBLANKLINES
— Returns
the number of non-blank lines in the last file processed by
WORDS
.
Syntax:
%_NONBLANKLINES
A line which contains only whitespace characters such as spaces or tabs is
considered blank. Subtract %_NONBLANKLINES
from
%_LINES
to get the number of blank lines.
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_NONBLANKLINESALL
— Returns
the number of non-blank lines in all files processed by the last call to
WORDS
.
Syntax:
%_NONBLANKLINESALL
A line which contains only whitespace characters such as spaces or tabs is
considered blank. Subtract %_NONBLANKLINESALL
from
%_LINESALL
to get the number of blank
lines.
If the WORDS
command has not been called,
this variable returns the value N/A
.
_PARAGRAPHS
— Returns the
number of paragraphs in the last file processed by
WORDS
.
Syntax:
%_PARAGRAPHS
A ‘paragraph’ is a line or series of lines which contains at
least one sentence. Divide %_SENTENCES
by %_PARAGRAPHS
to get the avarage paragraph length in sentences.
Divide %_SENTENCEWORDS
by
by %_PARAGRAPHS
to get the avarage paragraph length in words.
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_PARAGRAPHSALL
— Returns
the number of paragraphs in all files processed by the last call to
WORDS
.
Syntax:
%_PARAGRAPHSALL
A ‘paragraph’ is a line or series of lines which contains at
least one sentence. Divide %_SENTENCESALL
by %_PARAGRAPHSALL
to get the avarage paragraph length in sentences.
Divide %_SENTENCEWORDSALL
by
by %_PARAGRAPHSALL
to get the avarage paragraph length in words.
If the WORDS
command has not been called,
this variable returns the value N/A
.
_PASSWORD
— Returns a random
string suitable for use as a password.
Syntax:
%_PASSWORD
You can use the PASSWORD
command to
adjust the parameters used to generate the string.
_PROPERNOUNS
— Returns
the number of proper nouns in the last file processed by
WORDS
.
Syntax:
%_PROPERNOUNS
Counting proper nouns requires WORDS
to
build a vocabulary list for each file. If you disable this step with
/U:0
or /U:2
, the list will not be available and this
variable will return the value N/A
.
For the purposes of this plugin, a ‘proper noun’ is any word
which never appears in an all-lowercase form. If the
WORDS
command has not been called, or if
there was any error reading the last file, this variable returns the value
N/A
.
_PROPERNOUNSALL
— Returns
the number of proper nouns in all files processed by the last call to
WORDS
.
Syntax:
%_PROPERNOUNSALL
Counting proper nouns in all files requires WORDS
to build a vocabulary list for all files processed; this list is not built by
default. Unless you enable the omnibus vocabulary list with /U:2
or /U:3
, this variable will return the value N/A
.
For the purposes of this plugin, a ‘proper noun’ is any word
which never appears in an all-lowercase form. If the
WORDS
command has not been called, this
variable returns the value N/A
.
_SENTENCES
— Returns the
total number of sentences in the last file processed by
WORDS
.
Syntax:
%_SENTENCES
A ‘sentence’ is a word or series of words ending with a period,
exclamation mark, or question mark. Divide
%_SENTENCEWORDS
by
%_SENTENCES
to get the average sentence length.
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_SENTENCESALL
— Returns
the total number of sentences in all files processed by the last call to
WORDS
.
Syntax:
%_SENTENCESALL
A ‘sentence’ is a word or series of words ending with a period,
exclamation mark, or question mark. Divide
%_SENTENCEWORDSALL
by
%_SENTENCESALL
to get the average sentence length.
If the WORDS
command has not been called,
this variable returns the value N/A
.
_SENTENCESD
— Returns the
number of declarative sentences in the last file processed by
WORDS
.
Syntax:
%_SENTENCESD
A ‘declarative sentence’ is a word or series of words ending with a period.
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_SENTENCESDALL
— Returns
the number of declarative sentences in all files processed by the last call to
WORDS
.
Syntax:
%_SENTENCESDALL
A ‘declarative sentence’ is a word or series of words ending with a period.
If the WORDS
command has not been called,
this variable returns the value N/A
.
_SENTENCESE
— Returns the
number of exclamatory sentences in the last file processed by
WORDS
.
Syntax:
%_SENTENCESE
An ‘exclamatory sentence’ is a word or series of words ending with an exclamation mark.
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_SENTENCESEALL
— Returns
the number of exclamatory sentences in all files processed by the last call to
WORDS
.
Syntax:
%_SENTENCESEALL
An ‘exclamatory sentence’ is a word or series of words ending with an exclamation mark.
If the WORDS
command has not been called,
this variable returns the value N/A
.
_SENTENCESQ
— Returns the
number of interrogative sentences in the last file processed by
WORDS
.
Syntax:
%_SENTENCESQ
An ‘interrogative sentence’ is a word or series of words ending with a question mark.
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_SENTENCESQALL
— Returns
the number of interrogative sentences in all files processed by the last call to
WORDS
.
Syntax:
%_SENTENCESQALL
An ‘interrogative sentence’ is a word or series of words ending with a question mark.
If the WORDS
command has not been called,
this variable returns the value N/A
.
_SENTENCEWORDS
— Returns
the total number of words in the last file processed by
WORDS
which are part of a recognized
sentence.
Syntax:
%_SENTENCEWORDS
A ‘sentence’ is a word or series of words ending with a period,
exclamation mark, or question mark. Divide %_SENTENCEWORDS
by
%_SENTENCES
to get the average sentence
length.
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_SENTENCEWORDSALL
— Returns
the total number of words in all files processed by the last call to
WORDS
which are part of a recognized
sentence.
Syntax:
%_SENTENCEWORDSALL
A ‘sentence’ is a word or series of words ending with a period,
exclamation mark, or question mark. Divide %_SENTENCEWORDSALL
by
%_SENTENCESALL
to get the average
sentence length.
If the WORDS
command has not been called,
this variable returns the value N/A
.
_TITLES
— Returns the number
of titles in the last file processed by WORDS
.
Syntax:
%_TITLES
A ‘title’ is a line or series of lines which contains one or more words, but no recognized sentences. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_TITLESALL
— Returns the
number of titles in all files processed by the last call to
WORDS
.
Syntax:
%_TITLESALL
A ‘title’ is a line or series of lines which contains one or more words, but no recognized sentences. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….
If the WORDS
command has not been called,
this variable returns the value N/A
.
_UNIQUEWORDS
— Returns
the number of unique words in the last file processed by
WORDS
.
Syntax:
%_UNIQUEWORDS
Counting unique words requires WORDS
to
build a vocabulary list for each file. If you disable this step with
/U:0
or /U:2
, the list will not be available and this
variable will return the value N/A
.
Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this plugin will find only three ‘unique’ words.
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_UNIQUEWORDSALL
— Returns
the number of unique words in all files processed by the last call to
WORDS
.
Syntax:
%_UNIQUEWORDSALL
Counting unique words for all files requires WORDS
to build a vocabulary list for all files processed; this list is not built by
default. Unless you enable the omnibus vocabulary list with /U:2
or /U:3
, this variable will return the value N/A
.
Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this plugin will find only three ‘unique’ words.
If the WORDS
command has not been called,
this variable returns the value N/A
.
_WC
— Returns the number of
contiguous series of non-blank characters in the last file processed by
WORDS
.
Syntax:
%_WC
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
• Note: Unlike the other
variables set by WORDS
, _WC
does include any
Byte Order Mark at the start of a file. A BOM will be treated as a non-blank
character, and therefore count as a ‘word’ unto itself if the
following character is whitespace. This, to my mind, is stupid behavior; a
leading BOM should either be ignored altogether, or else treated as whitespace.
I count it this way only for compatibility with certain ports of the Unix
wc
.
_WCALL
— Returns the number of
contiguous series of non-blank characters in all files processed by the last
call to WORDS
.
Syntax:
%_WCALL
If the WORDS
command has not been called,
this variable returns the value N/A
.
_WORDFILES
—Returns the
number of files processed by the last call to WORDS
.
Syntax:
%_WORDFILES
_WORDS
— Returns the total
number of words in the last file processed by WORDS
.
Syntax:
%_WORDS
If the WORDS
command has not been called,
or if there was any error reading the last file, this variable returns the
value N/A
.
_WORDSALL
— Returns the
total number of words in all files processed by the last call to
WORDS
.
Syntax:
%_WORDSALL
If the WORDS
command has not been called,
this variable returns the value N/A
.
Reference Info:
Ranges | supported in many commands. |
Code Pages Supported | to interpret non-Unicode text. |
Character Escapes | |
UQuotes Control Variables | modify the translation of ASCII quotes to Unicode. |
Highlight Variable | to choose your colors. |
Startup Message | and how to disable it. |
Acknowledgments | |
Changes | slow march of progress, or just another bug hunt? |
Status and Licensing |
Ranges:
This plugin supports the following range syntax:
Size range: /[S
smallest,
largest]
You may omit either smallest or
largest. You may qualify either with a trailing
letter: lowercase k
, m
, g
, etc.
to multiply by one thousand, one million, one billion, and so on; or
uppercase K
, M
, G
, etc. to
multiply by 210, 220, 230, and so on. If
largest begins with a +
sign, it is
an increment over smallest. Use
/![S
smallest,
largest]
to invert the test and return only files not in the given size range.
Date range: /[D
[acw]:
earliest,
latest]
You may omit either earliest or latest; either defaults to the current date. The optional [acw] argument selects the date stamp to check. (If you want to check more than one date stamp, you must supply more than one date range option.) The colon after the [acw] is optional.
Dates may be given in the local date format, or in
yyyy-
mm-
dd
format (with a four-digit year). You may also specify a date as an offset
preceded with a +
or -
sign; the offset is in days
relative to today’s date (for earliest) or
relative to earliest (in the case of
latest). If earliest
turns out to be later than latest then the two
are exchanged.
You may also give a specific time on either date, preceded
by an @
sign. The time may be in either 24-hour format, or 12-hour
format with a trailing A
or P
.
Use /![D
[acw]:
earliest,
latest]
to invert the test and return only files not in the given date range.
Time range: /[T
[acw]:
earliest,
latest]
You may omit either earliest or
latest.
The optional [acw]
argument selects the time stamp to check. (If you want to check more than one
time stamp, you must supply more than one time range option.) The colon after
the [acw]
is optional. Times may be in either 24-hour format, or 12-hour format with a
trailing A
or P
.
Use /![T
[acw]:
earliest,
latest]
to invert the test and return only files not in the given time range.
Exclusion range: /[!
wildspec]
Filenames matching the wildspec will be excluded. You can supply more than one wildspec by separating them with (unquoted) spaces.
Owner range: /[O
wildspec]
Files whose owners (in domain\
user
format) do not match the wildspec will be
skipped. Use /![O
wildspec]
to invert the test and return only files which do not match the owner
wildspec.
Description range: /I
wildspec or (alternate syntax) /[I
wildspec]
If a file’s description does not match the wildspec,
it will be skipped. Use /!I
wildspec
to invert the test, returning only files which do not match the description
wildspec.
Day-of-the-week range: /[W
[acw]:
days]
You may specify multiple days separated by commas, e.g.
/[W:MON,WED,FRI]
. You can also give a range, for example
/[W:TUE-FRI]
. WEEKENDS
is accepted as a synonym for
SAT,SUN
; WEEKDAYS
is a synonym for MON-FRI
.
The colon in this syntax is required.
You may supply multiple ranges. A file must match all given ranges or it will be skipped.
Code Pages Supported:
Many of the commands in this plugin offer a /CP:
n
option to specify a code page. The value determines how non-ASCII characters
in non-Unicode files are interpreted. This option does not affect Unicode files
or ASCII characters. The following code pages are supported:
number | name | number | name | |
1252 | Latin I | 775 | Baltic (OEM) | |
1250 | Central Europe | 850 | Multilingual Latin I (OEM) | |
1251 | Cyrillic | 852 | Latin II | |
1253 | Greek | 855 | Cyrillic (OEM) | |
1254 | Turkish | 857 | Turkish (OEM) | |
1255 | Hebrew | 858 | Latin I with Euro sign (OEM) | |
1256 | Arabic | 862 | Hebrew (OEM) | |
1257 | Baltic | 866 | Russian (OEM) | |
1258 | Vietnam | 874 | Thai | |
437 | United States (OEM) | 10000 | Mac OS Roman | |
720 | Arabic (OEM) | 20866 | KOI8-R | |
737 | Greek (OEM) | 21866 | KOI8-U | |
A or ANSI | the current Windows code page | |||
O or OEM | the current OEM code page |
The default is the current Windows code page.
Character Escapes:
These may be used in CHARENCODING
with the /X
option.
Escape: | Expands to: | Example: |
---|---|---|
\b | backspace | |
\e | ASCII escape (27 decimal) | |
\k | grave accent | |
\n | newline | |
\p | percent sign | |
\q | double quote | |
\r | carriage return | |
\t | ASCII horizontal tab | |
\u xxxx | Unicode character, up to U+FFFF | \u03a3 → Σ |
\U xxxxxxxx | Unicode character, up to U+10FFFF | \U1f63a → 😺 |
\ nnn | octal value, up to 777 | \101 → A |
\x nnnn | hexadecimal value, up to FFFF | \x41 → A |
\# nnnnn | decimal value, up to 65535 | \#65 → A |
\\ | backslash |
UQuotes Control Variables:
The following environment variables specify a Unicode character used to
replace an ASCII character in the @UQUOTES
function, or in several commands when /Q
is used. The value of
the variable may be a single character; a decimal value 32 through 65533; or a
hexadecimal value 0x20 through 0xFFFD.
OPENQUOTE : | replaces the ASCII double-quote ( " ) at the start of a quotation; the default value is 0x201C ( “ ). |
CLOSEQUOTE : | replaces the ASCII double-quote ( " ) at the end of a quotation; the default is 0x201D ( ” ). |
OPENSQUOTE : | replaces the ASCII apostrophe ( ' ) at the start of a quotation; the default is 0x2018 ( ‘ ). |
CLOSESQUOTE : | replaces the ASCII apostrophe ( ' ) at the end of a quotation; the default is 0x2019 ( ’ ). |
APOSTROPHE : | replaces the ASCII apostrophe ( ' ) within a word; the default is 0x2019 ( ’ ). |
'OKINA : | replaces the ASCII apostrophe ( ' ) between two vowels; the default is 0x2018 ( ‘ ). |
PRIME : | replaces the ASCII apostrophe ( ' ) after a number; the default is 0x27 ( ' ). |
DOUBLEPRIME : | replaces the ASCII double-quote ( " ) after a number; the default is 0x22 ( " ). |
EMDASH : | replaces pairs of ASCII hyphens ( - ); the default is 0x2014 |
Note that the variable name 'OKINA
begins, ironically enough, with
an apostrophe. To disable ‘okinas, SET 'OKINA=0X2019
(or the same value as the apostrophe).
These environment variables control the interpretation of some old-fashioned ASCII text conventions:
UQUOTES_DOUBLES : | set to 0 to prevent replacing doubled apostrophes with quotes |
UQUOTES_GRAVES : | set to 0 to prevent replacing grave accents with open quotes |
For example:
rem Use guillemets for quotations:
set openquote=0xab
set closequote=0xbb
echo %@uquotes["Sacré bleu!" he exclaimed.]
Highlight Variable:
Several of the commands in the plugin feature highlighted output. You can
customize this feature by setting an environment variable Highlight
:
rem Disable highlight:
set highlight=none
rem Set the highlight foreground:
set highlight=bright cyan
rem Set both foreground and background:
set highlight=bri whi on blu
rem Numbers are also supported:
set highlight=46
If the Highlight
environment variable is not defined, the plugin will
check the registry for a value named Highlight
of type REG_SZ
.
The plugin will search, in this order:
• HKEY_CURRENT_USER\Software\JPPlugins\TextUtils | (affects this plugin only) |
• HKEY_CURRENT_USER\Software\JPPlugins | (affects several of my plugins) |
Many commands also have a /D
or /NC
option to
disable highlighting.
Startup Message:
This plugin displays an informational line when it initializes. The
message will be suppressed in transient or pipe shells. You can disable it
for all shells by defining an environment variable named NOLOADMSG
,
for example:
set /e /u noloadmsg=1
Acknowledgments:
The original Metaphone algorithm is by Lawrence Philips. The variant implemented in this plugin is my own adaptation (improvement? perversion?) Blame me, not him, for its peculiarities.
Changes:
Version: | Date: | Changes: |
---|---|---|
0.85.2.3 | 2024-11-05 | Bug fix: PLUGIN_BUFFER_MAX is 32K bytes, not 32K characters. |
0.85.2.2 | 2024-10-02 | ParseInt() now supports octal with a leading 0o . |
0.85.2 | 2024-09-03 | StringToUnicode() and f_uchar() use PLUGIN_BUFFER_MAX for the buffer size.CHARENCODING and @UCHAR allow octal values prefixed with 0o .
CHARENCODING adds /N for character names.Other tweaks and code cleanup. |
0.85.1 | 2024-08-08 | UTYPE now supports high-order
Unicode characters in /X hex mode. |
0.85.0.3 | 2024-08-07 | FileHandler.cpp v1.0.15.0, NewHelp.cpp v1.0.8.14. |
0.85.0.2 | 2024-03-26 | Minor tweak to support nested directory aliases. |
0.85.0 | 2024-01-05 | Updated to conlist.cpp v1.1 to better support Ctrl-C and Ctrl-Break. Tweaked UTF-16 detection for very small files. |
0.84.0 | 2023-10-17 | DEHTML no
longer smashes whitespace inside <PRE> blocks. |
0.83.0.2 | 2023-10-16 | Tweaked ShowCmdHelp() to report
VER_PATCH . |
0.83.0.1 | 2023-10-12 | Updated the plugin’s web address. |
0.83.0 | 2023-07-28 | Changed DEHTML ,
@MKENTITIES , and COPYCHARS to use
HtmlEntities.cpp. Now they should support all HTML 4 entities. Updated
CHARENCODING to the version in
UChars, and documented it —
CHARENCODING was somehow missing from the doc files. Lots of additional bug fixes, code tweaks,
and doc improvements. |
0.82.6 | 2023-07-24 | Updated to the current versions of ParseArgs.cpp, NewHelp.cpp, conlist.cpp, FileHandler.cpp, MMFiles.cpp, and codepages.cpp. |
0.82.5 | 2022-06-09 | Minor tweak to @STRIPACCENTS :
Now Æ æ Œ œ are replaced with AE ae OE oe. |
0.82.4 | 2021-10-20 |
Status and Licensing:
Consider this beta software. It may well have issues. Try it at your own risk. If you do find a problem, you can report it in the JP Software support forum.
TextUtils is currently licensed only for testing purposes. I may make binaries and source code available under some free license once I consider it ready for use.
Download:
You can download the current version of the plugin from https://charlesdye.net/dl/textutils.zip.