TextUtils plugin for Take Command / TCC / TCC/LE

beta version 0.85.0.3 2024-04-02

Charles Dye

Purpose:

This plugin implements a variety of text-related features. There are new commands to count words, sentences, and paragraphs in English text; find words in text and display them in context; replace words in text; generate random passwords; display the lines of a text file in reverse order; wrap text to a desired width; and save an entire array to disk and reload it later. New functions allow you to obscure text to make it unreadable, and restore it later; determine the character encoding and text format of text files; generate Metaphone codes; remove accents from text strings; and count vowels in a string.

Installation:

To use this plugin, copy TextUtils.dll and TextUtils.chm to some known location on your hard drive. (If you are still using the 32-bit version of Take Command, take TextUtils-x86.dll instead of TextUtils.dll.) Load the plugin with a PLUGIN /L command, for example:

plugin /l c:\bin\tcmd\test\textutils.dll

If you copy these files to a subdirectory named PlugIns within your Take Command program directory, the plugin will be loaded automatically when TCC starts.

Plugin Features:

New commands:
`CHARENCODING`	`CLIP2TEXT`	`CONTEXT`	`COPYCHARS`	`COUNTCHARS`
`DEDUP`	`DEGAS`	`DEHTML`	`FFIELDS`	`FILTERFILES`
`LOADARRAY`	`OINK`	`PARSEARGS`	`PASSWORD`	`RECASE`
`REPLACETEXT`	`ROT13`	`SAVEARRAY`	`SHUFFLE`	`TEXT2CLIP`
`TEXTUTILSHELP`	`UNICODIFY`	`UPEND`	`UTYPE`	`WORDS`
`WRAP`	`XFILTER`
New functions:
`@B85TOBIN`	`@BETWEEN`	`@BINTOB85`	`@CLARIFY`	`@INIVALUE`
`@LINEENDS`	`@METAPHONE`	`@MKENTITIES`	`@OBSCURE`	`@OINK`
`@ROT13`	`@ROUGHLYSIMILAR`	`@STRIPACCENTS`	`@TEXTENCODING`	`@TEXTFORMAT`
`@UCHAR`	`@UCODE`	`@UCODEX`	`@ULEN`	`@UQUOTES`
`@VOWELS`
New variables:
`_CHARACTERS`	`_CHARACTERSALL`	`_GETACP`	`_INIVALUERC`	`_LINES`
`_LINESALL`	`_LONGESTLINE`	`_LONGESTLINEALL`	`_NONBLANKLINES`	`_NONBLANKLINESALL`
`_PARAGRAPHS`	`_PARAGRAPHSALL`	`_PASSWORD`	`_PROPERNOUNS`	`_PROPERNOUNSALL`
`_SENTENCES`	`_SENTENCESALL`	`_SENTENCESD`	`_SENTENCESDALL`	`_SENTENCESE`
`_SENTENCESEALL`	`_SENTENCESQ`	`_SENTENCESQALL`	`_SENTENCEWORDS`	`_SENTENCEWORDSALL`
`_TITLES`	`_TITLESALL`	`_UNIQUEWORDS`	`_UNIQUEWORDSALL`	`_WC`
`_WCALL`	`_WORDFILES`	`_WORDS`	`_WORDSALL`

Syntax Note:

The syntax definitions in the following text use these conventions for clarity:

`BOLD CODE`	indicates text which must be typed exactly as shown.
`CODE`	indicates optional text, which may be typed as shown or omitted.
Bold italic	names a required argument; a value must be supplied.
Regular italic	names an optional argument.
ellipsis…	after an argument means that more than one may be given.

New Commands:

CHARENCODING — Show UTF-16 and UTF-8 encodings for characters.

Syntax:
CHARENCODING /16 /8 /C /D /K /X value "string" …

`/16`	show UTF-16 encoding
`/8`	show UTF-8 encoding
`/C`	show characters
`/D`	show decimal values
`/K`	show character class
`/X`	expand C-style character escapes in quoted strings
value	hex character value; leading `0x` or `U+` is optional
`"`string`"`	strign literal between quotes

You may enter characters as quoted string literals, character values, HTML 4 character entities, or any combination. You may prefix hex values with 0x or U+ but neither is required. With or without either prefix, hexadecimal is assumed. Separate values with spaces. If you specify neither /16 nor /8, the default is to show both.

/K displays a one-letter code to indicate the type of character:

K	Class
`A`	alphabetic
`D`	digit
`P`	punctuation
`W`	whitespace
`C`	control character
`B`	Byte Order Mark
`N`	noncharacter
`H`	unpaired surrogate (high) — not a character
`L`	unpaired surrogate (low) — not a character
`-`	anything else

/X expands any escapes in quoted strings after the /X on the command line. Strings before the /X will not be expanded.

charencoding /c "Hello, world. %@smiley[56]"

CLIP2TEXT — Copy text from the clipboard to a file or standard output.

Syntax:
CLIP2TEXT /A /NB /O /P /T /UTF8 /UTF16 filename

`/A`	append to an existing file
`/NB`	do not write a Byte Order Mark
`/O`	overwrite an existing file
`/P`	page output (useful only when output is to stdout)
`/T`	quietly
`/UTF8`	write file in the UTF-8 encoding
`/UTF16`	write file in the UTF-16 encoding (default)

Only one filename is allowed. If no filename is specified, CLIP2TEXT will dump the clipboard to standard output.

See also: the TEXT2CLIP command.

CONTEXT — Search for words in English text and display them in context.

Syntax:
CONTEXT /A:attribs /C:n /CP:n /F:n /H:n /K:n /P /N /S /V /W:base /X:word /Y:word filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/C:`n	specifies the number of sentences of context to display, before and after
`/CP:`n	interpret non-Unicode input text using code page n
`/F:`n	specifies the format of the input text; n is one of:
	`0` — best guess (default)
	`1` — unformatted (line breaks are used only to end paragraphs)
	`2` — prewrapped (line breaks are used to wrap text)
	`3` — unformatted, with blank lines between paragraphs
`/H:`n	set highlight colors for matching words
`/K:`n	output columns for word-wrap
`/P`	page output
`/N`	disable features
`/S`	search in subdirectories for matching filenames
`/V`	verbose; report counts of found items after each file and at the end
`/W:`base	search for forms of a word
`/W:"`base base…`"`	search for a series of word forms
`/X:`word	search for an exact word
`/X:"`word word…`"`	search for a series of exact words
`/Y:`word	search for words that sound like word
…	Range options are also supported.

CONTEXT can read from disk files or from a pipe. If you want to pipe to CONTEXT, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to search for words on the clipboard.

Note: This command was created specifically to search through English text. I make many Anglocentric assumptions about what constitutes a ‘word’, a ‘sentence’, a ‘paragraph’, ‘forms’ of a word, and so on. These assumptions are probably not useful for any other language.

Word search: /W:base searches for forms of a word; this will probably be your most frequently-used option. Specify the base form of a word, and CONTEXT will attempt to match variations of it. For example, /W:DOG will match dog, dogs, dog’s, doggy, and even doggedly.

A word in the input text is considered a ‘form’ of the specified base word if (1) the beginning matches for the entire length of base, except that a final Y at the end of the base word will match an I in the word from the text; and (2) the remainder of the word does not contain more than one vowel other than Y. Case is not significant, and most common accents are ignored; /W:garcon will match garçon, /W:"deja vu" will match Déjà vu, and so on.

If a word from the input text contains a hyphen, the /W: search will also look for the specified base word to either side of the hyphen; /W:LEVEL will match level-headed, sub-level, and even poorly-levelled.

Word series: You can search for a series of words with /W:"base base…". To match, a series of words must appear within the same sentence in the input text; a word series cannot span the end of a sentence. Matching words must be consecutive, and may be separated by spaces, tabs, or other punctuation. CONTEXT will check for forms of each base word as above, but will not look for the base within hyphenated words. For instance, /W:"LITTLE OLD LADY" will match little, old ladies.

Exact-word search: /X:word searches for a word without checking for variant forms. /X: does not look for the specified word within hyphenated words. Case and accents are still ignored. You can search for a series of exact words with /X:"word word…".

Sound-alike search: /Y:word searches for words which sound similar to the specified word. The comparison uses a Metaphone-like algorithm to guess at a word’s pronunciation. (This type of search does not support word series.)

Surrounding context: By default, CONTEXT displays one sentence before, and one sentence after, each sentence containing any of the specified search words. You can adjust this value with /C:n; legal values are 0 to 15. Note that you may see more than 2n sentences between found words that are close together; CONTEXT will display a little extra text rather than introduce a very short break. You may also see fewer than n sentences near the start or the end of a file.

Highlighting: If CONTEXT’s output is to the screen (i.e. stdout is not redirected), text which matches your search words will be highlighted in a different color. By default, CONTEXT picks a highlight color which contrasts with the current console colors. You can specify your own highlight color either with the option /H:n, or by setting an environment variable named HIGHLIGHT. Either way, the value should be a decimal number from 1 to 254, or a hexadecimal value from 0x01 to 0xFE. The high four bits set the background color, and the low four bits set the foreground color; the two values must be different. The command-line option takes precedence over the environment variable. You can disable highlighting with /NC. Text is not highlighted if the commands’s output is redirected.

Reports: If /V is specified, CONTEXT will also report the number of times each search word was found within a file. If more than one file is processed it will also show a final report for all files, giving the number of times each search word was found in total, and in how many files.

Text encoding: CONTEXT automatically detects Unicode text files. If the file is not Unicode, the command has no way of detecting the character encoding; the default Windows code page is assumed. You can specify a different code page for non-Unicode text files with /CP:n. Most single-byte (i.e., alphabetic) code pages are supported, but multibyte code pages (Chinese, Japanese, Korean) are not. This option only affects non-Unicode files.

Text format: Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to wrap text to some desired width. You can use /F:n to tell CONTEXT how to handle line breaks. /F:1 indicates that the text is unformatted, with line breaks only at the ends of paragraphs. CONTEXT will honor all line breaks, and add an extra blank line after each paragraph. /F:2 means that the input text is prewrapped, having line breaks within paragraphs and even within sentences. CONTEXT will skip single line breaks, honoring only sequences of two or more in a row. /F:3 is also for unformatted text and acts like /F:1, but does not insert a blank line after each paragraph. If you specify /F:0 or do not specify any /F:n, CONTEXT will attempt to guess how the input text is formatted. (Guessing is not reliable when there isn’t much input text.)

Word wrap: Text output by CONTEXT will be word-wrapped. If output is to the screen, it will be wrapped to the screen width. If output has been redirected, the default width is 100 columns. You can set a different width using the /K:n option; the value must be between 40 and 512.

Disabling features: /N disables features:

`/NB`	do not write a Byte Order Mark
`/NC`	disable highlight
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

C:\> context http://www.gutenberg.org/files/11/11-0.txt /w:paint File "D:\download\pg11.txt" : CHAPTER VIII. The Queen's Croquet-Ground A large rose-tree stood near the entrance of the garden: the roses growing on it were white, but there were three gardeners at it, busily painting them red. Alice thought this a very curious thing, and she went nearer to watch them, and just as she came up to them she heard one of them say, 'Look out now, Five! Don't go splashing paint over me like that!' 'I couldn't help it,' said Five, in a sulky tone; 'Seven jogged my elbow.'

* * *

Seven flung down his brush, and had just begun 'Well, of all the unjust things--' when his eye chanced to fall upon Alice, as she stood watching them, and he checked himself suddenly: the others looked round also, and all of them bowed low. 'Would you tell me,' said Alice, a little timidly, 'why you are painting those roses?' Five and Seven said nothing, but looked at Two. C:\>

COPYCHARS — Put characters on the clipboard.

Syntax:
COPYCHARS /A /Q value entity "string" …

`/Q`	append to current clipboard text
`/A`	quietly

Character values may be specified in decimal, or in hexadecimal with a leading 0x.

Entities are as in HTML 3.2; the leading ampersand may be omitted. Entities are case sensitive.

rem A non-breaking space, an em dash, and a space: copychars nbsp; mdash; 32 rem Text in fancy quotes: copychars ldquo; "This is a test." rdquo; rem High-order characters are supported: copychars 0x1f603

COUNTCHARS — Count characters in text files.

Syntax:
COUNTCHARS /C:x-y /CP:n /O /P /R /RO /S /U /V /W /X filespec…

`/C:`x`-`y	specify a range of characters to count
`/CP:`n	interpret non-Unicode input text using code page n
`/O`	sort by frequency
`/P`	page output
`/R`	report counts for ranges as well as individual characters
`/RO`	report range counts only, not counts of individual characters
`/S`	search in subdirectories for matching files
`/U`	force characters to uppercase
`/V`	do not automatically merge overlapping ranges
`/W`	do not report count of ‘other’ characters
`/X`	do not report total characters count
`/ASCII`	short for `/C:0-127`
`/BMP`	short for `/C:0-0xFFFF`
`/HI`	short for `/C:0x10000-0x10FFFF`
…	Range options are also supported.

Input filenames may be specified on the command line, or text may be redirected or piped into COUNTCHARS. If you want to pipe to COUNTCHARS, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

Specify ranges of characters to count with /C:x-y. The start and end characters x and y may be given as decimal, hexadecimal with a leading 0x, or as literal characters:

rem These three are all the same: countchars /c:65-90 myfile.txt countchars /c:0x41-0x5a myfile.txt countchars /c:A-Z myfile.txt

To specify a literal digit, wrap it in apostrophes:

countchars /c:'0'-'9' myfile.txt

You may specify up to 32 ranges. If you do not specify any ranges, the default is /C:0-127 (ASCII characters).

All values, both in character ranges and in COUNTCHARS’s reports, refer to Unicode code points. If the text uses an 8-bit or OEM encoding, the values reported are the values of the Unicode characters that the OEM characters are translated into — not the OEM character values.

How many letters are in Engine Summer.txt?

countchars /c:A-Z /u /ro "Engine Summer.txt" File "C:\Bin\JPSDK\TextUtils\Engine Summer.txt" : 0041 - 005A : 343 Other : 161 TOTAL : 504

/C:A-Z defines a range of characters from A to Z. /U converts lowercase letters to uppercase so they will also be counted in the same range. /RO reports only the the total number of characters in the range; we only want the total number of letters, not the number of As, Bs, Cs, and so on. There are 343 letters in this file.

How many Cyrillic letters? Most Cyrillic letters fall in the range of U+0400 to U+04FF:

countchars /c:0x0400-0x04ff /ro "Engine Summer.txt" File "C:\Bin\JPSDK\TextUtils\Engine Summer.txt" : 0400 - 04FF : 0 Other : 504 TOTAL : 504

Mr. Crowley is not writing in Russian.

DEDUP — Dump text files to standard output, merging repeated lines.

Syntax:
DEDUP /A:attribs /B /C /CP:n /D /H /I /M /N /P /S /T /U filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/B`	discard blank lines
`/C`	show line repeat counts
`/CP:`n	interpret non-Unicode input text using code page n
`/D`	show only repeating lines
`/H`	display filenames
`/I`	ignore case when comparing lines
`/M`	merge repeating lines (default)
`/N`	disable features
`/P`	page output
`/S`	search in subdirectories for matching files
`/T`	trim leading and trailing whitespace
`/U`	show only lines which do not repeat
…	Range options are also supported.

Input filenames may be specified on the command line, or text may be redirected or piped into DEDUP. If you want to pipe to DEDUP, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

Options /D, /M, and /U select the operating mode. If you don’t specify one, the default is /M. If you specify more than one, the last one wins.

/N disables features:

`/NB`	do not write a Byte Order Mark
`/NC`	disable highlight
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

DEGAS — Remove excess spaces and blank lines from text.

Syntax:
DEGAS /A:attribs /B:n /CP:n /E:n /H /L /N /P /R /S /T /W filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/B:`n	maximum whitespace characters
`/CP:`n	interpret non-Unicode input text using code page n
`/E:`n	maximum blank lines
`/H`	display filenames
`/L`	display line numbers
`/N`	disable features
`/P`	page output
`/R`	remove all blank lines at the start and end of the file
`/RS`	remove all blank lines at the start of the file
`/RE`	remove all blank lines at the end of the file
`/S`	search in subdirectories for matching files
`/T`	trim all leading and trailing whitespace from each line
`/W`	convert all whitespace characters to ASCII spaces
…	Range options are also supported.

The contents of the files will be dumped to standard output, with excess spaces and blank lines removed.

Input filenames may be specified on the command line, or text may be redirected or piped into DEGAS. If you want to pipe to DEGAS, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

/B: lets you specify the maximum number of whitespace characters in a row. For example, /B:4 allows no more than four whitespace characters in a row.

DEGAS allows for the convention of spacing twice at the end of a sentence. Specify two numbers separated by a comma: /B:n,m. The first sets the maximum number of whitespace characters after a period, question mark, or exclamation point; the second is the maximum after any other character. /B:2,1 allows up to two spaces at the end of a sentence, but only one elsewhere.

/E: specifies the maximum number of blank lines in a row. (A line containing only whitespace characters is considered a ‘blank line’.) /E:3 allows no more than three blank lines together. /E:0 removes all blank lines; /E:0 can be abbreviated to /E.

You can remove all blank lines at the start of a file with /RS. Likewise, you can remove all blank lines at the end of a file with /RE. /R does both. This option is independent of the /E: compression of blank lines.

/T strips all leading and trailing whitespace from each line. This is a separate operation from the /B: compression of spaces, and happens earlier.

If none of /B: /E: /R /RS /RE or /W are specified, the default is /B:2,1 /E:1 — a maximum of two spaces at the end of a sentence, one space elsewhere; and no more than one blank line in a row.

/N disables features:

`/NB`	do not write a Byte Order Mark
`/NC`	disable highlight
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

DEHTML — Strip HTML tags from a file and dump the contents to standard output.

Syntax:
DEHTML /A:attribs /B /C /CP:n /E /H /M /N /N: /O:n /P /R /S filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/B`	exclude text outside the body and title
`/C`	include text in <!-- comments -->
`/CP:`n	interpret non-Unicode input text using code page n
`/E`	omit empty (blank) lines
`/H`	display filenames
`/M`	look in <meta> tags for charset info
`/N`	by itself: include text in <noscript> or <applet> tags
`/N:`	with suboptions: disable features
`/O:`n	include text inside <option> tags:
	`0` — don’t include any (the default)
	`1` — include only the first <option>
	`2` — include all <option> text
`/P`	page output
`/R`	remove title
`/S`	search in subdirectories for matching files
…	Range options are also supported.

Input filenames may be specified on the command line, or text may be redirected or piped into DEHTML. If you want to pipe to DEHTML, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

DEHTML will strip HTML tags from the file and replace HTML entities with the corresponding characters; most of the remaining text will be dumped to stdout. This command will also discard: any text in the header which does not appear within <title> tags; anything in <script> or <style> tags; anything within an HTML comment unless you specify /C; anything in <noscript> or <applet> tags unless you specify /N; and anything in <option> tags within a <select> block unless you specify /O:1 or /O:2.

If you specify /M, DEHTML will look in <meta> tags in the header for information about the document’s character encoding. This only works if the file is not in Unicode; /M has no effect with Unicode files.

/N with suboptions disables features:

`/NB`	do not write a Byte Order Mark
`/NC`	disable highlight
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

• Note: HTML files often include some unusual characters like non-breaking spaces, bullets, em dashes, ellipses, and guillemets. If you want to pipe or redirect the output from this command, it’s a good idea to enable Unicode output with OPTION //UNICODEOUTPUT=YES. If Unicode output is disabled, some characters may be mangled in translation.

FFIELDS — Read a file and print fields in a specified format.

Syntax:
FFIELDS /A:attribs /C /CP:n /E /F:"format" /H /K:n /L:string /N /P /Q /S /T /W /X filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/C`	separate fields at commas
`/CP:`n	interpret non-Unicode input text using code page n
`/E`	separate fields at first unquoted equals sign
`/F:"`format`"`	format string; see below
`/H`	display filenames
`/K:`n	output line width (columns)
`/L:`string	insert line numbers on the left
`/N`	disable features
`/P`	page output
`/Q`	remove quotes (the default is to retain them)
`/S`	search in subdirectories for matching files
`/T`	separate fields at tabs
`/W`	separate fields at whitespace
`/X`	perform variable expansion on each line
…	Range options are also supported.

The FFIELDS command reads a file, divides each line into fields (blank lines are skipped), and then prints the fields using a format string. FFIELDS can read from disk files or from a pipe. If you want to pipe to FFIELDS, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

The format string may contain $n to print field n, or $n=wf to print field n truncated to length w; the final letter is L to left-justify the field if it contains fewer than w characters, R to right-justify it, C to center it, or T to simply truncate the field without padding it to length w. For example, a field specifier of $4=10L would print field 4, left-justified to 10 characters. Use $$ to print a literal dollar sign, or $N to insert a line break.

Fields are numbered starting from 0.

set |! ffields /e /f:"$0=20l $1=58t"

…displays variable names truncated to 20 characters, followed by a space and the variables’ values truncated to 58 characters.

If you include /L on the command line, FFIELDS will insert line numbers to the left of each output line. Lines are numbered starting at 0. If you include the optional string argument, FFIELDS will perform variable expansion on it before prepending it to each output line; use the variable _LINE to get the current line number. For example, /L:"%%@FORMAT[03,%%_LINE]" will prepend the line number, zero-padded to at least three digits.

If you don’t specify a format string, FFIELDS will invent one at random:

alias |! ffields /e

/N disables features:

`/NB`	do not write a Byte Order Mark
`/NC`	disable highlight
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

/X does variable expansion on each line before displaying it. You could, for example, count the characters in each alias definition:

alias |! ffields /e /f:"$0 = (%%@len[$1]) $1" /x

FILTERFILES — Pass files through a text filter command.

Syntax:
FILTERFILES /B:.ext /C /J /N /P /Q /S /UTF8 /UTF16 filespec… : command args…

`/B:.`ext	extension for backups; the default is .original
`/C`	do not abort if the command exits with errorlevel 3
`/N`	not really
`/N`	disable features
`/J`	redirect input
`/P`	prompt for each file
`/Q`	quietly
`/S`	search in subdirectories for matching files
`/UTF8`	redirect output as UTF-8
`/UTF16`	redirect output as UTF-16
…	Range options are also supported.
filespec…	the files to process; at least one filespec is required
command	a filter command which writes to stdout

At least one filespec is required. Anything after the first unquoted colon is the command to execute; this also is required.

Matching files will be renamed with a .original extension, or as per /B. Then the specified command will be called, passing the new filename on its command line after any args, and with its output redirected to the new filename.

This command only supports local files. CLIP:, URLs, standard input, and so on are not supported.

/N by itself prevents FILTERFILES from doing anything. Matching files will be displayed but not renamed, and the command will not be executed.

/N with suboptions disables features:

`/NC`	disable highlight
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

/P causes FILTERFILES to prompt before processing each file. You can press:

`Y`	to filter the file
`N` or `Esc`	to skip the file
`A`	to stop prompting and filter all remaining files
`Q`	to exit immediately

/UTF8 and /UTF16 let you set the output encoding. They call OPTION //UnicodeOutput= and OPTION //UTF8Output= before processing files, and then restore the original settings before FILTERFILES exits. Note that //UTF8Output does not actually work in TCC/LE.

By default, FILTERFILES passes each original filename to the command on its command line:

filtercmd "file.original" > "file.txt"

If you specify /J, it will use input redirection instead:

filtercmd < "file.original" > "file.txt"

FILTERFILES is mainly intended for use with the filters in this plugin: DEDUP, DEGAS, DEHTML, WRAP, and so on. But you can use it with any command that either accepts a filename on its command line or reads from standard input, and that writes text to standard output.

rem Convert all .TXT file in the current directory to Pig Latin: filterfiles *.txt : oink

rem Add line numbers to MyFile.txt: filterfiles myfile.txt : type /L

LOADARRAY — Load data from a file into an array variable.

Syntax:
LOADARRAY /Q filename arrayname

`/Q`	quietly
filename	a file created by `SAVEARRAY`
arrayname	an array variable name

The arrayname must begin with a letter. It may contain only letters, digits, underscores, and dollar signs; it should not be more than 31 characters long. If you don’t specify an arrayname, the name of the original array saved in the file will be used. The array will be created (or recreated) automatically, with the correct dimensions to hold the data from the file.

All elements in the file will be loaded. There is no provision for loading a partial array.

• Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.

See also: The CLIP2TEXT command.

TEXTUTILSHELP — Open the TextUtils plugin help file.

Syntax:
TEXTUTILSHELP /C /F /S /S:text /V topic

`/C`	select the ‘Contents’ tab
`/F`	select the ‘Favorites’ tab
`/S`	select the ‘Search’ tab
`/S:`text	select the ‘Search’ tab and search for text
`/V`	show detailed plugin version info
topic	the page to display

The TEXTUTILSHELP command will locate and open this plugin’s help file. In most cases, the internal HELP command, and the F1 and Ctrl-F1 keys, will be more convenient. The main advantage to this command is that it can be used to open the help file to any desired topic, not only to the names of commands, functions, and variables.

Note that any /C /F or /S must precede any topic on the command line. (This command has a very simple-minded parser.)

UNICODIFY — Convert text files to Unicode.

Syntax:
UNICODIFY /A:attribs /CP:n /L /N /O /P /Q /S /T /UTF8 /UTF16 filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/CP:`n	interpret non-Unicode input text using code page n
`/L`	normalize line endings to CR/LF
`/N`	disable features
`/O`	overwrite read-only files
`/Q`	replace ASCII quotes and apostrophes with Unicode open and close quotes
`/S`	search in subdirectories for matching files
`/T`	quietly
`/UTF8`	rewrite files using UTF-8 encoding
`/UTF16`	rewrite files using UTF-16 encoding (default)
…	Range options are also supported.

UNICODIFY rewrites the contents of text files, changing them to UTF-16 or UTF-8 encoding. By default, it will skip:

files which already appear to be in the desired encoding
files with the read-only attribute set (use /O to disable)
empty files

The original contents of the file will be saved in a new file with the extension .original.

• Note: This command only converts files. Standard input, internet URLs, and the clipboard are not supported. (You can use wildcards, directory aliases, @file lists, and so on.)

OEM characters will be interpreted according to the current Windows code page by default; use the /CP:n option to specify a different code page. To check the translation before you actually convert the file, try UTYPE with the /CP:n option first.

/N disables features:

`/NB`	do not write a Byte Order Mark
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

UPEND — Display lines from a file in reverse order.

Syntax:
UPEND /A:attribs /B /C /CP:n /E /H /L:string /N /P /R:string /S /T /V /W:n filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/B`	discard blank lines
`/C`	replace control characters with ^ sequences
`/CP:`n	interpret non-Unicode input text using code page n
`/E`	expand variables in the `/L:` and `/R:` strings
`/H`	display the filename before each file
`/L:`string	insert string to the left of each line
`/N`	disable features
`/P`	page output
`/R:`string	insert string to the right of each line
`/S`	search in subdirectories for matching files
`/T`	trim leading and trailing whitespace
`/V`	also reverse each line in the file
`/W:`n	truncate lines to n characters
…	Range options are also supported.

UPEND is a low-budget substitute for the Unix tac command. It can read from disk files or from a pipe. If you want to pipe to UPEND, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

If standard input (stdin) is redirected, UPEND will read from stdin before any filenames specified on the command line. If no filenames are specified, then UPEND will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read lines from the clipboard.

If /L: is specified, the given string will be inserted to the left of each line; /R: inserts a string to the right. If /E is also specified, variable expansion will be performed on each string. Along with TCC’s usual complement of internal variables, functions, and so on, UPEND will set an environment variable _LINE. _LINE will contain the value 0 for the first line listed (i.e. the last line in the file), 1 for the second line listed, and so on. You can massage this value with functions like @INC, @EVAL, @FORMAT, and so on. To prevent the variables from being expanded before UPEND executes, you must either enclose the string in backquotes or double the percent signs.

/N disables features:

`/NB`	do not write a Byte Order Mark
`/NC`	disable highlight
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

upend D:\download\pg11.txt /l:"%%@format[4,%%_line] " /e

UTYPE — Dump text files to standard output.

Syntax:
UTYPE /A:attribs /B /C /CP:n /D /E /F:string /H /HW:n /K:n /L:format /N /P /Q /S /T /U:string /X /Z:n filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/B`	discard BEL characters (control-G, ASCII 7)
`/C`	replace control characters with ^ sequences
`/CP:`n	interpret non-Unicode input text using code page n
`/D`	discard blank lines at the start of the file
`/E`	discard all empty lines
`/F:`string	show only lines following this string; `/FF:` inclusive
`/H`	display the filename before each file
`/HH`	display the filename, file size, and encoding before each file
`/HW:`n	hex dump width, in bytes; only useful with `/X`
`/K:`n	expand tabs to n columns
`/L:`format	insert line numbers on the left
`/N`	disable features
`/P`	page output
`/Q`	replace ASCII quotes and apostrophes with Unicode open and close quotes
`/S`	search in subdirectories for matching files
`/T`	trim leading and trailing whitespace
`/U:`string	show only lines until (before) this string; `/UU:` inclusive
`/X`	dump file in hexadecimal
`/Z:`	handling of NUL characters in text:
	`/Z:N` — treat like end-of-line (default)
	`/Z:I` — treat as invalid character
	`/Z:S` — skip over (ignore) any NUL characters

…	Range options are also supported.

UTYPE displays files to standard output, much like the internal TYPE command. The primary advantage of UTYPE is that it recognizes and handles UTF-8 text files; you can think of it as a ‘UTF-8 TYPE’.

If you want to pipe to UTYPE, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

If standard input (stdin) is redirected, UTYPE will read from stdin before any filenames specified on the command line. If no filenames are specified, then UTYPE will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to display the contents of the clipboard.

If you include /L on the command line, UTYPE will insert line numbers on the left, starting at 1, as TYPE does. If you include the optional format string, UTYPE will perform variable expansion on the string before displaying it; use the variable _LINE to get the current (zero-based) line number. For example, /L:"%%@FORMAT[03,%%_LINE] " will show the line number zero-padded to at least three digits.

/F: and /U: can be used to chop off a simple header or footer. /F: discards all lines up to and including the first line which contains the specified string (case-insensitive); /U: discards all lines including and after a line which contains the specified string (again, case-insensitive). For example, most Project Gutenberg ebooks include a header which ends in a line beginning with “*** START” and a footer beginning with “*** END”. You can strip them off like this:

utype "http://www.gutenberg.org/cache/epub/11/pg11.txt" /f:"*** start" /u:"*** end" /d | list

If you double the option letter — /FF: or /UU: — the matching line will be included in UTYPE’s output, not discarded.

/E discards all blank lines; /D discards only those at the start of a file. If you specify both, /D wins. If you combine /D with /F:string, UTYPE will discard any blank lines following the header. A line containing only spaces or tabs is considered blank.

/N disables features:

`/NB`	do not write a Byte Order Mark
`/NC`	disable highlight
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NH`	disable the handbrake
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

The handbrake: When scrolling a long file to the console and /P was not specified, UTYPE watches for the Ctrl and Esc keys. Hold down the Ctrl key to slow the scrolling; press Esc to pause the file as if /P had been specified. This feature will be disabled automatically if you specify /P or if output is redirected; you can also disable it with /NH.

Quotes replacement: /Q causes UTYPE to replace generic ASCII apostrophes and quote marks ( ' and " ) with Unicode open and close quote marks ( ‘ ’ and “ ” ). The new quote marks may or may not look different from the originals, depending on how they are displayed and the font used. If the output is displayed in a non-Unicode font, the curly quotes will be lost or mangled. You can set some environment variables to control this feature.

utype "Engine Summer.txt"

WORDS — Count words, sentences, and paragraphs in English text.

Syntax:
WORDS /A:attribs /C /CP:n /D /F:fmt /K /M:n /N /S /U:mode /X filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/C`	code mode; words may contain underscores and dollar signs
`/CP:`n	interpret non-Unicode input text using code page n
`/D`	dumps lists of unique words, sorted by frequency
`/F:`fmt	specifies the format for input text; fmt is one of:
	`0` — best guess (default)
	`1` — unformatted (line breaks are used only to end paragraphs)
	`2` — prewrapped (line breaks are used to wrap text)
`/K`	keeps hyphens when reassembling split words
`/M:`n	minimum number of letters in a word
`/N`	by itself: no words containing digits
`/N`	with suboptions: disable features
`/S`	search in subdirectories for matching files
`/U:`mode	controls the counting of unique words; mode is one of:
	`0` — do not count unique words (faster for large files)
	`1` — count unique words for each file individually (the default)
	`2` — count unique words for all files together (slower)
	`3` — separate counts for each file and for all files together (double oink!)
`/X`	no words beginning with a digit
…	Range options are also supported.

WORDS counts words, sentences, and paragraphs in English text. It can read text from standard input, or from one or more files specified on the command line. A report is written to standard output; this report can be piped or redirected. The results of the last file processed are also saved internally, and can be acessed through internal variables.

Note: This command was designed specifically for use with English text. I make many Anglocentric assumptions about what constitutes a ‘word’, a ‘sentence’, a ‘paragraph’, ‘forms’ of a word, and so on. These assumptions are probably not useful for any other language. WORDS may give strange or undesired results when used on source code, program output, HTML, or whatnot.

If standard input (stdin) is redirected, WORDS will read from stdin before any filenames specified on the command line. If no filenames are specified, then WORDS will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to count words on the clipboard.

This command’s definition of a ‘word’ is complex and subject to ongoing tweaking. In general, though, a word may contain only letters, digits (unless /N is specified), periods, apostrophes, and hyphens; at least one character must be a letter. For instance, 20th, 1920s, 1969's, and post-1941 are all considered words, but 1984 is not. The first character must be alphanumeric or (very rarely) an apostrophe.

If /C is specified, words may also contain underscores and dollar signs, but must not begin with a digit or dollar sign. /C also suppresses the count of sentences and paragraphs in the final report.

Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this command will find only three ‘unique’ words.

A word is counted as ‘proper’ only if it never occurs in an all-lowercase form; no proper nouns will be found in Polish polish. Acronyms like NATO will be counted as ‘proper nouns’; so will ordinary words capitalized at the start of a sentence. The latter are often common words like articles and prepositions, which tend to be weeded out in longer files as they recur midsentence.

Note that a hyphenate is always counted as a single word. Without a dictionary, the command has no way of knowing whether it is composed of actual words (red-eye, half-baked) or not (pre-K, Wi-Fi).

WORDS also gives counts of sentences, paragraphs, lines, characters, and bytes. All counts should be viewed as estimates rather than gospel truth. The sentences count in particular must be taken with a healthy dose of salt; the command has no good way to determine whether a period ends an abbreviation, a sentence, or both.

A line, or a series of lines, which contains one or more sentences is counted as a ‘paragraph’. A line or series of lines which contains one or more words, but no recognized sentences, is instead counted as a ‘title’. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….

The number of lines reported may differ from the number of carriage returns or line feeds in the text, e.g. if the last line in the file is not terminated. A line containing only whitespace characters (spaces and tabs) is considered blank. The character and byte counts do not include any Unicode byte-order mark at the beginning of the file.

Split words: If a hyphenated word is split across a line break, WORDS will reassemble it and treat it as a single word. By default, the hyphen is dropped — the command has no way of knowing whether a hyphenated compound word was broken at a hyphen, or whether a normal word was divided between syllables and a hyphen added. The latter seems more common, and I wanted to avoid cluttering the vocabulary list with differently-hyphenated versions of the same word. If /K is specified, the command will instead retain hyphens when reassembling words broken at the end of a line. This option may cause a larger number of ‘unique’ words to be reported.

Vocabularies: In order to count unique words and ‘proper nouns’, WORDS must build a list of all words found. Building this list can slow down the process and use a good deal of memory if the text file involved is large. /U:mode controls the vocabulary lists. /U:0 disables vocabularies; the command executes faster, but there will be no counts of unique and proper words. /U:1 causes WORDS to build a vocabulary list for each file it processes; this is the default behavior. /U:2 builds a combined vocabulary for all files that WORDS processes; this is slower than the default. Finally, /U:3 builds a vocabulary for each file that WORDS reads, and at the same time builds a master vocabulary for all files together; this is much slower than the default behavior, and devours memory shamelessly.

If you are processing extremely large text files, or files which are not English prose — e.g. output from a program or command — I strongly recommend using /U:0 to disable vocabulary lists.

Dump: If /D is specified, the vocabulary for each file will be dumped to stdout. If /D is combined with /U:2, you’ll instead get a combined vocabulary for all files. The list is sorted by frequency, with more common words appearing first. Note that words may be shown in a different case than they appear in the input text. This is because the command stores all words in lowercase internally for speed (lowercase letters are more streamlined).

Text encoding: WORDS automatically detects Unicode text files. If the file is not Unicode, the command has no way of detecting the character encoding; the default Windows code page is assumed. You can specify a different code page for non-Unicode text files with /CP:n. Most single-byte (i.e., alphabetic) code pages are supported, but multibyte code pages (Chinese, Japanese, Korean) are not. This option only affects non-Unicode files.

Disabling features: /N with suboptions disables features:

`/NB`	do not write a Byte Order Mark
`/NC`	disable highlight
`/ND`	do not search into hidden directories; only useful with `/S`
`/NF`	suppress the file-not-found error
`/NJ`	do not search into junctions; only useful with `/S`
`/NZ`	do not search into system directories; only useful with `/S`

You can combine these, e.g. /NDJ.

C:\> type EBS.txt This is a test. For the next sixty seconds, this station will conduct a test of the Emergency Broadcast System. This is only a test. C:\> words /d EBS.txt File "C:\EBS.txt" : 25 words total, 17 unique, 4 proper. 25 runs of non-blanks. 3 sentences total: 3. 0! 0? Average sentence 8.3 words. 1 paragraph, 0 titles. Average paragraph 3.0 sentences. 2 lines total, 2 not blank; the longest had 77 characters. 137 characters in 137 bytes (OEM, prewrapped). 3: a test this 2: is the 1: Broadcast conduct Emergency For next of only seconds sixty station System will C:\>

The results from the last file processed are saved, and can be accessed using these internal variables:

`_WORDS`	`_UNIQUEWORDS`	`_PROPERNOUNS`	`_WC`
`_SENTENCES`	`_SENTENCESD`	`_SENTENCESE`	`_SENTENCESQ`
`_SENTENCEWORDS`	`_PARAGRAPHS`	`_TITLES`
`_LINES`	`_NONBLANKLINES`	`_LONGESTLINE`	`_CHARACTERS`

The cumulative results from all files processed by the last invocation of WORDS can be accessed through these variables:

`_WORDSALL`	`_UNIQUEWORDSALL`	`_PROPERNOUNSALL`	`_WCALL`
`_SENTENCESALL`	`_SENTENCESDALL`	`_SENTENCESEALL`	`_SENTENCESQALL`
`_SENTENCEWORDSALL`	`_PARAGRAPHSALL`	`_TITLESALL`	`_WORDFILES`
`_LINESALL`	`_NONBLANKLINESALL`	`_LONGESTLINEALL`	`_CHARACTERSALL`

WRAP —Word-wrap English text to fit a specified number of columns.

Syntax:
WRAP /A:attribs /C: /CP:n /D /F:fmt /G:n,m /H /J /N:n /N /P /Q /R /S /T:n /W:width /Z:char filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/C:`n	condense repeated spaces in input text
`/CP:`n	interpret non-Unicode input text using code page n
`/D`	disable special handling of soft hyphens (character 173 / 0xAD)
`/F:`fmt	specifies the format for input text; fmt is one of:
	`0` — best guess (default)
	`1` — unformatted (line breaks are used only to end paragraphs)
	`2` — prewrapped (line breaks are used to wrap text)
	`3` — unformatted, with blank lines between paragraphs
`/G:`n`,`m	indent all paragraphs n spaces; if m is specified, it’s the indent for the second and later lines
`/H`	display filenames
`/J`	justify right margins
`/N:`n	minimum characters left on each line to split at a hyphen; 0 disables breaking at hyphens
`/P`	page output
`/N`	disable features
`/Q`	replace ASCII quotes and apostrophes with Unicode open and close quotes
`/R`	remove hyphens from line ends
`/S`	search in subdirectories for matching filenames
`/T:`n	tab stops every n spaces
`/W:`width	desired width of output text
`/Z:`char	define a forced line-break character
…	Range options are also supported.

The WRAP command word-wraps English text to fit a specified width. It can be used as a filter reading from standard input, or it can read from one or more files specified on the command line. The resulting text is written to standard output; it can be piped or redirected.

If you want to pipe to WRAP, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

‘Width’ here refers to a specified number of character positions, or columns. All characters are assumed to have the same width. The word-wrapped output should have neat, reasonably uniform line lengths when viewed or printed in a fixed-pitch font such as Courier, or displayed in a console window. Note that the specifed width includes the final newline character; if you specify a width of 80, then up to 79 printable characters may appear on a line.

Note: This command is designed specifically for use with English prose. It may give weird or undesired results when used on source code, program output, HTML, or whatnot. It makes Anglocentric assumptions that may not be appropriate to other languages.

If standard input (stdin) is redirected, WRAP will read from stdin before any filenames specified on the command line. If no filenames are specified, then WRAP will read from stdin whether it is redirected or not. If /H is used, each file’s name will be printed before it is processed. (For standard input, <stdin> will be shown.)

Output width: /W:width sets the desired width in characters for the output text. Width may be from 40 to 512. If no /W:width is specified, the default is the console width if output is to the console, or defaults to 100 columns if output is redirected. (You can set an environment variable COLUMNS to change this default.) If you type just a /W without a colon or width, then the current console width is assumed; this is useful if you are redirecting WRAP’s output but want it wrapped to the console width anyway, e.g. for piping to LIST.

Text format: Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to wrap text to some desired width. You can use /F:n to tell WRAP how to handle line breaks. /F:1 indicates that the text is unformatted, with line breaks only at the ends of paragraphs. WRAP will honor all line breaks, and add an extra blank line after each paragraph. /F:2 means that the input text is prewrapped, having line breaks within paragraphs and even within sentences. WRAP will skip single line breaks, honoring only sequences of two or more in a row. /F:3 is also for unformatted text and acts like /F:1, but does not insert a blank line after each paragraph; use this option to wrap the output from DEHTML. If you specify /F:0 or do not specify any /F:n, WRAP will attempt to guess how the input text is formatted. (Guessing is not reliable when there isn’t much input text.)

Tab size: The /T:n option controls the expansion of tab characters. By default, tab stops are every four columns (set an environment variable TABSIZE to change this default). /T:8 would make tabs eight columns wide. /T:0 disables special handling of tab characters, treating them like any other character; this will probably bollix word-wrapping and is not recommended. n may be 0 to 20.

Breaking at hyphens: WRAP will usually break lines at spaces. It may also break a line after a hyphen, if all of the following are true: (1) the character before the hyphen is a letter, and the following character is either a letter or a digit; (2) at least three characters, not counting the hyphen, will remain at the end of the line; and (3) at least three characters will move to the start of the following line. So, for example, if the phrase true-blue fell near the end of a line, WRAP might break the line after the hyphen, since true and blue have four letters each. The phrases do-nothing and derring-do would not be divided, however, since splitting either one would leave a two-letter do on a line by itself. You can adjust this behavior with /N:n, which sets the minimum number of characters for both lines. If you specify /N:4 then at least four characters, not counting the hyphen, must remain on each line. /N:0 prevents WRAP from breaking lines after hyphens.

Removing hyphens: If /R is used, WRAP may discard a hyphen at the end of a line if the preceding character was a letter, and if the first character on the following line is also a letter. Without /R, WRAP retains all hyphens from line ends.

Forced indentation: The /G:n option forcibly indents each new paragraph n spaces (not tabs.) Any indentation in the input text will be lost. n must be 0 to 20. /G:0 will strip all leading whitespace, leaving text flush with the left margin. The optional second value, if present, indents the second and later lines m spaces; m is also 0 to 20. You might use /G:0,4 to produce a hanging indent. If /G: is not specified, any indentation in the input text is preserved.

Condensing spaces: The /C:n option allows you to condense runs of consecutive spaces in the input text. Any sequence of more than n spaces will be truncated. Only spaces (character 32) are counted, not other whitespace characters. Spaces generated by the program itself (e.g. by expanding tabs or indenting paragraphs) will not be condensed. n must be 0 to 10; if n is 0, spaces are not condensed (the default.) This option might be useful for packing output text just a little more tightly; if the original text file had extra spaces inserted to justify margins; or if you are one of those unfortunates who suffer a violent reaction to the sight of two spaces after a period.

Quotes replacement: /Q causes WRAP to replace generic ASCII apostrophes and quote marks ( ' and " ) with Unicode open and close quote marks ( ‘ ’ and “ ” ). The new quote marks may or may not look different from the originals, depending on how they are displayed and the font used. If the output is displayed in a non-Unicode font, the curly quotes will be lost or mangled. You can set some environment variables to control this feature.

Text encoding: WRAP automatically detects Unicode text files. If the file is not Unicode, the command has no way of detecting the character encoding; the default Windows code page is assumed. You can specify a different code page for non-Unicode text files with /CP:n. Most single-byte (i.e., Western) code pages are supported, but multibyte code pages (Chinese, Japanese, Korean) are not. This option only affects non-Unicode files.

Forced line break: /Z:char defines a forced line-break character. char may be entered as either a single character, or as a decimal or hexadecimal (prefixed with 0x) character code. If a matching character is found in the input file or stream, WRAP will end the current line and begin a new one.

Disabling features: /N with suboptions disables features:

`/NB`	do not write a Byte Order Mark
`/ND`	do not search into hidden directories; only useful with `/S`
`/NH`	do not add a hyphen when breaking a word
`/NJ`	do not search into junctions; only useful with `/S`

You can combine these, e.g. /NDJ.

These variables may be set to a numeric value to modify the command’s default behavior:

`COLUMNS`:	sets the default width when output is redirected and `/W` is not specified. Legal values are 40 to 512.
`TABSIZE`:	sets the default number of columns between tab stops when `/T` is not specified. Legal values are 1 to 20.

wrap /w:100 "Fishy Story.txt"

XFILTER — Process lines of a file using variable expansion.

Syntax:
XFILTER /A:attribs /B /CP:n /F:"format" /H /N /P /S /T filename…

`/A:`attribs	attributes mask; valid flags are `-ACEHIORS`
`/B`	discard blank lines
`/CP:`n	interpret non-Unicode input text using code page n
`/F:"`format`"`	format string: required; see below
`/H`	display filenames
`/N`	disable features
`/P`	page output
`/S`	search in subdirectories for matching files
`/T`	trim leading and trailing whitespace
…	Range options are also supported.

The required format string contains TCC variables and functions, which will be expanded for each line in the file. Double all percent signs to prevent variables from being expanded before the command is executed. An asterisk in the format string will be replaced with each line from the file. The current (zero-based) line number is also available in the variable _LINE.

XFILTER can be used as a filter reading from standard input, or it can read from one or more files specified on the command line. The resulting text is written to standard output; it can be piped or redirected. If you want to pipe to XFILTER, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

To prevent problems caused by troublesome characters in the input text, certain ‘dangerous’ characters from the file will be temporarily replaced with safe alternatives from Unicode’s Halfwidth and Fullwidth Forms block. They will be restored to ASCII after variable expansion. This shuffle prevents issues when characters with special meanings to TCC are inadvertently present in the input text, but it might be confusing if you want to find or replace any of the remapped characters. The characters which are temporarily replaced are:

Character	ASCII	Hex	Remapped to
"	34	22	U+FF02
%	37	25	U+FF05
(	40	28	U+FF08
)	41	29	U+FF09
,	44	2C	U+FF0C
[	91	5B	U+FF3B
]	93	5D	U+FF3D
^	94	5E	U+FF3E
`	96	60	U+FF40

rem Dump a file in uppercase: xfilter /f:"%%@upper[*]" "Engine Summer.txt" rem Display the length of each line: xfilter /f:"Line %%_line has %%@len[*] characters." "Engine Summer.txt"

New Functions:

@B85TOBIN — Decodes a base-85 string into a binary buffer.

Syntax:
%@B85TOBIN[handle,start,string]

handle	the handle to a binary buffer, as returned by `@BALLOC`
start	the offset in bytes to which to begin decoding; defaults to 0
string	a base-85 encoded string as returned by `@BINTOB85`

This function decodes a base-85 string returned by @BINTOB85 and stores the resulting data in a binary buffer. Note that there is no option to control the number of bytes written; the entire string is decoded and written to the buffer. If there is any error in decoding the string, no change will be made to the binary buffer.

Note that the two commas between parameters are both required. You must supply both commas even if you omit the optional start value.

The return value is the number of bytes written to the buffer.

New Variables:

_CHARACTERS — Returns the number of characters in the last file processed by WORDS.

Syntax:
%_CHARACTERS

This count does not include any Unicode byte-order mark at the beginning of the file. If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.

_CHARACTERSALL — Returns the number of characters in all files processed by the last call to WORDS.

Syntax:
%_CHARACTERSALL

This count does not include any Unicode byte-order marks at the beginnings of files. If the WORDS command has not been called, this variable returns the value N/A.

_GETACP — Returns the current Windows code page.

Syntax:
%_GETACP

This function returns the current Windows code page. (This value is also traditionally miscalled the ‘ANSI code page’, although it has nothing to do with ANSI.) Note that this value can and usually does differ from the OEM code page returned by %_CODEPAGE.

echo The current Windows code page is %_getacp.

_INIVALUERC — Returns an exit code for the last call to @INIVALUE.

Syntax:
%_INIVALUERC

This variable returns a code indicating the success or failure of the last call to the @INIVALUE function, and the nature of the error if it failed. Possible return values include:

	an empty string if `@INIVALUE` has not been called
`Syntax error`	any error in arguments
`File error` n	any error opening the file; n is a Windows error number
`File empty`	the file contains no data
`Found` n	a matching entry was found at line n
`Count` n	successfully counted matching entries
`No section`	no matching section header was found
`No entry` n	no matching entry, or fewer than n entries found

If the correct entry was found, the return value is Found n. The n is the line number, starting from zero and not counting any blank lines.

Reference Info:

Ranges	supported in many commands.
Code Pages Supported	to interpret non-Unicode text.
Character Escapes
UQuotes Control Variables	modify the translation of ASCII quotes to Unicode.
Highlight Variable	to choose your colors.
Startup Message	and how to disable it.
Acknowledgments
Changes	slow march of progress, or just another bug hunt?
Status and Licensing

Ranges:

This plugin supports the following range syntax:

Size range: /[Ssmallest,largest]

You may omit either smallest or largest. You may qualify either with a trailing letter: lowercase k, m, g, etc. to multiply by one thousand, one million, one billion, and so on; or uppercase K, M, G, etc. to multiply by 2¹⁰, 2²⁰, 2³⁰, and so on. If largest begins with a + sign, it is an increment over smallest. Use /![Ssmallest,largest] to invert the test and return only files not in the given size range.

Date range: /[D[acw]:earliest,latest]

You may omit either earliest or latest; either defaults to the current date. The optional [acw] argument selects the date stamp to check. (If you want to check more than one date stamp, you must supply more than one date range option.) The colon after the [acw] is optional.

Dates may be given in the local date format, or in yyyy-mm-dd format (with a four-digit year). You may also specify a date as an offset preceded with a + or - sign; the offset is in days relative to today’s date (for earliest) or relative to earliest (in the case of latest). If earliest turns out to be later than latest then the two are exchanged.

You may also give a specific time on either date, preceded by an @ sign. The time may be in either 24-hour format, or 12-hour format with a trailing A or P.

Use /![D[acw]:earliest,latest] to invert the test and return only files not in the given date range.

Time range: /[T[acw]:earliest,latest]

You may omit either earliest or latest. The optional [acw] argument selects the time stamp to check. (If you want to check more than one time stamp, you must supply more than one time range option.) The colon after the [acw] is optional. Times may be in either 24-hour format, or 12-hour format with a trailing A or P.

Use /![T[acw]:earliest,latest] to invert the test and return only files not in the given time range.

Exclusion range: /[!wildspec]

Filenames matching the wildspec will be excluded. You can supply more than one wildspec by separating them with (unquoted) spaces.

Owner range: /[Owildspec]

Files whose owners (in domain\user format) do not match the wildspec will be skipped. Use /![Owildspec] to invert the test and return only files which do not match the owner wildspec.

Description range: /Iwildspec or (alternate syntax) /[Iwildspec]

If a file’s description does not match the wildspec, it will be skipped. Use /!Iwildspec to invert the test, returning only files which do not match the description wildspec.

Day-of-the-week range: /[W[acw]:days]

You may specify multiple days separated by commas, e.g. /[W:MON,WED,FRI]. You can also give a range, for example /[W:TUE-FRI]. WEEKENDS is accepted as a synonym for SAT,SUN; WEEKDAYS is a synonym for MON-FRI. The colon in this syntax is required.

You may supply multiple ranges. A file must match all given ranges or it will be skipped.

Code Pages Supported:

Many of the commands in this plugin offer a /CP:n option to specify a code page. The value determines how non-ASCII characters in non-Unicode files are interpreted. This option does not affect Unicode files or ASCII characters. The following code pages are supported:

number	name	number	name
1252	Latin I	775	Baltic (OEM)
1250	Central Europe	850	Multilingual Latin I (OEM)
1251	Cyrillic	852	Latin II
1253	Greek	855	Cyrillic (OEM)
1254	Turkish	857	Turkish (OEM)
1255	Hebrew	858	Latin I with Euro sign (OEM)
1256	Arabic	862	Hebrew (OEM)
1257	Baltic	866	Russian (OEM)
1258	Vietnam	874	Thai
437	United States (OEM)	10000	Mac OS Roman
720	Arabic (OEM)	20866	KOI8-R
737	Greek (OEM)	21866	KOI8-U
`A` or `ANSI`	the current Windows code page
`O` or `OEM`	the current OEM code page

The default is the current Windows code page.

Character Escapes:

These may be used in CHARENCODING with the /X option.

Escape:	Expands to:	Example:
`\b`	backspace
`\e`	ASCII escape (27 decimal)
`\k`	grave accent
`\n`	newline
`\p`	percent sign
`\q`	double quote
`\r`	carriage return
`\t`	ASCII horizontal tab
`\u`xxxx	Unicode character, up to U+FFFF	`\u03a3` → Σ
`\U`xxxxxxxx	Unicode character, up to U+10FFFF	\U1f63a → 😺
`\`nnn	octal value, up to 777	`\101` → A
`\x`nnnn	hexadecimal value, up to FFFF	`\x41` → A
`\#`nnnnn	decimal value, up to 65535	`\#65` → A
`\\`	backslash

UQuotes Control Variables:

The following environment variables specify a Unicode character used to replace an ASCII character in the @UQUOTES function, or in several commands when /Q is used. The value of the variable may be a single character; a decimal value 32 through 65533; or a hexadecimal value 0x20 through 0xFFFD.

`OPENQUOTE`:	replaces the ASCII double-quote ( " ) at the start of a quotation; the default value is 0x201C ( “ ).
`CLOSEQUOTE`:	replaces the ASCII double-quote ( " ) at the end of a quotation; the default is 0x201D ( ” ).
`OPENSQUOTE`:	replaces the ASCII apostrophe ( ' ) at the start of a quotation; the default is 0x2018 ( ‘ ).
`CLOSESQUOTE`:	replaces the ASCII apostrophe ( ' ) at the end of a quotation; the default is 0x2019 ( ’ ).
`APOSTROPHE`:	replaces the ASCII apostrophe ( ' ) within a word; the default is 0x2019 ( ’ ).
`'OKINA`:	replaces the ASCII apostrophe ( ' ) between two vowels; the default is 0x2018 ( ‘ ).
`PRIME`:	replaces the ASCII apostrophe ( ' ) after a number; the default is 0x27 ( ' ).
`DOUBLEPRIME`:	replaces the ASCII double-quote ( " ) after a number; the default is 0x22 ( " ).
`EMDASH`:	replaces pairs of ASCII hyphens ( - ); the default is 0x2014

Note that the variable name 'OKINA begins, ironically enough, with an apostrophe. To disable ‘okinas, SET 'OKINA=0X2019 (or the same value as the apostrophe).

These environment variables control the interpretation of some old-fashioned ASCII text conventions:

`UQUOTES_DOUBLES`:	set to 0 to prevent replacing doubled apostrophes with quotes
`UQUOTES_GRAVES`:	set to 0 to prevent replacing grave accents with open quotes

For example:

rem Use guillemets for quotations: set openquote=0xab set closequote=0xbb echo %@uquotes["Sacré bleu!" he exclaimed.]

Highlight Variable:

Several of the commands in the plugin feature highlighted output. You can customize this feature by setting an environment variable Highlight:

rem Disable highlight: set highlight=none rem Set the highlight foreground: set highlight=bright cyan rem Set both foreground and background: set highlight=bri whi on blu rem Numbers are also supported: set highlight=46

If the Highlight environment variable is not defined, the plugin will check the registry for a value named Highlight of type REG_SZ. The plugin will search, in this order:

• `HKEY_CURRENT_USER\Software\JPPlugins\TextUtils`	(affects this plugin only)
• `HKEY_CURRENT_USER\Software\JPPlugins`	(affects several of my plugins)

Many commands also have a /D or /NC option to disable highlighting.

Startup Message:

This plugin displays an informational line when it initializes. The message will be suppressed in transient or pipe shells. You can disable it for all shells by defining an environment variable named NOLOADMSG, for example:

set /e /u noloadmsg=1

Acknowledgments:

The original Metaphone algorithm is by Lawrence Philips. The variant implemented in this plugin is my own adaptation (improvement? perversion?) Blame me, not him, for its peculiarities.

Changes:

Version:	Date:	Changes:
0.85.0.3	2024-04-02	FileHandler.cpp v1.0.15.0, NewHelp.cpp v1.0.8.14.
0.85.0.2	2024-03-26	Minor tweak to support nested directory aliases.
0.85.0	2024-01-05	Updated to conlist.cpp v1.1 to better support Ctrl-C and Ctrl-Break. Tweaked UTF-16 detection for very small files.
0.84.0	2023-10-17	`DEHTML` no longer smashes whitespace inside `<PRE>` blocks.
0.83.0.2	2023-10-16	Tweaked ShowCmdHelp() to report `VER_PATCH`.
0.83.0.1	2023-10-12	Updated the plugin’s web address.
0.83.0	2023-07-28	Changed `DEHTML`, `@MKENTITIES`, and `COPYCHARS` to use HtmlEntities.cpp. Now they should support all HTML 4 entities. Updated `CHARENCODING` to the version in UChars, and documented it — `CHARENCODING` was somehow missing from the doc files. Lots of additional bug fixes, code tweaks, and doc improvements.
0.82.6	2023-07-24	Updated to the current versions of ParseArgs.cpp, NewHelp.cpp, conlist.cpp, FileHandler.cpp, MMFiles.cpp, and codepages.cpp.
0.82.5	2022-06-09	Minor tweak to `@STRIPACCENTS`: Now Æ æ Œ œ are replaced with AE ae OE oe.
0.82.4	2021-10-20

Status and Licensing:

Consider this beta software. It may well have issues. Try it at your own risk. If you do find a problem, you can report it in the JP Software support forum.

TextUtils is currently licensed only for testing purposes. I may make binaries and source code available under some free license once I consider it ready for use.

Download:

You can download the current version of the plugin from http://charlesdye.net/dl/textutils.zip.

`/A:`array	name of an array to receive the arguments; the default is `ARG`
`/F:`flags	parse flags; bitmapped, see below; the default is 1
`/Q`	quiet; don’t display arguments to stdout
`/V:`var	name of an environment variable containing the string to parse
`!`string	the string to parse

Parse flags:
1	divide the string at unquoted spaces
2	divide the string at unquoted commas
4	slashes kludge: treat `/A/B` like `/A /B`
8	quotes kludge: treat `/A"foo"` like `/A:"foo"`
16	equals kludge: break at the first unquoted equals sign
32	one-arg kludge: allow unquoted spaces in arg not beginning with `/`
64	don’t swallow double quotes
128	force all arguments to uppercase
256	don’t trim spaces from the end of args
512	disable special handling of double quotes

`/A:`min`,`max	the number of alphabetic characters to use
`/C:`n	specify the case of the alphabetic characters:
	0: random
	1: lowercase
	2: uppercase
	3: word case
	5: alternating
	6: leet (vowels lower, consonants upper)
	7: unleet (reverse of the above)
`/D:`min`,`max	the number of digits to use
`/E:`min`,`max	the number of extended characters to use
`/F`	make the first character a letter if possible
`/L:`min`,`max	the total length of the password, in characters
`/N:`n	the number of strings to generate
`/P:`min`,`max	the number of punctuation characters to use
`/S:`min`,`max	the number of syllables to use
`/Y`	also copy the password to the clipboard

`/O`	the command may overwrite an existing file
`/P`	save a partial array as if it were the whole thing; only useful with `/X: /Y: /Z: /W:`
`/Q`	quietly
`/X:`m`,`n	save only X index m through n
`/Y:`m`,`n	save only Y index m through n
`/Z:`m`,`n	save only Z index m through n
`/W:`m`,`n	save only W index m through n
arrayname	an array variable name
filename	the file to create

`/A`	append to any text already on the clipboard
`/CP:`n	interpret non-Unicode input text using code page n
`/Q`	replace ASCII quotes and apostrophes with Unicode open and close quotes
`/T`	quietly

delims	exactly two characters, one start and one end delimiter
string	the string to parse

filename	the file to examine
section	the name of the section to search for the entry
entry	the name associated with the desired value
index	which entry to return; defaults to 0 (the first); -1 returns the number of matching entries
errorstr	the string to return on any error; defaults to nothing (the empty string)
flags	a bitmapped integer controlling advanced features:
	1 — bomb out on file errors
	2 — treat section as a wildcard to match
	4 — treat entry as a wildcard to match

filename	the file to scan
n	what to report:
	`1`: the number of lines ending in CR/LF pairs
	`2`: the number of lines ending in LF/CR pairs
	`3`: the number of lines ending in CR not followed by LF
	`4`: the number of lines ending in LF not followed by CR
	`5`: the number of lines ending in NEL
	`10`: the total number of line-end sequences in the file

`Empty`	The file contains no data.
`None`	No line-end characters were found.
`CR/LF`	The file uses CR/LF line ends.
`LF/CR`	The file uses LF/CR line ends. (Who does this?)
`CR`	The file uses CR line ends.
`LF`	The file uses LF line ends.
`NEL`	The file uses NEL line ends.
`Mixed`	The file uses more than one line-end sequence.
`ERROR`	There was an error reading from the file.

word	the word or words to process
length	the maximum length of the codes to return (8)
flags	set to 1 for better compatibility

Character:	Replaced with:
" (double quote)	"
% (percent sign)	%
& (ampersand)	&
< (less-than sign)	<
> (greater-than sign)	>

string1	the first string to compare
string2	the second string to compare

`Empty`	There is no data in the file.
`OEM`	The file is probably not Unicode.
`UTF-16LE`	The file is probably 16-bit Unicode.
`UTF-16BE`	The file is probably 16-bit Unicode (big-endian).
`UTF-8`	The file is probably UTF-8 encoded Unicode.
`UTF-32LE`	The file looks like UTF-32 (little-endian).
`UTF-32BE`	The file looks like UTF-32 (big-endian).
`EBCDIC`	The file is probably in some version of EBCDIC.

`Empty`	There is no text in the file.
`Unformatted`	Line breaks are used to end paragraphs.
`Prewrapped`	Line breaks are used to limit line width.

— Quick Links —

TextUtils plugin for Take Command / TCC / TCC/LE

beta version 0.85.0.3 2024-04-02

Charles Dye

Purpose:

Installation:

Plugin Features:

Syntax Note:

New Commands:

New Functions:

New Variables:

Reference Info:

Ranges:

Code Pages Supported:

Character Escapes:

UQuotes Control Variables:

Highlight Variable:

Startup Message:

Acknowledgments:

Changes:

Status and Licensing:

Download: