TextUtils plugin for Take Command / TCC / TCC/LE

beta version 0.85.2     2024-11-05

Charles Dye

Purpose:

This plugin implements a variety of text-related features. There are new commands to count words, sentences, and paragraphs in English text; find words in text and display them in context; replace words in text; generate random passwords; display the lines of a text file in reverse order; wrap text to a desired width; and save an entire array to disk and reload it later. New functions allow you to obscure text to make it unreadable, and restore it later; determine the character encoding and text format of text files; generate Metaphone codes; remove accents from text strings; and count vowels in a string.

Installation:

To use this plugin, copy TextUtils.dll and TextUtils.chm to some known location on your hard drive. (If you are still using the 32-bit version of Take Command, take TextUtils-x86.dll instead of TextUtils.dll.) Load the plugin with a PLUGIN /L command, for example:

plugin /l c:\bin\tcmd\test\textutils.dll

If you copy these files to a subdirectory named PlugIns within your Take Command program directory, the plugin will be loaded automatically when TCC starts.

Plugin Features:

New commands:
CHARENCODING CLIP2TEXT CONTEXT COPYCHARS COUNTCHARS
DEDUP DEGAS DEHTML FFIELDS FILTERFILES
LOADARRAY OINK PARSEARGS PASSWORD RECASE
REPLACETEXT ROT13 SAVEARRAY SHUFFLE TEXT2CLIP
TEXTUTILSHELP UNICODIFY UPEND UTYPE WORDS
WRAP XFILTER
New functions:
@B85TOBIN @BETWEEN @BINTOB85 @CLARIFY @INIVALUE
@LINEENDS @METAPHONE @MKENTITIES @OBSCURE @OINK
@ROT13 @ROUGHLYSIMILAR @STRIPACCENTS @TEXTENCODING @TEXTFORMAT
@UCHAR @UCODE @UCODEX @ULEN @UQUOTES
@VOWELS
New variables:
_CHARACTERS _CHARACTERSALL _GETACP _INIVALUERC _LINES
_LINESALL _LONGESTLINE _LONGESTLINEALL _NONBLANKLINES _NONBLANKLINESALL
_PARAGRAPHS _PARAGRAPHSALL _PASSWORD _PROPERNOUNS _PROPERNOUNSALL
_SENTENCES _SENTENCESALL _SENTENCESD _SENTENCESDALL _SENTENCESE
_SENTENCESEALL _SENTENCESQ _SENTENCESQALL _SENTENCEWORDS _SENTENCEWORDSALL
_TITLES _TITLESALL _UNIQUEWORDS _UNIQUEWORDSALL _WC
_WCALL _WORDFILES _WORDS _WORDSALL  

Syntax Note:

The syntax definitions in the following text use these conventions for clarity:

BOLD CODEindicates text which must be typed exactly as shown.
CODEindicates optional text, which may be typed as shown or omitted.
Bold italicnames a required argument; a value must be supplied.
Regular italicnames an optional argument.
ellipsis…after an argument means that more than one may be given.

New Commands:

CHARENCODING — Show UTF-16 and UTF-8 encodings for characters.

Syntax:
CHARENCODING /16 /8 /C /D /K /N /X value "string"

/16show UTF-16 encoding
/8show UTF-8 encoding
/Cshow characters
/Dshow decimal values
/Kshow character class
/Nshow character names if available
/Xexpand C-style character escapes in quoted strings
valuehex character value; leading 0x or U+ is optional
"string"strign literal between quotes

You may enter characters as quoted string literals, character values, HTML 4 character entities, or any combination. You may prefix hex values with 0x or U+ but neither is required. With or without either prefix, hexadecimal is assumed. Separate values with spaces. If you specify neither /16 nor /8, the default is to show both.

/K displays a one-letter code to indicate the type of character:

KClass
Aalphabetic
Ddigit
Ppunctuation
Wwhitespace
Ccontrol character
BByte Order Mark
Nnoncharacter
Hunpaired surrogate (high) — not a character
Lunpaired surrogate (low) — not a character
-anything else

/N displays the official Unicode name of a character, if it is available. This feature requires Windows 10 build 1703 or later; it will not work in earlier versions.

/X expands any escapes in quoted strings after the /X on the command line. Strings before the /X will not be expanded.

charencoding /c "Hello, world. %@uchar[1f638]"



CLIP2TEXT — Copy text from the clipboard to a file or standard output.

Syntax:
CLIP2TEXT /A /NB /O /P /T /UTF8 /UTF16 filename

/Aappend to an existing file
/NBdo not write a Byte Order Mark
/Ooverwrite an existing file
/Ppage output (useful only when output is to stdout)
/Tquietly
/UTF8write file in the UTF-8 encoding
/UTF16write file in the UTF-16 encoding (default)

Only one filename is allowed. If no filename is specified, CLIP2TEXT will dump the clipboard to standard output.

See also: the TEXT2CLIP command.



CONTEXT — Search for words in English text and display them in context.

Syntax:
CONTEXT /A:attribs /C:n /CP:n /F:n /H:n /K:n /P /N /S /V /W:base /X:word /Y:word filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/C:nspecifies the number of sentences of context to display, before and after
/CP:ninterpret non-Unicode input text using code page n
/F:nspecifies the format of the input text; n is one of:
   0 — best guess (default)
   1 — unformatted (line breaks are used only to end paragraphs)
   2 — prewrapped (line breaks are used to wrap text)
   3 — unformatted, with blank lines between paragraphs
/H:nset highlight colors for matching words
/K:noutput columns for word-wrap
/Ppage output
/Ndisable features
/Ssearch in subdirectories for matching filenames
/Vverbose; report counts of found items after each file and at the end
/W:basesearch for forms of a word
/W:"base base…"search for a series of word forms
/X:wordsearch for an exact word
/X:"word word…"search for a series of exact words
/Y:wordsearch for words that sound like word
Range options are also supported.

CONTEXT can read from disk files or from a pipe. If you want to pipe to CONTEXT, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to search for words on the clipboard.

Note:  This command was created specifically to search through English text. I make many Anglocentric assumptions about what constitutes a ‘word’, a ‘sentence’, a ‘paragraph’, ‘forms’ of a word, and so on. These assumptions are probably not useful for any other language.

Word search: /W:base searches for forms of a word; this will probably be your most frequently-used option. Specify the base form of a word, and CONTEXT will attempt to match variations of it. For example, /W:DOG will match dog, dogs, dog’s, doggy, and even doggedly.

A word in the input text is considered a ‘form’ of the specified base word if (1) the beginning matches for the entire length of base, except that a final Y at the end of the base word will match an I in the word from the text; and (2) the remainder of the word does not contain more than one vowel other than Y. Case is not significant, and most common accents are ignored; /W:garcon will match garçon, /W:"deja vu" will match Déjà vu, and so on.

If a word from the input text contains a hyphen, the /W: search will also look for the specified base word to either side of the hyphen; /W:LEVEL will match level-headed, sub-level, and even poorly-levelled.

Word series: You can search for a series of words with /W:"base base…". To match, a series of words must appear within the same sentence in the input text; a word series cannot span the end of a sentence. Matching words must be consecutive, and may be separated by spaces, tabs, or other punctuation. CONTEXT will check for forms of each base word as above, but will not look for the base within hyphenated words. For instance, /W:"LITTLE OLD LADY" will match little, old ladies.

Exact-word search: /X:word searches for a word without checking for variant forms. /X: does not look for the specified word within hyphenated words. Case and accents are still ignored. You can search for a series of exact words with /X:"word word…".

Sound-alike search: /Y:word searches for words which sound similar to the specified word. The comparison uses a Metaphone-like algorithm to guess at a word’s pronunciation. (This type of search does not support word series.)

Surrounding context: By default, CONTEXT displays one sentence before, and one sentence after, each sentence containing any of the specified search words. You can adjust this value with /C:n; legal values are 0 to 15. Note that you may see more than 2n sentences between found words that are close together; CONTEXT will display a little extra text rather than introduce a very short break. You may also see fewer than n sentences near the start or the end of a file.

Highlighting: If CONTEXT’s output is to the screen (i.e. stdout is not redirected), text which matches your search words will be highlighted in a different color. By default, CONTEXT picks a highlight color which contrasts with the current console colors. You can specify your own highlight color either with the option /H:n, or by setting an environment variable named HIGHLIGHT. Either way, the value should be a decimal number from 1 to 254, or a hexadecimal value from 0x01 to 0xFE. The high four bits set the background color, and the low four bits set the foreground color; the two values must be different. The command-line option takes precedence over the environment variable. You can disable highlighting with /NC. Text is not highlighted if the commands’s output is redirected.

Reports: If /V is specified, CONTEXT will also report the number of times each search word was found within a file. If more than one file is processed it will also show a final report for all files, giving the number of times each search word was found in total, and in how many files.

Text encoding: CONTEXT automatically detects Unicode text files. If the file is not Unicode, the command has no way of detecting the character encoding; the default Windows code page is assumed. You can specify a different code page for non-Unicode text files with /CP:n. Most single-byte (i.e., alphabetic) code pages are supported, but multibyte code pages (Chinese, Japanese, Korean) are not. This option only affects non-Unicode files.

Text format: Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to wrap text to some desired width. You can use /F:n to tell CONTEXT how to handle line breaks. /F:1 indicates that the text is unformatted, with line breaks only at the ends of paragraphs. CONTEXT will honor all line breaks, and add an extra blank line after each paragraph. /F:2 means that the input text is prewrapped, having line breaks within paragraphs and even within sentences. CONTEXT will skip single line breaks, honoring only sequences of two or more in a row. /F:3 is also for unformatted text and acts like /F:1, but does not insert a blank line after each paragraph. If you specify /F:0 or do not specify any /F:n, CONTEXT will attempt to guess how the input text is formatted. (Guessing is not reliable when there isn’t much input text.)

Word wrap: Text output by CONTEXT will be word-wrapped. If output is to the screen, it will be wrapped to the screen width. If output has been redirected, the default width is 100 columns. You can set a different width using the /K:n option; the value must be between 40 and 512.

Disabling features: /N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.


C:\> context https://www.gutenberg.org/files/11/11-0.txt /w:paint

File "D:\download\pg11.txt" :

CHAPTER VIII. The Queen's Croquet-Ground

A large rose-tree stood near the entrance of the garden: the roses growing on it were white, but there were three gardeners at it, busily painting them red. Alice thought this a very curious thing, and she went nearer to watch them, and just as she came up to them she heard one of them say, 'Look out now, Five! Don't go splashing paint over me like that!'

'I couldn't help it,' said Five, in a sulky tone; 'Seven jogged my elbow.'

*    *    *


Seven flung down his brush, and had just begun 'Well, of all the unjust things--' when his eye chanced to fall upon Alice, as she stood watching them, and he checked himself suddenly: the others looked round also, and all of them bowed low.

'Would you tell me,' said Alice, a little timidly, 'why you are painting those roses?'

Five and Seven said nothing, but looked at Two.


C:\>



COPYCHARS — Put characters on the clipboard.

Syntax:
COPYCHARS /A /Q value entity "string" 

/Qappend to current clipboard text
/Aquietly

Character values may be specified in decimal, or in hexadecimal with a leading 0x.

Entities are as in HTML 3.2; the leading ampersand may be omitted. Entities are case sensitive.


rem   A non-breaking space, an em dash, and a space:
copychars nbsp; mdash; 32

rem   Text in fancy quotes:
copychars ldquo; "This is a test." rdquo;

rem   High-order characters are supported:
copychars 0x1f603



COUNTCHARS — Count characters in text files.

Syntax:
COUNTCHARS /C:x-y /CP:n /O /P /R /RO /S /U /V /W /X filespec…

/C:x-yspecify a range of characters to count
/CP:ninterpret non-Unicode input text using code page n
/Osort by frequency
/Ppage output
/Rreport counts for ranges as well as individual characters
/ROreport range counts only, not counts of individual characters
/Ssearch in subdirectories for matching files
/Uforce characters to uppercase
/Vdo not automatically merge overlapping ranges
/Wdo not report count of ‘other’ characters
/Xdo not report total characters count
/ASCIIshort for /C:0-127
/BMPshort for /C:0-0xFFFF
/HIshort for /C:0x10000-0x10FFFF
Range options are also supported.

Input filenames may be specified on the command line, or text may be redirected or piped into COUNTCHARS. If you want to pipe to COUNTCHARS, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read text from the clipboard.


Specify ranges of characters to count with /C:x-y. The start and end characters x and y may be given as decimal, hexadecimal with a leading 0x, or as literal characters:

rem   These three are all the same:

countchars /c:65-90 myfile.txt
countchars /c:0x41-0x5a myfile.txt
countchars /c:A-Z myfile.txt

To specify a literal digit, wrap it in apostrophes:

countchars /c:'0'-'9' myfile.txt

You may specify up to 32 ranges. If you do not specify any ranges, the default is /C:0-127 (ASCII characters).


All values, both in character ranges and in COUNTCHARS’s reports, refer to Unicode code points. If the text uses an 8-bit or OEM encoding, the values reported are the values of the Unicode characters that the OEM characters are translated into — not the OEM character values.


How many letters are in Engine Summer.txt?

countchars /c:A-Z /u /ro "Engine Summer.txt"

File "C:\Bin\JPSDK\TextUtils\Engine Summer.txt" :

    0041 - 005A :  343
    Other       :  161
    TOTAL       :  504

/C:A-Z defines a range of characters from A to Z. /U converts lowercase letters to uppercase so they will also be counted in the same range. /RO reports only the the total number of characters in the range; we only want the total number of letters, not the number of As, Bs, Cs, and so on. There are 343 letters in this file.


How many Cyrillic letters? Most Cyrillic letters fall in the range of U+0400 to U+04FF:

countchars /c:0x0400-0x04ff /ro "Engine Summer.txt"

File "C:\Bin\JPSDK\TextUtils\Engine Summer.txt" :

    0400 - 04FF :  0
    Other       :  504
    TOTAL       :  504

Mr. Crowley is not writing in Russian.



DEDUP — Dump text files to standard output, merging repeated lines.

Syntax:
DEDUP /A:attribs /B /C /CP:n /D /H /I /M /N /P /S /T /U filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard blank lines
/Cshow line repeat counts
/CP:ninterpret non-Unicode input text using code page n
/Dshow only repeating lines
/Hdisplay filenames
/Iignore case when comparing lines
/Mmerge repeating lines (default)
/Ndisable features
/Ppage output
/Ssearch in subdirectories for matching files
/Ttrim leading and trailing whitespace
/Ushow only lines which do not repeat
Range options are also supported.

Input filenames may be specified on the command line, or text may be redirected or piped into DEDUP. If you want to pipe to DEDUP, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read text from the clipboard.

Options /D, /M, and /U select the operating mode. If you don’t specify one, the default is /M. If you specify more than one, the last one wins.

/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.



DEGAS — Remove excess spaces and blank lines from text.

Syntax:
DEGAS /A:attribs /B:n /CP:n /E:n /H /L /N /P /R /S /T /W filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/B:nmaximum whitespace characters
/CP:ninterpret non-Unicode input text using code page n
/E:nmaximum blank lines
/Hdisplay filenames
/Ldisplay line numbers
/Ndisable features
/Ppage output
/Rremove all blank lines at the start and end of the file
/RSremove all blank lines at the start of the file
/REremove all blank lines at the end of the file
/Ssearch in subdirectories for matching files
/Ttrim all leading and trailing whitespace from each line
/Wconvert all whitespace characters to ASCII spaces
Range options are also supported.

The contents of the files will be dumped to standard output, with excess spaces and blank lines removed.

Input filenames may be specified on the command line, or text may be redirected or piped into DEGAS. If you want to pipe to DEGAS, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to dump the clipboard.


/B: lets you specify the maximum number of whitespace characters in a row. For example, /B:4 allows no more than four whitespace characters in a row.

DEGAS allows for the convention of spacing twice at the end of a sentence. Specify two numbers separated by a comma: /B:n,m. The first sets the maximum number of whitespace characters after a period, question mark, or exclamation point; the second is the maximum after any other character. /B:2,1 allows up to two spaces at the end of a sentence, but only one elsewhere.


/E: specifies the maximum number of blank lines in a row. (A line containing only whitespace characters is considered a ‘blank line’.) /E:3 allows no more than three blank lines together. /E:0 removes all blank lines; /E:0 can be abbreviated to /E.


You can remove all blank lines at the start of a file with /RS. Likewise, you can remove all blank lines at the end of a file with /RE. /R does both. This option is independent of the /E: compression of blank lines.


/T strips all leading and trailing whitespace from each line. This is a separate operation from the /B: compression of spaces, and happens earlier.


If none of /B: /E: /R /RS /RE or /W are specified, the default is /B:2,1 /E:1 — a maximum of two spaces at the end of a sentence, one space elsewhere; and no more than one blank line in a row.


/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.



DEHTML — Strip HTML tags from a file and dump the contents to standard output.

Syntax:
DEHTML /A:attribs /B /C /CP:n /E /H /M /N /N: /O:n /P /R /S filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bexclude text outside the body and title
/Cinclude text in <!-- comments -->
/CP:ninterpret non-Unicode input text using code page n
/Eomit empty (blank) lines
/Hdisplay filenames
/Mlook in <meta> tags for charset info
/Nby itself: include text in <noscript> or <applet> tags
/N:with suboptions: disable features
/O:ninclude text inside <option> tags:
   0 — don’t include any (the default)
   1 — include only the first <option>
   2 — include all <option> text
/Ppage output
/Rremove title
/Ssearch in subdirectories for matching files
Range options are also supported.

Input filenames may be specified on the command line, or text may be redirected or piped into DEHTML. If you want to pipe to DEHTML, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to dump the clipboard if it contains HTML.

DEHTML will strip HTML tags from the file and replace HTML entities with the corresponding characters; most of the remaining text will be dumped to stdout. This command will also discard: any text in the header which does not appear within <title> tags; anything in <script> or <style> tags; anything within an HTML comment unless you specify /C; anything in <noscript> or <applet> tags unless you specify /N; and anything in <option> tags within a <select> block unless you specify /O:1 or /O:2.

If you specify /M, DEHTML will look in <meta> tags in the header for information about the document’s character encoding. This only works if the file is not in Unicode; /M has no effect with Unicode files.

/N with suboptions disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.


•  Note: HTML files often include some unusual characters like non-breaking spaces, bullets, em dashes, ellipses, and guillemets. If you want to pipe or redirect the output from this command, it’s a good idea to enable Unicode output with OPTION //UNICODEOUTPUT=YES. If Unicode output is disabled, some characters may be mangled in translation.



FFIELDS — Read a file and print fields in a specified format.

Syntax:
FFIELDS /A:attribs /C /CP:n /E /F:"format" /H /K:n /L:string /N /P /Q /S /T /W /X filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Cseparate fields at commas
/CP:ninterpret non-Unicode input text using code page n
/Eseparate fields at first unquoted equals sign
/F:"format"format string; see below
/Hdisplay filenames
/K:noutput line width (columns)
/L:stringinsert line numbers on the left
/Ndisable features
/Ppage output
/Qremove quotes (the default is to retain them)
/Ssearch in subdirectories for matching files
/Tseparate fields at tabs
/Wseparate fields at whitespace
/Xperform variable expansion on each line
Range options are also supported.

The FFIELDS command reads a file, divides each line into fields (blank lines are skipped), and then prints the fields using a format string. FFIELDS can read from disk files or from a pipe. If you want to pipe to FFIELDS, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read from the clipboard instead of a file.

The format string may contain $n to print field n, or $n=wf to print field n truncated to length w; the final letter is L to left-justify the field if it contains fewer than w characters, R to right-justify it, C to center it, or T to simply truncate the field without padding it to length w. For example, a field specifier of $4=10L would print field 4, left-justified to 10 characters. Use $$ to print a literal dollar sign, or $N to insert a line break.

Fields are numbered starting from 0.


set |! ffields /e /f:"$0=20l $1=58t"

…displays variable names truncated to 20 characters, followed by a space and the variables’ values truncated to 58 characters.

If you include /L on the command line, FFIELDS will insert line numbers to the left of each output line. Lines are numbered starting at 0. If you include the optional string argument, FFIELDS will perform variable expansion on it before prepending it to each output line; use the variable _LINE to get the current line number. For example, /L:"%%@FORMAT[03,%%_LINE]" will prepend the line number, zero-padded to at least three digits.

If you don’t specify a format string, FFIELDS will invent one at random:


alias |! ffields /e

/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.

/X does variable expansion on each line before displaying it. You could, for example, count the characters in each alias definition:


alias |! ffields /e /f:"$0 = (%%@len[$1])  $1" /x



FILTERFILES — Pass files through a text filter command.

Syntax:
FILTERFILES /B:.ext /C /J /N /P /Q /S /UTF8 /UTF16 filespec… : command args…

/B:.extextension for backups; the default is .original
/Cdo not abort if the command exits with errorlevel 3
/Nnot really
/Ndisable features
/Jredirect input
/Pprompt for each file
/Qquietly
/Ssearch in subdirectories for matching files
/UTF8redirect output as UTF-8
/UTF16redirect output as UTF-16
Range options are also supported.
filespec…the files to process; at least one filespec is required
commanda filter command which writes to stdout

At least one filespec is required. Anything after the first unquoted colon is the command to execute; this also is required.

Matching files will be renamed with a .original extension, or as per /B. Then the specified command will be called, passing the new filename on its command line after any args, and with its output redirected to the new filename.

This command only supports local files. CLIP:, URLs, standard input, and so on are not supported.


/N by itself prevents FILTERFILES from doing anything. Matching files will be displayed but not renamed, and the command will not be executed.

/N with suboptions disables features:

/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.


/P causes FILTERFILES to prompt before processing each file. You can press:

Yto filter the file
N or Escto skip the file
Ato stop prompting and filter all remaining files
Qto exit immediately

/UTF8 and /UTF16 let you set the output encoding. They call OPTION //UnicodeOutput= and OPTION //UTF8Output= before processing files, and then restore the original settings before FILTERFILES exits. Note that //UTF8Output does not actually work in TCC/LE.


By default, FILTERFILES passes each original filename to the command on its command line:

filtercmd "file.original" > "file.txt"

If you specify /J, it will use input redirection instead:

filtercmd < "file.original" > "file.txt"


FILTERFILES is mainly intended for use with the filters in this plugin: DEDUP, DEGAS, DEHTML, WRAP, and so on. But you can use it with any command that either accepts a filename on its command line or reads from standard input, and that writes text to standard output.


rem   Convert all .TXT file in the current directory to Pig Latin:

filterfiles *.txt : oink


rem   Add line numbers to MyFile.txt:

filterfiles myfile.txt : type /L



LOADARRAY — Load data from a file into an array variable.

Syntax:
LOADARRAY /Q filename arrayname

/Qquietly
filenamea file created by SAVEARRAY
arraynamean array variable name

The arrayname must begin with a letter. It may contain only letters, digits, underscores, and dollar signs; it should not be more than 31 characters long. If you don’t specify an arrayname, the name of the original array saved in the file will be used. The array will be created (or recreated) automatically, with the correct dimensions to hold the data from the file.

All elements in the file will be loaded. There is no provision for loading a partial array.

•  Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.

See also: the SAVEARRAY command.



OINK — Translate a text file to Pig Latin.

Syntax:
OINK /A:attribs /CP:n /D /H /N /P /Q /S filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/CP:ninterpret non-Unicode input text using code page n
/Ddisable highlight
/Hdisplay filenames
/Ndisable features
/Ppage output
/Qreplace ASCII quotes and apostrophes with Unicode open and close quotes
/Ssearch in subdirectories for matching files
Range options are also supported.

If standard input (stdin) is redirected, OINK will read from stdin before any filenames specified on the command line. If no filenames are specified, then OINK will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read from the clipboard.

If you want to pipe to OINK, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL in the shell’s .DLL directory; or else use temporary files or an in-process pipe.

/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.

(Yes, this is silly. It was a simple test driver to generate gribble for testing some of the other commands and functions in this plugin. It’s very small — most of the code is shared with other commands — so I left it in.)

See also: the @OINK function, which renders a string as Pig Latin.



PARSEARGS — Divide a string into arguments.

Syntax:
PARSEARGS /A:array /F:flags /Q /V:var !string

/A:arrayname of an array to receive the arguments; the default is ARG
/F:flagsparse flags; bitmapped, see below; the default is 1
/Qquiet; don’t display arguments to stdout
/V:varname of an environment variable containing the string to parse
!stringthe string to parse

This command exposes the plugin’s internal ParseArgs() function, which divides a string into command-line arguments. Its operation can be changed in various ways with the /F:flags option.

The string to be parsed may be passed in two different ways. You can pass the string on the command line, immediately following an exclamation point. The string must be the last item on the command line; everything following the exclamation point is considered the string to parse. Alternatively, you can store the string in an environment variable, and pass the name of the variable with the /V:var option.

The resulting arguments will be stored in an array. You can specify the name of the array with the /A:array option. The array name must begin with a letter. It may contain only letters, digits, underscores, and dollar signs; it should not be more than 31 characters long. If you don’t specify an array name, the default is ARG. The number of arguments found will be stored in an environment variable; the name of this variable is the name of the array with an _N appended, for example ARG_N.

Parse flags:
1divide the string at unquoted spaces
2divide the string at unquoted commas
4slashes kludge: treat /A/B like /A /B
8quotes kludge: treat /A"foo" like /A:"foo"
16equals kludge: break at the first unquoted equals sign
32one-arg kludge: allow unquoted spaces in arg not beginning with /
64don’t swallow double quotes
128force all arguments to uppercase
256don’t trim spaces from the end of args
512disable special handling of double quotes

You should specify at least one of 1, 2, or 16; specifying more than one is allowed. If you don’t specify any, then 1 is assumed. Note that if you include a value of 2 (break at commas), then empty arguments are possible.

A value 4 causes causes a slash to terminate an argument beginning with a slash followed by a letter. It treats an argument like /A/B as two separate arguments.

A value of 8 checks for arguments beginning with a slash followed by a single letter and then a double quote. If this kind of construction is found, the missing colon is supplied, changing /A"foo" into /A:foo.

If you only expect one argument which does not begin with a slash, and if that argument will always be the last one in the string, you can add 32 to flags. This allows the (only) argument to contain spaces without the necessity of double quotes.

A value of 16 is useful for commands that, like SET or ASSOC, expect a name=value pair. This mode has a number of peculiar quirks. It splits arguments at the first unquoted equals sign in an argument which does not begin with a slash. Spaces around the equals sign are dropped. Spaces in the argument after the equals sign, the value part, are retained even if they are not quoted; the name=value pair is expected to be the last item on the command line. The equals sign is retained as the first character in the value argument; this allows you to distinguish a name= construction (to clear or reset the value for name, perhaps) from a name alone (to report the value for name without changing it.)

Normal behavior is to remove double quotes from the string. Typically the double quotes are not part of the filename, value, etc. per se, but a syntactic mechanism for escaping spaces; once the string has been parsed there is no further need for them. If you want to retain double quotes, add 64 to the value of flags.

•  Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.



PASSWORD — Generate random strings suitable for use as passwords.

Syntax:
PASSWORD /A:min,max /C:n /D:min,max /E:min,max /F /L:min,max /N:n /P:min,max /S:min,max /Y

/A:min,maxthe number of alphabetic characters to use
/C:nspecify the case of the alphabetic characters:
     0: random
     1: lowercase
     2: uppercase
     3: word case
     5: alternating
     6: leet (vowels lower, consonants upper)
     7: unleet (reverse of the above)
/D:min,maxthe number of digits to use
/E:min,maxthe number of extended characters to use
/Fmake the first character a letter if possible
/L:min,maxthe total length of the password, in characters
/N:nthe number of strings to generate
/P:min,maxthe number of punctuation characters to use
/S:min,maxthe number of syllables to use
/Yalso copy the password to the clipboard

This command displays proposed passwords to standard output. Output can be redirected.

The default behavior is to generate a password from 7 to 10 characters long. You can specify the desired length with /L:min,max. The allowed range is 4 to 1024 characters. If you specify only one value after the /L: it will be used as both the minimum and the maximum. (All the other options which accept a min,max range behave the same way.)

/A:min,max sets the number of alphabetic characters to include. ‘Alphabetic characters’ are the unaccented Latin letters, A to Z. The values must be from 0 to 512. The legal range is from 0 to 512 alpha characters.

/D:min,max specifies the number of digits to include; digits are of course 0 to 9. The legal range is from 0 to 128 digits.

Punctuation is by default limited to standard ASCII punction marks with no special meaning to TCC: !@#$*()-_=+;:,./?{}~ You can specify a custom set of punctuation characters by setting an environment variable named PUNCTUATION_CHARACTERS. You may include from 0 to 64 punctuation characters.

‘Extended characters’ are the Unicode code points from U+00C0 through U+00FF: accented Latin letters, thorn, eth, easc, eszett, and a few other hard-to-type glyphs. These characters are not included unless you specify a nonzero value using /E:. You can include up to 64 extended characters.

‘Syllables’ are series of four letters, alternating consonant and vowel sounds. They are intended to be somewhat pronounceable, and perhaps more memorable than an entirely random letter salad. Syllables are not guaranteed to be real words; nor are they not guaranteed not to be real words. You may include up to 64 syllables.

The /C:n case option, if specified, is only applied to the regular Latin letters A — Z. It does not affect extended characters. If you specify /C:3 (word case), then the first letter in a run of consecutive letters will be capitalized and the remainder will be in lowercase. These runs are not likely to correspond to actual words. The /C:5 option will give roughly equal numbers of uppercase and lowercase letters.


rem  Generate a 10-character random password, and
rem  stash it on the clipboard:

password /l:10 /y


This command also saves its parameters for future calls to the _PASSWORD variable.



RECASE — Change the case of text.

Syntax:
RECASE /A:attribs /C /CP:n /H /L /P /S /U filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Ccapitalize the first letter of each word
/CP:ninterpret non-Unicode input text using code page n
/Hdisplay filenames
/Lmake text lowercase
/Ppage output
/Ssearch in subdirectories for matching files
/Umake text uppercase

If standard input (stdin) is redirected, RECASE will read from stdin before any filenames specified on the command line. If no filenames are specified, then RECASE will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read from the clipboard.

If you want to pipe to RECASE, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL in the shell’s .DLL directory; or else use temporary files or an in-process pipe.



REPLACETEXT — Replace strings in text from a file.

Syntax:
REPLACETEXT /A:attribs /C /CP:n /H /N /P /R:from:to /S /W /X:from:to filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Creplace character escapes (affects following /R: and /X:)
/CP:ninterpret non-Unicode input text using code page n
/Hdisplay filenames
/Ndisable features
/Ppage output
/R:from:tospecify old and replacement text
/Ssearch in subdirectories for matching files
/Wwhole words only (affects following /R: and /X:)
/X:from:tospecify old and replacement text (do not auto-capitalize)
Range options are also supported.

If standard input (stdin) is redirected, REPLACETEXT will read from stdin before any filenames specified on the command line. If no filenames are specified, then REPLACETEXT will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read from the clipboard.

If you want to pipe to REPLACETEXT, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL in the shell’s .DLL directory; or else use temporary files or an in-process pipe.

Use /R: or /X: to specify the strings to search for (from) and to substitute (to). You must have at least one of these; you may add as many as you like. The text from each matching file will be dumped to stdout, with every occurrence of from replaced with the corresponding to string. If you give a from string without a matching to, then matching strings will simply be omitted from the output. The difference between the two options is that /R: automatically capitalizes the to string to match the from text which it replaces, but /X: does not. The rules for /R: are simple:

/W only affects those /R: and /X: options which follow it on the command line. /W prevents matching text which immediately follows or immediately precedes a letter or digit.

/C only affects those /R: and /X: options which follow it on the command line. /C expands character escapes of the form \nnn (decimal) or \Xxx (hexadecimal) in both the from and to text. Use this option to embed troublesome characters. For example, you could use /C /R:\x22: to strip double-quote marks from a file.

/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.


replacetext "Engine Summer.txt" /w /r:winter:autumn /r:but:yet



ROT13 — Encode or decode text with ROT13.

Syntax:
ROT13 /A:attribs /CP:n /H /N /P /S filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/CP:ninterpret non-Unicode input text using code page n
/Hdisplay filenames
/Ppage output
/Ndisable features
/Ssearch in subdirectories for matching files
Range options are also supported.

If standard input (stdin) is redirected, ROT13 will read from stdin before any filenames specified on the command line. If no filenames are specified, then ROT13 will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read from the clipboard.

If you want to pipe to ROT13, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL in the shell’s .DLL directory; or else use temporary files or an in-process pipe.

/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.

See also: the @ROT13 function, which transforms a string using ROT13.



SAVEARRAY — Save data from an array variable to a file.

Syntax:
SAVEARRAY /O /P /Q /X:m,n /Y:m,n /Z:m,n /W:m,n arrayname filename

/Othe command may overwrite an existing file
/Psave a partial array as if it were the whole thing; only useful with /X: /Y: /Z: /W:
/Qquietly
/X:m,nsave only X index m through n
/Y:m,nsave only Y index m through n
/Z:m,nsave only Z index m through n
/W:m,nsave only W index m through n
arraynamean array variable name
filenamethe file to create

The arrayname should begin with a letter. It should contain only letters, digits, underscores, and dollar signs; it should not be more than 31 characters long.

All non-empty elements in the array will be saved. You can restore the data later with LOADARRAY.

The default behavior is to save the entire array. You can restrict the elements saved using the /X:, /Y:, /Z:, and /W: options. /X: restricts the first dimension of the array, /Y: affects the second, /Z: the third, and /Z: the fourth.

•  Note: The maximum size for any element in the array is 8,191 characters. Longer elements may cause issues!

•  Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.

See also: the LOADARRAY command.



SHUFFLE — Dump randomized lines from a text file.

Syntax:
SHUFFLE /A:attribs /B /CP:n /H /J /L /M:n /N /P /S filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard blank lines
/CP:ninterpret non-Unicode input text using code page n
/Hdisplay the filename before each file
/Jshow line numbers (original)
/Lshow line numbers (new)
/M:nmaximum number of lines to show
/Ndisable features
/Ppage output
/Ssearch in subdirectories for matching files
Range options are also supported.

SHUFFLE randomly reorders lines from the specified file. It can read from disk files or from a pipe. If you want to pipe to SHUFFLE, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

If standard input (stdin) is redirected, SHUFFLE will read from stdin before any filenames specified on the command line. If no filenames are specified, then SHUFFLE will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read lines from the clipboard.

/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.


shuffle /b "engine summer.txt"



TEXT2CLIP — Copy text from a file onto the clipboard.

Syntax:
TEXT2CLIP /A /CP:n /Q /T filename

/Aappend to any text already on the clipboard
/CP:ninterpret non-Unicode input text using code page n
/Qreplace ASCII quotes and apostrophes with Unicode open and close quotes
/Tquietly

Only one filename is allowed. Text may be piped or redirected into TEXT2CLIP.

See also: The CLIP2TEXT command.



TEXTUTILSHELP — Open the TextUtils plugin help file.

Syntax:
TEXTUTILSHELP /C /F /S /S:text /V topic

/Cselect the ‘Contents’ tab
/Fselect the ‘Favorites’ tab
/Sselect the ‘Search’ tab
/S:textselect the ‘Search’ tab and search for text
/Vshow detailed plugin version info
topicthe page to display

The TEXTUTILSHELP command will locate and open this plugin’s help file. In most cases, the internal HELP command, and the F1 and Ctrl-F1 keys, will be more convenient. The main advantage to this command is that it can be used to open the help file to any desired topic, not only to the names of commands, functions, and variables.


Note that any /C /F or /S must precede any topic on the command line. (This command has a very simple-minded parser.)



UNICODIFY — Convert text files to Unicode.

Syntax:
UNICODIFY /A:attribs /CP:n /L /N /O /P /Q /S /T /UTF8 /UTF16 filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/CP:ninterpret non-Unicode input text using code page n
/Lnormalize line endings to CR/LF
/Ndisable features
/Ooverwrite read-only files
/Qreplace ASCII quotes and apostrophes with Unicode open and close quotes
/Ssearch in subdirectories for matching files
/Tquietly
/UTF8rewrite files using UTF-8 encoding
/UTF16rewrite files using UTF-16 encoding (default)
Range options are also supported.

UNICODIFY rewrites the contents of text files, changing them to UTF-16 or UTF-8 encoding. By default, it will skip:

The original contents of the file will be saved in a new file with the extension .original.

•  Note: This command only converts files. Standard input, internet URLs, and the clipboard are not supported. (You can use wildcards, directory aliases, @file lists, and so on.)

OEM characters will be interpreted according to the current Windows code page by default; use the /CP:n option to specify a different code page. To check the translation before you actually convert the file, try UTYPE with the /CP:n option first.

/N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.



UPEND — Display lines from a file in reverse order.

Syntax:
UPEND /A:attribs /B /C /CP:n /E /H /L:string /N /P /R:string /S /T /V /W:n filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard blank lines
/Creplace control characters with ^ sequences
/CP:ninterpret non-Unicode input text using code page n
/Eexpand variables in the /L: and /R: strings
/Hdisplay the filename before each file
/L:stringinsert string to the left of each line
/Ndisable features
/Ppage output
/R:stringinsert string to the right of each line
/Ssearch in subdirectories for matching files
/Ttrim leading and trailing whitespace
/Valso reverse each line in the file
/W:ntruncate lines to n characters
Range options are also supported.

UPEND is a low-budget substitute for the Unix tac command. It can read from disk files or from a pipe. If you want to pipe to UPEND, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

If standard input (stdin) is redirected, UPEND will read from stdin before any filenames specified on the command line. If no filenames are specified, then UPEND will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read lines from the clipboard.

If /L: is specified, the given string will be inserted to the left of each line; /R: inserts a string to the right. If /E is also specified, variable expansion will be performed on each string. Along with TCC’s usual complement of internal variables, functions, and so on, UPEND will set an environment variable _LINE. _LINE will contain the value 0 for the first line listed (i.e. the last line in the file), 1 for the second line listed, and so on. You can massage this value with functions like @INC, @EVAL, @FORMAT, and so on. To prevent the variables from being expanded before UPEND executes, you must either enclose the string in backquotes or double the percent signs.

/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.


upend D:\download\pg11.txt /l:"%%@format[4,%%_line] " /e



UTYPE — Dump text files to standard output.

Syntax:
UTYPE /A:attribs /B /C /CP:n /D /E /F:string /H /HW:n /K:n /L:format /N /P /Q /S /T /U:string /X /Z:n filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard BEL characters (control-G, ASCII 7)
/Creplace control characters with ^ sequences
/CP:ninterpret non-Unicode input text using code page n
/Ddiscard blank lines at the start of the file
/Ediscard all empty lines
/F:stringshow only lines following this string; /FF: inclusive
/Hdisplay the filename before each file
/HHdisplay the filename, file size, and encoding before each file
/HW:nhex dump width, in bytes; only useful with /X
/K:nexpand tabs to n columns
/L:formatinsert line numbers on the left
/Ndisable features
/Ppage output
/Qreplace ASCII quotes and apostrophes with Unicode open and close quotes
/Ssearch in subdirectories for matching files
/Ttrim leading and trailing whitespace
/U:stringshow only lines until (before) this string; /UU: inclusive
/Xdump file in hexadecimal
/Z:handling of NUL characters in text:
    /Z:N — treat like end-of-line (default)
    /Z:I — treat as invalid character
    /Z:S — skip over (ignore) any NUL characters
  
Range options are also supported.

UTYPE displays files to standard output, much like the internal TYPE command. The primary advantage of UTYPE is that it recognizes and handles UTF-8 text files; you can think of it as a ‘UTF-8 TYPE’.

If you want to pipe to UTYPE, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

If standard input (stdin) is redirected, UTYPE will read from stdin before any filenames specified on the command line. If no filenames are specified, then UTYPE will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to display the contents of the clipboard.

If you include /L on the command line, UTYPE will insert line numbers on the left, starting at 1, as TYPE does. If you include the optional format string, UTYPE will perform variable expansion on the string before displaying it; use the variable _LINE to get the current (zero-based) line number. For example, /L:"%%@FORMAT[03,%%_LINE] " will show the line number zero-padded to at least three digits.

/F: and /U: can be used to chop off a simple header or footer. /F: discards all lines up to and including the first line which contains the specified string (case-insensitive); /U: discards all lines including and after a line which contains the specified string (again, case-insensitive). For example, most Project Gutenberg ebooks include a header which ends in a line beginning with “*** START” and a footer beginning with “*** END”. You can strip them off like this:

utype "https://www.gutenberg.org/cache/epub/11/pg11.txt" /f:"*** start" /u:"*** end" /d | list

If you double the option letter — /FF: or /UU: — the matching line will be included in UTYPE’s output, not discarded.

/E discards all blank lines; /D discards only those at the start of a file. If you specify both, /D wins. If you combine /D with /F:string, UTYPE will discard any blank lines following the header. A line containing only spaces or tabs is considered blank.

/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NHdisable the handbrake
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.


The handbrake: When scrolling a long file to the console and /P was not specified, UTYPE watches for the Ctrl and Esc keys. Hold down the Ctrl key to slow the scrolling; press Esc to pause the file as if /P had been specified. This feature will be disabled automatically if you specify /P or if output is redirected; you can also disable it with /NH.


Quotes replacement: /Q causes UTYPE to replace generic ASCII apostrophes and quote marks ( ' and " ) with Unicode open and close quote marks (   and    ). The new quote marks may or may not look different from the originals, depending on how they are displayed and the font used. If the output is displayed in a non-Unicode font, the curly quotes will be lost or mangled. You can set some environment variables to control this feature.


utype "Engine Summer.txt"



WORDS — Count words, sentences, and paragraphs in English text.

Syntax:
WORDS /A:attribs /C /CP:n /D /F:fmt /K /M:n /N /S /U:mode /X filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Ccode mode; words may contain underscores and dollar signs
/CP:ninterpret non-Unicode input text using code page n
/Ddumps lists of unique words, sorted by frequency
/F:fmtspecifies the format for input text; fmt is one of:
   0 — best guess (default)
   1 — unformatted (line breaks are used only to end paragraphs)
   2 — prewrapped (line breaks are used to wrap text)
/Kkeeps hyphens when reassembling split words
/M:nminimum number of letters in a word
/Nby itself: no words containing digits
/Nwith suboptions: disable features
/Ssearch in subdirectories for matching files
/U:modecontrols the counting of unique words; mode is one of:
   0 — do not count unique words (faster for large files)
   1 — count unique words for each file individually (the default)
   2 — count unique words for all files together (slower)
   3 — separate counts for each file and for all files together (double oink!)
/Xno words beginning with a digit
Range options are also supported.

WORDS counts words, sentences, and paragraphs in English text. It can read text from standard input, or from one or more files specified on the command line. A report is written to standard output; this report can be piped or redirected. The results of the last file processed are also saved internally, and can be acessed through internal variables.

Note:  This command was designed specifically for use with English text. I make many Anglocentric assumptions about what constitutes a ‘word’, a ‘sentence’, a ‘paragraph’, ‘forms’ of a word, and so on. These assumptions are probably not useful for any other language. WORDS may give strange or undesired results when used on source code, program output, HTML, or whatnot.

If standard input (stdin) is redirected, WORDS will read from stdin before any filenames specified on the command line. If no filenames are specified, then WORDS will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to count words on the clipboard.

This command’s definition of a ‘word’ is complex and subject to ongoing tweaking. In general, though, a word may contain only letters, digits (unless /N is specified), periods, apostrophes, and hyphens; at least one character must be a letter. For instance, 20th, 1920s, 1969's, and post-1941 are all considered words, but 1984 is not. The first character must be alphanumeric or (very rarely) an apostrophe.

If /C is specified, words may also contain underscores and dollar signs, but must not begin with a digit or dollar sign. /C also suppresses the count of sentences and paragraphs in the final report.

Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this command will find only three ‘unique’ words.

A word is counted as ‘proper’ only if it never occurs in an all-lowercase form; no proper nouns will be found in Polish polish. Acronyms like NATO will be counted as ‘proper nouns’; so will ordinary words capitalized at the start of a sentence. The latter are often common words like articles and prepositions, which tend to be weeded out in longer files as they recur midsentence.

Note that a hyphenate is always counted as a single word. Without a dictionary, the command has no way of knowing whether it is composed of actual words (red-eye, half-baked) or not (pre-K, Wi-Fi).

WORDS also gives counts of sentences, paragraphs, lines, characters, and bytes. All counts should be viewed as estimates rather than gospel truth. The sentences count in particular must be taken with a healthy dose of salt; the command has no good way to determine whether a period ends an abbreviation, a sentence, or both.

A line, or a series of lines, which contains one or more sentences is counted as a ‘paragraph’. A line or series of lines which contains one or more words, but no recognized sentences, is instead counted as a ‘title’. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….

The number of lines reported may differ from the number of carriage returns or line feeds in the text, e.g. if the last line in the file is not terminated. A line containing only whitespace characters (spaces and tabs) is considered blank. The character and byte counts do not include any Unicode byte-order mark at the beginning of the file.

Split words: If a hyphenated word is split across a line break, WORDS will reassemble it and treat it as a single word. By default, the hyphen is dropped — the command has no way of knowing whether a hyphenated compound word was broken at a hyphen, or whether a normal word was divided between syllables and a hyphen added. The latter seems more common, and I wanted to avoid cluttering the vocabulary list with differently-hyphenated versions of the same word. If /K is specified, the command will instead retain hyphens when reassembling words broken at the end of a line. This option may cause a larger number of ‘unique’ words to be reported.

Vocabularies: In order to count unique words and ‘proper nouns’, WORDS must build a list of all words found. Building this list can slow down the process and use a good deal of memory if the text file involved is large. /U:mode controls the vocabulary lists. /U:0 disables vocabularies; the command executes faster, but there will be no counts of unique and proper words. /U:1 causes WORDS to build a vocabulary list for each file it processes; this is the default behavior. /U:2 builds a combined vocabulary for all files that WORDS processes; this is slower than the default. Finally, /U:3 builds a vocabulary for each file that WORDS reads, and at the same time builds a master vocabulary for all files together; this is much slower than the default behavior, and devours memory shamelessly.

If you are processing extremely large text files, or files which are not English prose — e.g. output from a program or command — I strongly recommend using /U:0 to disable vocabulary lists.

Dump: If /D is specified, the vocabulary for each file will be dumped to stdout. If /D is combined with /U:2, you’ll instead get a combined vocabulary for all files. The list is sorted by frequency, with more common words appearing first. Note that words may be shown in a different case than they appear in the input text. This is because the command stores all words in lowercase internally for speed (lowercase letters are more streamlined).

Text format: Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to wrap text to some desired width. You can use /F:n to tell CONTEXT how to handle line breaks. /F:1 indicates that the text is unformatted, with line breaks only at the ends of paragraphs. CONTEXT will honor all line breaks, and add an extra blank line after each paragraph. /F:2 means that the input text is prewrapped, having line breaks within paragraphs and even within sentences. CONTEXT will skip single line breaks, honoring only sequences of two or more in a row. /F:3 is also for unformatted text and acts like /F:1, but does not insert a blank line after each paragraph. If you specify /F:0 or do not specify any /F:n, CONTEXT will attempt to guess how the input text is formatted. (Guessing is not reliable when there isn’t much input text.)

Text encoding: WORDS automatically detects Unicode text files. If the file is not Unicode, the command has no way of detecting the character encoding; the default Windows code page is assumed. You can specify a different code page for non-Unicode text files with /CP:n. Most single-byte (i.e., alphabetic) code pages are supported, but multibyte code pages (Chinese, Japanese, Korean) are not. This option only affects non-Unicode files.

Disabling features: /N with suboptions disables features:

/NBdo not write a Byte Order Mark
/NCdisable highlight
/NDdo not search into hidden directories; only useful with /S
/NFsuppress the file-not-found error
/NJdo not search into junctions; only useful with /S
/NZdo not search into system directories; only useful with /S

You can combine these, e.g. /NDJ.


C:\> type EBS.txt
This is a test.  For the next sixty seconds, this station will conduct a test
of the Emergency Broadcast System.  This is only a test.

C:\> words /d EBS.txt

File "C:\EBS.txt" :
  25 words total, 17 unique, 4 proper.  25 runs of non-blanks.
  3 sentences total:  3.  0!  0?   Average sentence 8.3 words.
  1 paragraph, 0 titles.  Average paragraph 3.0 sentences.
  2 lines total, 2 not blank; the longest had 77 characters.
  137 characters in 137 bytes (OEM, prewrapped).

3:  a test this
2:  is the
1:  Broadcast conduct Emergency For next of only seconds sixty station System will

C:\>


The results from the last file processed are saved, and can be accessed using these internal variables:

_WORDS_UNIQUEWORDS_PROPERNOUNS_WC
_SENTENCES_SENTENCESD_SENTENCESE_SENTENCESQ
_SENTENCEWORDS_PARAGRAPHS_TITLES 
_LINES_NONBLANKLINES_LONGESTLINE_CHARACTERS

The cumulative results from all files processed by the last invocation of WORDS can be accessed through these variables:

_WORDSALL_UNIQUEWORDSALL_PROPERNOUNSALL_WCALL
_SENTENCESALL_SENTENCESDALL_SENTENCESEALL_SENTENCESQALL
_SENTENCEWORDSALL_PARAGRAPHSALL_TITLESALL_WORDFILES
_LINESALL_NONBLANKLINESALL_LONGESTLINEALL_CHARACTERSALL


WRAP —Word-wrap English text to fit a specified number of columns.

Syntax:
WRAP /A:attribs /C: /CP:n /D /F:fmt /G:n,m /H /J /N:n /N /P /Q /R /S /T:n /W:width /Z:char filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/C:ncondense repeated spaces in input text
/CP:ninterpret non-Unicode input text using code page n
/Ddisable special handling of soft hyphens (character 173 / 0xAD)
/F:fmtspecifies the format for input text; fmt is one of:
   0 — best guess (default)
   1 — unformatted (line breaks are used only to end paragraphs)
   2 — prewrapped (line breaks are used to wrap text)
   3 — unformatted, with blank lines between paragraphs
/G:n,mindent all paragraphs n spaces; if m is specified, it’s the indent for the second and later lines
/Hdisplay filenames
/Jjustify right margins
/N:nminimum characters left on each line to split at a hyphen; 0 disables breaking at hyphens
/Ppage output
/Ndisable features
/Qreplace ASCII quotes and apostrophes with Unicode open and close quotes
/Rremove hyphens from line ends
/Ssearch in subdirectories for matching filenames
/T:ntab stops every n spaces
/W:widthdesired width of output text
/Z:chardefine a forced line-break character
Range options are also supported.

The WRAP command word-wraps English text to fit a specified width. It can be used as a filter reading from standard input, or it can read from one or more files specified on the command line. The resulting text is written to standard output; it can be piped or redirected.

If you want to pipe to WRAP, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to wrap text from the clipboard.

‘Width’ here refers to a specified number of character positions, or columns. All characters are assumed to have the same width. The word-wrapped output should have neat, reasonably uniform line lengths when viewed or printed in a fixed-pitch font such as Courier, or displayed in a console window. Note that the specifed width includes the final newline character; if you specify a width of 80, then up to 79 printable characters may appear on a line.

Note:  This command is designed specifically for use with English prose. It may give weird or undesired results when used on source code, program output, HTML, or whatnot. It makes Anglocentric assumptions that may not be appropriate to other languages.

If standard input (stdin) is redirected, WRAP will read from stdin before any filenames specified on the command line. If no filenames are specified, then WRAP will read from stdin whether it is redirected or not. If /H is used, each file’s name will be printed before it is processed. (For standard input, <stdin> will be shown.)

Output width: /W:width sets the desired width in characters for the output text. Width may be from 40 to 512. If no /W:width is specified, the default is the console width if output is to the console, or defaults to 100 columns if output is redirected. (You can set an environment variable COLUMNS to change this default.) If you type just a /W without a colon or width, then the current console width is assumed; this is useful if you are redirecting WRAP’s output but want it wrapped to the console width anyway, e.g. for piping to LIST.

Text format: Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to wrap text to some desired width. You can use /F:n to tell WRAP how to handle line breaks. /F:1 indicates that the text is unformatted, with line breaks only at the ends of paragraphs. WRAP will honor all line breaks, and add an extra blank line after each paragraph. /F:2 means that the input text is prewrapped, having line breaks within paragraphs and even within sentences. WRAP will skip single line breaks, honoring only sequences of two or more in a row. /F:3 is also for unformatted text and acts like /F:1, but does not insert a blank line after each paragraph; use this option to wrap the output from DEHTML. If you specify /F:0 or do not specify any /F:n, WRAP will attempt to guess how the input text is formatted. (Guessing is not reliable when there isn’t much input text.)

Tab size: The /T:n option controls the expansion of tab characters. By default, tab stops are every four columns (set an environment variable TABSIZE to change this default). /T:8 would make tabs eight columns wide. /T:0 disables special handling of tab characters, treating them like any other character; this will probably bollix word-wrapping and is not recommended. n may be 0 to 20.

Breaking at hyphens: WRAP will usually break lines at spaces. It may also break a line after a hyphen, if all of the following are true: (1) the character before the hyphen is a letter, and the following character is either a letter or a digit; (2) at least three characters, not counting the hyphen, will remain at the end of the line; and (3) at least three characters will move to the start of the following line. So, for example, if the phrase true-blue fell near the end of a line, WRAP might break the line after the hyphen, since true and blue have four letters each. The phrases do-nothing and derring-do would not be divided, however, since splitting either one would leave a two-letter do on a line by itself. You can adjust this behavior with /N:n, which sets the minimum number of characters for both lines. If you specify /N:4 then at least four characters, not counting the hyphen, must remain on each line. /N:0 prevents WRAP from breaking lines after hyphens.

Removing hyphens: If /R is used, WRAP may discard a hyphen at the end of a line if the preceding character was a letter, and if the first character on the following line is also a letter. Without /R, WRAP retains all hyphens from line ends.

Forced indentation: The /G:n option forcibly indents each new paragraph n spaces (not tabs.) Any indentation in the input text will be lost. n must be 0 to 20. /G:0 will strip all leading whitespace, leaving text flush with the left margin. The optional second value, if present, indents the second and later lines m spaces; m is also 0 to 20. You might use /G:0,4 to produce a hanging indent. If /G: is not specified, any indentation in the input text is preserved.

Condensing spaces: The /C:n option allows you to condense runs of consecutive spaces in the input text. Any sequence of more than n spaces will be truncated. Only spaces (character 32) are counted, not other whitespace characters. Spaces generated by the program itself (e.g. by expanding tabs or indenting paragraphs) will not be condensed. n must be 0 to 10; if n is 0, spaces are not condensed (the default.) This option might be useful for packing output text just a little more tightly; if the original text file had extra spaces inserted to justify margins; or if you are one of those unfortunates who suffer a violent reaction to the sight of two spaces after a period.

Quotes replacement: /Q causes WRAP to replace generic ASCII apostrophes and quote marks ( ' and " ) with Unicode open and close quote marks (   and    ). The new quote marks may or may not look different from the originals, depending on how they are displayed and the font used. If the output is displayed in a non-Unicode font, the curly quotes will be lost or mangled. You can set some environment variables to control this feature.

Text encoding: WRAP automatically detects Unicode text files. If the file is not Unicode, the command has no way of detecting the character encoding; the default Windows code page is assumed. You can specify a different code page for non-Unicode text files with /CP:n. Most single-byte (i.e., Western) code pages are supported, but multibyte code pages (Chinese, Japanese, Korean) are not. This option only affects non-Unicode files.

Forced line break: /Z:char defines a forced line-break character. char may be entered as either a single character, or as a decimal or hexadecimal (prefixed with 0x) character code. If a matching character is found in the input file or stream, WRAP will end the current line and begin a new one.

Disabling features: /N with suboptions disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NHdo not add a hyphen when breaking a word
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.


These variables may be set to a numeric value to modify the command’s default behavior:

COLUMNS:sets the default width when output is redirected and /W is not specified. Legal values are 40 to 512.
TABSIZE:sets the default number of columns between tab stops when /T is not specified. Legal values are 1 to 20.

wrap /w:100 "Fishy Story.txt"



XFILTER — Process lines of a file using variable expansion.

Syntax:
XFILTER /A:attribs /B /CP:n /F:"format" /H /N /P /S /T filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard blank lines
/CP:ninterpret non-Unicode input text using code page n
/F:"format"format string: required; see below
/Hdisplay filenames
/Ndisable features
/Ppage output
/Ssearch in subdirectories for matching files
/Ttrim leading and trailing whitespace
Range options are also supported.

The required format string contains TCC variables and functions, which will be expanded for each line in the file. Double all percent signs to prevent variables from being expanded before the command is executed. An asterisk in the format string will be replaced with each line from the file. The current (zero-based) line number is also available in the variable _LINE.

XFILTER can be used as a filter reading from standard input, or it can read from one or more files specified on the command line. The resulting text is written to standard output; it can be piped or redirected. If you want to pipe to XFILTER, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to process text from the clipboard instead of from a file.

To prevent problems caused by troublesome characters in the input text, certain ‘dangerous’ characters from the file will be temporarily replaced with safe alternatives from Unicode’s Halfwidth and Fullwidth Forms block. They will be restored to ASCII after variable expansion. This shuffle prevents issues when characters with special meanings to TCC are inadvertently present in the input text, but it might be confusing if you want to find or replace any of the remapped characters. The characters which are temporarily replaced are:

CharacterASCIIHexRemapped to
"3422U+FF02
%3725U+FF05
(4028U+FF08
)4129U+FF09
,442CU+FF0C
[915BU+FF3B
]935DU+FF3D
^945EU+FF3E
`9660U+FF40

rem  Dump a file in uppercase:
xfilter /f:"%%@upper[*]" "Engine Summer.txt"

rem  Display the length of each line:
xfilter /f:"Line %%_line has %%@len[*] characters." "Engine Summer.txt"



New Functions:

@B85TOBIN — Decodes a base-85 string into a binary buffer.

Syntax:
%@B85TOBIN[handle,start,string]

handlethe handle to a binary buffer, as returned by @BALLOC
startthe offset in bytes to which to begin decoding; defaults to 0
stringa base-85 encoded string as returned by @BINTOB85

This function decodes a base-85 string returned by @BINTOB85 and stores the resulting data in a binary buffer. Note that there is no option to control the number of bytes written; the entire string is decoded and written to the buffer. If there is any error in decoding the string, no change will be made to the binary buffer.

Note that the two commas between parameters are both required. You must supply both commas even if you omit the optional start value.

The return value is the number of bytes written to the buffer.

See also: the BINTOB85 function.



@BETWEEN — Returns the portion of a string between two delimiters.

Syntax:
%@BETWEEN[delims,string]

delimsexactly two characters, one start and one end delimiter
stringthe string to parse

You generally do not need to quote or escape the delims string; the first two characters found are assumed to be the start and end delimiter characters, and the third must be a comma. (Exception: If you want to use a close bracket as a delimiter, escape it.) To use the same character as both start and end delimiter, type it twice.

The function returns the portion of string between the start and end delimiters. If the start delimiter is not found in the string, an empty string is returned. If the start delimiter occurs more than once, the first one found is used. If the start delimiter is found but the end delimiter is not, everything after the start delimiter is returned.

echo %@between[<>,This is <only> a test.]
only

echo %@between["",Let's parse out a "quoted chunk" of text.]
quoted chunk



@BINTOB85 — Encodes the contents of a binary buffer as a base-85 string.

Syntax:
%@BINTOB85[handle,start,length]

handlethe handle to a binary buffer, as returned by @BALLOC
startthe offset in bytes at which to begin encoding; defaults to 0
lengththe number of bytes to encode; defaults to 128 or the remainder of the buffer

This function encodes binary data (from a binary buffer) as a string which can be easily handled by TCC. You can store this string in an environment variable, write it to an .INI file, and so on. To restore the original binary data, use the @B85TOBIN function.

Four bytes of data are encoded into five characters; encoding a 1024-byte buffer will result in a 1,281-character-long string (counting the terminal null). Keep in mind that encoding long series of bytes will produce even longer strings! If you don’t specify a length, the default is 128 bytes or to the end of the buffer.

This implementation of base-85 differs from others. The set of characters used to encode binary data has been chosen to avoid syntactically troublesome signs like quotes, percent signs, ampersands, carets, and so on. All characters are ASCII, so the string should not be mangled by code page translations.

See also: the B85TOBIN function.



@CLARIFY — Returns the original text mangled by @OBSCURE.

Syntax:
%@CLARIFY[obscured-text]

obscured-textobfuscated text

The input obscured-text should be a string returned by the @OBSCURE function; anything else is very unlikely to return meaningful text.

You probably should not write the restored value into an environment variable, an .INI file, or a registry value, or display it to the screen. Just use it immediately, plugging the @CLARIFY function directly into the command which requires the original text. (The ditzy little example below displays a password to the screen because it’s just a ditzy little example.)

set inifile="%userprofile\Passwords.ini"
set password=%@iniread[%inifile,Personal,Password]

echo Password: %@clarify[%password]
unset inifile password

See also: the @OBSCURE function.



@INIVALUE — Returns a value from an .INI file.

Syntax:
%@INIVALUE[filename,section,entry,index,errorstr,flags]

filenamethe file to examine
sectionthe name of the section to search for the entry
entrythe name associated with the desired value
indexwhich entry to return; defaults to 0 (the first); -1 returns the number of matching entries
errorstrthe string to return on any error; defaults to nothing (the empty string)
flagsa bitmapped integer controlling advanced features:
   1 — bomb out on file errors
   2 — treat section as a wildcard to match
   4 — treat entry as a wildcard to match

This function is essentially @INIREAD without GetPrivateProfileString(). It can handle some things that @INIREAD can’t, such as UTF-8 .INI files, sectionless values, multiple values with the same name, and multiple headers for the same section.

You must specify the full name and extension of the filename. If you do not include a path, the file is assumed to be in the Windows directory, not in the current directory! To force this function to look in the current directory, begin the filename with .\.

If you do not specify a section, the function will look for a matching entry before the first section header. If section is an asterisk, the function will look for a matching entry throughout the file, ignoring all section headers.

Sometimes an .INI file will contain multiple lines with the same entry name. For example, TCMD.INI may have more than one NormalKey directive. You can loop through multiple entries with the index argument. An index of 0 returns the first matching entry, 1 returns the second, and so on. Set index to -1 to return the number of matching entries.

The default behavior is to return an empty string on any error: file not found, access denied, or no matching section or entry. If you specify an errorstr, then that value will be returned instead. (This is useful if the .INI file can contain empty values.) Additionally, you can set flags to 1, and any error opening the file will result in an error message instead of returning a string value. You can also check the _INIVALUERC internal variable to get information about the last call to @INIVALUE.

See also: the _INIVALUERC variable, which returns an exit code for this function.



@LINEENDS — Reports the line-end characters used in a text file.

Syntax:
%@LINEENDS[filename,n]

filenamethe file to scan
nwhat to report:
     1: the number of lines ending in CR/LF pairs
     2: the number of lines ending in LF/CR pairs
     3: the number of lines ending in CR not followed by LF
     4: the number of lines ending in LF not followed by CR
     5: the number of lines ending in NEL
     10: the total number of line-end sequences in the file

If n is zero or not present, @LINEENDS returns a string describing the file’s format:

EmptyThe file contains no data.
NoneNo line-end characters were found.
CR/LFThe file uses CR/LF line ends.
LF/CRThe file uses LF/CR line ends. (Who does this?)
CRThe file uses CR line ends.
LFThe file uses LF line ends.
NELThe file uses NEL line ends.
MixedThe file uses more than one line-end sequence.
ERRORThere was an error reading from the file.

See also: the @TEXTENCODING and @TEXTFORMAT functions.



@METAPHONE — Returns a roughly phonetic code for an English word.

Syntax:
%@METAPHONE[word,length,flags]

wordthe word or words to process
lengththe maximum length of the codes to return (8)
flagsset to 1 for better compatibility

Metaphone codes are meant to roughly approximate the pronunciation of a word. Words that sound similar should have similar Metaphone codes. You can use this function to compare the sounds of words, to suggest similar words, or to group words by pronunciation.

If you pass more than one word, separate them with spaces. The resulting codes will also be separated by spaces.

rem  Compare two words:

set word1=cougher
set word2=coffer
if %@metaphone[%word1] == %@metaphone[%word2] echo "%word1" may sound like "%word2".


By default, this function returns Metaphone codes of up to eight characters long. You can specify a different length with the length parameter, e.g. %@metaphone[word,10] to return ten-letter Metaphone codes. Legal values are 4 to 20.


•  Note: Values returned by this function are not guaranteed to match those generated by any other implementation. Documentation of the Metaphone algorithm is invariably unclear and self-contradictory, and never seems to agree with the corresponding code. This is my attempt to implement Lawrence Philips’s original algorithm to the best of my limited understanding, with a few additional tweaks thrown in.

More specifically, comparing against assertFull_v1.1.txt, dated 2011-11-25, by the Metaphone-standards project, @METAPHONE produces different codes for 40 out of 2753 words: about 98.5% agreement. If flags is set to 1, there are no mismatches — but I still cannot guarantee perfect agreement with any other implementation.



@MKENTITIES — Replaces characters in a string with HTML entities.

Syntax:
%@MKENTITIES[string]

@MKENTITIES will replace these characters with HTML entities:

Character:Replaced with:
"  (double quote)&quot;
%  (percent sign)&#37;
&  (ampersand)&amp;
<  (less-than sign)&lt;
>  (greater-than sign)&gt;

• Note: This function can return ampersands in its output. You will need to quote it, or use SETDOS /C to temporarily change the command separator character.



@OBSCURE — Mangles a text string, making it difficult to read.

Syntax:
%@OBSCURE[text]

texttext to be obfuscated

The input text should be reasonably short, preferably not more than a kilobyte or two. The resulting, mangled string will be longer than the original string, usually by about one-third. The same input text can return different obfuscated text; you cannot meaningfully compare the output from two calls to @OBSCURE. Do not edit or alter the returned text in any way.

If the input text comes from an environment variable, it’s probably a good idea to remove or overwrite that variable as soon as possible after calling @OBSCURE. One way to do this would be to simply store the returned string back in the original variable.

set inifile="%userprofile\Passwords.ini"
input /p Enter password:  %%password
set password=%@obscure[%password]

set rv=%@iniwrite[%inifile,Personal,Password,%password]
unset inifile password


•  Note: This function does not provide secure cryptography! It was designed for ease of use, not for real security. Using @OBSCURE to muddle text will discourage casual snooping, but a sophisticated user can recover the original data easily by passing the obscured text to @CLARIFY. (A determined attacker could also reverse-engineer the algorithm, although that would be a pointless waste of time when the plugin itself is readily available.)

See also: the @CLARIFY function.



@OINK — Translates text to Pig Latin.

Syntax:
%@OINK[text]

echo %@oink[This is only a test.]

See also: the OINK command, which Pig Latinizes text files.



@ROT13 — Transforms a string using ROT13.

Syntax:
%@ROT13[text]

echo %@rot13[This is only a test.]

See also: the ROT13 command, which encodes or decodes text files.



@ROUGHLYSIMILAR — Compares words in two text strings.

Syntax:
%@ROUGHLYSIMILAR[string1,string2]

string1the first string to compare
string2the second string to compare

Both strings are simplified before comparing them:

After both strings have been simplified, they are compared. %@ROUGHLYSIMILAR returns 1 if the two strings match, 0 if they differ.

echo %@roughlysimilar[THIS IS A TEST!,This-is-a-test.]



@STRIPACCENTS — Removes accents from letters.

Syntax:
%@STRIPACCENTS[text]

Only characters in the range U+00C0 through U+00FF, plus U+0152 and U+0153, will be replaced. (This function only recognizes a few accented characters, so it’s fast.)

echo %@stripaccents[Déjà vu]



@TEXTENCODING — Returns a guess at the character encoding of a text file.

Syntax:
%@TEXTENCODING[filename,flags]

filenamethe file to examine
flagsset to 1 to also report presence of a BOM

If file begins with a Unicode Byte Order Mark, then it is assumed to be Unicode; the encoding is inferred from the BOM. If the file does not begin with a BOM, the function can only guess at the encoding; the longer the file, the more likely the guess is to be accurate.

Possible return values include:

EmptyThere is no data in the file.
OEMThe file is probably not Unicode.
UTF-16LEThe file is probably 16-bit Unicode.
UTF-16BEThe file is probably 16-bit Unicode (big-endian).
UTF-8The file is probably UTF-8 encoded Unicode.
UTF-32LEThe file looks like UTF-32 (little-endian).
UTF-32BEThe file looks like UTF-32 (big-endian).
EBCDICThe file is probably in some version of EBCDIC.

If flags is 1, and if the file is Unicode and begins with a Byte Order Mark, the phrase with BOM will be appended.

set filename=myfile.txt
echo File %filename is %@textencoding[%filename].

See also: the @LINEENDS and @TEXTFORMAT functions.



@TEXTFORMAT — Returns a guess at the formatting of a text file.

Syntax:
%@TEXTFORMAT[filename]

filenamethe file to examine

Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to limit text to a desired width. This function attempts to determine how the specified text file is formatted.

Possible return values include:

EmptyThere is no text in the file.
UnformattedLine breaks are used to end paragraphs.
PrewrappedLine breaks are used to limit line width.

set filename=myfile.txt
set format=%@textformat[%filename]

if %format == Unformatted echo File %filename is not word-wrapped.

See also: the @LINEENDS and @TEXTENCODING functions.



@UCHAR — Returns Unicode characters with the specified values.

Syntax:
%@UCHAR[value value…]

This function behaves like @CHAR, except that the input values are assumed to be hexadecimal. You may prefix values with 0x or U+ but neither is required. With or without either prefix, each value will be parsed as hexadecimal.


echo %@uchar[16a6 16d6 16eb 16bb 16a9 16d2 16d2 16c1 16cf]

See also: the @UCODE and @UCODEX functions.



@UCODE — Returns the hexadecimal values of characters in a string.

Syntax:
%@UCODE[string]

This function behaves like @UNICODE, except that it returns values as hexadecimal (without any prefix). A few characters, including the backquote and the close square bracket, will need to be escaped.


echo %@ucode[This is a test.]

See also: the @UCHAR and @UCODEX functions.



@UCODEX — Returns the hexadecimal values of characters in a string.

Syntax:
%@UCODEX[string]

This function behaves like @UNICODE, except that it returns values as hexadecimal with leading 0x. A few characters, including the backquote and the close square bracket, will need to be escaped.


echo %@ucode[This is a test.]

See also: the @UCHAR and @UCODE functions.



@ULEN — Returns the number of Unicode characters in a string.

Syntax:
%@ULEN[string]

This functions is almost the same as @LEN, except that it counts properly-paired surrogates as a single character.

echo %@ulen[😺]

echo %@ulen[%@char[0xd83d 0xde00]]


Surrogates which are not properly paired will be counted as separate ‘characters’.



@UQUOTES — Replaces ASCII apostrophes and quote marks with Unicode open and close quotes.

Syntax:
%@UQUOTES[text]

textEnglish text containing apostrophes or quotation marks

Generic ASCII apostrophes ( ' ) and quote marks ( " ) in text will be replaced with Unicode open and close quote marks (   and   ). Also, any doubled hyphens will be replaced with em dashes.

The modified string may or may not look different from the original, depending on how you use it and the font used to display it. If it is redirected to a file and //UnicodeOutput=No, then the fancy Unicode quotes will be smashed right back into ASCII. (Worse yet, under some versions of Windows the Unicode single open-quote character may be mangled to a grave accent….) If the modified string is ECHOed to the console and the console font doesn’t support the relevant Unicode characters, then again the Unicode quotes may be lost. In Take Command, curly quotes must be supported by both the tab-window font (Options / Configure Take Command / Tabs / Font) and also the underlying console window (detach a tab to check this).

echo %@uquotes["Never use a GUI to do a shell's work!" said Tom commandingly.]


You can set some environment variables to control this feature.



@VOWELS — Returns the number of vowels in a string.

Syntax:
%@VOWELS[string]

stringthe text to examine

Only vowels in the Latin alphabet are counted: A, E, I, O, U, and Y. Accented variants in the range U+00C0 through U+00FF (Unicode’s Latin-1 Supplement) are also recognized.

echo %@vowels[Déjà vu]



New Variables:

_CHARACTERS — Returns the number of characters in the last file processed by WORDS.

Syntax:
%_CHARACTERS

This count does not include any Unicode byte-order mark at the beginning of the file. If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_CHARACTERSALL — Returns the number of characters in all files processed by the last call to WORDS.

Syntax:
%_CHARACTERSALL

This count does not include any Unicode byte-order marks at the beginnings of files. If the WORDS command has not been called, this variable returns the value N/A.



_GETACP — Returns the current Windows code page.

Syntax:
%_GETACP

This function returns the current Windows code page. (This value is also traditionally miscalled the ‘ANSI code page’, although it has nothing to do with ANSI.) Note that this value can and usually does differ from the OEM code page returned by %_CODEPAGE.

echo The current Windows code page is %_getacp.



_INIVALUERC — Returns an exit code for the last call to @INIVALUE.

Syntax:
%_INIVALUERC

This variable returns a code indicating the success or failure of the last call to the @INIVALUE function, and the nature of the error if it failed. Possible return values include:


 an empty string if @INIVALUE has not been called
Syntax errorany error in arguments
File error nany error opening the file; n is a Windows error number
File emptythe file contains no data
Found na matching entry was found at line n
Count nsuccessfully counted matching entries
No sectionno matching section header was found
No entry nno matching entry, or fewer than n entries found

If the correct entry was found, the return value is Found n. The n is the line number, starting from zero and not counting any blank lines.

See also: the @INIVALUE function.



_LINES — Returns the number of lines in the last file processed by WORDS.

Syntax:
%_LINES

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_LINESALL — Returns the number of lines in all files processed by the last call to WORDS.

Syntax:
%_LINESALL

If the WORDS command has not been called, this variable returns the value N/A.



_LONGESTLINE — Returns the number of characters in the longest line of the last file processed by WORDS.

Syntax:
%_LONGESTLINE

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_LONGESTLINEALL — Returns the number of characters in the longest line in all files processed by the last call to WORDS.

Syntax:
%_LONGESTLINEALL

If the WORDS command has not been called, this variable returns the value N/A.



_NONBLANKLINES — Returns the number of non-blank lines in the last file processed by WORDS.

Syntax:
%_NONBLANKLINES

A line which contains only whitespace characters such as spaces or tabs is considered blank. Subtract %_NONBLANKLINES from %_LINES to get the number of blank lines.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_NONBLANKLINESALL — Returns the number of non-blank lines in all files processed by the last call to WORDS.

Syntax:
%_NONBLANKLINESALL

A line which contains only whitespace characters such as spaces or tabs is considered blank. Subtract %_NONBLANKLINESALL from %_LINESALL to get the number of blank lines.

If the WORDS command has not been called, this variable returns the value N/A.



_PARAGRAPHS — Returns the number of paragraphs in the last file processed by WORDS.

Syntax:
%_PARAGRAPHS

A ‘paragraph’ is a line or series of lines which contains at least one sentence. Divide %_SENTENCES by %_PARAGRAPHS to get the avarage paragraph length in sentences. Divide %_SENTENCEWORDS by by %_PARAGRAPHS to get the avarage paragraph length in words.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_PARAGRAPHSALL — Returns the number of paragraphs in all files processed by the last call to WORDS.

Syntax:
%_PARAGRAPHSALL

A ‘paragraph’ is a line or series of lines which contains at least one sentence. Divide %_SENTENCESALL by %_PARAGRAPHSALL to get the avarage paragraph length in sentences. Divide %_SENTENCEWORDSALL by by %_PARAGRAPHSALL to get the avarage paragraph length in words.

If the WORDS command has not been called, this variable returns the value N/A.



_PASSWORD — Returns a random string suitable for use as a password.

Syntax:
%_PASSWORD

You can use the PASSWORD command to adjust the parameters used to generate the string.



_PROPERNOUNS — Returns the number of proper nouns in the last file processed by WORDS.

Syntax:
%_PROPERNOUNS

Counting proper nouns requires WORDS to build a vocabulary list for each file. If you disable this step with /U:0 or /U:2, the list will not be available and this variable will return the value N/A.

For the purposes of this plugin, a ‘proper noun’ is any word which never appears in an all-lowercase form. If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_PROPERNOUNSALL — Returns the number of proper nouns in all files processed by the last call to WORDS.

Syntax:
%_PROPERNOUNSALL

Counting proper nouns in all files requires WORDS to build a vocabulary list for all files processed; this list is not built by default. Unless you enable the omnibus vocabulary list with /U:2 or /U:3, this variable will return the value N/A.

For the purposes of this plugin, a ‘proper noun’ is any word which never appears in an all-lowercase form. If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCES — Returns the total number of sentences in the last file processed by WORDS.

Syntax:
%_SENTENCES

A ‘sentence’ is a word or series of words ending with a period, exclamation mark, or question mark. Divide %_SENTENCEWORDS by %_SENTENCES to get the average sentence length.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCESALL — Returns the total number of sentences in all files processed by the last call to WORDS.

Syntax:
%_SENTENCESALL

A ‘sentence’ is a word or series of words ending with a period, exclamation mark, or question mark. Divide %_SENTENCEWORDSALL by %_SENTENCESALL to get the average sentence length.

If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCESD — Returns the number of declarative sentences in the last file processed by WORDS.

Syntax:
%_SENTENCESD

A ‘declarative sentence’ is a word or series of words ending with a period.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCESDALL — Returns the number of declarative sentences in all files processed by the last call to WORDS.

Syntax:
%_SENTENCESDALL

A ‘declarative sentence’ is a word or series of words ending with a period.

If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCESE — Returns the number of exclamatory sentences in the last file processed by WORDS.

Syntax:
%_SENTENCESE

An ‘exclamatory sentence’ is a word or series of words ending with an exclamation mark.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCESEALL — Returns the number of exclamatory sentences in all files processed by the last call to WORDS.

Syntax:
%_SENTENCESEALL

An ‘exclamatory sentence’ is a word or series of words ending with an exclamation mark.

If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCESQ — Returns the number of interrogative sentences in the last file processed by WORDS.

Syntax:
%_SENTENCESQ

An ‘interrogative sentence’ is a word or series of words ending with a question mark.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCESQALL — Returns the number of interrogative sentences in all files processed by the last call to WORDS.

Syntax:
%_SENTENCESQALL

An ‘interrogative sentence’ is a word or series of words ending with a question mark.

If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCEWORDS — Returns the total number of words in the last file processed by WORDS which are part of a recognized sentence.

Syntax:
%_SENTENCEWORDS

A ‘sentence’ is a word or series of words ending with a period, exclamation mark, or question mark. Divide %_SENTENCEWORDS by %_SENTENCES to get the average sentence length.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCEWORDSALL — Returns the total number of words in all files processed by the last call to WORDS which are part of a recognized sentence.

Syntax:
%_SENTENCEWORDSALL

A ‘sentence’ is a word or series of words ending with a period, exclamation mark, or question mark. Divide %_SENTENCEWORDSALL by %_SENTENCESALL to get the average sentence length.

If the WORDS command has not been called, this variable returns the value N/A.



_TITLES — Returns the number of titles in the last file processed by WORDS.

Syntax:
%_TITLES

A ‘title’ is a line or series of lines which contains one or more words, but no recognized sentences. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_TITLESALL — Returns the number of titles in all files processed by the last call to WORDS.

Syntax:
%_TITLESALL

A ‘title’ is a line or series of lines which contains one or more words, but no recognized sentences. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….

If the WORDS command has not been called, this variable returns the value N/A.



_UNIQUEWORDS — Returns the number of unique words in the last file processed by WORDS.

Syntax:
%_UNIQUEWORDS

Counting unique words requires WORDS to build a vocabulary list for each file. If you disable this step with /U:0 or /U:2, the list will not be available and this variable will return the value N/A.

Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this plugin will find only three ‘unique’ words.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_UNIQUEWORDSALL — Returns the number of unique words in all files processed by the last call to WORDS.

Syntax:
%_UNIQUEWORDSALL

Counting unique words for all files requires WORDS to build a vocabulary list for all files processed; this list is not built by default. Unless you enable the omnibus vocabulary list with /U:2 or /U:3, this variable will return the value N/A.

Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this plugin will find only three ‘unique’ words.

If the WORDS command has not been called, this variable returns the value N/A.



_WC — Returns the number of contiguous series of non-blank characters in the last file processed by WORDS.

Syntax:
%_WC

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.

•  Note: Unlike the other variables set by WORDS, _WC does include any Byte Order Mark at the start of a file. A BOM will be treated as a non-blank character, and therefore count as a ‘word’ unto itself if the following character is whitespace. This, to my mind, is stupid behavior; a leading BOM should either be ignored altogether, or else treated as whitespace. I count it this way only for compatibility with certain ports of the Unix wc.



_WCALL — Returns the number of contiguous series of non-blank characters in all files processed by the last call to WORDS.

Syntax:
%_WCALL

If the WORDS command has not been called, this variable returns the value N/A.



_WORDFILES —Returns the number of files processed by the last call to WORDS.

Syntax:
%_WORDFILES



_WORDS — Returns the total number of words in the last file processed by WORDS.

Syntax:
%_WORDS

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_WORDSALL — Returns the total number of words in all files processed by the last call to WORDS.

Syntax:
%_WORDSALL

If the WORDS command has not been called, this variable returns the value N/A.



Reference Info:


Rangessupported in many commands.
Code Pages Supportedto interpret non-Unicode text.
Character Escapes
UQuotes Control Variablesmodify the translation of ASCII quotes to Unicode.
Highlight Variableto choose your colors.
Startup Messageand how to disable it.
Acknowledgments
Changesslow march of progress, or just another bug hunt?
Status and Licensing

Ranges:

This plugin supports the following range syntax:


Size range:  /[Ssmallest,largest]

You may omit either smallest or largest. You may qualify either with a trailing letter: lowercase k, m, g, etc. to multiply by one thousand, one million, one billion, and so on; or uppercase K, M, G, etc. to multiply by 210, 220, 230, and so on. If largest begins with a + sign, it is an increment over smallest. Use /![Ssmallest,largest] to invert the test and return only files not in the given size range.

Date range:  /[D[acw]:earliest,latest]

You may omit either earliest or latest; either defaults to the current date. The optional [acw] argument selects the date stamp to check. (If you want to check more than one date stamp, you must supply more than one date range option.) The colon after the [acw] is optional.

Dates may be given in the local date format, or in yyyy-mm-dd format (with a four-digit year). You may also specify a date as an offset preceded with a + or - sign; the offset is in days relative to today’s date (for earliest) or relative to earliest (in the case of latest). If earliest turns out to be later than latest then the two are exchanged.

You may also give a specific time on either date, preceded by an @ sign. The time may be in either 24-hour format, or 12-hour format with a trailing A or P.

Use /![D[acw]:earliest,latest] to invert the test and return only files not in the given date range.

Time range:  /[T[acw]:earliest,latest]

You may omit either earliest or latest. The optional [acw] argument selects the time stamp to check. (If you want to check more than one time stamp, you must supply more than one time range option.) The colon after the [acw] is optional. Times may be in either 24-hour format, or 12-hour format with a trailing A or P.

Use /![T[acw]:earliest,latest] to invert the test and return only files not in the given time range.

Exclusion range:  /[!wildspec]

Filenames matching the wildspec will be excluded. You can supply more than one wildspec by separating them with (unquoted) spaces.

Owner range:  /[Owildspec]

Files whose owners (in domain\user format) do not match the wildspec will be skipped. Use /![Owildspec] to invert the test and return only files which do not match the owner wildspec.

Description range:  /Iwildspec or (alternate syntax) /[Iwildspec]

If a file’s description does not match the wildspec, it will be skipped. Use /!Iwildspec to invert the test, returning only files which do not match the description wildspec.

Day-of-the-week range:  /[W[acw]:days]

You may specify multiple days separated by commas, e.g. /[W:MON,WED,FRI]. You can also give a range, for example /[W:TUE-FRI]. WEEKENDS is accepted as a synonym for SAT,SUN; WEEKDAYS is a synonym for MON-FRI. The colon in this syntax is required.

You may supply multiple ranges. A file must match all given ranges or it will be skipped.

Code Pages Supported:

Many of the commands in this plugin offer a /CP:n option to specify a code page. The value determines how non-ASCII characters in non-Unicode files are interpreted. This option does not affect Unicode files or ASCII characters. The following code pages are supported:


numbername numbername
1252Latin I 775Baltic (OEM)
1250Central Europe 850Multilingual Latin I (OEM)
1251Cyrillic 852Latin II
1253Greek 855Cyrillic (OEM)
1254Turkish 857Turkish (OEM)
1255Hebrew 858Latin I with Euro sign (OEM)
1256Arabic 862Hebrew (OEM)
1257Baltic 866Russian (OEM)
1258Vietnam 874Thai
437United States (OEM) 10000Mac OS Roman
720Arabic (OEM) 20866KOI8-R
737Greek (OEM) 21866KOI8-U
A or ANSIthe current Windows code page
O or OEMthe current OEM code page

The default is the current Windows code page.

Character Escapes:

These may be used in CHARENCODING with the /X option.

Escape:Expands to:Example:
\bbackspace
\eASCII escape (27 decimal)
\kgrave accent
\nnewline
\ppercent sign
\qdouble quote
\rcarriage return
\tASCII horizontal tab
\uxxxxUnicode character, up to U+FFFF\u03a3 → Σ
\UxxxxxxxxUnicode character, up to U+10FFFF\U1f63a → 😺
\nnnoctal value, up to 777\101 → A
\xnnnnhexadecimal value, up to FFFF\x41 → A
\#nnnnndecimal value, up to 65535\#65 → A
\\backslash

UQuotes Control Variables:

The following environment variables specify a Unicode character used to replace an ASCII character in the @UQUOTES function, or in several commands when /Q is used. The value of the variable may be a single character; a decimal value 32 through 65533; or a hexadecimal value 0x20 through 0xFFFD.

OPENQUOTE:replaces the ASCII double-quote ( " ) at the start of a quotation; the default value is 0x201C (  ).
CLOSEQUOTE:replaces the ASCII double-quote ( " ) at the end of a quotation; the default is 0x201D (  ).
OPENSQUOTE:replaces the ASCII apostrophe ( ' ) at the start of a quotation; the default is 0x2018 (  ).
CLOSESQUOTE:replaces the ASCII apostrophe ( ' ) at the end of a quotation; the default is 0x2019 (  ).
APOSTROPHE:replaces the ASCII apostrophe ( ' ) within a word; the default is 0x2019 (  ).
'OKINA:replaces the ASCII apostrophe ( ' ) between two vowels; the default is 0x2018 (  ).
PRIME:replaces the ASCII apostrophe ( ' ) after a number; the default is 0x27 ( ' ).
DOUBLEPRIME:replaces the ASCII double-quote ( " ) after a number; the default is 0x22 ( " ).
EMDASH:replaces pairs of ASCII hyphens ( - ); the default is 0x2014

Note that the variable name 'OKINA begins, ironically enough, with an apostrophe. To disable ‘okinas, SET 'OKINA=0X2019  (or the same value as the apostrophe).

These environment variables control the interpretation of some old-fashioned ASCII text conventions:

UQUOTES_DOUBLES:set to 0 to prevent replacing doubled apostrophes with quotes
UQUOTES_GRAVES:set to 0 to prevent replacing grave accents with open quotes

For example:

rem  Use guillemets for quotations:
set openquote=0xab
set closequote=0xbb
echo %@uquotes["Sacré bleu!" he exclaimed.]

Highlight Variable:

Several of the commands in the plugin feature highlighted output. You can customize this feature by setting an environment variable Highlight:

rem  Disable highlight:
set highlight=none

rem  Set the highlight foreground:
set highlight=bright cyan

rem  Set both foreground and background:
set highlight=bri whi on blu

rem  Numbers are also supported:
set highlight=46

If the Highlight environment variable is not defined, the plugin will check the registry for a value named Highlight of type REG_SZ. The plugin will search, in this order:

•  HKEY_CURRENT_USER\Software\JPPlugins\TextUtils(affects this plugin only)
•  HKEY_CURRENT_USER\Software\JPPlugins(affects several of my plugins)

Many commands also have a /D or /NC option to disable highlighting.

Startup Message:

This plugin displays an informational line when it initializes. The message will be suppressed in transient or pipe shells. You can disable it for all shells by defining an environment variable named NOLOADMSG, for example:

set /e /u noloadmsg=1

Acknowledgments:

The original Metaphone algorithm is by Lawrence Philips. The variant implemented in this plugin is my own adaptation (improvement? perversion?) Blame me, not him, for its peculiarities.

Changes:


Version:Date:Changes:
0.85.2.32024-11-05Bug fix: PLUGIN_BUFFER_MAX is 32K bytes, not 32K characters.
0.85.2.22024-10-02ParseInt() now supports octal with a leading 0o.
0.85.22024-09-03StringToUnicode() and f_uchar() use PLUGIN_BUFFER_MAX for the buffer size.
CHARENCODING and @UCHAR allow octal values prefixed with 0o. CHARENCODING adds /N for character names.
Other tweaks and code cleanup.
0.85.12024-08-08UTYPE now supports high-order Unicode characters in /X hex mode.
0.85.0.32024-08-07FileHandler.cpp v1.0.15.0, NewHelp.cpp v1.0.8.14.
0.85.0.22024-03-26Minor tweak to support nested directory aliases.
0.85.02024-01-05Updated to conlist.cpp v1.1 to better support Ctrl-C and Ctrl-Break. Tweaked UTF-16 detection for very small files.
0.84.02023-10-17DEHTML no longer smashes whitespace inside <PRE> blocks.
0.83.0.22023-10-16Tweaked ShowCmdHelp() to report VER_PATCH.
0.83.0.12023-10-12Updated the plugin’s web address.
0.83.02023-07-28Changed DEHTML, @MKENTITIES, and COPYCHARS to use HtmlEntities.cpp. Now they should support all HTML 4 entities. Updated CHARENCODING to the version in UChars, and documented it — CHARENCODING was somehow missing from the doc files. Lots of additional bug fixes, code tweaks, and doc improvements.
0.82.62023-07-24Updated to the current versions of ParseArgs.cpp, NewHelp.cpp, conlist.cpp, FileHandler.cpp, MMFiles.cpp, and codepages.cpp.
0.82.52022-06-09Minor tweak to @STRIPACCENTS: Now Æ æ Œ œ are replaced with AE ae OE oe.
0.82.42021-10-20

Status and Licensing:

Consider this beta software. It may well have issues. Try it at your own risk. If you do find a problem, you can report it in the JP Software support forum.

TextUtils is currently licensed only for testing purposes. I may make binaries and source code available under some free license once I consider it ready for use.

Download:

You can download the current version of the plugin from https://charlesdye.net/dl/textutils.zip.