VOICE Home Page: http://www.os2voice.org |
January 2004
Newsletter Index
|
By Thomas Klein © January 2004 |
Welcome back to our series on programming with REXX and DrDialog. I had to take a break in order to get other things done that piled up behind me. Sorry for making you wait but finally, here we are back. As the last article dealt with loops, there is one addendum I would like to make to that subject:
Make sure to never mess with the loop (or counter) variable manually!
While this sometimes is used in other languages to invoke a direct exit out of the
loop, such "tricks" should be avoided as it could lead to unpredictable
behavior in some circumstances. Rather use the EXIT statement provided by REXX or
think about using a different structure for your loops.
Today we'll talk about REXX's wealth of functions for working with strings. We won't be complete on that subject because some of the functions are that specific that you might hardly ever need them. As you go on writing your own stuff, you'll find some of the functions will always be part of your code while others won't be. It largely depends upon your approach to solve a specific problem. In order to provide a structured overview, I came up with grouping them by functions for...
At the end of the article, you might wonder what happened to the PARSE keyword/function. Well, PARSE is worth an article itself I guess. This is one of the most powerful parts of REXX - both in matters of functionality as well as complexity. We'll have one article dealing with the basic use of PARSE at a later moment. There is much more to PARSE than what we will be dealing with in our series of course, but as this series is intended to address beginners in REXX as well, I don't think it would be such a great idea to confuse you by going into details. Sure, knowing all of PARSEs behavior might provide you with powerful means to solve your programming issues, but most of the time you'll only have to deal with a basic subset of it and this is what we'll be dealing with as well.
When dealing with strings, it's very useful to know something about them before messing around.
LENGTH will tell you the amount of bytes (or characters) that a string contains. Note that this also includes leading and trailing blanks:
/* length sample */
text = " I'm 97 years old. "
say length(text)
If run, this script would print 20.
VERIFY is used to check whether a string contains specific characters or not. To accomplish this, you need to specify the string to be checked, a second string which holds the "comparison characters" and additional options. For example, you could check if a string is a valid phone number - that's to say it shall only contain the digits 0 through 9, blanks, dash and plus sign (for international dial prefix). The comparison string thus would look like
"0123456789 -+"
Now VERIFY can be used to either check if the string ONLY contains one of these characters or whether it contains NONE of these.Actually, VERIFY will return the position of the first character in the test string which does or doesn't match with any of the characters in the comparison string. A little confusing at first, hm? Here we go:
/* VERIFY sample */
matchstring = "0123456789 -+"
PARSE PULL phone
if VERIFY(phone, matchstring, "NOMATCH") = 0 then
say "The phone number is okay."
else
say "Phone number contains invalid characters."
In the above example, if one would enter +44-123 456 / 789 VERIFY would
actually return 13 because in the test string, character number 13 ("/")
is not part of the comparison string. Thus, VERIFY used with the NOMATCH
parameter will tell which character does NOT MATCH the comparison string characters.
If a 0 (zero) is returned, this means that there is no character that doesn't match,
thus the number is "okay" we might say.
If "MATCH" is used instead, VERIFY well tell you the position
of the first character in the test string which MATCHES with those in the comparison
string. It depends on what's easier to code but most of the time, you might prefer
the NOMATCH parameter for it's easier to read and understand program logic.
Some notes about VERIFY: The full syntax is:
result = VERIFY ( <test string>, <comparison> [, "MATCH" | "NOMATCH" ] [, START] )
Only the first letter of the "MATCH" or "NOMATCH" parameter is required and can be either upper or lower case. And there's an additional parameter of START which tells the position in the test string from where to check comparison. By default, comparison starts with the first character in test string but depending on what you might construct as test string, you might want to skip comparison of a certain number of leading characters. If your program uses a specific concatenation of strings for you address book for example, this might result in something like "Doe,John/555-6780". In order to test if the number is valid, you should tell VERIFY to start at position 10 by coding
if VERIFY(phone, matchstring, "NOMATCH",10) = 0 then
or by using the abbreviated version
if VERIFY(phone, matchstring, "N", 10) = 0 then
As "Nomatch" is the default for the comparison type, you could even just omit it. In this case, if you want to supply the optional parameter of START, you'll still have to use the additional comma for the parser to understand that you actually omitted the comparison type parameter:
if VERIFY(phone, matchstring, ,10) = 0 then
In case you just want to check right from the start using "Nomatch", you could omit the whole rest and type
if VERIFY(phone, matchstring) = 0 then
For the example of John Doe you might wonder how to tell the position for START if the name changes. Good point. We'll use another function that'll be discussed in a few moments. But for now, let's have a look at the last informational string function.
WORDS is a very useful function. The whole concept of "words" in strings feels like heaven to you if you're used to program in BASIC for example. A WORD is a subpart of a string which is either enclosed in spaces or the begin/end of the entire string respectively. Amongst those WORD-functions, WORDS is quite simple: It returns the number of words found in a string:
/* words example */
text = "This is a words() sample. "
say words(text)
WORDS would recognize the following substrings: This
is a words() sample.
Thus, WORDS in the above example would return a value of 5.
In order to explain what WORDS are about let me put it this way: If YOU
look at a string, there's words that you recognize, right? The WORDS() function
quite exactly works the same way. As long as there is at least one space between
strings, they will be recognized as two words. It doesn't matter if there's - let's
say - twenty spaces between them. It's still two words. Exceptions are that you
might not recognize a single full stop as a word but WORDS() does if the
dot is separated from the rest by spaces like in
"There were 57 channels and nothing on . "
(note that separated "." at the end). WORDS() would recognize
8 words.
Among string functions, these are the ones used most - at least for me. Let's
start with a very basic one:
POS is used to find the starting position of a string within another
string. IBM uses the "needle and haystack" method to explain the syntax
- that's a quite good way of memorize the syntax scheme:
result = POS( <needle>, <haystack> [, START])POS searches the <haystack> string for the first occurrence of <needle>. It either returns the starting position (the character number, starting with 1 for the first character) or ZERO if the <needle> wasn't found in the <haystack>. Optionally, you can tell POS not to start its search from the first character of <haystack> but from a different position. This is useful for identifying special substrings - although the WORD-functions described later do a much better job here.
"firstname=Peter, lastname=Jones, phone=555-12345"
and you want to retrieve the phone number. Assuming that 'phone number' is always the last entry of the contact data string, you would go like this:
phonestart = POS("phone=", record)
If phonestart contains something else than zero, it means that the string was
found. Next, you skip 6 characters (the length of 'phone=') and you know the starting
position of the actual number. Next, you would determine the length of the string
according to the entire length of 'record' in order to retrieve the number from
the string. But this requires an additional functions (substr) described later.
Personally, I use this function most of the time to check whether a string actually
is contained within another string or not - regardless of where it actually
is like in:
IF POS("/?", parameterstring) \= 0 then call DisplayHelp
...which means "if 'parameterstring' contains a '/?' then call a certain function (used to display the command syntax)"
LASTPOS does quite the same, except that it searches the <haystack>
backwards. It uses the same options (START) and the same return value. LASTPOS
is the convenient way to make sure you find the LAST occurrence of <needle>
within the haystack. Of course, you could achieve the same by using a loop of POS
calls that subsequently START by the last found position, but hey: Why worry?
Personally I use LASTPOS mostly when dealing with file names that include
drive and path information (so-called "fully qualified file names"). Once
I know the position of the last "backslash" character, I know that everything
else "behind" it must be the actual file name and - vice versa - the preceding
part is drive and path. Yes, I could use the FILESPEC() function as well,
but depending on the program needs, sometimes you might need to refer to such data...
The WORD-functions (word, wordpos, wordindex, wordlength and subword)
are extremely useful when dealing with parts of strings that are separated by one
ore more blanks. If you ever tried to identify such parts "by hand" like
in vintage BASIC dialects or other programming languages that lack such functions,
you might agree that REXX feels like "programmer's heaven" ;)
As an example for the following set of functions, let's assume that you have a string
named "input" containing an unknown amount of parts (or "words")
separated by an unknown amount of blanks... for example "Mary has
5 little lambs."
WORDS (as already discussed above) will tell you the amount of "parts" (or "words") that are contained in the string.
SAY WORDS(input)
would display 5
WORD is used to retrieve a single word from a string and "cleaning"
it by removing both leading and trailing blanks. In order to achieve this, you must
tell WORD which word to retrieve by specifying a "word number"
(1 for the first, 2 for the second and so on...). Thus
SAY WORD(input, 2)
would display has.
WORDPOS works just like POS described above - except,
that it doesn't deal with character positions but words: It searches <haystack>
for the first occurrence of <needle> and returns the number of the word that
matches <needle>. Just like POS, an optional parameter can be used
to make WORDPOS start from a "later" position than the first
word. Again, in WORDPOS this refers to a word number.
The syntax is
result = WORDPOS(<needle>, <haystack> [ ,START ] )
Note that <needle> and <haystack> must match exactly for WORDPOS to function correctly - that means, the case of characters must match as well.
SAY WORDPOS("HAS", input)
would give you 0
because "HAS" is not equal to "has", while
SAY WORDPOS("has", input)
would result in 2
Another fact worth mentioning is that you can use more than one "word" for the <needle>. In this case, WORDPOS treats the <needle> contents the same way as all WORD-functions treat the <haystack>: The contents are internally parsed into words. Thus
SAY WORDPOS("has 5", input)
will display 2
as well, although 'input' contains "Mary has 5
little lambs." (which shows 4 spaces between 'has' and '5') while the '<needle>'
uses only 1 space. By internally parsing both needle and haystack into separate
words, the match applies...
WORDINDEX is used to get the starting position of a certain word within the entire string - that's to say including all leading characters, even if they are blanks.
SAY WORDINDEX(input, 2)
would display 7
The SUBSTR function returns you a part of a string, specified by starting character number and length.
SAY SUBSTR(input, 3, 17)
for example will return you the part of 'input' that starts with the 3rd character
and is 17 characters long - which results in ry has
5 lit
being displayed. In case that you're familiar with BASICs "MID$"-function,
note that SUBSTR cannot be used to set/change subparts of a string, but
only to retrieve them. Optionally, SUBSTR can be told to fill up "non-existent
parts" of the substring to retrieve with a specified character. "Non-existent"
in this case refers to a substring that is longer than the actual string. Example?
If your programs retrieves the characters number 3,4 and 5 of a string and you accidentally
pass it a string of 3 characters only, you won't get an error message. Instead,
you will only receive character number 3 along with two spaces - because by default
(if no explicit padding character was specified) blanks are used. If you use the
optional "padding" character, you'll get character number 3 and two padding
characters returned:
SAY SUBSTR(input, 25, 10)
would display "ambs.
"
(without the quotes - they're only used by me to show the trailing blanks)
SAY SUBSTR(input, 25, 10, "-")
would display "ambs.-----"
A handy feature of SUBSTR is that if you don't specify a length operand,
it'll return you the entire rest of the string starting from the specified position:
SAY SUBSTR(input, 17)
would give you little lambs.
The two string functions that I use in almost each program are left and right. They're used to retrieve a substring in a given length of characters from another string. This can be achieved by either starting from the right or the left boundary of the string - according to how the function is called respectively:
SAY LEFT(input, 7)will display "Mary h" whereas
RIGHT(input, 7)will display " lambs." respectively - both (again) without the quotes of course.
SAY LEFT("abcdefg",10,"-")This would display abcdefg---
SUBWORD acts in a similar way to SUBSTR. Besides the fact that it deals with words instead of characters, there are quite some more differences though: There is no padding for "exceeded" parts like in SUBSTR. Remember that input contains 5 "words". If we try to retrieve words 4, 5 and 6 from input by
SAY SUBWORD(input, 4, 3)it would simply give us "little lambs."
SAY SUBWORD(input, 1, 3)thus will display "Mary has 5"
WORDLENGTH finally tells you how much characters a word in a string is made up of:
SAY WORDLENGTH(input, 3)would display 1
mystring = left(mystring, 5)but basically we don't transform a string. But this is not so important right now - let's conclude the article.
COPIES creates a string by concatenating multiple copies of a specified string:
SAY COPIES("bla", 3)for example would display blablabla
SAY COPIES("-", 18)will give you ------------------
XRANGE is useful e.g. for being prepared to deal with character translation. As you might now, each character has an "index number" within the character table. We call that "ASCII table". XRANGE makes use of these numbers and creates a string that consists of a consecutive row of characters (according to the table sequence) by taking into account both start and end characters:
myalphabet = XRANGE("a", "z")will display abcdefghijklmnopqrstuvwxyz
SAY myalphabet
SAY XRANGE("00"x, "FF"x)will display the entire ASCII table contents (as far as the entries are printable characters...).
We already know STRIP from a previous example: It removes leading and/or trailing characters from a string. Or, like I said above, it rather creates a new string that was removed those characters. By default, it removes spaces but can be used for other characters as well. Optionally, you can also specify what type (leading, trailing or both) to remove. The default is both.
SAY STRIP(" Mary. ")will return (display) Mary.
SAY STRIP("0012.850", "L", "0")will give you 12.850 while
SAY STRIP("0012.850", , "0")will display 12.85
SAY STRIP("0012.850", "0")would result in an error, because "0" will be interpreted to be the leading/trailing parameter - which is not valid, but only "L", "T" or "B".
INSERT appears to be quite complex at first sight. The full syntax diagram is
result = INSERT ( <what> , <into> [, START ] [, LENGTH ] [, PAD ] )Basically, it inserts a string into another string by using a specified character position:
SAY INSERT("123", "abcde", 3)will display abc123de
SAY INSERT("123, "abcde")will display 123abcde
SAY INSERT("123", "abcde", 3, 5)will display abc123 de
SAY INSERT(123, "abcde", 3, 5, "#")will display abc123##de
DELSTR removes a substring from another string. It uses quite the same parameters like SUBSTR - the starting character position and length:
SAY DELSTR("abcde", 3, 2)would display abe
SAY DELSTR("abcde", 3)would thus display ab
DELWORD is the equivalent counterpart to DELSTR when dealing with words. We'll return to Mary and her lambs to show how it works:
SAY DELWORD(input, 2, 2)will remove two words from input, starting with word number two: Mary little lambs.
SAY DELWORD("abc def ghi jkl", 2, 2)will result in abc jkl
CENTER is some kind of special "flavor" of INSERT. It'll center a string within a new empty string of a given length and can be told to fill up the boundary parts with a special character. This is great for doing headlines in VIO mode for example:
SAY CENTER("Mary's lambs", "20", "-")will display ----Mary's lambs----
/* the mary sample */
lambname.0 = 5
lambname.1 = "itchy"
lambname.2 = "sparky"
lambname.3 = "joey"
lambname.4 = "samantha"
lambname.5 = "lou"
say center("Mary's lambs list", 30, "-")
do i = 1 to lambname.0
say center(lambname.i, 30)
end
say copies("-", 30)
------Mary's lambs list-------Great, huh?
itchy
sparky
joey
samantha
lou
------------------------------
SAY CENTER("This is not funny!", 7)will display is not
REVERSE is nothing tremendously abstract: It simply gives you the reverse notation of the string passed. If you ever wanted to know what your first name is looking "the other way round", give it a try with:
SAY REVERSE("thomas")You must replace 'thomas' with your first name in order to make the program function correctly. ;) Except, of course, your first name is Thomas too.
SPACE is great when dealing with words. It can be used to make words spaced with the same amount of characters. Did you ever try to first get the "words" out of a string, then put them together, separated by one space each? This can be done with a single command in REXX:
SAY SPACE(input)will display Mary has 5 little lambs.
SAY SPACE(input, 1, " ")to make it understand that we want the words to be separated by 1 blank each. Why not separate them by two underscores each:
SAY SPACE(input, 2, "_")would display Mary__has__5__little__lambs.
SAY SPACE(input, 0)will thus display Maryhas5littlelambs.
I must admit that until today, I didn't ever mess with OVERLAY.
After looking into what it's used for... well, I might mess with it in the future.
What overlay actually does is working like INSERT (it even uses the same
syntax and parameter list) except that - let me put it that way - it uses "overwrite
mode" instead of "insert mode" while typing its text... know what
I mean?
SAY OVERLAY("=XYZ=", "01234567890", 4)will display 012=XYZ=890
result = OVERLAY ( <what> , <into> [, START ] [, LENGTH ] [, PAD ] )If you specify a length parameter, <what> will be padded with PAD characters to the specified length. The default for PAD is blanks. Thus,
SAY OVERLAY("=XYZ=", "01234567890", 4, 6)will display 012=XYZ= 90
Finally, TRANSLATE is another cool function for messing with
strings. With TRANSLATE, you set up two tables of characters which are
used to replace the characters of a string. For each character in the string, TRANSLATE
looks it up in the "input" table, then replaces it with the corresponding
character in the "output" table. Both tables are just strings with characters,
where the correspondence is derived from the character position within the string.
That's to say that character #1 in the input table corresponds to character #1 in
the output table and so on...
The syntax scheme looks like this:
output = TRANSLATE( <input> [, <output-table>] [, <input-table>] [, PAD] )The PAD character will be used to fill up the <output-table> if it's size is smaller than the one of <input-table>. If no PAD is specified, blanks are used by default. This is to ensure that there is a match in <output-table> for each entry of <input-table>. You might wonder what happens if all optional parameters are omitted. What happens to <input> then? Quite simply: It will be translated to upper case only:
SAY TRANSLATE("hello")thus displays HELLO
outstring = TRANSLATE("hello", copies(" ", 5), "aeiou")This will set up an <input-table> which contains all vocals and an <output-table> which contains five spaces. Thus, each vocal found will be replaced with a space. This will give us "h ll " as output-string. Next, we'll use the SPACE function with a separation amount of zero:
SAY SPACE(outstring, 0)which would display hll
SAY SPACE(TRANSLATE("hello", copies(" ", 5), "aeiou"), 0)This might not be the perfect example. Just imagine that you have a multi-line text entry field and want to count the lines in it. You only need to replace everything with spaces except for "0D"x (which is "LF", "line feed"), then strip all spaces off by SPACE() and get the length of the remaining string. You're done. This method can even be used for counting lines in a text file, once you read the entire file into a variable. I didn't believe how d**n fast this is compared to what I coded so far... until I tried on my own. Want to give it a try? Make sure your text files are not too large (I tried with up to approx. 52K).
/* line count sample */
fname="c:\temp\testfile.txt" /* change to your needs */
tablein = xrange("00"x, "FF"x) /* entire ascii charset, 256 bytes */
tableout = copies(" ", 13) || "0D"x || copies(" ", 242) /* 13 spaces + LF + 242 spaces = 256 bytes) */
fchars=charin(fname, 1, chars(fname)) /* read whole file into variable */
result = TRANSLATE(fchars, tableout, tablein) /* leave LFs only */
say length(space(result, 0)) + 1
/* text crypt sample */What it does it using the lower-case alphabet as input-table and the reverse of it as the output-table.
/* warning: this is a stupid way of encryption and not safe! */
intab = xrange("a", "z")
outab = reverse(intab)
enc = translate("thomas", outab, intab)
say enc
dec = translate(enc, intab, outab)
say dec
The next part will take us on a short tour about the most interesting (and useful) "helper" functions of REXX. Thanks for your patience; Stay tuned!
Feature Index
editor@os2voice.org
< Previous Page | Newsletter Index | Next Page >
VOICE Home Page: http://www.os2voice.org