ORA: Text Transforms

The transforms in this category manipulate text values.

Transform	Description
append	Returns the input value with append's parameter concatenated as a suffix
extract	Extracts a substring from the value using a Regular Expression pattern to identify the substring
extractIndex	Extracts a substring from the value using a Regular Expression pattern to identify the substring. The pattern may match the value multiple times, and if so, you may specify an index number to indicate which matching value you want to extract
hrefToUrl	Converts an HREF value, which may be a relative URL, into a full URL, which will be an absolute URL
initials	Returns the first letter of each word in the value followed by a period and a space
length	Returns the length of the value
numberToWords	Converts the field value from digits ("17") to words ("seventeen")
padLeft	Pads the current string with a given string (repeated, if needed) so that the resulting string reaches a given length. The padding is applied from the left (start) of the current string
padRight	Pads the current string with a given string (repeated, if needed) so that the resulting string reaches a given length. The padding is applied to the right (end) of the current string
plural	Returns one of three text values based on the numeric value of the field value
prepend	Returns the input value with prepend's parameter concatenated as a prefix
replace	Replaces text that matches either a Regular Expression pattern or a literal value with new text
reverse	Splits the field value into parts using the separator to divide the parts and returns a text value with the parts in reverse sequence
split	Splits the field value into parts using the separator to divide the parts and returns the part indicated by the index number
splitCount	Splits the field value into parts using the separator to divide the parts and returns the number of parts
splitLines	Splits the field value into lines and returns the line indicated by the index number
splitLinesCount	Splits the field value into lines and returns the number of lines
substring	Extracts a subset of characters from a value
urlParameter	Returns the input value encoded for use in a URL

Transform

Description

Returns the input value with append's parameter concatenated as a suffix

Extracts a substring from the value using a Regular Expression pattern to identify the substring

Extracts a substring from the value using a Regular Expression pattern to identify the substring. The pattern may match the value multiple times, and if so, you may specify an index number to indicate which matching value you want to extract

hrefToUrl

Converts an HREF value, which may be a relative URL, into a full URL, which will be an absolute URL

initials

Returns the first letter of each word in the value followed by a period and a space

length

Returns the length of the value

numberToWords

Converts the field value from digits ("17") to words ("seventeen")

padLeft

Pads the current string with a given string (repeated, if needed) so that the resulting string reaches a given length. The padding is applied from the left (start) of the current string

padRight

Pads the current string with a given string (repeated, if needed) so that the resulting string reaches a given length. The padding is applied to the right (end) of the current string

plural

Returns one of three text values based on the numeric value of the field value

prepend

Returns the input value with prepend's parameter concatenated as a prefix

replace

Replaces text that matches either a Regular Expression pattern or a literal value with new text

reverse

Splits the field value into parts using the separator to divide the parts and returns a text value with the parts in reverse sequence

split

Splits the field value into parts using the separator to divide the parts and returns the part indicated by the index number

splitCount

Splits the field value into parts using the separator to divide the parts and returns the number of parts

splitLines

Splits the field value into lines and returns the line indicated by the index number

splitLinesCount

Splits the field value into lines and returns the number of lines

substring

Extracts a subset of characters from a value

urlParameter

Returns the input value encoded for use in a URL

append:text

Returns the input value with text concatenated as a suffix.

For example, if a field named Place has the value "Boston", the result of [Place:append:, Massachusetts] is "Boston, Massachusetts".

extract:pattern:flags

Extracts the first substring from the value that matches the Regular Expression. The pattern is specified as parameter one, and the flags for the Regular Expression are specified as an optional parameter two. If the flags are not provided, they default to "gi".

extract returns the first match. If you want to extract the second or subsequent match, use extractIndex.

You should include one capturing group in the Regular Expression to indicate which text to extract. A capturing group is specified using parentheses to surround some or all of the pattern. If you do not include a capturing group, ORA will extract all the text that matches the Regular Expression.

For example, if the "Name" field has the value "John 'Jack' Smith", the result of [Name:extract:'([^']+)'] is "Jack".

The initial ' (single quote) matches a single quote.
The ( starts the capturing group. The characters that match the expression within the parentheses will be extracted.
The [^']+ expression matches one or more characters that are not single-quotes.
The ) ends the capture group.
The final ' (single quote) matches a single quote.

There are many resources online for learning about Regular Expressions including RegexOne. There are also many online Regular Expression testers which will help you understand Regular Expressions. I often use regular expressions 101 to test Regular Expressions. On that site, make sure you set the "Flavor" to "ECMAScript (JavaScript)".

extractIndex:pattern:index:flags

Specifying an index number parameter is the main reason to use extractIndex rather than extract.

Extracts a substring from the value using a Regular Expression to identify the substring. The pattern may match the value multiple times, and if so, you may specify an index number to indicate which matching value you want to extract.

The pattern is specified as parameter one, the index is an optional parameter two, and the flags for the Regular Expression are specified as an optional parameter three. If the flags are not provided, they default to "gi". The "g" (global) flag is required and if you do not specify it, ORA will add it.

If index is a positive number, it indicates the index number of the desired match where the first match is 1, the second match is 2, etc.

if index is a negative number, it indicates the index number of the desired match where the last match is -1, the second-to-last is -2, etc.

Examples

If a field named Place has the value "Hessen (Hesse), Deutschland (Germany)", the result of [Place:extractIndex:$([^$]+)\):1] is "Hesse".
If a field named Place has the value "Hessen (Hesse), Deutschland (Germany)", the result of [Place:extractIndex:$([^$]+)\):2] is "Germany".
If a field named Place has the value "Hessen (Hesse), Deutschland (Germany)", the result of [Place:extractIndex:$([^$]+)\):-1] is "Germany".
If a field named Place has the value "Hessen (Hesse), Deutschland (Germany)", the result of [Place:extractIndex:$([^$]+)\):-2] is "Hesse".

The examples above all use the same Regular Expression pattern, $([^)]+)$, explained as follows:

\( matches a left-parenthesis. The left-parenthesis must be escaped with a back-slash because a left-parenthesis is a reserved character in Regular Expressions.
( starts the capture group. The characters that match the expression within the parentheses will be extracted.
[^\)]+ is a character class expression that matches one or more characters that are not a right-parenthesis. The back-slash escapes the special meaning of the right-parenthesis within the character class expression.
) ends the capture group.
\) matches a right-parenthesis. The right-parenthesis must be escaped with a back-slash because a right-parenthesis is a reserved character in Regular Expressions.

The Regular Expression above is useful for extracting text that is enclosed in parentheses.

hrefToUrl

The following information is intended for advanced users.

Converts an HREF value, which may be a relative URL, into a full URL, which will be an absolute URL.

The hrefToUrl transform is useful when manipulating HREF attributes. It is common for HREF attributes to be relative URLs, i.e., a URL that includes the path part only. To convert a relative HREF to an ansolute URL, use the hrefToUrl transform.

For example:

Value	images/banner.jpg
Current Page	https://www.example.com/index.htm
Result	https://www.example.com/images/banner.jpg

Value

images/banner.jpg

Current Page

https://www.example.com/index.htm

Result

https://www.example.com/images/banner.jpg

If an HREF value is already an absolute URL, hrefToUrl will not modify it. This means you can use hrefToUrl even if you do not know whether the HREF is absolute or relative.

initials

Returns the first letter of each word in the value followed by a period and a space.

If one of the words in the value does not begin with a letter, that word is ignored.

For example, if a field GivenName has the value "Sue Ellen", the result of [GivenName:initials] is "S. E.".

The initials transform ignores words that start with punctuation, such as a nickname enclosed in quotes. For example, if a field GivenName has the value "Margaret "Meg" Elizabeth", the result of [GivenName:initials] is "M. E.". ORA ignores the ""Meg"" part because that word does not begin with a letter.

Limitations

JavaScript does not have an "isLetter()" function. ORA uses a trick to determine if a character is a letter or not: it converts the first character of the word to uppercase and to lowercase, and compares them. If they are not the same, the character is deemed to be a letter.

For example, the result of converting "1" to uppercase is "1", and the result of converting "1" to lowercase is also "1". The uppercased and lowercased values are the same, so "1" is not a letter. The same will be true for other digits, punctuation, emojis, etc.

In contrast, the result of converting "A" to uppercase is "A", and the result of converting "A" to lowercase is "a". The uppercased and lowercased values are not the same, so "A" is a letter. The same will be true for many other letters, including A to Z in English, and A to Z and other letters with diacritics (such as "Á") in Western European alphabets. However, many alphabets, such as Hebrew and Japanese, have letters that do not have both uppercase and lowercase forms. For those alphabets, the initials transform will not produce the desired result.

length

Returns the length of the value.

For example, if a field City has the value "Boston", the result of [City:length] is "6".

Typically, you won't insert the length of a field into your genealogy database. Rather, you'll use the length of a field to make a decision about inserting something else. For example, if your genealogy application has a limit on the length of a field, and the data you are extracting may exceed the length limit, you may want to indicate that data was truncated using an ellipsis.

<[?:Notes:length>200][Notes:substring:1:197]...|[Notes]>

If the Notes field is longer than 200 characters, the first 197 characters are inserted into the output followed by "...". Otherwise, the entire field is inserted into the output.

numberToWords

For numeric values less than 999,999, transforms the value from digits to words. For numeric values greater than 999,999, leaves the value as-is. For non-numeric values, returns an empty string.

For example, if a field named Age has the value "7", the result of [Age:numberToWords] is "seven".

padLeft:length:padString

Pads the current string with a given string (repeated, if needed) so that the resulting string reaches a given length. The padding is applied from the left (start) of the current string.

The first parameter is the desired length. It is required. If the value is already equal to or longer than this length, the value is not changed.

The second parameter is the string to add to the left if needed. It is optional and defaults to a single space.

For example, if a field named Enumeration District has the value "12A", the result of [Enumeration District:padLeft:4:0] is "012A".

padRight:length:padString

Pads the current string with a given string (repeated, if needed) so that the resulting string reaches a given length. The padding is applied to the right (end) of the current string.

The first parameter is the desired length. It is required. If the value is already equal to or longer than this length, the value is not changed.

The second parameter is the string to add to the right if needed. It is optional and defaults to a single space.

For example, if a field named Sheet has the value "B", the result of [Sheet:padRight:4:*] is "B***".

plural:one:many:zero

Returns one of three text values based on the numeric value of the given variable. This transform expects up to three options to specify the text values. The first is for a value of 1 or -1, the second is for a value that is not -1, zero, or 1, and the last is for zero. The general form is:

[field-name:plural:one:many:zero]

For example, if a field named Children has the value "2", the result of [Children:plural:one:more than one:zero] is "more than one".

To add an "s" to a word, you could use [Children:plural::s:s].

Plural returns a true result if the input value is a number and there is at least one parameter. If one or more of the three text values are empty, the transform will not append any text to the result, but the result will still be true. This makes it more convenient to use the transform inside conditional expressions.

prepend:text

Returns the input value with text concatenated as a prefix.

For example, if a field named Place has the value "Boston", the result of [Place:prepend:City=] is "City=Boston".

replace:pattern:newText:flags
replace:oldText:newText:options

Replace has two modes of operation, Regular Expression mode and Literal mode. The default is Regular Expression mode. If you include the option character l (ell), replace operates in Literal mode.

In Regular Expression mode, replace uses a Regular Expression to identify a substring in the current value and replaces the substring with new text. The pattern is specified as parameter one, newText is optional parameter two, and the flags for the Regular Expression are specified as an optional parameter three. If newText is not provided, it defaults to the empty string. If the flags are not provided, they default to "gi".
In Literal mode, replace changes oldText to newText in the current value. The oldText is specified as parameter one, newText is parameter two, and the replacement options are specified as parameter three. You may remove oldText by specifying an empty string for newText. See the Literal Mode Options section below for details.

Regular Expression Mode Examples

Change underscore to comma and space:
```
[Occupation:replace:_:, ]
```
If a field named Occupation has the value "farmer_blacksmith", the result is "farmer, blacksmith"".
Remove leading zero(es) from a number:
```
[Children:replace:^0+]
```
If a field named Children has the value "01", the result is "1".

You may use the following special text sequences in the new text:

Pattern Inserts

$$ Inserts a "$".

$n Where n is a positive integer less than 100, inserts the nth captured group.

Pattern	Inserts
`$$`	Inserts a `"$"`.
`$n`	Where `n` is a positive integer less than 100, inserts the `n`th captured group.

Literal Mode Examples

Change the "quoted exactly" notation from parentheses to brackets:
```
[Occupation:replace:(sic):[sic]:l]
```
If a field named Occupation has the value "farmr(sic)", the result is "farmr[sic]".
Change ", USA" to the empty string:
```
[Occupation:replace:, USA::l]
```
If a field named Location has the value "Augusta, Maine, USA", the result is "Augusta, Maine"".

Literal Mode Options

These are the Literal mode options.

You must always specify the l option to enable Literal mode.
If you specify "l" only, ORA treats that the same as "lgi" and performs a global, case-insensitive replace operation.
If you specify "lg", ORA will replace all occurrences of oldText that match the uppercase and lowercase characters in oldText exactly. The case-insensitive option is disabled.
If you specify "li", ORA will replace only the first occurrence of the oldText using a case-insensitive replace operation. The global option is disabled.
If you specify "l " (an ell and a space), ORA performs a non-global, non-case-sensitive replace operation. Specifying "l " is how you disable both the global and case-sensitive options.

reverse:separator

Splits the field value into parts using the separator to divide the parts and returns a text value with the parts in reverse sequence.

Examples

Return the first part from a field value where the parts are separated by a comma:

[Place:reverse:, ]

If a field named Place has the value "Massachusetts, Suffolk County, Boston", the result is "Boston, Suffolk County, Massachusetts".

split:separator:index

Splits the field value into parts using the separator to divide the parts and returns the part indicated by the index number.

If index is a positive number, it indicates the index number of the desired part where the first part is 1, the second part is 2, etc.

if index is a negative number, it indicates the index number of the desired part where the last part is -1, the second-to-last is -2, etc.

Examples

Return the first part from a field value where the parts are separated by a comma:
```
[Place:split:,:1]
```
If a field named Place has the value "Boston, Suffolk Country, Massachusetts", the result is "Boston".
Return the last part from a field value where the parts are separated by a comma:
```
[Place:split:,:-1]
```
If a field named Place has the value "Boston, Suffolk Country, Massachusetts", the result is "Massachusetts".

splitCount:separator

Splits the field value into parts using the separator to divide the parts and returns the number of parts.

Example

Return the number of parts in a field value where the parts are separated by a comma:

[Place:splitCount:,]

If a field named Place has the value "Boston, Suffolk Country, Massachusetts", the result is "3".

splitLines:index

Splits the field value into lines and returns the line indicated by the index number.

If index is a positive number, it indicates the index number of the desired line where the first line is 1, the second line is 2, etc.

if index is a negative number, it indicates the index number of the desired line where the last line is -1, the second-to-last is -2, etc.

This transform is useful when a field value contains lines of text. It operates the same as split, but provides a separator that is not possible to specify with split.

Line values are trimmed, i.e., leading and trailing spaces are removed. Empty lines are not discarded, so if line 3 is empty, [Text:splitLines:3] will return an empty string.

splitLinesCount

Splits the field value into lines and returns the number of lines.

This transform is useful when a field value contains lines of text. It operates the same as splitCount, but provides a separator that is not possible to specify with splitCount.

Empty lines are included in the count.

substring:indexStart:indexEnd

Extracts a subset of characters from a value.

There are several acceptable forms of the parameters. The first parameter, indexStart, is required. The second parameter, indexEnd, is optional. If indexEnd is not specified, it defaults to the end of the value.

Parameter Variations

If indexStart is a positive number, it indicates the one-origin index of the first character of the desired subset.
For example, if a field named Test has the value "123456789", the result of [Test:substring:1:1] is "1".
If indexStart is a negative number, it indicates the one-origin index of the first character of the desired subset relative to the end of the value.
For example, if a field named Test has the value "123456789", the result of [Test:substring:-1] is "9".
If indexEnd is a positive number, it indicates the one-origin index of the last character of the desired subset.
For example, if a field named Test has the value "123456789", the result of [Test:substring:2:3] is "23".
If indexEnd is a negative number, it indicates the one-origin index of the last character of the desired subset relative to the end of the value.
For example, if a field named Test has the value "123456789", the result of [Test:substring:-3, -2] is "78".
If indexEnd starts with a plus sign (+), the number that follows indicates the length of the desired subset starting at indexStart.
For example, if a field named Test has the value "123456789", the result of [Test:substring:3:+2] is "34".

urlParameter

Returns the input value encoded for use in a URL. Use urlParameter to ensure that characters in a field value are encoded properly for use in a URL. Do not do encoding with the replace transform; that is tedious and error-prone.

For example, if a field named Place has the value "Boston, Massachusetts", the result of [Place:urlParameter] is "Boston%2C%20Massachusetts". The comma and space characters in the original value are converted to %2C and %20, respectively, because the comma and space characters are reserved in URLs and must be encoded.

Web servers may have additional rules for the values passed as parameters and so it may be necessary to use other transforms to massage the value before including it in a URL. It's best to massage the value to meet the rules of the web server and then pass the resulting value to the urlParameter transform to ensure no reserved characters are being used.

Regular Expression Flags

The transforms that accept a Regular Expression as a parameter include an optional parameter to specify flags for the Regular Expression processor.

Flag Description

g Global search; if the pattern matches the input value more than once, all matches will be processed by transforms that support multiple matches.

i Case-insensitive search; a lower-case letter in the value matches an upper-case letter in the pattern, and vice-versa.

Flag	Description
`g`	Global search; if the pattern matches the input value more than once, all matches will be processed by transforms that support multiple matches.
`i`	Case-insensitive search; a lower-case letter in the value matches an upper-case letter in the pattern, and vice-versa.

The default is "gi". If you omit the parameter, ORA will pass "gi" to the Regular Expression processor.

If you want to disable the default "gi" flags, specify one or more spaces as the parameter value.

Text Transforms

append:text

extract:pattern:flags

extractIndex:pattern:index:flags

Examples

hrefToUrl

initials

Limitations

length

numberToWords

padLeft:length:padString

padRight:length:padString

plural:one:many:zero

prepend:text

replace:pattern:newText:flags replace:oldText:newText:options

Regular Expression Mode Examples

Literal Mode Examples

Literal Mode Options

reverse:separator

Examples

split:separator:index

Examples

splitCount:separator

Example

splitLines:index

splitLinesCount

substring:indexStart:indexEnd

Parameter Variations

urlParameter

Regular Expression Flags

replace:pattern:newText:flags
replace:oldText:newText:options