Handling Strings and Text

This section includes the following topics:

Searching for Strings

This section provides information about searching for strings. This section discusses the following topics:

Finding Identical Strings

In a document, you can search for text that is an exact match with what you specify in your query. For example, consider the following query:

//name [ . ="Lu" ]
               

            

This query finds all name elements that contain only the text Lu. It would return elements like these:

<name>Lu</name> 
               
<name> 
               

              
<firstname>Lu</firstname> 
                   

                
</name>

The same query does not return elements like these:

<name>Lu Chen</name>
               
<name>
               

              
<firstname>Lu</firstname>
                   
<lastname>Chen</lastname>
                   

                
</name>

The XPath processor does not return the first name element because the comparison is between "Lu" and "Lu Chen". The query does not return the second name element because the XPath processor concatenates the two strings "Lu" and "Chen" before it makes the evaluation. Consequently, the comparison is between "Lu" and "LuChen". Note that the XPath processor does not insert a space between text nodes that it concatenates.

Searches are case sensitive. A search for "Lu" does not return "lu".

Finding Strings That Contain Strings You Specify

To obtain elements that contain a particular string, call the contains() function. The format is

boolean contains(string, string) 
               

            

The contains() function returns true if the first argument string contains the second argument string, and otherwise returns false. For example, the following query returns all books that have a title that contains the string "Trenton":

/bookstore/book[contains(title, "Trenton")]
               

            

When the first argument is a node list, the XPath processor tests only the string value of the node in the node list that is first in document order. Any subsequent nodes are ignored.

Finding Substrings That Appear Before Strings You Specify

To obtain a substring that appears before a string you specify, call the substring-before() function. The format is

string substring-before(string, string) 
               

            

The substring-before() function returns the substring of the first argument string that precedes the first occurrence of the second argument string in the first argument string. This function returns the empty string if the first argument string does not contain the second argument string. For example, the following call returns "1999":

substring-before("1999/04/01","/")
               

            

Finding Substrings That Appear After Strings You Specify

To obtain a substring that appears after a string you specify, call the substring-after() function. The format is

string substring-after(string, string) 
               

            

The substring-after() function returns the substring of the first argument string that follows the first occurrence of the second argument string in the first argument string. This function returns the empty string if the first argument string does not contain the second argument string. For example, the following call returns "04/01":

substring-after("1999/04/01","/")
               

            

Finding Substrings by Position

To obtain a substring that is in a particular position within its string, call the substring() function. The format is

string substring(string, number, number?) 
               

            

The substring() function returns the substring of the first argument, starting at the position specified in the second argument, with length specified in the third argument. For example, the following returns "234":

substring("12345", 2, 3)
               

            

If you do not specify the third argument, the substring() function returns the substring starting at the position specified in the second argument and continuing to the end of the string. For example, the following call returns "2345":

substring("12345", 2) 
               

            

More precisely, each character in the string is considered to have a numeric position. The position of the first character is 1. The position of the second character is 2, and so on. The returned substring contains those characters for which the position of the character is greater than or equal to the rounded second argument and, if the third argument is specified, less than the sum of the value of the second and third arguments. The comparisons and addition used for the preceding follow the standard IEEE 754 rules. The XPath processor rounds the second and third arguments as if by a call to the round() function. For example:

substring("12345", 1.5, 2.6) returns "234" 
               
substring("12345", 0, 3) returns "12" 
               
substring("12345", 0 div 0, 3) returns "" 
               
substring("12345", 1, 0 div 0) returns "" 
               
substring("12345", -42, 1 div 0) returns "12345" 
               
substring("12345", -1 div 0, 1 div 0) returns "" 
               

            

Manipulating Strings

After you obtain a string, you might want to manipulate it and use the result in the query. This section describes functions that allow you to do this. It discusses the following topics:

Concatenating Strings

To concatenate two or more strings, call the concat() function. The format is

string concat(string, string, {string}...) 
               

            

The concat() function returns the concatenation of its arguments.

Determining the Number of Characters in a String

To obtain the number of characters in a string, call the string-length() function. The format is

number string-length(string?) 
               

            

The string-length() function returns the number of characters in the string. If you omit the argument, it defaults to the string value of the context node.

Normalizing Strings

To strip leading and trailing white space from a string, call the normalize-space() function. The format is

string normalize-space(string?) 
               

            

The normalize-space() function removes leading and trailing white space. White space consists of spaces, tabs, new lines, and returns.

If there are consecutive internal spaces, the normalize-space() function collapses the internal spaces into one space. The normalize-space() function returns the string with the extraneous white space removed. If you omit the argument, it defaults to the string value of the context node.

Replacing Characters in Strings with Characters You Specify

To replace characters in a string with other characters, call the translate() function. The format is

string translate(string, string, string) 
               

            

The translate() function looks for characters in the first string that are also in the second string. For each such character, the translate() function replaces the character in the first string with a character from the third string. The replacement character is the character in the third string that is in the same position as the character in the second string that corresponds to the character being replaced. For example:

translate("bar", "abc", "ABC")
               

            

Execution of this function returns "BAr". Following is another example:

translate("---aaa---", "abc", "ABC")
               

            

Execution of this function returns "AAA". Sometimes there is a character in the second argument string with no character at a corresponding position in the third argument string. This happens when the second argument string is longer than the third argument string. In this case, the XPath processor removes occurrences of that character.

If a character occurs more than once in the second argument string, the first occurrence determines the replacement character. If the third argument string is longer than the second argument string, the XPath processor ignores the excess characters.

Converting Objects to Strings

In some situations, you might want to force a string comparison. The XPath processor performs a string comparison only when the operands are neither Boolean nor numeric values. If an operand is numeric or Boolean, call the string() function on it to convert it to a string. The format of the string() function is

string string(object?) 
               

            

The string() function can convert any object to a string. If you omit the argument, it defaults to a node set with the context node as the only member. The string value of an element node is the concatenation of the string values of all text node descendants of the element node in document order.

Node Sets

When the string() function converts a node set to a string, it returns the string value of the node in the node set that is first in document order. If the node set is empty, the string() function returns an empty string.

Numbers

The string() function converts numbers to strings as follows:

  • NaN (not a number) becomes "NaN"
  • Positive zero becomes "0"
  • Negative zero becomes "0"
  • Positive infinity becomes "Infinity"
  • Negative infinity becomes "-Infinity"
  • An integer becomes a sequence of digits with no leading zeros, for example, "1234". A negative integer is preceded by a minus sign, for example, "-1234".
  • A noninteger number becomes a sequence of digits with at least one digit before a decimal point and at least one digit after a decimal point, for example, "12.34". A negative noninteger number is preceded by a minus sign, for example, "-12.34". Leading zeros are not allowed unless there is only one to satisfy the requirement of a zero before the decimal point. Beyond the one required digit after the decimal point, there must be as many, but only as many, more digits as are needed to uniquely distinguish the number from all other IEEE 754 numeric values.
Boolean Values

The string() function converts the Boolean false value to the string "false", and the Boolean true value to the string "true".

Finding Strings That Start with a Particular String

To determine if a string starts with a particular string, specify the starts-with() function. The format is

boolean 
              starts-with(string
              , string
              ) 
               

            

This function returns true if the first argument string starts with the second argument string, and otherwise returns false.

Obtaining the Text Contained in a Node

You can use the string() function to obtain the text in a node. The string value of an element node is the concatenation of the string values of all text node descendants of the element node in document order. Use one of the following formats:

string string(pathExpression)
               
pathExpression
               

            

Replace pathExpression with the path of the node or nodes that contain the text you want. This can be a rooted path or a relative path. It need not be a single node. If you do not explicitly specify the string() function, you must specify pathExpression in a context where the XPath processor must treat it as a string, for example:

/bookstore/book[title = "Trenton Revisited"]
               

            

The XPath processor obtains the text contained in each title element and compares it with "Trenton Revisited". The XPath processor returns books with the title Trenton Revisited.

For additional information about the string() function, see Converting Objects to Strings.

 
Free Stylus Studio XML Training: