XPath - Navigating and Querying XML Documents
Introduction
As the W3C standard states: XPath is a language for addressing parts of an XML document. That’s simple enough. XPath is basically a means to traverse an XML document and perform searches on it. We can use the structure of the XML document (semantics of the data) or the data itself to perform these searches. XPath can be used in XML transformations (XSLT) and in SOA (BPEL language). jQuery also uses similar logic for its selector search operations.
XML Document
XPath can query any part of an XML document (any node at any level). XML documents are treated as trees of nodes. As a result of a search, XPath may return null
, a string, a number, or another XML node (which can also be queried). XPath is used to navigate through elements or attributes of an XML document.
We will use the following XML document for our examples:
<game-systems>
<system>
<type>Arcade</type>
<name>MAME</name>
<emulator usable="true">true</emulator>
</system>
<system>
<type>Pocket</type>
<name>Sony PSP</name>
<emulator usable="false">false</emulator>
</system>
<system>
<type>Console</type>
<name>Nintendo Wii</name>
<emulator usable="false">true</emulator>
</system>
<system>
<type>Console</type>
<name>Sony PS2</name>
<emulator usable="false">true</emulator>
</system>
</game-systems>
Basic Node Selection
To navigate through an XML document, we use path expressions. The most common expressions we will use are slashes: single (/
) or double (//
).
A single slash performs a search from the root node. In our XPath search, an expression like: /game-systems/system/type
will return the following result (in XMLSpy):
A double slash (//
) will traverse the entire XML tree and find all nodes that match the selection, regardless of their position. So, the selection //type
in our example will produce the same result as the previous example.
Other common path expressions to select XML nodes are @
, .
, and ..
.
@
is used to select an attribute, as in: /game-systems/system/emulator/@usable
, where we select the value of the usable
attribute within the emulator
node.
.
will select the current node, and ..
will select the parent node. This is similar to selecting file paths in a file system!
To select the parent of an emulator
node (which is system
): /game-systems/system/emulator/..
.
Finding Specific Nodes
To find specific nodes, we use predicates. With this construction, we can perform searches to find nodes with specific element or attribute values. We can also extract specific results from a node-set (if there is more than one node resulting from a previous search). Predicates are always enclosed in square brackets []
.
Finding the first system/emulator
value can be done with the following search: /game-systems/system[1]/emulator
.
To find all system names with a usable emulator, we would write: /game-systems/system/emulator[@usable='true']/../name
. Here, we use ..
as a way to move up to the parent element in the XML tree.
Finding Node-sets Relative to the Current Node
We can also use the XML tree structure (children, parents, etc.) to find specific nodes.
For example, we can rewrite the previous example as follows: /game-systems/system/emulator[@usable='true']/ancestor::system/name
.
This expression is somewhat longer, but it achieves the same result. Here, we are using the ancestor
axis, which returns the ancestors of the current element (system
in this case). You can also search for children, attributes, descendants, and perform similar searches (in most cases) using basic node selectors and predicates. ```
Enjoy Reading This Article?
Here are some more articles you might like to read next: