OXPath is a careful extension of XPath that facilitates data extraction from the deep web, web automation and crawling. It is designed to interact with sophisticated modern web interfaces with client-side scripting and asynchronous server communication.
OXPath extends XPath with:
- Actions, allowing the simulation of user actions (e.g., click, form filling) to interact with the scripted multi-page interfaces of web applications.
- Style Axis and Visible Field, allow node and form field selection based on visual features by exposing all CSS properties via a new axis called style.
- Intensional Axes, to relate nodes through multiple conditions, e.g., to select all nodes which are at the same vertical position and have the same color as the current node, which is not expressible in XPath.
- Extraction Markers, a new kind of qualifier, to identify nodes as representative for records and to form attributes from extracted data.
For example, the following figures illustrate a simple an OXPath expression that navigates on Google News, and extracts a story element for each current Google News story, along with its title and sources, producing an output here shown formatted in XML.


In general OXPath is able to specify complex interaction as illustrated in the following figure on Amazon.
OXPath is available on github released under a BSD-style license. Check out our publications for more details.
