XPath Injection
XPath Injection is a critical vulnerability that allows attackers to manipulate XPath queries used to traverse XML documents. By injecting malicious input, attackers can bypass authentication, extract sensitive data from XML data stores, or gain unauthorized access to information in SOAP services and XML-based configurations.
What Is XPath Injection?
XPath Injection is a code injection technique similar to SQL Injection, but it targets applications that use XPath queries to navigate and retrieve data from XML documents. XPath (XML Path Language) is a query language used to select nodes from an XML document. When applications build XPath queries by concatenating user input directly into query strings without proper validation or sanitization, attackers can inject malicious XPath expressions that alter the query's intended logic.
This vulnerability commonly appears in applications that use XML databases, SOAP web services, XML-based authentication systems, and configuration files stored in XML format. Unlike SQL Injection which targets relational databases, XPath Injection exploits the hierarchical structure of XML data. Since XML is widely used for data interchange, configuration management, and legacy systems integration, XPath Injection remains a significant security concern.
XPath Injection is classified under OWASP's A03 (Injection) category in the 2021 Top 10. While less prevalent than SQL Injection due to the declining use of XML databases in modern applications, it still poses a serious threat to systems that rely on XML for data storage and authentication. The impact can be equally severe, potentially exposing sensitive business data, customer information, and system configurations.
How It Works
The application presents an input mechanism such as a login form, search field, or API parameter that will be used to query XML data. This could be used for authentication, data retrieval, or configuration lookups in XML-based systems.
The application takes the user-supplied input and directly concatenates it into an XPath query string. For example, an authentication query might be constructed as: //users/user[username=' + userInput + ' and password=' + passInput + '].
Instead of providing legitimate credentials, the attacker enters a specially crafted string such as ' or '1'='1. This injection payload is designed to manipulate the XPath query's logic when concatenated into the query string.
The resulting XPath query becomes //users/user[username='' or '1'='1' and password='']. The injected or '1'='1' condition is always true, causing the query to match all user nodes in the XML document, effectively bypassing authentication logic.
The XML parser processes the altered XPath query without detecting the malicious modification. It returns user nodes that match the manipulated criteria, granting the attacker unauthorized access to the first user in the document (often an administrator account) or exposing sensitive data from the XML structure.
Vulnerable Code Example
@RestController
public class AuthController {
@PostMapping("/login")
public ResponseEntity<?> login(@RequestBody LoginRequest req) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("users.xml"));
// VULNERABLE: User input is directly concatenated
// into the XPath query expression
String xpathQuery = "//users/user[username='"
+ req.getUsername() + "' and password='"
+ req.getPassword() + "']";
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
NodeList nodes = (NodeList) xpath.evaluate(xpathQuery,
doc, XPathConstants.NODESET);
if (nodes.getLength() > 0) {
return ResponseEntity.ok(Map.of("status", "authenticated"));
}
return ResponseEntity.status(401).body("Invalid credentials");
}
}
// An attacker can bypass authentication by submitting:
// username: ' or '1'='1
// password: anything
//
// Resulting XPath query:
// //users/user[username='' or '1'='1' and password='anything']
//
// This matches all users, granting access to the first user node.Secure Code Example
@RestController
public class AuthController {
@PostMapping("/login")
public ResponseEntity<?> login(@RequestBody LoginRequest req) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("users.xml"));
// SECURE: Use parameterized XPath with variable resolver
// User input is treated as data, never as XPath code
String xpathQuery = "//users/user[username=$username and password=$password]";
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
// Create variable resolver to safely bind user input
xpath.setXPathVariableResolver(varName -> {
if ("username".equals(varName.getLocalPart())) {
return req.getUsername();
} else if ("password".equals(varName.getLocalPart())) {
return req.getPassword();
}
return null;
});
NodeList nodes = (NodeList) xpath.evaluate(xpathQuery,
doc, XPathConstants.NODESET);
if (nodes.getLength() > 0) {
return ResponseEntity.ok(Map.of("status", "authenticated"));
}
return ResponseEntity.status(401).body("Invalid credentials");
}
}
// With XPathVariableResolver, user input is treated as literal data.
// Even if an attacker submits: ' or '1'='1
// The query looks for a user with that exact username string,
// which does not exist, so the attack fails.Types of XPath Injection
Authentication Bypass
The most common form of XPath Injection targeting authentication systems. Attackers use tautology-based injections like ' or '1'='1 or ' or 1=1 or ''=' to create always-true conditions that bypass login checks. Because XPath queries often return the first matching node, this typically grants access as the first user in the XML document, which in many systems is an administrator account. This makes authentication bypass particularly dangerous in XML-based user management systems.
Data Extraction
Attackers leverage XPath's powerful navigation capabilities to traverse the entire XML tree and extract data from any node. Using expressions like //* (selects all nodes), //user/* (selects all child elements of user nodes), or ancestor:: and descendant:: axes, attackers can access data outside the intended scope of the query. This technique is particularly effective against SOAP services and XML configuration files where sensitive information like database credentials, API keys, and business logic may be stored in the XML structure.
Blind XPath Injection
Used when the application does not display query results directly or provides limited error feedback. Attackers use boolean-based inference techniques to extract data one character at a time by observing application behavior. For example, using substring(//user[1]/password, 1, 1) = 'a' to test if the first character of the password is 'a', then iterating through all possibilities. By analyzing true/false responses (different page content, response times, or status codes), attackers can reconstruct entire data fields character by character, making even error-suppressed systems vulnerable.
Impact
A successful XPath Injection attack can have severe consequences for XML-based systems. The impact depends on the structure of the XML data, the privileges of the application, and the sensitivity of the information stored.
Attackers can navigate the entire XML tree structure to access sensitive information stored anywhere in the document. This includes user credentials, personal data, configuration settings, API keys, database connection strings, and proprietary business information. Unlike SQL databases with table-based access controls, XML documents often store diverse data in a single file, making comprehensive data exposure a significant risk.
XPath Injection in authentication systems allows attackers to bypass login mechanisms entirely, often gaining access as the first user defined in the XML file (typically an administrator). This grants full application privileges without requiring valid credentials, potentially exposing administrative functions and sensitive operations.
Many SOAP-based web services use XPath to parse and query XML messages. Successful injection can manipulate service responses, access restricted methods, or extract data from backend XML stores that the service interfaces with. This can compromise API security, expose internal business logic, and enable unauthorized operations.
Applications that store configurations in XML format (common in Java applications using web.xml, Spring configurations, or custom XML config files) are vulnerable to injection attacks that expose deployment secrets, database credentials, third-party service tokens, and system architecture details. This information can be leveraged for further attacks against the infrastructure.
Prevention Checklist
This is the primary defense against XPath Injection. Modern XPath libraries support parameterized queries through variable resolvers (Java's XPathVariableResolver, .NET's XsltArgumentList, or similar mechanisms). These ensure user input is treated strictly as data values, never as XPath code. Always use these mechanisms instead of string concatenation when building XPath expressions.
Use allowlist validation to restrict input to expected formats. For example, usernames should only contain alphanumeric characters and specific allowed symbols. Reject any input containing XPath metacharacters like single quotes ('), double quotes ("), forward slashes (/), square brackets ([]), and parentheses. While validation alone is insufficient, it provides a critical defense-in-depth layer.
In rare cases where parameterized queries cannot be used, implement proper escaping of XPath special characters. Escape single quotes as ', double quotes as ", and other XML entities appropriately. However, escaping is error-prone and should be avoided in favor of parameterization whenever possible.
Consider migrating from XML-based authentication and data storage to more secure alternatives. Use modern relational databases with parameterized SQL queries, NoSQL databases with proper query interfaces, or OAuth/SAML for authentication. If XML must be used, isolate it from user-facing operations and implement multiple layers of validation and access control.
Structure XML documents to minimize sensitive data exposure. Separate authentication data from business data, use multiple XML files with different access levels, and never store plaintext passwords in XML. If possible, encrypt sensitive fields within the XML structure and implement access controls at the application layer before executing XPath queries.
Audit all code that constructs XPath queries, looking for string concatenation patterns. Use static analysis tools (SAST) to detect potential injection points. Conduct penetration testing specifically targeting XPath injection in SOAP services, XML-based authentication, and configuration parsers. Tools like OWASP ZAP, Burp Suite, and specialized XML fuzzing tools can help identify vulnerabilities.
Real-World Examples
Oracle E-Business Suite
Security researchers discovered XPath Injection vulnerabilities in Oracle's E-Business Suite, a widely-used enterprise resource planning system. The vulnerabilities existed in XML-based SOAP web services that handled authentication and business logic. Attackers could manipulate XPath queries to bypass authentication, access sensitive business data, and execute unauthorized operations across the enterprise system.
SAP NetWeaver
Multiple XPath Injection vulnerabilities were identified in SAP NetWeaver's web services framework, which used XML extensively for configuration and data management. The flaws allowed remote attackers to inject malicious XPath expressions into SOAP requests, potentially accessing sensitive configuration data, user credentials, and business information stored in XML format across the enterprise platform.
Banking SOAP Services
A major financial institution's SOAP-based API for account management was found vulnerable to XPath Injection. The service used XML databases to store temporary transaction data and user preferences. Attackers could exploit the vulnerability to extract account information, transaction histories, and customer personal data by manipulating XPath queries in API requests, highlighting the risk in financial services using legacy XML technologies.
Healthcare System Portal
A healthcare management system using XML-based authentication for provider access was compromised through XPath Injection. The system stored user credentials and access control policies in XML files, queried via XPath expressions built from login inputs. Attackers exploited the vulnerability to bypass authentication, gain administrative access, and potentially access protected health information (PHI), resulting in HIPAA compliance violations and significant remediation costs.
Ready to Test Your Knowledge?
Put what you have learned into practice. Try identifying and fixing XPath Injection vulnerabilities in our interactive coding challenges, or explore more security guides to deepen your understanding.