- Analyzed language: Java
- Difficulty level: 200
Serialization is the process of converting in memory objects to text or binary output formats, usually for the purpose of sharing or saving program state. This serialized data can then be loaded back into memory at a future point through the process of deserialization.
In languages such as Java, Python and Ruby, deserialization provides the ability to restore not only primitive data, but also complex types such as library and user defined classes. This provides great power and flexibility, but introduces a signficant attack vector if the deserialization happens on untrusted user data without restriction.
Apache Struts is a popular open-source MVC framework for creating web applications in Java. In 2017, a researcher from the predecessor of the GitHub Security Lab found CVE-2017-9805, an XML deserialization vulnerability in Apache Struts that would allow remote code execution.
The problem occurred because included as part of the Apache Struts framework is the ability to accept requests in multiple different formats, or content types. It provides a pluggable system for supporting these content types through the ContentTypeHandler
interface, which provides the following interface method:
/**
* Populates an object using data from the input stream
* @param in The input stream, usually the body of the request
* @param target The target, usually the action class
* @throws IOException If unable to write to the output stream
*/
void toObject(Reader in, Object target) throws IOException;
New content type handlers are defined by implementing the interface and defining a toObject
method which takes data in the specified content type (in the form of a Reader
) and uses it to populate the Java object target
, often via a deserialization routine. However, the in
parameter is typically populated from the body of a request without sanitization or safety checks. This means it should be treated as "untrusted" user data, and only deserialized under certain safe conditions.
In this workshop, we will write a query to find CVE-2017-9805 in a database built from the known vulnerable version of Apache Struts.
To take part in the workshop you will need to follow these steps to get the CodeQL development environment setup:
- Install the Visual Studio Code IDE.
- Download and install the CodeQL extension for Visual Studio Code. Full setup instructions are here.
- Set up the starter workspace.
- Important: Don't forget to
git clone --recursive
orgit submodule update --init --remote
, so that you obtain the standard query libraries.
- Important: Don't forget to
- Open the starter workspace: File > Open Workspace > Browse to
vscode-codeql-starter/vscode-codeql-starter.code-workspace
. - Download and unzip the apache_struts_cve_2017_9805.zip database.
- Choose this database in CodeQL (using
Ctrl + Shift + P
to open the command palette, then selecting "CodeQL: Choose Database"). - Create a new file in the
codeql-custom-queries-java
directory calledUnsafeDeserialization.ql
.
If you get stuck, try searching our documentation and blog posts for help and ideas. Below are a few links to help you get started:
The workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step. Each step has a hint that describes useful classes and predicates in the CodeQL standard libraries for Java. You can explore these in your IDE using the autocomplete suggestions (Ctrl + Space
) and the jump-to-definition command (F12
).
XStream is a Java framework for serializing Java objects to XML used by Apache Struts. It provides a method XStream.fromXML
for deserializing XML to a Java object. By default, the input is not validated in any way, and is vulnerable to remote code execution exploits. In this section, we will identify calls to fromXML
in the codebase.
-
Find all method calls in the program.
Hint
- A method call is represented by the
MethodAccess
type in the CodeQL Java library.
Solution
import java from MethodAccess call select call
- A method call is represented by the
-
Update your query to report the method being called by each method call.
Hints
- Add a CodeQL variable called
method
with typeMethod
. MethodAccess
has a predicate calledgetMethod()
for returning the method.- Add a
where
clause.
Solution
import java from MethodAccess call, Method method where call.getMethod() = method select call, method
- Add a CodeQL variable called
-
Find all calls in the program to methods called
fromXML
.Hint
Method.getName()
returns a string representing the name of the method.
Solution
import java from MethodAccess fromXML, Method method where fromXML.getMethod() = method and method.getName() = "fromXML" select fromXML
However, as we now want to report only the call itself, we can inline the temporary
method
variable like so:import java from MethodAccess fromXML where fromXML.getMethod().getName() = "fromXML" select fromXML
-
The
XStream.fromXML
method deserializes the first argument (i.e. the argument at index0
). Update your query to report the deserialized argument.Hint
MethodCall.getArgument(int i)
returns the argument at the i-th index.- The arguments are expressions in the program, represented by the CodeQL class
Expr
. Introduce a new variable to hold the argument expression.
Solution
import java from MethodAccess fromXML, Expr arg where fromXML.getMethod().getName() = "fromXML" and arg = fromXML.getArgument(0) select fromXML, arg
-
Recall that predicates allow you to encapsulate logical conditions in a reusable format. Convert your previous query to a predicate which identifies the set of expressions in the program which are deserialized directly by
fromXML
. You can use the following template:predicate isXMLDeserialized(Expr arg) { exists(MethodAccess fromXML | // TODO fill me in ) }
exists
is a mechanism for introducing temporary variables with a restricted scope. You can think of them as their ownfrom
-where
-select
. In this case, we use it to introduce thefromXML
temporary variable, with typeMethodAccess
.Hint
- Copy the
where
clause of the previous query.
Solution
import java predicate isXMLDeserialized(Expr arg) { exists(MethodAccess fromXML | fromXML.getMethod().getName() = "fromXML" and arg = fromXML.getArgument(0) ) } from Expr arg where isXMLDeserialized(arg) select arg
- Copy the
Like predicates, classes in CodeQL can be used to encapsulate reusable portions of logic. Classes represent single sets of values, and they can also include operations (known as member predicates) specific to that set of values. You have already seen numerous instances of CodeQL classes (MethodAccess
, Method
etc.) and associated member predicates (MethodAccess.getMethod()
, Method.getName()
, etc.).
-
Create a CodeQL class called
ContentTypeHandler
to find the interfaceorg.apache.struts2.rest.handler.ContentTypeHandler
. You can use this template:class ContentTypeHandler extends RefType { ContentTypeHandler() { // TODO Fill me in } }
Hint
- Use
RefType.hasQualifiedName(string packageName, string className)
to identify classes with the given package name and class name. For example:from RefType r where r.hasQualifiedName("java.lang", "String") select r
- Within the characteristic predicate you can use the magic variable
this
to refer to the RefType
Solution
import java /** The interface `org.apache.struts2.rest.handler.ContentTypeHandler`. */ class ContentTypeHandler extends RefType { ContentTypeHandler() { this.hasQualifiedName("org.apache.struts2.rest.handler", "ContentTypeHandler") } }
- Use
-
Create a CodeQL class called
ContentTypeHandlerToObject
for identfyingMethod
s calledtoObject
on classes whose direct super-types includeContentTypeHandler
.Hint
- Use
Method.getName()
to identify the name of the method. - To identify whether the method is declared on a class whose direct super-type includes
ContentTypeHandler
, you will need to:- Identify the declaring type of the method using
Method.getDeclaringType()
. - Identify the super-types of that type using
RefType.getASuperType()
- Use
instanceof
to assert that one of the super-types is aContentTypeHandler
- Identify the declaring type of the method using
Solution
/** A `toObject` method on a subtype of `org.apache.struts2.rest.handler.ContentTypeHandler`. */ class ContentTypeHandlerToObject extends Method { ContentTypeHandlerToObject() { this.getDeclaringType().getASupertype() instanceof ContentTypeHandler and this.hasName("toObject") } }
- Use
-
toObject
methods should consider the first parameter as untrusted user input. Write a query to find the first (i.e. index 0) parameter fortoObject
methods.Hint
- Use
Method.getParameter(int index)
to get the i-th index parameter. - Create a query with a single CodeQL variable of type
ContentTypeHandlerToObject
.
Solution
from ContentTypeHandlerToObject toObjectMethod select toObjectMethod.getParameter(0)
- Use
We have now identified (a) places in the program which receive untrusted data and (b) places in the program which potentially perform unsafe XML deserialization. We now want to tie these two together to ask: does the untrusted data ever flow to the potentially unsafe XML deserialization call?
In program analysis we call this a data flow problem. Data flow helps us answer questions like: does this expression ever hold a value that originates from a particular other place in the program?
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two nodes.
Consider this example Java method:
int func(int tainted) {
int x = tainted;
if (someCondition) {
int y = x;
callFoo(y);
} else {
return x;
}
return -1;
}
The data flow graph for this method will look something like this:
This graph represents the flow of data from the tainted parameter. The nodes of graph represent program elements that have a value, such as function parameters and expressions. The edges of this graph represent flow through these nodes.
CodeQL for Java provides data flow analysis as part of the standard library. You can import it using semmle.code.java.dataflow.DataFlow
. The library models nodes using the DataFlow::Node
CodeQL class. These nodes are separate and distinct from the AST (Abstract Syntax Tree, which represents the basic structure of the program) nodes, to allow for flexibility in how data flow is modeled.
There are a small number of data flow node types – expression nodes and parameter nodes are most common.
In this section we will create a data flow query by populating this template:
/**
* @name Unsafe XML deserialization
* @kind problem
* @id java/unsafe-deserialization
*/
import java
import semmle.code.java.dataflow.DataFlow
// TODO add previous class and predicate definitions here
class StrutsUnsafeDeserializationConfig extends DataFlow::Configuration {
StrutsUnsafeDeserializationConfig() { this = "StrutsUnsafeDeserializationConfig" }
override predicate isSource(DataFlow::Node source) {
exists(/** TODO fill me in **/ |
source.asParameter() = /** TODO fill me in **/
)
}
override predicate isSink(DataFlow::Node sink) {
exists(/** TODO fill me in **/ |
/** TODO fill me in **/
sink.asExpr() = /** TODO fill me in **/
)
}
}
from StrutsUnsafeDeserializationConfig config, DataFlow::Node source, DataFlow::Node sink
where config.hasFlow(source, sink)
select sink, "Unsafe XML deserialization"
-
Complete the
isSource
predicate using the query you wrote for Section 2.Hint
- You can translate from a query clause to a predicate by:
- Converting the variable declarations in the
from
part to the variable declarations of anexists
- Placing the
where
clause conditions (if any) in the body of the exists - Adding a condition which equates the
select
to one of the parameters of the predicate.
- Converting the variable declarations in the
- Remember to include the
ContentTypeHandlerToObject
class you defined earlier.
Solution
override predicate isSource(Node source) { exists(ContentTypeHandlerToObject toObjectMethod | source.asParameter() = toObjectMethod.getParameter(0) ) }
- You can translate from a query clause to a predicate by:
-
Complete the
isSink
predicate by using the final query you wrote for Section 1. Remember to use theisXMLDeserialized
predicate!Hint
- Complete the same process as above.
Solution
override predicate isSink(Node sink) { exists(Expr arg | isXMLDeserialized(arg) and sink.asExpr() = arg ) }
You can now run the completed query. You should find exactly one result, which is the CVE reported by our security researchers in 2017!
For this result, it is easy to verify that it is correct, because both the source and sink are in the same method. However, for many data flow problems this is not the case.
We can update the query so that it not only reports the sink, but it also reports the source and the path to that source. We can do this by making these changes: The answer to this is to convert the query to a path problem query. There are five parts we will need to change:
- Convert the
@kind
fromproblem
topath-problem
. This tells the CodeQL toolchain to interpret the results of this query as path results. - Add a new import
DataFlow::PathGraph
, which will report the path data alongside the query results. - Change
source
andsink
variables fromDataFlow::Node
toDataFlow::PathNode
, to ensure that the nodes retain path information. - Use
hasFlowPath
instead ofhasFlow
. - Change the select to report the
source
andsink
as the second and third columns. The toolchain combines this data with the path information fromPathGraph
to build the paths.
-
Convert your previous query to a path-problem query.
Solution
/** * @name Unsafe XML deserialization * @kind path-problem * @id java/unsafe-deserialization */ import java import semmle.code.java.dataflow.DataFlow import DataFlow::PathGraph predicate isXMLDeserialized(Expr arg) { exists(MethodAccess fromXML | fromXML.getMethod().getName() = "fromXML" and arg = fromXML.getArgument(0) ) } /** The interface `org.apache.struts2.rest.handler.ContentTypeHandler`. */ class ContentTypeHandler extends RefType { ContentTypeHandler() { this.hasQualifiedName("org.apache.struts2.rest.handler", "ContentTypeHandler") } } /** A `toObject` method on a subtype of `org.apache.struts2.rest.handler.ContentTypeHandler`. */ class ContentTypeHandlerToObject extends Method { ContentTypeHandlerToObject() { this.getDeclaringType().getASupertype() instanceof ContentTypeHandler and this.hasName("toObject") } } class StrutsUnsafeDeserializationConfig extends DataFlow::Configuration { StrutsUnsafeDeserializationConfig() { this = "StrutsUnsafeDeserializationConfig" } override predicate isSource(DataFlow::Node source) { exists(ContentTypeHandlerToObject toObjectMethod | source.asParameter() = toObjectMethod.getParameter(0) ) } override predicate isSink(DataFlow::Node sink) { exists(Expr arg | isXMLDeserialized(arg) and sink.asExpr() = arg ) } } from StrutsUnsafeDeserializationConfig config, DataFlow::PathNode source, DataFlow::PathNode sink where config.hasFlowPath(source, sink) select sink, source, sink, "Unsafe XML deserialization"
For more information on how the vulnerability was identified, you can read the blog disclosing the original problem.
Although we have created a query from scratch to find this problem, it can also be found with one of our default security queries, UnsafeDeserialization.ql. You can see this on a vulnerable copy of Apache Struts that has been analyzed on LGTM.com, our free open source analysis platform.
- Read the tutorial on analyzing data flow in Java.
- Go through more CodeQL training materials for Java.
- Try out the latest CodeQL Java Capture-the-Flag challenge on the GitHub Security Lab website for a chance to win a prize! Or try one of the older Capture-the-Flag challenges to improve your CodeQL skills.
- Try out a CodeQL course on GitHub Learning Lab.
- Read about more vulnerabilities found using CodeQL on the GitHub Security Lab research blog.
- Explore the open-source CodeQL queries and libraries, and learn how to contribute a new query.