- My GSoC work on GitHub: Commits
(Maybe the link changes, if that is the case please search for: “[author:vrnithinkumar author-date:2020-06-01..2020-08-31]” in the repository of the LLVM project)
- My GSoC work on Phabricator: Reviews
- GSoC Project Page Link
- Original report linkClang Static Analyzer is used to find bugs in the program by analyzing source code without compiling and executing. It uses symbolic computations to find the defects. Analyzer covers a variety of checks targeted at finding security and API usage bugs, dead code, null dereference, division by zero, and other logic errors. The Clang Static Analyzer already has a checker to find the null pointer dereference in code, however it is not sufficient for higher abstractions such as C++ smart pointers or optionals. By explicitly teaching the C++ standard class behaviors we can make the Analyzer to find more bugs related to modern C++ code.
std::unique_ptr, and then extend it to
std::optionalif time permits. LLVM project and 5 warnings in the WebKit project. Even though we were only able to support
std::unique_ptrso far, we accomplished to build a base modeling for the smart pointer checkers. And this will act as a consolidated foundation for developing checkers for any C++ objects that are passed by value. That is one of the first conscious attempts to do so and we’ve gained a lot of experience and managed to maintain our integrity - in the sense that the code ended up mostly free of hacks. It could most likely be generalized to modeling the entire C++ standard library. This does not give us the high-level architecture that’ll be needed to deal with the scale of the standard library, but we got our low-level basics right. Also this work will be used to add support for checking other smart pointers
std::weak_ptras well as
std::optionals. Also it can be used to build a checker for use-after-free errors.
unique_ptris a smart pointer that owns and manages another object through a pointer and disposes that object when the
unique_ptrgoes out of scope. It should be used to own and manage any dynamically allocated object when its ownership is not shared. A
unique_ptrexplicitly prevents copying of its contained pointer (as would happen with the normal assignment), but the move assignment and move constructor can be used to transfer ownership of the contained pointer to another
Example: 1 Dereferencing default-constructed unique pointer which is null
Example: 2 Dereferencing a unique pointer after calling release()
Similar to above cases, other possible cases are dereferencing after calling
reset(nullptr), getting and explicitly deleting inner pointer, or swapping with null pointer using
std::swap (more examples in Appendix). Above all cases will result in a crash.
The basic idea of the checker is to keep a record of raw pointers wrapped inside the smart pointers using a map between smart pointer and corresponding inner pointer. Update the map by enumerating all situations when the smart pointer becomes null, as well as the situations when the smart pointer becomes non-null. For example when a smart pointer is default constructed track that smart pointer as it has a null inner pointer. Then check if any of the tracked pointers dereferenced has a null value. To make the bug report more clear attach additional details along the bug path to provide more detailed information on where the smart pointer becomes null.
Another possible solution considered was manipulating symbolic values inside the smart pointer. The limitation that we run into here is that our memory model (“RegionStore”) doesn’t currently allow setting a “default” binding to a whole object when it’s a part (say, a field) of a bigger object. This means that we have to understand how the smart pointer works internally (which field corresponds to what) to manipulate its symbolic value, which ties us to a specific implementation of the C++ standard library. This might still work for a unique pointer which probably always has exactly one field, but for shared pointers it is not the case and has multiple fields. So it depends on the different implementations of the C++ standard library. It has been decided to not go with this approach since this approach is challenging and potentially a lot of work compared to the first approach.
D81315: Created a basic implementation.
- Made a separate checker class for emitting diagnostics. Used the new checker to use
checkPreCalland put bug reporting logic there.
- Kept all smart pointer related modeling logic in
SmartPtrModeling. Shared common functionality via a header file shared between the
- Made a
SmartPtrModelingas a dependency to
- Introduced a GDM with
MemRegionas key and
SValas value to track the smart pointer and corresponding inner pointer.
- Also added support to model
unique_ptrconstructor, release and reset methods.
evalCallto handle modeling. As part of this enabled constructor support in
evalCallevent with D82256.
checkDeadSymbolsto clean up the
MemRegionof smart pointers from the program state map when they go out of scope. Keeping the data structures in the program state as minimal as possible so that it would not grow to a great size while analyzing real code and eventually slows down the analysis.
With this patch, the model can emit warnings for cases like use after default constructor, use after release, or use after the reset with a null pointer. Kept the
SmartPtrChecker under alpha.cplusplus package and smart pointer modeling have to be enabled by the
SmartPtrModeling. To improve the accuracy, when a smart pointer is passed by a non-const reference into a function, removed the tracked region data. Since it is not sure what happens to the smart pointer inside the function.
For example here in the code above, we are passing a default constructed
unique_ptr ‘P’ to method
bar(). But it is unknown whether the
unique_ptr ‘P’ is reset with a valid inner pointer or not inside
bar(). To avoid false positives we are not producing any warning on dereference of
unique_ptr ‘P’ after
D83877: Enabled the
SmartPtrModeling to handle the swap method for
swap() method can be used to exchange ownership of inner pointers between the
unique_ptrs. So it is possible to make a
unique_ptr null by swapping with another null
unique_ptr. With this patch warnings are emitted when a
unique_ptr is used after swapping with another
unique_ptr with null as an inner pointer.
D84600: With NoteTags added more detailed information on where the smart pointer becomes null in the bug path. Introduced a
getNullDereferenceBugType() inter-checker API to check if the bug type is interesting.
After adding NoteTags:
D86029: Modeled to return tracked inner pointer for the
get() method. The
get() method is used to access the inner pointer. When the inner pointer is used with conditional branching or other symbol constraining methods we can use the constraints on the inner pointer to find whether the corresponding
unique_ptr is null or not. When the inner pointer value for a
unique_ptr is available from the tracked map we bind that value to the return value of
get() method. Also made changes to create
conjureSymbolVal in case of missing inner pointer value for a
unique_ptr region we are tracking.
D86027: Modeling the case where
unique_ptr is explicitly converted to bool. It is a common practice to check if a
unique_ptr is null or not before accessing it. If the inner pointer value is already tracked and we know the value, we can figure out the corresponding boolean value. And the analyzer will take the branch based on that. Using
SValBuilder::conjureSymbolVal to create a symbol when there is no symbol tracked yet and we constrain on that symbol to split the Exploded Graph with assuming null and non-null value.
checkLiveSymbols to make sure that we are keeping the symbol alive until the corresponding owner
unique_ptr is alive to avoid removing the constraints related to that symbol. Also when the
unique_ptr goes out of scope, we make sure the symbols are cleaned up.
For example here in the code above, we have to keep the symbol for
RP alive since that is tracked as the inner pointer value of
unique_ptr P. We have to use the constraints on that symbol to decide whether the branching takes the true or false branch.
D86293: Modeled how the unique_ptr moves the ownership of its managed memory to another
= operator. With the move assignment operator a
unique_ptr can be reset with another
unique_ptr whereas the assigned
unique_ptr will lose its ownership of its inner pointer and become null. Also it is possible to assign
nullptr to a
unique_ptr and reset it to null. Made changes to update the tracked values of both LHS and RHS side
unique_ptr values of the operator.
D86373: Similar to the
= operator, modeled how the
unique_ptr moves the ownership of its managed memory to another via move constructor. Then tracked the moved unique_ptr’s inner pointer value as null.
The checker has been evaluated on a number of open-source software projects which use smart pointers extensively(symengine, oatpp, zstd, simbody, duckdb, drogon, fmt, re2, cppcheck, faiss). Unfortunately(or fortunately) the checker did not produce any warnings which are not false positive. But we found 8 true positive warnings related to smart pointer null dereference in LLVM project and 5 warnings in the WebKit project.
(Attaching few example warnings)
Warnings in LLVM
Warnings in WebKit
So far we covered the important methods related to
unique_ptr, but still there exist few more methods and operators on
unique_ptr to cover. Remaining methods are
std::swap. Remaining operators include
operator->, and all the comparison operators.
Extend the modeling for
std::optional. Right now
SmartPtrModeling only models most of the
std::unique_ptr, adding modeling for other smart pointers will make the checker complete.
To enable the checker by default we have to use
evalCall. Right now we are manually implementing the name matching logic that has been already implemented in
CallDescriptionMap. But the support for the Constructor and Operator calls are not supported yet and changes are in review(D81059, D80503) by Charusso.
The SmartPtrChecker is under the alpha.cplusplus package and smart pointer modeling has to be enabled by the
ModelSmartPtrDereference flag. Enabling the checker by default will benefit codebases that use smart pointers.
We are using
trackExpressionValue() to track how an inner pointer expression for a unique_ptr is getting null in the bug report. The
trackExpressionValue() is suppressing some warnings to avoid the false positives with inlined defensive checks.
For example the code below is a true positive warning.
On the other hand, the code below is a false positive warning. We cannot infer
unique_ptr Q is null in
bar() based on the check
if(P) in function call
foo(). So the warning should be suppressed.
Right now we are trusting
trackExpressionValue() when it suppresses reports. It may occasionally suppress true positive warnings, but it’s better than having false positives.
Below code is an example for a suppressed true positive warning.
We have to investigate more on real world projects and see
trackExpressionValue() is sufficient for suppressing all false positive warnings related to inlined defensive checks.
Right now there is no API similar to
markInteresting() for marking the region not interesting in a bug report. With this support, our checker can remove less useful and unwanted notes showing in the report. For example, when a
unique_ptr is referenced after the
release() we don’t have to show a note tag on the
unique_ptr constructor unless it is constructed with null.
After: marking P not interesting
When raw pointers are accessed from
release(), we have to ensure that the raw pointers are tracked via
MallocChecker. Also, enable SmartPtrModeling to communicate the deallocation to
MallocChecker when we see the destructor call of the
unique_ptr and it has a default deleter. Also, communicating with
MallocChecker could potentially find double-free errors when the same pointer is passed to multiple unique_ptrs or it is also freed independently of the
Many C++ projects have their own custom implementations of smart pointers similar to boost::shared_ptr or llvm::IntrusiveRefCntPtr. If the user can specify the custom smart pointers and methods on it, we could reuse the existing
SmartPtrModeling for modeling and checking the custom smart pointers.
All the changes are in the master. But the checker and modeling are not enabled by default. Checker is under the alpha.cplusplus package and smart pointer modeling has to be enabled by the
Checker and modeling can be enabled explicitly:
SmartPtrChecker is depended to
SmartPtrModeling we don’t have to explicitly enable
I want to express my gratitude towards everyone that helped me with this project, but especially to 3 individuals: My mentors, Artem Dergachev, Gábor Horváth, and Valeriy Savchenko. With their guidance, I’ve learned a lot about how Clang Static Analyzer works during the summer. I got to skype with them every Monday and received all the help and suggestions. When I got stuck with issues I got immediate help even on the weekends. Also, I received very fast feedback for my review requests. I am also thankful to Kristóf Umann for tips and comments on the reviews.
Thank you very much for the support and mentoring.
A default constructed unique pointer has null value
Unique pointer constructed with null value
Unique pointer constructed with move constructor