- My GSoC work on GitHub: Commits
(Maybe the link changes, if that is the case please search for: “[author:vrnithinkumar author-date:2020-06-01..2020-08-31]” in the repository of the LLVM project) - My GSoC work on Phabricator: Reviews
- GSoC Project Page Link
- Original report link
Abstract
The Clang Static Analyzer is used to find bugs in the program by analyzing source code without compiling and executing. It uses symbolic computations to find the defects. Analyzer covers a variety of checks targeted at finding security and API usage bugs, dead code, null dereference, division by zero, and other logic errors. The Clang Static Analyzer already has a checker to find the null pointer dereference in code, however it is not sufficient for higher abstractions such as C++ smart pointers or optionals. By explicitly teaching the C++ standard class behaviors we can make the Analyzer to find more bugs related to modern C++ code.Goal
Enable Clang Static Analyzer to find the occurrences of null smart pointer dereferences by teaching the observed behaviors of C++ smart pointer classes. Improve the analyzer’s ability to realize the values of the standard smart pointers without having to dig deep into the complex implementation details. We should be able to find more null dereference bugs related to the smart pointers while reducing the number of false positives. Should be able to cover at least one class fully eg.std::unique_ptr
, and then extend it tostd::shared_ptr
or thestd::optional
if time permits.Summary
Within the GSoC time period, we could not implement the modeling for all smart pointer methods and operators as we planned. Since the problem was more complicated than what we all anticipated. So far the majority of the modeling for std::unique_ptr is implemented and committed. We were able to find some promising results with that. We found 8 true positive warnings related to smart pointer null dereference in LLVM project and 5 warnings in the WebKit project. Even though we were only able to supportstd::unique_ptr
so far, we accomplished to build a base modeling for the smart pointer checkers. And this will act as a consolidated foundation for developing checkers for any C++ objects that are passed by value. That is one of the first conscious attempts to do so and we’ve gained a lot of experience and managed to maintain our integrity - in the sense that the code ended up mostly free of hacks. It could most likely be generalized to modeling the entire C++ standard library. This does not give us the high-level architecture that’ll be needed to deal with the scale of the standard library, but we got our low-level basics right. Also this work will be used to add support for checking other smart pointersstd::shard_ptr
,std::weak_ptr
as well asstd::optionals
. Also it can be used to build a checker for use-after-free errors.Research
Smart Pointer
A Smart pointer is an abstract data type that simulates a pointer with additional automatic memory management. It manages a dynamically allocated object and ensures the dynamically allocated object is properly cleaned up. Such features are intended to reduce bugs caused by the misuse of pointers while retaining efficiency. Smart pointers typically keep track of the memory they point to, and may also be used to manage other resources, such as network connections and file handles.Unique Pointer
Aunique_ptr
is a smart pointer that owns and manages another object through a pointer and disposes that object when theunique_ptr
goes out of scope. It should be used to own and manage any dynamically allocated object when its ownership is not shared. Aunique_ptr
explicitly prevents copying of its contained pointer (as would happen with the normal assignment), but the move assignment and move constructor can be used to transfer ownership of the contained pointer to anotherunique_ptr
.
Example: 1 Dereferencing default-constructed unique pointer which is null
Example: 2 Dereferencing a unique pointer after calling release()
Similar to above cases, other possible cases are dereferencing after calling std::move()
, reset()
or reset(nullptr)
, getting and explicitly deleting inner pointer, or swapping with null pointer using std::swap
(more examples in Appendix). Above all cases will result in a crash.
Design
The basic idea of the checker is to keep a record of raw pointers wrapped inside the smart pointers using a map between smart pointer and corresponding inner pointer. Update the map by enumerating all situations when the smart pointer becomes null, as well as the situations when the smart pointer becomes non-null. For example when a smart pointer is default constructed track that smart pointer as it has a null inner pointer. Then check if any of the tracked pointers dereferenced has a null value. To make the bug report more clear attach additional details along the bug path to provide more detailed information on where the smart pointer becomes null.
Alternative Solution Considered
Another possible solution considered was manipulating symbolic values inside the smart pointer. The limitation that we run into here is that our memory model (“RegionStore”) doesn’t currently allow setting a “default” binding to a whole object when it’s a part (say, a field) of a bigger object. This means that we have to understand how the smart pointer works internally (which field corresponds to what) to manipulate its symbolic value, which ties us to a specific implementation of the C++ standard library. This might still work for a unique pointer which probably always has exactly one field, but for shared pointers it is not the case and has multiple fields. So it depends on the different implementations of the C++ standard library. It has been decided to not go with this approach since this approach is challenging and potentially a lot of work compared to the first approach.
Implementation
Initial Smart Pointer Modeling and Checker
D81315: Created a basic implementation.
- Made a separate checker class for emitting diagnostics. Used the new checker to use
checkPreCall
and put bug reporting logic there. - Kept all smart pointer related modeling logic in
SmartPtrModeling
. Shared common functionality via a header file shared between theSmartPtrModeling
andSmartPtrChecker
. - Made a
SmartPtrModeling
as a dependency toSmartPtrChecker
. - Introduced a GDM with
MemRegion
as key andSVal
as value to track the smart pointer and corresponding inner pointer. - Also added support to model
unique_ptr
constructor, release and reset methods. - Used
evalCall
to handle modeling. As part of this enabled constructor support inevalCall
event with D82256. - Implemented
checkDeadSymbols
to clean up theMemRegion
of smart pointers from the program state map when they go out of scope. Keeping the data structures in the program state as minimal as possible so that it would not grow to a great size while analyzing real code and eventually slows down the analysis.
With this patch, the model can emit warnings for cases like use after default constructor, use after release, or use after the reset with a null pointer. Kept the SmartPtrChecker
under alpha.cplusplus package and smart pointer modeling have to be enabled by the ModelSmartPtrDereference
flag.
checkRegionChanges for SmartPtrModeling
D83836: Implemented checkRegionChanges
for SmartPtrModeling
. To improve the accuracy, when a smart pointer is passed by a non-const reference into a function, removed the tracked region data. Since it is not sure what happens to the smart pointer inside the function.
For example here in the code above, we are passing a default constructed unique_ptr
‘P’ to method bar()
. But it is unknown whether the unique_ptr
‘P’ is reset with a valid inner pointer or not inside bar()
. To avoid false positives we are not producing any warning on dereference of unique_ptr
‘P’ after bar()
.
Modeling for unique_ptr::swap method
D83877: Enabled the SmartPtrModeling
to handle the swap method for unique_ptr
. The swap()
method can be used to exchange ownership of inner pointers between the unique_ptrs
. So it is possible to make a unique_ptr
null by swapping with another null unique_ptr
. With this patch warnings are emitted when a unique_ptr
is used after swapping with another unique_ptr
with null as an inner pointer.
NoteTag for better reporting
D84600: With NoteTags added more detailed information on where the smart pointer becomes null in the bug path. Introduced a getNullDereferenceBugType()
inter-checker API to check if the bug type is interesting.
After adding NoteTags:
Modeling for unque_ptr::get()
D86029: Modeled to return tracked inner pointer for the get()
method. The get()
method is used to access the inner pointer. When the inner pointer is used with conditional branching or other symbol constraining methods we can use the constraints on the inner pointer to find whether the corresponding unique_ptr
is null or not. When the inner pointer value for a unique_ptr
is available from the tracked map we bind that value to the return value of get()
method. Also made changes to create conjureSymbolVal
in case of missing inner pointer value for a unique_ptr
region we are tracking.
Example:
Modeling for unique_ptr bool conversion
D86027: Modeling the case where unique_ptr
is explicitly converted to bool. It is a common practice to check if a unique_ptr
is null or not before accessing it. If the inner pointer value is already tracked and we know the value, we can figure out the corresponding boolean value. And the analyzer will take the branch based on that. Using SValBuilder::conjureSymbolVal
to create a symbol when there is no symbol tracked yet and we constrain on that symbol to split the Exploded Graph with assuming null and non-null value.
Adding support for checkLiveSymbols
D86027: Implemented checkLiveSymbols
to make sure that we are keeping the symbol alive until the corresponding owner unique_ptr
is alive to avoid removing the constraints related to that symbol. Also when the unique_ptr
goes out of scope, we make sure the symbols are cleaned up.
For example here in the code above, we have to keep the symbol for RP
alive since that is tracked as the inner pointer value of unique_ptr
P. We have to use the constraints on that symbol to decide whether the branching takes the true or false branch.
Modeling of move assignment operator (unique_ptr::operator=)
D86293: Modeled how the unique_ptr moves the ownership of its managed memory to another unique_ptr
via =
operator. With the move assignment operator a unique_ptr
can be reset with another unique_ptr
whereas the assigned unique_ptr
will lose its ownership of its inner pointer and become null. Also it is possible to assign nullptr
to a unique_ptr
and reset it to null. Made changes to update the tracked values of both LHS and RHS side unique_ptr
values of the operator.
Modeling for unique_ptr move constructor
D86373: Similar to the =
operator, modeled how the unique_ptr
moves the ownership of its managed memory to another via move constructor. Then tracked the moved unique_ptr’s inner pointer value as null.
Evaluation
The checker has been evaluated on a number of open-source software projects which use smart pointers extensively(symengine, oatpp, zstd, simbody, duckdb, drogon, fmt, re2, cppcheck, faiss). Unfortunately(or fortunately) the checker did not produce any warnings which are not false positive. But we found 8 true positive warnings related to smart pointer null dereference in LLVM project and 5 warnings in the WebKit project.
(Attaching few example warnings)
Warnings in LLVM
Warning-1: clang/lib/Analysis/Consumed.cpp
Warning-2: clang/lib/Lex/Preprocessor.cpp
Warning-3: llvm/utils/TableGen/OptParserEmitter.cpp
Warnings in WebKit
Warning-1:
Warning-2:
Warning-3:
Warning-4:
Future Work
Model remaining methods of unique_ptr
So far we covered the important methods related to unique_ptr
, but still there exist few more methods and operators on unique_ptr
to cover. Remaining methods are std::make_unique
, std::make_unique_for_overwrite
, and std::swap
. Remaining operators include operator*
, operator->
, and all the comparison operators.
Model other smart pointers
Extend the modeling for std::shared_ptr
, std::weak_ptr
and std::optional
. Right now SmartPtrModeling
only models most of the std::unique_ptr
, adding modeling for other smart pointers will make the checker complete.
CallDescriptionMap support for CXX Constructor and Operator
To enable the checker by default we have to use CallDescriptionMap
for evalCall
. Right now we are manually implementing the name matching logic that has been already implemented in CallDescriptionMap
. But the support for the Constructor and Operator calls are not supported yet and changes are in review(D81059, D80503) by Charusso.
Enabling the checker by default
The SmartPtrChecker is under the alpha.cplusplus package and smart pointer modeling has to be enabled by the ModelSmartPtrDereference
flag. Enabling the checker by default will benefit codebases that use smart pointers.
Inlined defensive checks
We are using trackExpressionValue()
to track how an inner pointer expression for a unique_ptr is getting null in the bug report. The trackExpressionValue()
is suppressing some warnings to avoid the false positives with inlined defensive checks.
For example the code below is a true positive warning.
On the other hand, the code below is a false positive warning. We cannot infer unique_ptr
Q is null in bar()
based on the check if(P)
in function call foo()
. So the warning should be suppressed.
Right now we are trusting trackExpressionValue()
when it suppresses reports. It may occasionally suppress true positive warnings, but it’s better than having false positives.
Below code is an example for a suppressed true positive warning.
We have to investigate more on real world projects and see trackExpressionValue()
is sufficient for suppressing all false positive warnings related to inlined defensive checks.
Marking regions as not interesting
Right now there is no API similar to markInteresting()
for marking the region not interesting in a bug report. With this support, our checker can remove less useful and unwanted notes showing in the report. For example, when a unique_ptr
is referenced after the release()
we don’t have to show a note tag on the unique_ptr
constructor unless it is constructed with null.
Before:
After: marking P not interesting
Communication with MallocChecker
When raw pointers are accessed from unique_ptr
via get()
or release()
, we have to ensure that the raw pointers are tracked via MallocChecker
. Also, enable SmartPtrModeling to communicate the deallocation to MallocChecker
when we see the destructor call of the unique_ptr
and it has a default deleter. Also, communicating with MallocChecker
could potentially find double-free errors when the same pointer is passed to multiple unique_ptrs or it is also freed independently of the unique_ptr
(example).
Add modeling for user-defined custom smart pointers
Many C++ projects have their own custom implementations of smart pointers similar to boost::shared_ptr or llvm::IntrusiveRefCntPtr. If the user can specify the custom smart pointers and methods on it, we could reuse the existing SmartPtrModeling
for modeling and checking the custom smart pointers.
How to use
All the changes are in the master. But the checker and modeling are not enabled by default. Checker is under the alpha.cplusplus package and smart pointer modeling has to be enabled by the ModelSmartPtrDereference
flag.
Checker and modeling can be enabled explicitly:
(Since SmartPtrChecker
is depended to SmartPtrModeling
we don’t have to explicitly enable SmartPtrModeling
)
Acknowledgment
I want to express my gratitude towards everyone that helped me with this project, but especially to 3 individuals: My mentors, Artem Dergachev, Gábor Horváth, and Valeriy Savchenko. With their guidance, I’ve learned a lot about how Clang Static Analyzer works during the summer. I got to skype with them every Monday and received all the help and suggestions. When I got stuck with issues I got immediate help even on the weekends. Also, I received very fast feedback for my review requests. I am also thankful to Kristóf Umann for tips and comments on the reviews.
Thank you very much for the support and mentoring.
Appendix
Potential bugs with unique_ptr
A default constructed unique pointer has null value
Unique pointer constructed with null value
Unique pointer constructed with move constructor
release
reset
swap
get
operator bool
Double-free error example
|