Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
Reviewed-by: Malfurious <m@lfurio.us>
|
|
Originally I was deciding whether to get a reloc based on the type. I'm
not sure what SET_64 vs ADD_64 means, but the SET* types seemed to be
the only symbols we care about. After running into a binary where a
SET* symbol didn't have a name (and crashed sploit), I have decided to
filter on that instead.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Grabbing the json and returning that dict directly avoids all of the
processing we were doing before. I also added in a small, temporary
band-aid for PE files until we add actual support for them. The 'relro'
key doesn't exist on PE files, so just default it to '' in ELF.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
This addresses a couple issues with get_elf_symbols().
First of all, we can greatly simplify our processing of the r2 output by
getting back json instead of trying to do string processing on their
pretty-printed tables. This resolves a number of issues we were running
into and also makes the code way more maintainable.
Second, we have reevaluated what we actually want to get out of r2. We
now grab section offsets, all FUNC, OBJ, and NOTYPE symbols, and all
strings. The strings and section offsets no longer try to escape
special characters and sometimes aren't accessible through normal object
attributes, but now that we have dictionary subscripting, this isn't an
issue.
Lastly, a few subsets of the symbols are separated into their own tables
and added to the main table as subtables. Sections are located at
sym.sect and offset at 0. Imported symbols are located at sym.imp and are
offset at sect['.plt']. Relocations are located at sym.rel and are offset at
sect['.got']. Strings are located at sym.str and are offset at
sect['.rodata'].
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
When iterating over a symtbl, the returned tuples should be sorted by
offset.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Adds a ROP-enabled payload builder under the builder namespace. Much of
the behavior is parameterized by the active arch, so several new columns
are added to the Arch class.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
This dataclass is intended to be used directly with the new ROP builder
class. GadHints allow users to teach the library about gadgets it can
not find on its own and how to use them correctly.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
To determine the address of the end of a payload, based on its Symtbl
data. I believe it makes the most sense to make this a part of the
Payload API, since Symtbl lacks a concept of element size.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
This is a package to contain the related Payload and ROP modules, as
well as utility classes. Payload is moved into the new package.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
ROP gadgets returned through search from the r2 API will now always
contain a file-relative offset, even if they come from a non-pic binary
using a fixed baddr.
However, gadgets returned through the ELF API will be mapped according
to the ELF's Symtbl. This ensures the correct offset is returned
following a library leak, and allows the user to always safely insert an
ELF-returned gadget into that ELF's Symtbl without issue.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
This fixes a bug with Symtbl's __getitem__. An object that is
convertable to int should also cause __getitem__ to behave as though an
int was given, and translate the object as a foreign offset.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
On ELF construction, call r2.get_bin_info() and keep the results under
the psuedo-namespaces .info and .security. Also add a pretty-print to
these in a tabulated form. Also rewrite the ELF pretty-print to just
summarize and not print out the entirety of .sym. Lastly, fixed a small
bug where ELF could crash on construction if ldd fails (loading a
non-native ELF, for instance).
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Code reuse since we were using r2 iI in get_elf_symbols to get the
baddr. This can cause get_bin_info to be called (and log that it's
being called) multiple times, so I'm also adding the @cache annotation.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Call r2's iI command and return a subset of the fields that we care
about.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
You can now lookup a predefined Arch based on a tuple of arch_string
(returned by r2 iI), wordsize, and endianness.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Also added a DEFAULT_ARCH constant.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Also check type when setting arch.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Sets the value of rop.len = 10 in r2, to give the search function more
data to sift through. This is a doubling from the default value (5).
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Development on the rop chain builder has produced this upgrade to our
gadget search facility. The primary advantages in this version are
increased flexibility and runtime performance.
It is now easier to find specific 'stray' instructions (not immediately
followed by a ret) since we search from every position in the data
returned by r2. If you _do_ want a ret, just specify it in your input
regexes. For this reason, a dedicated function for locating a simple
'ret' gadget is no longer present - elf.gadget("ret") is the equivalent.
A major change in this version is that we now obtain and operate on r2's
JSON representation of the gadget data. We now only reach out to r2
once to get all information for a binary (which is cached) and the
actual 'search' is implemented in Python. This provides a significant
performance speedup in cases where we need many gadgets from one binary,
as r2 doesn't need to inspect the entire file each time. Additional
caching is done on specific search results, so that 100% redundant
searches are returned immediately. Access to the raw JSON data is made
available through a new function rop_json(), but is not exposed in the
ELF interface, since it seems like a niche need.
Search results are returned via Gadget objects (or a list thereof),
which contain regular expression Match objects for each assembly
instruction found in the gadget. This allows the caller to retrieve the
values contained in regular expression capture groups if present.
Also, anecdotally, the search functionality in r2 has seemed to return
false negatives for some queries in the past, whereas I haven't noticed
similar cases with this implementation yet.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
This new class is intended to be used to return data from gadget
searches, and is able to be nested within object Symtbls.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Can now use Symtbl subscript syntax to obtain the mapped address of a
foreign offset (not a defined symbol) without having to modify the
object or add a new symbol entry.
Assuming a base value of 10, tbl[15] will return 25, for example.
We now assert that the defined table keys are strings, to prevent the
creation of entries that are now un-readable by this patch. However,
this always should have been the case.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Sometimes we might be working on an object that can be treated as an
int, but python won't automatically type coerce. For example, grabbing a
nested symtbl and passing it in here expecting it to resolve to a type
conversion of its base offset.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Previously, due to precedence rules, the text produced for any library
whose corresponding ELF object has already been initialized would simply
be `str(lib.path)`, instead of the intended formatted string.
Also fixes a typo.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
QoL change - Don't print the headings if the table is empty. Just
report "0 symbols" and the base address.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
When printing a human readable Symtbl, show all nested objects within
[brackets], not just Symtbl itself. Primarily useful since more types
are being developed with the intent of being stored in a Symtbl.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Define human-readable string formatting for objects in repr, rather than
str, as this will enable an interactive interpreter to more conveniently
show this data to the user. I believe this especially makes sense in
cases where __str__ doesn't perform a semantic type conversion for its
class (currently, all affected cases).
Scripts can still easily yield this information by using
`print(object)`, as print will fallback to repr(object) when there is
not an explicitly defined __str__.
Furthermore, this patch still maintains backwards compatability (for the
time being) of using str(object) to retrieve the information. This is
because the default __str__ implementation will defer to __repr__ if
provided. This made the Symtbl case of providing both of them
especially redundant.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
Reviewed-by: Malfurious <m@lfurio.us>
|
|
The built in int's to_bytes and from_bytes functions have some weird
behavior with the signed parameter. Rather than expecting the user to
properly give btoi/itob the right signed value to pass through to
to_bytes/from_btyes, it makes more sense to just always convert an
unsigned number. Using the new int conversions, this can always be
unambiguous with respect to the width of the int.
There may also be situations where a user would like to truncate/sign
extend an int to a certain length other than the configured architecture
wordsize or convert to a different endianness. These are now
parameterized. There is no need to parameterize the width for btoi
because you will now always get an unsigned int back (and because of
python, the width is ambiguous). The user can convert it to whatever
width/sign they want after the fact with the new int conversion methods.
This also means that payload's int() does not need to take a signed
argument either. Whatever sign of int you give it, when it calls itob,
it will get the correct bytearray at the width of the configured
architecture's wordsize.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
Reviewed-by: Malfurious <m@lfurio.us>
|
|
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
Reviewed-by: Malfurious <m@lfurio.us>
|
|
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
Reviewed-by: Malfurious <m@lfurio.us>
|
|
Python's dataclass annotation gives us a nice way to cleanly and
concisely define our list of supported architectures similar to namedtuple.
Unlike namedtuple, though, dataclass gives us an actual class that is
significantly more feature rich and even allows us to add functionality.
In general, these are meant to be like const records of info about an
architecture, so we use frozen=True to enforce some const correctness.
There were some issues when involving other classes for the ActiveArch
feature (subclassing and composition both had their respective issues),
so I'm removing __ActiveArch__ and putting a set() method directly on
Arch. This method will copy a given Arch into the self object. This
technically breaks const correctness as this does modify the object, but
it is intended to only be used on a single sentinel Arch that represents
the active arch. This arch is initialized with x86_64 by default.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
Reviewed-by: Malfurious <m@lfurio.us>
|
|
A read of 0 isn't particularly useful, but it is weird that it will
cause a BrokenPipeError. Instead, it makes more sense to just return an
empty string.
A read of <0 would normally read until EOF, but we already have that
feature in readall() and it wouldn't be particularly useful here. A
similar functionality of reading the entire current contents of the
buffer is useful, though. This is already implemented in
readall_nonblock() and this would be a nice user-facing way of calling
that.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
This effort was triggered by three immediate wants of the module:
An improved data container interface to support things like key
iteration and better key management. This is primarily wanted by
the ROP module (which is still in development).
The introduction of package documentation across the project. This
module is now fully documented.
To fix a bug in the Symtbl constructor, which would not allow a
caller to supply "self" as an initial symbol name, even though it is
legal in every other context. This problem was caused by the
constructor's bound instance parameter sharing this name.
This patch addresses all of these concerns, and also introduces some
fringe / QoL improvements that were discovered during the API refactor.
Element access may now be done via subscripting, as well as the previous
(and still generally perferred) .attribute notation. The syntax for
storing subtables within a parent Symtbl is now greatly streamlined due
to some implementation-level changes to the class. You may now directly
assign just a Symtbl object or a normal int, and you don't have to fuss
with tuples anymore. The subtable's base is taken as its offset in the
parent, and the new operator replacement for the .map() method may be
used to define a desired value for the parent.
This detail is actually a breaking change compared to the previous
version. While not technically a bug, it is unintuitive that the
previous version would not remove subtables when their offset was
changed by a simple assignment - the table would just move. This patch
make it such that any symbol assignment to a regular int will replace an
old mounted subtable if one exists.
There are now no normal instance methods on the Symtbl type (only dunder
method overrides). This is to free up the available symbol namespace as
much as possible. The previous methods map(), adjust(), and rebase()
are now implemented as operators which, in every case, yield a new
derivative object, rather than mutating the original. All operators are
listed here:
@ remap to absolute address
+ remap to relative address
- remap to negated relative address
>> adjust all symbol offsets upward
<< adjust all symbol offsets downward
% rebase all symbol offsets around an absolute zero point
Additionally, Symtbl objects will convert to an integer via int(),
hex(), oct(), or bin(), yielding the base value.
The addition of these operators presents another breaking change to the
previous version. Previously, symbol adjustments or rebases affected
the tracked offsets and caused symbols to shift around in linked tables
as well. Since these operators now preserve the state of their source
object, this is no longer the case. The amount of shift due to
adjustment or rebasing is localized in a specific Symtbl instance (and
is affected the the use of the related operators), however this value is
inherited by derivatives of that object.
There is a third breaking change caused by the use of operators as well.
Previously, the map() function allowed the caller to specify that the
given absolute address is not that of the table base, but of some offset
in the table, from which the new base is calculated. However, the
remapping operators take only a single numeric value as their right hand
side operand, which is the absolute or relative address. The new
intended way of accomplishing this (which is _nearly_ equivalent) is
through the combined use of the rebase and remap operations:
# The address of the puts() function in a libc tbl is leaked
sym = sym % sym.puts @ leak
aka: adjust offsets such that the known point is at the base, then move
that base to the known location. The way in which this is different to
what you would end up with before is that previously, following a
map(abs, off) the base of the table would be accurately valued
according to the known information. Now, the 'base' is considered to be
the leaked value, but internal offsets are shifted such that they still
resolve correctly.
Finally, a few new pieces of functionality are added to build out the
container API:
- symbol key deletion
- iteration over symbol:offset pairs
- can now check for symbol existence with the "in" keyword
- len(symtbl) returns the number of symbols defined
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
I assume that the preferred style is to leave one major class each to a
file. In this case, synchronize the names of the Symtbl class and its
containing module. Per PEP8, the module is lowercase, and the class
remains Pascal case.
If other memory-oriented utilities are introduced in the future, we may
wish to move them, as well as Symtbl, back into a subpackage named
'mem'.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Print the current version (sourced from git describe) when sploit
starts up.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
Reviewed-by: Malfurious <m@lfurio.us>
|
|
Instead of hard-coding the version into the pyproject.toml, we can
dynamically source it at build time. Ideally, we want to use git
describe as a single authority source on the version. The version is
stored in sploit.__version__ and can be consumed during sploit runtime
or during a build/package to populate the project's core metadata
version in the toml file.
hatchling provides a tool.hatch.version plugin that can read out the
variable during a build/package. Because this variable is populated
from a git command, if the source tree isn't in a git repo, it will
fail. In this case, sploit will report a PEP 440 compliant fake version
"0+unknown.version" to let the user know.
Because a packaged distribution doesn't exist in a git repo, we want to
bake in the version at build time into the package. hatchling provides
a plugin to help with this, but it had some technical limitations that
didn't quite work for our use case. Instead, I added a custom build
hook which will take the version sourced from the package (and by proxy
the git command), and overwrite the __init__.py with a hard-coded
version in the __version__ variable. This means that built/packaged
distributions of this project will have a fixed version hard-coded in
rather than dynamically sourcing from git.
The build hook operates just before the build executes. It seems that
most build/packager front-ends (e.g. build, pip) will just run it in the
current source tree rather than making a temp copy. This means that
when we modify the __init__.py, it is modifying our git tree. Ideally,
we want this to be restored at the end of the build. The build hook
interface allows us to write a hook that happens after the build, but it
won't run in the case of a crash or failed build. Instead, I added a
custom solution to this using a member variable deconstructor. If the
build ends in any way, the original contents of __init__.py are written
back out.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
Reviewed-by: Malfurious <m@lfurio.us>
|
|
Currently, the standard way to build and package a Python project is
through a pyproject.toml file rather than the old setup.py. This is
also build back-end agnostic and we can choose to use something other
than setuptools. After looking through a few options, I've decided to
use hatchling.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
Reviewed-by: Malfurious <m@lfurio.us>
|
|
In interact(), we set stdin to be nonblocking for the duration of the
function. As an unexpected side-effect, this was setting stdout to be
nonblocking as well. This has caused at least one crash in the past.
Localizing the nonblock to just when we're reading from stdin should
solve this.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
We had originally decided to use the os.read() function instead of the
actual buffered file object's read function. This was due to the
blocking behavior or os.read() being closer to POSIX read than the other
function.
As it turns out, os.read() is an unbuffered read. Every other read call
in this interface is buffered. This causes some undefined behavior in
certain cases and leads to some really confusing bugs.
After some discussion, we've decided that, in this application's domain,
the blocking behavior of the buffered file object's read is actually
often more useful anyways. Changing this call will deal with both
issues.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
This behavior was accidentally removed in dcba5f2
interact mode works by polling for IO events, but it will miss any
unread data already in the buffer when it is first entered. We can
ensure this gets caught by just doing a read once at the beginning.
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
Line-oriented reads now strip the newline from the end of their returned
string. Additionally, readall() strips the newline, but only from the
string that gets logged to the user's terminal (goodbye to all the "\n"
printed at the end of each line).
Of course, these functions are called by other parts of the read API and
have downstream effects. Consideration was given to the entire API with
these rules in mind:
- Raw reads (or non-line-oriented reads) will not filter ANY of
their read content. They are logged to the screen as one "line"
of log text with \n characters shown in-place (not actually
resetting the terminal cursor). If reading binary, these bytes
dont actually mean line termination anyway.
functions: read, readall(_nonblock) *, readuntil
- Line-oriented reads will strip the terminating \n, log the single
line to the screen, and return it.
functions: readline, readlineuntil **
* readall(_nonblock) functions turn out to be a special case. They will
operate as raw reads, returning a blob of content. However, we
generally want to run them on line-oriented input, so they log according
to the line-oriented rules.
** Although content returned from readlineuntil will have \n's stripped,
the lines are returned in an array, so we can still distinguish them.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|
|
The writeline function will now default to send an empty line when
called without an argument. I don't believe any such default makes
sense for the plain write function, as writing nothing should have no
effect.
Signed-off-by: Malfurious <m@lfurio.us>
Signed-off-by: dusoleil <howcansocksbereal@gmail.com>
|