Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase pyelftools and reapply S2E-specific commits as needed #3

Draft
wants to merge 155 commits into
base: master
Choose a base branch
from

Conversation

michaelbrownuc
Copy link

No description provided.

frederiksdun and others added 30 commits July 16, 2018 06:22
* relocation: handle ARM binaries

* relocation: handle R_ARM_ABS32 for ARM machines

* testfiles: add reloc_arm_gcc.o.elf

Generated on Ubuntu 14.04 using: arm-linux-gnueabi-gcc-4.7 -c -g -o reloc_armhf_gcc.o.elf hello.c

* testfiles: add reloc_armhf_gcc.o.elf

Generated on Ubuntu 14.04 using: arm-linux-gnueabihf-gcc-4.7 -c -g -o reloc_armhf_gcc.o.elf hello.c

* readelf: print soft-float abi for ARM if EF_ARM_ABI_FLOAT_SOFT in flags

* readelf: print hard-float abi for ARM if EF_ARM_ABI_FLOAT_HARD in flags

* readelf: print BE8 info for armeb binaries

* testfiles: add simple_armhf_gcc.o.elf

    Generated on Ubuntu 14.04 using: arm-linux-gnueabihf-gcc-4.7  -g -o simple_armhf_gcc.o.elf hello.c

* elf: remove unwind from dicts and set ARM_EXIDX description

* testfiles: add  reloc_armsf_gcc.o.elf as soft float testcase taken from binutils 2.30

* testfiles: add reloc_armeb_gcc.o.elf as arm big endian testcase taken from binutils 2.30 testcase arm-be8

* readelf: print endian info LE8 if flag was set in header flags
* Add support for 'R_ARM_CALL' relocation type

* Add test script and test files to verify support for 'R_ARM_CALL'

Signed-off-by: Koltunov Dmitry <[email protected]>
* Provide enums for DT_FLAGS and DT_FLAGS_1

This change adds two enums with the name to value mappings
for the two flags fields in the dynamic section. The values
and corresponding names are taken from the elf/elf.h file
in the most recent glibc version.

The enums are also used to print the names instead of the
raw hex values for DT_FLAGS and DT_FLAGS_1 in
scripts/readelf.py.

Fixes: eliben#189

* Add test file for DT_FLAGS/DT_FLAGS_1 parsing

The test file has the DF_BIND_NOW and DF_ORIGIN flags set
in DT_FLAGS as well as DF_1_NOW, DF_1_GLOBAL, DF_1_NOOPEN
and DF_1_ORIGIN flags in DF_FLAGS_1.

This is the source code for the dt_flags.elf file:

  #include <stdio.h>

  int function(const char *arg){
      printf("Hello, %s!", arg);
      return 0;
  }

and was compiled using the following command line:

$ gcc -shared -m32 \
  -Wl,-rpath,'$ORIGIN/lib',-z,global,-z,origin,-z,nodlopen,-z,now \
  -o testfiles_for_readelf/dt_flags.elf dt_flags.c
The __init__ function of ARMAttribute has two parameters
structs and stream through which the caller can pass in the
relevant objects (ARMAttributesSubsubsection does that after
seeking to the right position in stream).

The accesses for TAG_SECTION and TAG_SYMBOL, however, were
referring to non-existing members instead of the parameters.

Additionally, one assertion tries to access an undefined
'null_byte' variable which should be 'nul' instead.
The stream position in the .debug_info stream can't change when
reading from the .debug_abbrev stream.
…eliben#206)

* Implemented ELFFile.get_machine_arch for the remaining architectures.

Added all architectures according to the ENUM_E_MACHINE.

* Refactored if statement into dict.get.
The code that is intended to coalesce null DIEs into the DIE that
precedes them does not do that and is actually not needed as the
'unflattening' procedure takes care of any unexpected null DIEs.

Also added a unit test for verifying the DIE size calculation.
…ns (eliben#208)

* Added support for decoding .debug_pubtypes and .debug_pubnames sections

* Added reference output to dwarf_pubnames_types.py example.

* Added readelf support, fixed review comments and documentation updates

* Avoid printing the entire die in pubnames example to workaround Python2 vs 3 imcompatibilites
Create all the AbbrevDecl objects during parsing and later return
references to them - this gives a small performance gain.
…#214)

In DWARFv4 the location lists are referenced with the 'sec_offset'
attribute form instead of 'data4' or 'data8'.
* tox: explicitly set locale

Locale affects GNU binutils output translation which cause
run_readelf_tests.py to fail if system language is not English.

Signed-off-by: Efimov Vasily <[email protected]>

* test: unittest reproducing error with empty ".debug_pubtypes" section

Signed-off-by: Efimov Vasily <[email protected]>

* NameLUT: use `construct.If` to declare "name" field

This patch also fixes problem with empty first entry.

Signed-off-by: Efimov Vasily <[email protected]>

* NameLUT._get_entries: remove unused `bytes_read`

Signed-off-by: Efimov Vasily <[email protected]>
StringTableSection.get_string() returns an UTF-8 decoded
string (or '' if fetching the string failed) since eliben#182
but the code in _DynamicStringTable was never updated to
decode anything at all so it just returns a bytes sequence
in Python 3.

Let's convert the string there as well to be able to use
both string tables the same way without having to worry
about decoding. Adapt the test cases accordingly.
On macOS I'm getting the following error when testing with tox on py27:

```
ERROR: invocation failed (exit code 1), logfile: /devel/pyelftools/.tox/py27/log/py27-33.log
ERROR: actionid: py27
msg: installpkg
cmdargs: ['/devel/pyelftools/.tox/py27/bin/pip', 'install', '-U', '--no-deps', '/devel/pyelftools/.tox/dist/pyelftools-0.25.zip']

DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Processing ./.tox/dist/pyelftools-0.25.zip
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/qz/XXX/T/pip-req-build-890d2p/setup.py", line 47, in <module>
        scripts=['scripts/readelf.py']
      File "/devel/pyelftools/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 144, in setup
        _install_setup_requires(attrs)
      File "/devel/pyelftools/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 137, in _install_setup_requires
        dist.parse_config_files(ignore_option_errors=True)
      File "/devel/pyelftools/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 704, in parse_config_files
        self._parse_config_files(filenames=filenames)
      File "/devel/pyelftools/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 600, in _parse_config_files
        reader = io.TextIOWrapper(fp, encoding=encoding)
    LookupError: unknown encoding:
```

This is due to the specification of LC_ALL as simply `en_US` without an encoding. Python 3.x seems to be fine with this, but Python 2.7 barfs. As a fix, setting `LC_ALL` to `en_US.utf-8` (including an explicit encoding spec) works.
dynamic: parse DT_{GNU_}HASH for number of symbols

In ultra-stripped binaries we can find the symbol table by
parsing the dynamic segment and using the pointer in the
DT_SYMTAB tag as the base address. However, we don't know
anything about the number of symbols in the symbol table.

Earlier, this code relied on finding the closest pointer
value bigger than the base address of the symbol table. In
PIE executables and shared libraries however this method
could break as the pointer value for DT_SYMTAB is in the same
range as things like DT_RELASZ or DT_STRSZ, leading to a too
small number of symbols returned by iter_symbols().

The crashpad project has implemented a different strategy to
find the number of symbols: parsing the symbol lookup hash
tables (see [0]) as every symbol must have a corresponding
entry in the hash table. This commit implements this
behaviour for DynamicSegment, leaving the old code as a
backup if neither DT_HASH or DT_GNU_HASH tags have been
found.

For DT_HASH type tables, it is quite easy as the header
already contains the number of entries. For DT_GNU_HASH
things are a bit more complicated as we need to work forward
from the highest symbol referenced in the header (a good
explanation of the format can be found at [1]).

[0]: chromium/crashpad@1f1657d
[1]: https://flapenguin.me/2017/05/10/elf-lookup-dt-gnu-hash/

* dynamic: provide more functions for symbol access

So far, the DynamicSegment only provided a method to iterate
over all symbols but for some use cases it might be useful to
use the recovered symbol table more like a normal
SymbolTableSection.

To this end, provide get_symbol(index) to fetch a symbol by
its index, num_symbols() to get the total number of symbols
and get_symbol_by_name(name) to look for a list of symbols
with a given name.
$SITE_PYTHON/lib/python3.7/site-packages/elftools/construct/lib/container.py:5
 Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working

This change is compatible with Python 3.3 and up, when the ABCs were
moved to collections.abc. Backward compatibility is retained through
the try/except block.
woodruffw and others added 22 commits May 27, 2021 06:38
* dwarf: initial DWARFv5 support

* dwarf/structs: use Embed to select header layout

* dwarf/structs: DW_FORM_strx family

Not sure how best to handle 24-bit values yet.

* dwarf/structs: use IfThenElse

`If` alone wraps the else in a `Value`.

* dwarf/structs: DW_FORM_addrx family handling

* dwarf_expr: support DW_OP_addrx

Not complete, but gets readelf.py to the end of a single
binary.

* dwarf/constants: DW_UT_* constants

* dwarf/structs: fix some DW_FORMs

* elftools, test: plumbing for DWARFv5 sections

* dwarf/constants: fix typo

* dwarf/structs: re-add a comment that got squashed

* dwarf/structs: DWARFv5 table header scaffolding

* dwarf/constants: typo

* test: add a basic DWARFv5 test
…r most architectures (eliben#354)

* fixed parsing for structures containing uids or gids in core dumps for most architectures

* added testcase for mips corefile uid/gid parsing

* better description

* better email
* [example] Handle lpe with end_sequence correctly

* [example] exclude highpc in address comparison in decode_funcname

Co-authored-by: Jangseop Shin <[email protected]>
* ELF notes: keep raw note descriptors as bytes

* py3compat: add bytes2hex function

* elf/descriptions: use bytes2hex where needed

* ELF notes: convert to string only for known types
This is very similar to the filtering implemented for
sections in commit d71faeb.
* DWARF 5 tags and attributes

* DW_AT_virtual

Co-authored-by: Seva Alekseyev <[email protected]>
* DWARF 5 tags and attributes

* DW_AT_private

Co-authored-by: Seva Alekseyev <[email protected]>
* Add support for .note.gnu.properties notes section

References:

- Doc: https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf
- Linux: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=00e19ceec80b03a43f626f891fcc53e57919f1b3
- Glibc: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/dl-prop.h;h=385548fad3e4ad71dbdcdbfada58585c2f24ea5e;hb=HEAD
- Binutils: https://sourceware.org/git/?p=binutils-gdb.git&a=search&h=HEAD&st=commit&s=NT_GNU_PROPERTY_TYPE_0

* Add descriptions for .note.gnu.properties notes

* descriptions: add missing PT_GNU_PROPERTY description

* py3compat: add optional separator for bytes2hex

* readelf: align notes column headers

* elf/descriptions: conform to real readelf's output format

* test: special case some known readelf output quirks

* test: add test ELFs for .note.gnu.property notes
Changes to conform the output of readelf.py to binutils readelf v2.37:

- Use singular "entry" when needed instead of "entries".

- Output the last entry for the .debug_line output table when
  DW_LNE_end_sequence is encountered, as DWARF standard dictates. Looks
  looks like this was a readelf bug which was fixed in commit
  ba8826a82a29a19b78c18ce4f44fe313de279af7 of the GNU binutils-gdb repo.

- Add additional "Stmt" field in the .debug_line output table, and
  ignore the new "View" field. The "Stmt" field has been implemented in
  readelf.py. The "View" field is not something that the DWARF standard
  defines, it's an internal register added to the line number
  information state machine by binutils to perform assembler checks (see
  commit ba8826a82a29a19b78c18ce4f44fe313de279af7 of GNU binutils-gdb
  repo for more info, in particular gas/doc/as.texinfo). "View" is
  unimplemented in pyelftools for now and a special case has been added
  in the readelf test suite to ignore it.

- Add support for printing section names when dumping .symtab entries of
  st_type STT_SECTION as readelf v2.37 does (see commit
  23356397449a8aa65afead0a895a20be53b3c6b0 of GNU binutils-gdb repo).

- Add suport for recognizing SOs specifically tagged as PIE (DT_FLAGS_1
  dynamic tag with DF_1_PIE set). In such case, describe the file as
  "Position-Independent Executable file" instead of "Shared object
  file", as readelf v2.37 does.

- Add leading "0x" for version section addresses when dumping version
   information (-V) as readelf does.

- Ignore "D (mbind)" in section headers flags legend (pyelftools does
  not output this flag).

Special cases ADDED for run_readelf_tests.py:

- Ignore "View" column for --debug-dump=decodedline in readelf's output.
- Ignore ellipsis ("[...]") for long names/symbols/paths in readelf's
  output.

Special cases REMOVED for run_readelf_tests.py:

- Detection of additional '@' after symbol names (flag_after_symtable)
  seems to no longer be needed as all tests pass whitout this exception.
- Special case for DW_AT_apple_xxx seems to no longer be needed, readelf
  now recognizes those.
- Special case for PT_GNU_PROPERTY no longer needed, readelf now
  recognizes it.

Other changes:

- Add missing import in elftools/dwarf/lineprogram.py.

References:

- GNU binutils-gdb repo: https://sourceware.org/git/?p=binutils-gdb.git
- Implement support for GNU property note type
  GNU_PROPERTY_X86_FEATURE_1_AND (which is a feature bitmask) and its
  relative flags.
- Fix off-by-one in "Data size" column alignment for readelf.py note
  sections dump.

References:

- https://gitlab.com/x86-psABIs/x86-64-ABI
* Add PS3/CellOS OSABI identifier.

* Remove "OS" from CELL OS ABI

* Remove "OS" from CELL OS ABI

* Add Missing comma for ELFOSABI_CELL_LV2.
Remove unused imports
…en#395)

As more and more tools now support DT_RELR compressed relocations
(most notably, the just released GNU binutils 2.38 [0]), let's add
support for reading these relocations as well.

The original discussion about advantages of packe RELATIVE
relocations can be found at [1]. In a nutshell, the format
exploits the fact that RELATIVE relocations are often placed
next to each other and (for x86_64) stores up to 64 relocations
in two 8-byte words. In a regular .rela.dyn table, these would
take up 24 * 64 = 1536 bytes.

The compressed relocations work as follows:

The first word in the section describes a base address and
contains an offset for a relocation. This offset must always
lie at an even address. Following this entry can be one or
more bitmap(s) which have their least significant bit set to 1.
All other bits describe (in increasing order of significance) if
the following continuous offsets also contain a relocation. The
addends for existing relocations are stored at the corresponding
offsets in the file (that is, they work like REL relocations).
A good description of the history of this feature and its current
adoption is the following blog post [2].

[0]: https://lists.gnu.org/archive/html/info-gnu/2022-02/msg00009.html
[1]: https://groups.google.com/g/generic-abi/c/bX460iggiKg?pli=1
[2]: https://maskray.me/blog/2021-10-31-relative-relocations-and-relr
* Add support DW_FORM_implicit_const

* Add support for DW_FORM_line_strp

* Add new tests for DW_FORM_implicit_const and DW_FORM_linestrp.
elftools/* Reapply S2E-specific commits.
@michaelbrownuc
Copy link
Author

Still need to test before marking ready for review.

@vitalych
Copy link
Member

vitalych commented May 14, 2022

I propose we do the following:

  • Keep the existing master branch as is. Perhaps even rename it to v0.24-s2e.
  • Create a new branch based on the updated pyelftools from upstream, then cherry-pick S2E-specific commits on top. Some of them could be squashed/cleaned up (e.g., version modifications).
  • Update s2e-env to install this new branch.

That would help keep the history linear, avoid messy merge commits inside the PR, and will make it clear what are the S2E changes.

@michaelbrownuc
Copy link
Author

I propose we do the following:

  • Keep the existing master branch as is. Perhaps even rename it to v0.24-s2e.
  • Create a new branch based on the updated pyelftools from upstream, then cherry-pick S2E-specific commits on top. Some of them could be squashed/cleaned up (e.g., version modifications).
  • Update s2e-env to install this new branch.

That would help keep the history linear, avoid messy merge commits inside the PR, and will make it clear what are the S2E changes.

I agree that this commit history is a bit messy. I went this path as it was most conducive for testing / understanding the S2E commits in my fork. I'm fine not merging this, I've mostly created this draft PR so you can see what I'm testing.

If testing goes well, the most straightforward approach IMO would just be to fetch upstream and resolve the conflicts as I have done on this PR. I think it might confusing for newcomers to see that S2E-env relies on a branch of a fork of pyelftools.

@michaelbrownuc
Copy link
Author

Circling back to this - having resolved the testing issue related to the ubuntu image, I was able to test tracing functions and did not see any issues related to pyelftools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.