Add function prologue/epilogue to conditionally add BTI landing pads
and/or PAC code generation & authentication instructions depending on
compilation flags.
This patch enables PACBTI for all relevant variants of strlen:
* Newlib for armv8.1-m.main+pacbti
* Newlib for armv8.1-m.main+pacbti+mve
* Newlib-nano
Add function prologue/epilogue to conditionally add BTI landing pads
and/or PAC code generation & authentication instructions depending on
compilation flags.
This patch enables PACBTI for all relevant variants of strcmp:
* Newlib for armv8.1-m.main+pacbti
* Newlib for armv8.1-m.main+pacbti+mve
* Newlib-nano
Augment the arm_asm.h header file to simplify function prologues and
epilogues whilst adding support for PACBTI enablement via macros for
hand-written assembly functions. For PACBTI, both prologues/epilogues
as well as cfi-related directives are automatically amended
accordingly, depending on the compile-time mbranch-protection argument
values.
It defines the following preprocessor macros:
* HAVE_PAC_LEAF: Indicates whether pac-signing has been requested for
leaf functions.
* PAC_LEAF_PUSH_IP: Whether leaf functions should push the pac code
to the stack irrespective of whether the ip register is clobbered in
the function or not.
* STACK_ALIGN_ENFORCE: Whether a dummy register should be added to
the push list as necessary in the prologue to ensure stack
alignment preservation at the start of assembly function. The
epilogue behavior is likewise affected by this flag, ensuring any
pushed dummy registers also get popped on function return.
It also defines the following assembler macros:
* prologue: In addition to pushing any callee-saved registers onto
the stack, it generates any requested pacbti instructions.
Pushed registers are specified via the optional `first', `last',
`push_ip' and `push_lr' macro argument parameters.
when a single register number is provided, it pushes that
register. When two register numbers are provided, they specify a
rage to save. If push_ip and/or push_lr are non-zero, the
respective registers are also saved. Stack alignment is requested
via the `align` argument, which defaults to the value of
STACK_ALIGN_ENFORCE, unless manually overridden.
For example:
prologue push_ip=1 -> push {ip}
prologue push_ip=1, align8=1 -> push {r2, ip}
prologue push_ip=1, push_lr=1 -> push {ip, lr}
prologue 1 -> push {r1}
prologue 1, align8=1 -> push {r0, r1}
prologue 1 push_ip=1 -> push {r1, ip}
prologue 1 4 -> push {r1-r4}
prologue 1 4 push_ip=1 -> push {r1-r4, ip}
* epilogue: pops registers off the stack and emits pac key signing
instruction, if requested. The `first', `last', `push_ip',
`push_lr' and `align' function as per the prologue macro,
generating pop instead of push instructions.
Stack alignment is enforced via the following helper macro
call-chain:
{prologue|epilogue} ->_align8 -> _preprocess_reglist ->
_preprocess_reglist1 -> {_prologue|_epilogue}
Finally, the necessary cfi directives for adding debug information
to prologue and epilogue are generated via the following macros:
* cfisavelist - prologue macro helper function, generating
necessary .cfi_offset directives associated with push instruction.
Therefore, the net effect of calling `prologue 1 2 push_ip=1' is
to generate the following:
push {r1-r2, ip}
.cfi_adjust_cfa_offset 12
.cfi_offset 143, -4
.cfi_offset 2, -8
.cfi_offset 1, -12
* cfirestorelist - epilogue macro helper function, emitting
.cfi_restore instructions prior to resetting the cfa offset. As
such, calling `epilogue 1 2 push_ip=1' will produce:
pop {r1-r2, ip}
.cfi_register 143, 12
.cfi_restore 2
.cfi_restore 1
.cfi_def_cfa_offset 0
... so that all of 'exit', '_exit', '_Exit' work. 'exit' thus becomes the
standard 'newlib/libc/stdlib/exit.c' -- and functions registered via 'atexit'
are now called at return from 'main' or manual 'exit' invocation.
It seems there is a swapped logic in one of the subcases of
setjmp.S for MIPS: when the FPU registers are 64-bit within
a 32-bit aligned jmp_buf, the code realigns the pointers
before doing 64-bit writes, but the branch logic is swapped:
we must avoid the address adjustement when bit 2 is zero
(that is, the address is already 8-byte aligned).
This always triggers an address error when run, as tested
on a MIPS VR4300 with O64 ABI.
As per the arm Procedure Call Standard for the Arm Architecture
section 6.1.2 [1], VFP registers s16-s31 (d8-d15, q4-q7) must be
preserved across subroutine calls.
The current setjmp/longjmp implementations preserve only the core
registers, with the jump buffer size too small to store the required
co-processor registers.
In accordance with the C Library ABI for the Arm Architecture
section 6.11 [2], this patch sets _JBTYPE to long long adjusting
_JBLEN to 20.
It also emits vfp load/store instructions depending on architectural
support, predicated at compile time on ACLE feature-test macros.
[1] https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst
[2] https://github.com/ARM-software/abi-aa/blob/main/clibabi32/clibabi32.rst
Call __builtin_gcn_get_stack_limit and __builtin_gcn_first_call_this_thread_p
to reduce dependency on some register/layout assumptions by using the new
GCC mainline (GCC 13) builtins, if they are available. If not, the existing
code is used.
The first attempt to support the 64-bit mode had two bugs:
1. The saved general-purpose register 31 value was overwritten with the saved
link register value.
2. The link register was saved and restored using 32-bit instructions.
Use 64-bit store/load instructions to save/restore the link register. Make
sure that the general-purpose register 31 and the link register storage areas
do not overlap.
In a follow up patch, struct _reent is optionally replaced by dedicated
thread-local objects. In this case,_REENT is optionally defined to NULL. Add
the _REENT_IS_NULL() macro to disable this check on demand.
Add a _REENT_CLEANUP() macro to encapsulate access to the
__cleanup member of struct reent. This will help to replace the
struct member with a thread-local storage object in a follow up
patch.
Add a _REENT_STDERR() macro to encapsulate access to the _stderr
member of struct reent. This will help to replace the struct
member with a thread-local storage object in a follow up patch.
Add a _REENT_STDOUT() macro to encapsulate access to the _stdout
member of struct reent. This will help to replace the struct
member with a thread-local storage object in a follow up patch.
Add a _REENT_STDIN() macro to encapsulate access to the _stdin
member of struct reent. This will help to replace the struct
member with a thread-local storage object in a follow up patch.
Add a _REENT_ERRNO() macro to encapsulate the access to the
_errno member of struct reent. This will help to replace the
structure member with a thread-local storage object in a follow
up patch.
Replace uses of __errno_r() with _REENT_ERRNO(). Keep __errno_r() macro for
potential users outside of Newlib.
Remove the pointer indirection through the read-only _global_impure_ptr and
directly use a externally visible _impure_data object of type struct _reent.
This enables the static initialization of global data structures in a follow up
patch. In addition, we get rid of a machine-specific file.
The recent makefile reorganization broke the amdgcn port by creating
duplicate __malloc_lock symbols. This patch fixes the problem by renaming
the malloc_support.c file to mlock.c, thus overriding the default symbol
properly. Actually, I'm not sure how this ever worked?
Convert all the libc/ subdir makes into the top-level Makefile. This
allows us to build all of libc from the top Makefile without using any
recursive make calls. This is faster and avoids the funky lib.a logic
where we unpack subdir archives to repack into a single libc.a. The
machine override logic is maintained though by way of Makefile include
ordering, and source file accumulation in libc_a_SOURCES.
There's a few dummy.c files that are no longer necessary since we aren't
doing the lib.a accumulating, so punt them.
The winsup code has been pulling the internal newlib ssp library out,
but that doesn't exist anymore, so change that to pull the objects.
Rather than define per-object rules in the Makefile, have small files
that define & include the right content. This simplifies the build
rules, and makes understanding the source a little easier (imo) as it
makes all the subdirs behave the same: you have 1 source file and it
produces 1 object. It's also about the same amount of boiler plate,
without having to define custom build rules that can fall out of sync.
We also realign the free & pvalloc definitions: common code puts these
in malloc.o & valloc.o respectively, not in free.o & pvalloc.o objects.
This will also be important as we merge the libc.a build into the top
dir since it relies on a single flat list of objects for overrides.
The mallopt symbol is defined in tiny-malloc.c, not mallocr.c, but
the Makefile in here tries to compile it out of the latter. This
leads to mallopt never being defined.
The build also creates mallinfo.o & mallopt.o & mallstats.o objects
to override common ones, but the common dir doesn't use these names.
Instead, it places these all in mstats.o.
So move the build define logic to a dedicated file and compile it
directly to make things a bit simpler while fixing the missing func
and aligning objects with the cmomon code.
This kills off the last configure script under libc/ and folds it
into the top newlib configure script. The a lot of the logic was
already in the top configure script, so move what's left into a
libc/acinclude.m4 file.
Remove dependency on __sdidinit member of struct _reent to check
object initialization. Like __sdidinit, the __cleanup member of
struct _reent is initialized in the __sinit() function. Checking
initialization against __cleanup serves the same purpose and will
reduce overhead in the __sfp() function in a follow up patch.
The crt0.o was handled in a subdir-by-subdir basis: it would be compiled
in one (e.g. libc/sys/$arch/), then copied up one level (libc/sys/), then
copied up another (libc/) before finally being copied & installed in the
top newlib dir. The libc/sys/ copy was cleaned up, and then the top dir
was changed to copy it directly out of the libc/sys/$arch/ dir. But the
libc/sys/ copy to libc/ was left behind. Clean that up now too.
When migrating the manual to the top-level, the include order was
sorted by name of the subdir. But this changed the chapter order
of the manual in the process. Change the sorting back to match
existing chapters and update the comments to explain.
Using xxx_LIBADD, xxx_DEPENDENCIES, and EXTRA_xxx_SOURCES is one way of
conditionally including files into a target. But it's meant more for the
cases where the variables added to LIBADD & DEPENDENCIES are constructed
via substitution (e.g. AC_SUBST) or other dynamic methods. With Automake
conditionals, then the much simpler form is to conditionally append to
the xxx_SOURCES variable and let Automake sort everything out.
Replace the custom build rules (which require copying & pasting from the
current Makefile) with small stub files. This allows us to drop the rules
entirely and let Automake provide everything.
These subdirs don't actually use anything from libm. The common dir
in particular only has 4 header files, and none are included here.
The xstormy16 code has a comment mentioning why this hack is here, but
it refers to code that was removed when its configure script was merged
up a level.
This is used in a bunch of places, but nowhere is it ever set, and
nowhere can I find any documentation, nor can I find any other project
using it. So delete the flags to simplify.
This was only ever used for i?86-pc-linux-gnu targets, but that's been
broken for years, and has since been dropped. So clean this up too.
This also deletes the funky objectlist logic since it only existed for
the libtool libraries. Since it was the only thing left in the small
Makefile.shared file, we can punt that too.
Now that we use AC_NO_EXECUTABLES, and we require a recent version of
autoconf, we don't need to define our own copies of these macros. So
switch to the standard AC_PROG_CC.
THe stdio subdir is actually required by the documentation. The
stdio/def is handled dynamically, but libc.texi always expects it
to be included, and fails if it isn't. So making it required when
building docs is safe.
The xdr subdir is handled dynamically, but it doesn't include any
docs, so the dynamic logic isn't (currently) adding any value. So
making it required when building docs is safe.
That leaves: iconv, stdio64, posix, and signal subdirs. The chapters
have a little disclaimer saying they are system-dependent, but even
then, imo having stable manuals regardless of the target is preferable,
and we can add more disclaimer language to these chapters if we want.
This doesn't touch the man page codepaths, just the info/pdf.
This isn't strictly necessary, but it makes for much clearer logs as
to what the target is doing, and provides cache vars for anyone who
wants to force the test a different way, and it lets the build cache
its own results when rerunning config.status.
When we had configure scripts in subdirs, the newlib_basedir value
was computed relative to that, and it'd be the same when used in the
Makefile in the same dir. With many subdir configure scripts removed,
the top-level configure & Makefile can't use the same relative path.
So switch the subdir Makefiles over to abs_newlib_basedir when they
use -I to find source headers.
Do this for all subdirs, even ones with configure scripts and where
newlib_basedir works. This makes the code consistent, and avoids
surprises if the configure script is ever removed in the future as
part of merging to the higher level.
Some of the subdirs were using -I$(newlib_basedir)/../newlib/ for
some reason. Collapse those too since newlib_basedir points to the
newlib source tree already.
When using the top-level configure script but subdir Makefiles, the
newlib_basedir value gets a bit out of sync: it's relative to where
configure lives, not where the Makefile lives. Move the abs setting
from the top-level configure script into acinclude.m4 so we can rely
on it being available everywhere. Although this commit doesn't use
it anywhere, just lays the groundwork.
The machine configure scripts are all effectively stub scripts that
pass the higher level options to its own makefile. There were only
three doing custom tests. The rest were all effectively the same as
the libc/ configure script.
So instead of recursively running configure in all of these subdirs,
generate their makefiles from the top-level configure. For the few
unique ones, deploy a pattern of including subdir logic via m4:
m4_include([machine/nds32/acinclude.m4])
Some of the generated machine makefiles have a bunch of extra stuff
added to them, but that's because they were inconsistent in their
configure libtool calls. The top-level has it, so it exports some
new vars to the ones that weren't already.
The machine/{configure,Makefile} files exist only to fan out to the
specific machine/$arch/ subdir. We already have all that same info
in the libc/ dir itself, so by moving the recursive configure and
make calls into it, we can cut off this logic entirely and save the
overhead.
For arches that don't have a machine subdir, it means they can skip
the logic entirely. Although there's prob not too many of those.