newlib-cygwin

Commit Graph

Author	SHA1	Message	Date
Victor L. Do Nascimento	5582536896	newlib: libc: strlen M-profile PACBTI-enablement Add function prologue/epilogue to conditionally add BTI landing pads and/or PAC code generation & authentication instructions depending on compilation flags. This patch enables PACBTI for all relevant variants of strlen: * Newlib for armv8.1-m.main+pacbti * Newlib for armv8.1-m.main+pacbti+mve * Newlib-nano	2023-01-10 14:16:05 +00:00
Victor L. Do Nascimento	ebd922e77a	newlib: libc: strcmp M-profile PACBTI-enablement Add function prologue/epilogue to conditionally add BTI landing pads and/or PAC code generation & authentication instructions depending on compilation flags. This patch enables PACBTI for all relevant variants of strcmp: * Newlib for armv8.1-m.main+pacbti * Newlib for armv8.1-m.main+pacbti+mve * Newlib-nano	2023-01-10 14:16:05 +00:00
Victor L. Do Nascimento	9d6b00511e	newlib: libc: define M-profile PACBTI-enablement macros Augment the arm_asm.h header file to simplify function prologues and epilogues whilst adding support for PACBTI enablement via macros for hand-written assembly functions. For PACBTI, both prologues/epilogues as well as cfi-related directives are automatically amended accordingly, depending on the compile-time mbranch-protection argument values. It defines the following preprocessor macros: * HAVE_PAC_LEAF: Indicates whether pac-signing has been requested for leaf functions. * PAC_LEAF_PUSH_IP: Whether leaf functions should push the pac code to the stack irrespective of whether the ip register is clobbered in the function or not. * STACK_ALIGN_ENFORCE: Whether a dummy register should be added to the push list as necessary in the prologue to ensure stack alignment preservation at the start of assembly function. The epilogue behavior is likewise affected by this flag, ensuring any pushed dummy registers also get popped on function return. It also defines the following assembler macros: * prologue: In addition to pushing any callee-saved registers onto the stack, it generates any requested pacbti instructions. Pushed registers are specified via the optional `first', `last', `push_ip' and `push_lr' macro argument parameters. when a single register number is provided, it pushes that register. When two register numbers are provided, they specify a rage to save. If push_ip and/or push_lr are non-zero, the respective registers are also saved. Stack alignment is requested via the `align` argument, which defaults to the value of STACK_ALIGN_ENFORCE, unless manually overridden. For example: prologue push_ip=1 -> push {ip} prologue push_ip=1, align8=1 -> push {r2, ip} prologue push_ip=1, push_lr=1 -> push {ip, lr} prologue 1 -> push {r1} prologue 1, align8=1 -> push {r0, r1} prologue 1 push_ip=1 -> push {r1, ip} prologue 1 4 -> push {r1-r4} prologue 1 4 push_ip=1 -> push {r1-r4, ip} * epilogue: pops registers off the stack and emits pac key signing instruction, if requested. The `first', `last', `push_ip', `push_lr' and `align' function as per the prologue macro, generating pop instead of push instructions. Stack alignment is enforced via the following helper macro call-chain: {prologue\|epilogue} ->_align8 -> _preprocess_reglist -> _preprocess_reglist1 -> {_prologue\|_epilogue} Finally, the necessary cfi directives for adding debug information to prologue and epilogue are generated via the following macros: * cfisavelist - prologue macro helper function, generating necessary .cfi_offset directives associated with push instruction. Therefore, the net effect of calling `prologue 1 2 push_ip=1' is to generate the following: push {r1-r2, ip} .cfi_adjust_cfa_offset 12 .cfi_offset 143, -4 .cfi_offset 2, -8 .cfi_offset 1, -12 * cfirestorelist - epilogue macro helper function, emitting .cfi_restore instructions prior to resetting the cfa offset. As such, calling `epilogue 1 2 push_ip=1' will produce: pop {r1-r2, ip} .cfi_register 143, 12 .cfi_restore 2 .cfi_restore 1 .cfi_def_cfa_offset 0	2023-01-10 14:16:05 +00:00
CompilerAI Research Group	ad3f9820b1	Fix memccpy to handle end char >= x80 - use unsigned char variables for optimized version of memccpy	2023-01-03 14:52:47 -05:00
Thomas Schwinge	5841b2f6a4	nvptx: Implement '_exit' instead of 'exit' ... so that all of 'exit', '_exit', '_Exit' work. 'exit' thus becomes the standard 'newlib/libc/stdlib/exit.c' -- and functions registered via 'atexit' are now called at return from 'main' or manual 'exit' invocation.	2022-12-22 12:52:15 +01:00
Mike Frysinger	0a7bf8fc4c	remove +x bit on source files These should never be marked executable as they have no shebang and are pure source files.	2022-12-21 22:38:57 -05:00
Victor L. Do Nascimento	57a08d6b9a	libc: arm: setjmp.S code cleanup The code for setjmp and longjmp contains unconditionally-disabled legacy FPA code. Given the code is not used by any targets, remove the code.	2022-12-19 11:22:11 +00:00
Giovanni Bajo	9bba9c2bdd	Fix a bug in setjmp for MIPS o32/o64 FPXX/FP64 It seems there is a swapped logic in one of the subcases of setjmp.S for MIPS: when the FPU registers are 64-bit within a 32-bit aligned jmp_buf, the code realigns the pointers before doing 64-bit writes, but the branch logic is swapped: we must avoid the address adjustement when bit 2 is zero (that is, the address is already 8-byte aligned). This always triggers an address error when run, as tested on a MIPS VR4300 with O64 ABI.	2022-12-19 10:38:05 +01:00
Victor L. Do Nascimento	15ad816ddd	libc: arm: fix setjmp abi non-conformance As per the arm Procedure Call Standard for the Arm Architecture section 6.1.2 [1], VFP registers s16-s31 (d8-d15, q4-q7) must be preserved across subroutine calls. The current setjmp/longjmp implementations preserve only the core registers, with the jump buffer size too small to store the required co-processor registers. In accordance with the C Library ABI for the Arm Architecture section 6.11 [2], this patch sets _JBTYPE to long long adjusting _JBLEN to 20. It also emits vfp load/store instructions depending on architectural support, predicated at compile time on ACLE feature-test macros. [1] https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst [2] https://github.com/ARM-software/abi-aa/blob/main/clibabi32/clibabi32.rst	2022-12-13 15:50:35 +00:00
Mike Frysinger	c8d5210337	newlib: info: tweak iconv node to avoid collisions We have "Iconv" and "iconv" nodes which generates Iconv.html and iconv.html files. On a case-insensitive filesystem, these collide. Rename the "Iconv" node to match the chapter name that it's already using to avoid the issue.	2022-12-13 05:22:09 -05:00
Corinna Vinschen	55de3fdd0e	Cygwin: define FILE as struct __sFILE64, not as __sFILE Until Cygwin 3.3.6, we define __LARGE64_FILES unconditionally, so we were using the type __sFILE64 even for 64 bit. That was lazy and wrong. so commit `2902b3a09e` ("Cygwin: drop requirement to build newlib's stdio64") tried to fix that. Unfortunately this patch forgot to take the exposure of the typename __sFILE64 in userspace into account. This leads to trouble in C++ due to name mangling. Commit `0f376ae220` tried to fix this by just renaming __sFILE to __sFILE64 by using a macro. While __sFILE and __sFILE64 are the same size, they are not exactly congruent. To avoid backward compatibility problems, make sure to define FILE as the real __sFILE64, and make sure that __sFILE is not defined at all on Cygwin. Fixes: `0f376ae220` ("Cygwin: rename __sFILE to __sFILE64 for backward compatibility") Fixes: `2902b3a09e` ("Cygwin: drop requirement to build newlib's stdio64") Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2022-12-08 17:16:20 +01:00
Alexey Lapshin	0b09753a3f	libc: fix fropen/fwopen compile warnings This patch fixes warnings that appears when compiling: #define fwopen(__cookie,__fn) funopen(__cookie, (int ()())0, __fn, (fpos_t ()())0, (int ()())0) Expands to: funopen(__null, (int ()())0, &app_printf, (fpos_t ()())0, (int ()())0) argument of type "int ()()" is incompatible with parameter of type "int ()(void __cookie, char __buf, int __n)"C/C++(167) invalid conversion from 'fpos_t ()()' {aka 'long int ()()'} to 'fpos_t ()(void, fpos_t, int)' {aka 'long int ()(void, Discussion is here: https://github.com/espressif/arduino-esp32/issues/7407	2022-12-07 13:10:24 +01:00
Corinna Vinschen	8d138c3f66	Cygwin: fix LC_CTYPE in global locale to be a real C.UTF-8 locale https://cygwin.com/pipermail/cygwin/2022-December/252571.html Cygwin's default locale is "C.UTF-8" as far as LC_CTYPE settings are concerned. However, while __global_locale contains fixed mbtowc and wctomb pointers, the lc_ctype_T pointer is still pointing to _C_ctype_locale, representing the standard "C" locale. The problem with this is that the codeset name as well as MB_CUR_MAX is wrong. Fix this by introducing a new lc_ctype_T structure called _C_utf8_ctype_locale, setting the default codeset to "UTF-8" and MB_CUR_MAX to 6. Use this as lc_ctype_T pointer in __global_locale by default on Cygwin. Fixes: `a6a477fa81` ("POSIX-1.2008 per-thread locales, groundwork part 1") Co-Authored-By: Takashi Yano <takashi.yano@nifty.ne.jp> Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2022-12-03 16:16:30 +01:00
Tobias Burnus	b7aca332ce	amdgcn: Use __builtin_gcn_ in libc/machine/amdgcn/getreent.c Call __builtin_gcn_get_stack_limit and __builtin_gcn_first_call_this_thread_p to reduce dependency on some register/layout assumptions by using the new GCC mainline (GCC 13) builtins, if they are available. If not, the existing code is used.	2022-11-22 18:05:34 -05:00
Tobias Burnus	b9898fc993	amdgcn: Replace asm("s8") by __builtin_gcn_kernarg_ptr if existing Check whether __builtin_gcn_kernarg_ptr is available and, if it is, call it instead using the hard-coded 'asm("s8")' in: * newlib/libc/machine/amdgcn/exit-value.h (exit_with_int) * newlib/libc/machine/amdgcn/mlock.c (sbrk) * newlib/libc/sys/amdgcn/write.c (write) newlib/libc/machine/amdgcn/exit-value.h \| 6 ++++++ newlib/libc/machine/amdgcn/mlock.c \| 10 +++++++--- newlib/libc/sys/amdgcn/write.c \| 4 ++++ 3 files changed, 17 insertions(+), 3 deletions(-)	2022-11-21 13:10:29 +01:00
Sebastian Huber	5c79aa4b22	powerpc/setjmp: Fix 64-bit buffer alignment The rlwinm is a word-size instruction which clears the remaining 32 bits of a 64-bit register. Use clrrdi in 64-bit configurations.	2022-11-10 16:05:17 +01:00
Thomas Schwinge	67459ce679	Generally make all 'long double complex' methods available in <complex.h> ..., not just '#if defined(__CYGWIN__)'. (Exception: 'clog10l' which currently indeed is for Cygwin only.) This completes 2017-07-05 commit `be3ca39474` "Fixed warnings for some long double complex methods" after Aditya Upadhyay's work on importing "Long double complex methods" from NetBSD. For example, this changes GCC/nvptx libgfortran 'configure' output as follows: [...] checking for ccosf... yes checking for ccos... yes checking for ccosl... [-no-]{+yes+} [...] ..., and correspondingly GCC/nvptx 'nvptx-none/libgfortran/config.h' as follows: [...] /* Define to 1 if you have the `ccosl' function. / -/ #undef HAVE_CCOSL */ +#define HAVE_CCOSL 1 [...] Similarly for 'ccoshl', 'cexpl', 'cpowl', 'csinl', 'csinhl', 'ctanl', 'ctanhl', 'cacoshl', 'cacosl', 'casinhl', 'catanhl'. ('conjl', 'cprojl' are not currently being used in libgfortran.) This in turn simplifies GCC/nvptx 'libgfortran/intrinsics/c99_functions.c' compilation such that this files doesn't have to provide its own "Implementation of various C99 functions" for those, when in fact they're available in newlib libm.	2022-11-08 21:38:08 +01:00
Corinna Vinschen	a8526cb52b	strftime/wcsftime: use STRLEN, not strlen Commit `737e2004a3` accidentally introduced a call to strlen in code used with wide character strings in case of wcsftime. Use STRLEN instead. Fixes: `737e2004a3` ("strftime.c(__strftime): add %q, %v, tests; tweak %Z doc") Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2022-10-29 20:15:58 +02:00
Sebastian Huber	a89d3a89c3	powerpc/setjmp: Fix 64-bit support The first attempt to support the 64-bit mode had two bugs: 1. The saved general-purpose register 31 value was overwritten with the saved link register value. 2. The link register was saved and restored using 32-bit instructions. Use 64-bit store/load instructions to save/restore the link register. Make sure that the general-purpose register 31 and the link register storage areas do not overlap.	2022-10-28 12:53:42 +02:00
Brian Inglis	737e2004a3	strftime.c(__strftime): add %q, %v, tests; tweak %Z doc %q GNU quarter year 1-4 %v BSD/OSX/Ruby VMS/Oracle %e-%b-%Y %Z change time zone name to abbreviation	2022-10-25 12:15:40 +02:00
Brian Inglis	d6a26e542d	strptime.c(strptime_l): add %q GNU quarter	2022-10-24 14:07:23 +02:00
Markus B. Moessner	01f6251c09	Fix missing extern C statement	2022-09-26 13:44:21 -04:00
Sebastian Huber	d9dc88048a	powerpc/setjmp: Add 64-bit support Use 64-bit store/load instructions to save/restore the general-purpose registers.	2022-09-24 08:39:29 +02:00
Jeff Johnston	5230eb7f8c	Implement sysconf for Arm - add support for using sysconf to get page size in _mallocr.c via HAVE_SYSCONF_PAGESIZE flag set in configure.host - set flag in configure.host for arm and add a default sysconf implementation in libc/sys/arm that returns the page size - the default implementation can be overridden outside newlib to allow a different page size to improve malloc on devices with a small footprint without needing to rebuild newlib - this patch is based on a contribution from Torbjorn Svensson and Niklas Dahlquist (https://ecos.sourceware.org/ml/newlib/current/017616.html)	2022-09-19 15:35:55 -04:00
tb	eb5c631ead	upstream OpenBSD: arc4random: fix indent	2022-09-10 21:00:38 +02:00
djm	52a410f9bd	upstream OpenBSD: arc4random: Randomise the rekey interval a little. Previously, the chacha20 instance would be rekeyed every 1.6MB. This makes it happen at a random point somewhere in the 1-2MB range. Feedback deraadt@ visa@, ok tb@ visa@ newlib port: Make REKEY_BASE depend on SIZE_MAX Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2022-09-10 20:59:01 +02:00
dtucker	f5fece2838	upstream OpenBSD: arc4random: Remove unused ivbits argument from chacha_keysetup to match other instances in the tree. ok deraadt@	2022-09-10 20:58:03 +02:00
deraadt	db5e07368c	upstream OpenBSD: arc4random: replace abort() with _exit() In the incredibly unbelievable circumstance where _rs_init() fails to allocate pages, don't call abort() because of corefile data leakage concerns, but simply _exit(). The reasoning is _rs_init() will only fail if someone finds a way to apply specific pressure against this failure point, for the purpose of leaking information into a core which they can read. We don't need a corefile in this instance to debug that. So take this "lever" away from whoever in the future wants to do that.	2022-09-10 20:58:03 +02:00
Corinna Vinschen	dd22053fee	upstream OpenBSD: arc4random: bump file versions This hides a patch not required in newlib	2022-09-10 20:58:01 +02:00
bcook	ef76759d7f	upstream OpenBSD: arc4random: Add support for building arc4random with MSVC. By default, MSVC's stdlib.h defines min(), so we need to spell out something less common to avoid picking it up. ok deraadt@ beck@ miod@	2022-09-10 20:56:25 +02:00
Torbjörn SVENSSON	a68e99f883	Don't allocate another header when merging chunks In the nano version of malloc, when the last chunk is to be extended, there is no need to acount for the header again as it's already taken into account in the overall "alloc_size" at the beginning of the function. Contributed by STMicroelectronics Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>	2022-09-01 15:39:10 -04:00
Torbjörn SVENSSON	0455ea28ce	Used chunk needs to be removed from free_list When using nano malloc and the remaning heap space is not big enough to fullfill the allocation, malloc will attempt to merge the last chunk in the free list with a new allocation in order to create a bigger chunk. This is successful, but the chunk still remains in the free_list, so any later call to malloc can give out the same region without it first being freed. Possible sequence to verify: void p1 = malloc(3000); void p2 = malloc(4000); void p3 = malloc(5000); void p4 = malloc(6000); void p5 = malloc(7000); free(p2); free(p4); void p6 = malloc(35000); free(p6); void p7 = malloc(42000); void p8 = malloc(32000); Without the change, p7 and p8 points to the same address. Requirement, after malloc(35000), there is less than 42000 bytes available on the heap. Contributed by STMicroelectronics Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>	2022-09-01 14:40:27 -04:00
Jeff Johnston	d92d3a3c4a	Fix some Coverity Scan errors.	2022-08-31 15:18:08 -04:00
Torbjörn SVENSSON	dd1122e21c	Restore _lock initialization in non-single threaded mode When __SINGLE_THREAD__ is not defined, stdin, stdout and stderr needs to have their _lock instance initialized. The __sfp() method is not invoked for the 3 mentioned fds thus, the std() method needs to handle the initialization of the lock. This is more or less a revert of `382550072b` Contributed by STMicroelectronics Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>	2022-08-31 10:02:53 +02:00
Yilin Sun via Newlib	b7109cf82e	SH: Do not build syscalls if option provided This patch makes syscalls for SH architecture respecting the global option "--disable-newlib-supplied-syscalls". This is useful when a bare-metal toolchain is needed. Signed-off-by: Yilin Sun <imi415@imi.moe>	2022-08-15 15:12:19 -04:00
Corinna Vinschen	85be74f295	newlocale: fix crash when trying to write to __C_locale This simple testcase: locale_t st = newlocale(LC_ALL_MASK, "C", (locale_t)0); locale_t st2 = newlocale(LC_CTYPE_MASK, "en_US.UTF-8", st); is sufficient to reproduce a crash in _newlocale_r. After the first call to newlocale, `st' points to __C_locale, which is const. When using `st' as locale base in the second call, _newlocale_r tries to set pointers inside base to NULL. This is bad if base is __C_locale, obviously. Add a test to avoid trying to overwrite pointer values inside base if base is __C_locale. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2022-08-12 12:29:26 +02:00
Corinna Vinschen	2902b3a09e	Cygwin: drop requirement to build newlib's stdio64 Given that 64 bit Cygwin defines all file access types (off_t, fpos_t, and derived types) as 64 bit anyway, there's no reason left to rely on the stdio64 part of newlib. Use base functions and base types. Signed-off-by: Corinna Vinschen <corinna@vinschen.de>	2022-08-03 13:41:35 +02:00
Matt Joyce	ea99f21ce6	Add --enable-newlib-reent-thread-local option By default, Newlib uses a huge object of type struct _reent to store thread-specific data. This object is returned by __getreent() if the __DYNAMIC_REENT__ Newlib configuration option is defined. The reentrancy structure contains for example errno and the standard input, output, and error file streams. This means that if an application only uses errno it has a dependency on the file stream support even if it does not use it. This is an issue for lower end targets and applications which need to qualify the software according to safety standards (for example ECSS-E-ST-40C, ECSS-Q-ST-80C, IEC 61508, ISO 26262, DO-178, DO-330, DO-333). If the new _REENT_THREAD_LOCAL configuration option is enabled, then struct _reent is replaced by dedicated thread-local objects for each struct _reent member. The thread-local objects are defined in translation units which use the corresponding object.	2022-07-13 06:55:46 +02:00
Matt Joyce	1a09082036	Add _REENT_IS_NULL() In a follow up patch, struct _reent is optionally replaced by dedicated thread-local objects. In this case,_REENT is optionally defined to NULL. Add the _REENT_IS_NULL() macro to disable this check on demand.	2022-07-13 06:55:46 +02:00
Matt Joyce	db2123caf8	Add _REENT_SIG_FUNC(ptr) Add a _REENT_SIG_FUNC() macro to encapsulate access to the _sig_func member of struct reent. This will help to replace the struct member with a thread-local storage object in a follow up patch.	2022-07-13 06:55:46 +02:00
Matt Joyce	81352a9df9	Add _REENT_CVTBUF(ptr) Add a _REENT_CVTBUF() macro to encapsulate access to the _cvtbuf member of struct reent. This will help to replace the struct member with a thread-local storage object in a follow up patch.	2022-07-13 06:55:46 +02:00
Matt Joyce	315c420e1b	Add _REENT_CVTLEN(ptr) Add a _REENT_CVTLEN() macro to encapsulate access to the _cvtlen member of struct reent. This will help to replace the struct member with a thread-local storage object in a follow-up patch.	2022-07-13 06:55:46 +02:00
Matt Joyce	f89ce35d83	Add _REENT_CLEANUP(ptr) Add a _REENT_CLEANUP() macro to encapsulate access to the __cleanup member of struct reent. This will help to replace the struct member with a thread-local storage object in a follow up patch.	2022-07-13 06:55:46 +02:00
Matt Joyce	50f078b48c	Add _REENT_LOCALE(ptr) Add a _REENT_LOCALE() macro to encapsulate access to the _locale member of struct reent. This will help to replace the struct member with a thread-local storage object in a follow up patch.	2022-07-13 06:55:46 +02:00
Matt Joyce	0985d418cb	Add _REENT_INC(ptr) Add a _REENT_INC() macro to encapsulate access to the _inc member of struct reent. This will help to replace the struct member with a thread-local storage object in a follow up patch.	2022-07-13 06:55:46 +02:00
Matt Joyce	e56801f34d	Add _REENT_STDERR(ptr) Add a _REENT_STDERR() macro to encapsulate access to the _stderr member of struct reent. This will help to replace the struct member with a thread-local storage object in a follow up patch.	2022-07-13 06:55:46 +02:00
Matt Joyce	3266a46327	Add _REENT_STDOUT(ptr) Add a _REENT_STDOUT() macro to encapsulate access to the _stdout member of struct reent. This will help to replace the struct member with a thread-local storage object in a follow up patch.	2022-07-13 06:55:46 +02:00
Matt Joyce	627a5cb413	Add _REENT_STDIN(ptr) Add a _REENT_STDIN() macro to encapsulate access to the _stdin member of struct reent. This will help to replace the struct member with a thread-local storage object in a follow up patch.	2022-07-13 06:55:46 +02:00
Matt Joyce	f3b8138239	Add _REENT_ERRNO(ptr) Add a _REENT_ERRNO() macro to encapsulate the access to the _errno member of struct reent. This will help to replace the structure member with a thread-local storage object in a follow up patch. Replace uses of __errno_r() with _REENT_ERRNO(). Keep __errno_r() macro for potential users outside of Newlib.	2022-07-13 06:55:41 +02:00
Matt Joyce	d0d78e96eb	Define _REENT_EMERGENCY(ptr) only once Use this macro to access the _emergency member of struct _reent. This macro will help to replace the _emergency member of struct _reent with a thread-local storage object in a follow up patch.	2022-07-13 06:50:25 +02:00
Sebastian Huber	a3fe1ed573	Move content in <sys/reent.h> Move definitions not directly related to struct _reent to the bottom of the file. This allows a contiguous #ifndef _REENT_THREAD_LOCAL_STORAGE block.	2022-07-13 06:50:25 +02:00
Sebastian Huber	1db7cf5ce6	RTEMS: Add README	2022-07-11 13:19:29 +02:00
Gleb Smirnoff	c1abc93988	libc/syslog: fully deprecate and don't try to open "/dev/log" The "/dev/log" socket existed in pre-FreeBSD times. Later it was substituted to a compatibility symlink. The symlink creation was deprecated in FreeBSD 10.2 and 9-STABLE. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35304	2022-07-11 13:19:29 +02:00
Konrad Sewiłło-Jopek	cf2ba7d7f8	arp: Implement sticky ARP mode for interfaces. Provide sticky ARP flag for network interface which marks it as the "sticky" one similarly to what we have for bridges. Once interface is marked sticky, any address resolved using the ARP will be saved as a static one in the ARP table. Such functionality may be used to prevent ARP spoofing or to decrease latencies in Ethernet networks. The drawbacks include potential limitations in usage of ARP-based load-balancers and high-availability solutions such as carp(4). The implemented option is disabled by default, therefore should not impact the default behaviour of the networking stack. Sponsored by: Conclusive Engineering sp. z o.o. Reviewed By: melifaro, pauamma_gundo.com Differential Revision: https://reviews.freebsd.org/D35314 MFC after: 2 weeks	2022-07-11 13:19:29 +02:00
Alan Somers	27dfb5f33f	Correctly measure system load averages > 1024 The old fixed-point arithmetic used for calculating load averages had an overflow at 1024. So on systems with extremely high load, the observed load average would actually fall back to 0 and shoot up again, creating a kind of sawtooth graph. Fix this by using 64-bit math internally, while still reporting the load average to userspace as a 32-bit number. Sponsored by: Axcient Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D35134	2022-07-11 13:19:29 +02:00
Konstantin Belousov	0ed668df2c	Add ifcap2 names for RXTLS4 and RXTLS6 interface capabilities and corresponding nvlist capabilities name strings. Reviewed by: hselasky, jhb, kp (previous version) Sponsored by: NVIDIA Networking MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D32551	2022-07-11 13:19:29 +02:00
Konstantin Belousov	361bd82a1f	Kernel-side infrastructure to implement nvlist-based set/get ifcaps Reviewed by: hselasky, jhb, kp (previous version) Sponsored by: NVIDIA Networking MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D32551	2022-07-11 13:19:29 +02:00
Richard Scheffenegger	aeced2f48a	tcp: LRO code to deal with all 12 TCP header flags TCP per RFC793 has 4 reserved flag bits for future use. One of those bits may be used for Accurate ECN. This patch is to include these bits in the LRO code to ease the extensibility if/when these bits are used. Reviewed By: hselasky, rrs, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34127	2022-07-11 13:19:29 +02:00
Mike Karels	a9a87c1921	kernel: deprecate Internet Class A/B/C Hide historical Class A/B/C macros unless IN_HISTORICAL_NETS is defined; define it for user level. Define IN_MULTICAST separately from IN_CLASSD, and use it in pf instead of IN_CLASSD. Stop using class for setting default masks when not specified; instead, define new default mask (24 bits). Warn when an Internet address is set without a mask. MFC after: 1 month Reviewed by: cy Differential Revision: https://reviews.freebsd.org/D32708	2022-07-11 13:19:29 +02:00
Peter Lei	73784208e3	tcp: socket option to get stack alias name TCP stack sysctl nodes are currently inserted using the stack name alias. Allow the user to get the current stack's alias to allow for programatic sysctl access. Obtained from: Netflix	2022-07-11 13:19:29 +02:00
Randall Stewart	0464f26db0	tcp: Add hystart-plus to cc_newreno and rack. TCP Hystart draft version -03: https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus Is a new version of hystart that allows one to carefully exit slow start if the RTT spikes too much. The newer version has a slower-slow-start so to speak that then kicks in for five round trips. To see if you exited too early, if not into congestion avoidance. This commit will add that feature to our newreno CC and add the needed bits in rack to be able to enable it. Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32373	2022-07-11 13:19:29 +02:00
Randall Stewart	57703f72c8	tcp: Add support for DSACK based reordering window to rack. The rack stack, with respect to the rack bits in it, was originally built based on an early I-D of rack. In fact at that time the TLP bits were in a separate I-D. The dynamic reordering window based on DSACK events was not present in rack at that time. It is now part of the RFC and we need to update our stack to include these features. However we want to have a way to control the feature so that we can, if the admin decides, make it stay the same way system wide as well as via socket option. The new sysctl and socket option has the following meaning for setting: 00 (0) - Keep the old way, i.e. reordering window is 1 and do not use DSACK bytes to add to reorder window 01 (1) - Change the Reordering window to 1/4 of an RTT but do not use DSACK bytes to add to reorder window 10 (2) - Keep the reordering window as 1, but do use SACK bytes to add additional 1/4 RTT delay to the reorder window 11 (3) - reordering window is 1/4 of an RTT and add additional DSACK bytes to increase the reordering window (RFC behavior) The default currently in the sysctl is 3 so we get standards based behavior. Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D31506	2022-07-11 13:19:29 +02:00
Andrew Gallatin	4bf5c259d3	tsleep: Add a PNOLOCK flag Add a PNOLOCK flag so that, in the race circumstance where wakeup races are externally mitigated, tsleep() can be called with a sleep time of 0 without triggering an an assertion. Reviewed by: jhb Sponsored by: Netflix	2022-07-11 13:19:29 +02:00
Roy Marples	356891f5e0	socket: Implement SO_RERROR SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports. Reviewed by: philip (network), kbowling (transport), gbe (manpages) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26652	2022-07-11 13:19:29 +02:00
Kristof Provost	5260d10c98	pf: syncookie support Import OpenBSD's syncookie support for pf. This feature help pf resist TCP SYN floods by only creating states once the remote host completes the TCP handshake rather than when the initial SYN packet is received. This is accomplished by using the initial sequence numbers to encode a cookie (hence the name) in the SYN+ACK response and verifying this on receipt of the client ACK. Reviewed by: kbowling Obtained from: OpenBSD MFC after: 1 week Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D31138	2022-07-11 13:19:29 +02:00
Randall Stewart	b89c5a3e88	tcp: Add a socket option to rack so we can test various changes to the slop value in timers. Timer_slop, in TCP, has been 200ms for a long time. This value dates back a long time when delayed ack timers were longer and links were slower. A 200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that lowering this value to something more in line with todays delayed ack values (40ms) might improve TCP. This bit of code makes it so rack can, via a socket option, adjust the timer slop. Reviewed by: mtuexen Sponsered by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30249	2022-07-11 13:19:29 +02:00
Richard Scheffenegger	d4971b6464	tcp: SACK Lost Retransmission Detection (LRD) Recover from excessive losses without reverting to a retransmission timeout (RTO). Disabled by default, enable with sysctl net.inet.tcp.do_lrd=1 Reviewed By: #transport, rrs, tuexen, #manpages Sponsored by: Netapp, Inc. Differential Revision: https://reviews.freebsd.org/D28931	2022-07-11 13:19:29 +02:00
Randall Stewart	a00ca7bd54	This brings into sync FreeBSD with the netflix versions of rack and bbr. This fixes several breakages (panics) since the tcp_lro code was committed that have been reported. Quite a few new features are now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the largest). There is also support for ack-war prevention. Documents comming soon on rack.. Sponsored by: Netflix Reviewed by: rscheff, mtuexen Differential Revision: https://reviews.freebsd.org/D30036	2022-07-11 13:19:29 +02:00
John Baldwin	8424d5c949	Use thunks for compat ioctls using struct ifgroupreq. Reviewed by: brooks, kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D29893	2022-07-11 13:19:29 +02:00
Konstantin Belousov	19a627f3a4	ioccom: define ioctl cmd value that can never be valid Its use is for cases where some filler is needed for cmd, or we need an indication that there were no cmd supplied, and so on. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29935	2022-07-11 13:19:29 +02:00
Thomas Munro	363527bb03	poll(2): Add POLLRDHUP. Teach poll(2) to support Linux-style POLLRDHUP events for sockets, if requested. Triggered when the remote peer shuts down writing or closes its end. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D29757	2022-07-11 13:19:29 +02:00
Michael Tuexen	85140fb378	tcp: add support for TCP over UDP Adding support for TCP over UDP allows communication with TCP stacks which can be implemented in userspace without requiring special priviledges or specific support by the OS. This is joint work with rrs. Reviewed by: rrs Sponsored by: Netflix, Inc. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29469	2022-07-11 13:19:29 +02:00
Bjoern A. Zeeb	defb5ffed4	termios: add more speeds A lot of small arm64 gadgets are using 1500000 as console speed. While cu can perfectly deal with this some 3rd party software, e.g., comms/conserver-con add speeds based on B<n> being defined. Having it defined here simplifies enhancing other software. Obtained-from: NetBSD sys/sys/termios.h 1.36 MFC-after: 2 weeks Reviewed-by: philip (,okayed by imp) Differential Revision: https://reviews.freebsd.org/D29209	2022-07-11 13:19:29 +02:00
Alexander V. Chernikov	3be97ff62c	Revert "SO_RERROR indicates that receive buffer overflows" Wrong version of the change was pushed inadvertenly. This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.	2022-07-11 13:19:29 +02:00
Alexander V. Chernikov	2ba2e1e052	SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.	2022-07-11 13:19:29 +02:00
Alex Richardson	8054ce555f	Expose clang's alignment builtins and use them for roundup2/rounddown2 This makes roundup2/rounddown2 type- and const-preserving and allows using it on pointer types without casting to uintptr_t first. Not performing pointer-to-integer conversions also helps the compiler's optimization passes and can therefore result in better code generation. When using it with integer values there should be no change other than the compiler checking that the alignment value is a valid power-of-two. I originally implemented these builtins for CHERI a few years ago and they have been very useful for CheriBSD. However, they are also useful for non-CHERI code so I was able to upstream them for Clang 10.0. Rationale from the clang documentation: Clang provides builtins to support checking and adjusting alignment of pointers and integers. These builtins can be used to avoid relying on implementation-defined behavior of arithmetic on integers derived from pointers. Additionally, these builtins retain type information and, unlike bitwise arithmetic, they can perform semantic checking on the alignment value. There is also a feature request for GCC, so GCC may also support it in the future: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98641 Reviewed By: brooks, jhb, imp Differential Revision: https://reviews.freebsd.org/D28332	2022-07-11 13:19:29 +02:00
Gleb Smirnoff	5bc5689a6a	Catch up with 6edfd179c86: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG. Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was extended to array of such pointers. Then, such mbufs were augmented with header/trailer. Basically, extended mbufs are extended, and set of features is subject to change. The new name should be generic enough to avoid further renaming.	2022-07-11 11:52:46 +02:00
Konstantin Belousov	581bde91a5	Add tcgetwinsize(3) and tcsetwinsize(3) to termios These functions get/set tty winsize respectively, and are trivial wrappers around corresponding termio ioctls. The functions are expected to be a part of POSIX.1 issue 8: https://www.austingroupbugs.net/view.php?id=1151#c3856. They are currently available in NetBSD and in musl libc. PR: 251868 Submitted by: Soumendra Ganguly <soumendraganguly@gmail.com> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27650	2022-07-11 11:52:46 +02:00
Andrew Gallatin	c76896074b	Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO_REUSEPORT_LB, a number of workers can share a listen socket. However, even if a worker sets affinity to a core or set of cores on a NUMA domain, it will receive connections associated with all NUMA domains in the system. This will lead to cross-domain traffic when the server writes to the socket or calls sendfile(), and memory is allocated on the server's local NUMA node, but transmitted on the NUMA node associated with the TCP connection. Similarly, when the server reads from the socket, he will likely be reading memory allocated on the NUMA domain associated with the TCP connection. This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A server can now tell the kernel to filter traffic so that only incoming connections associated with the desired NUMA domain are given to the server. (Of course, in the case where there are no servers sharing the listen socket on some domain, then as a fallback, traffic will be hashed as normal to all servers sharing the listen socket regardless of domain). This allows a server to deal only with traffic that is local to its NUMA domain, and avoids cross-domain traffic in most cases. This patch, and a corresponding small patch to nginx to use TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted https media content from dual-socket Xeons with only 13% (as measured by pcm.x) cross domain traffic on the memory controller. Reviewed by: jhb, bz (earlier version), bcr (man page) Tested by: gonzo Sponsored by: Netfix Differential Revision: https://reviews.freebsd.org/D21636	2022-07-11 11:52:46 +02:00
Brooks Davis	70b6efc47d	style(9): Correct whitespace in struct definitions struct ifconf and struct ifreq use the odd style "struct<tab>foo". struct ifdrv seems to have tried to follow this but was committed with spaces in place of most tabs resulting in "struct<space><space>ifdrv". MFC after: 3 days	2022-07-11 11:52:46 +02:00
Conrad Meyer	3f7425e8bb	unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI As this ABI is still fresh (r367287), let's correct some mistakes now: - Version the structure to allow for future changes - Include sender's pid in control message structure - Use a distinct control message type from the cmsgcred / sockcred mess Discussed with: kib, markj, trasz Differential Revision: https://reviews.freebsd.org/D27084	2022-07-11 11:52:46 +02:00
Conrad Meyer	55dec604f8	unix(4): Add SOL_LOCAL:LOCAL_CREDS_PERSISTENT This option is intended to be semantically identical to Linux's SOL_SOCKET:SO_PASSCRED. For now, it is mutually exclusive with the pre-existing sockopt SOL_LOCAL:LOCAL_CREDS. Reviewed by: markj (penultimate version) Differential Revision: https://reviews.freebsd.org/D27011	2022-07-11 11:52:46 +02:00
Warner Losh	1cb590ab48	Integrate 4.4BSD-Lite2 changes to IOC_* definitions Bring in the long-overdue 4.4BSD-Lite2 rev 8.3 by cgd of sys/ioccom.h. This uses UL suffix for the IOC_* constants so they don't sign extend. Also bring in the handy diagram from NetBSD's version of this file. This alters the 4.4BSD-Lite2 code slightly in a way that's semantically the same but more compact. This should stop the warnings from Chrome for bogus sign extension. Reviewed by: kib@, jhb@ Differential Revision: https://reviews.freebsd.org/D26423	2022-07-11 11:52:46 +02:00
John Baldwin	5ea36d92e6	Support hardware rate limiting (pacing) with TLS offload. - Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but adds a separate structure when allocating the new tag type). - When allocating a send tag for TLS offload, check to see if the connection already has a pacing rate. If so, allocate a tag that supports both rate limiting and TLS offload rather than a plain TLS offload tag. - When setting an initial rate on an existing ifnet KTLS connection, set the rate in the TCP control block inp and then reset the TLS send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit send tag. This allocates the TLS send tag asynchronously from a task queue, so the TLS rate limit tag alloc is always sleepable. - When modifying a rate on a connection using KTLS, look for a TLS send tag. If the send tag is only a plain TLS send tag, assume we failed to allocate a TLS ratelimit tag (either during the TCP_TXTLS_ENABLE socket option, or during the send tag reset triggered by ktls_output_eagain) and ignore the new rate. If the send tag is a ratelimit TLS send tag, change the rate on the TLS tag and leave the inp tag alone. - Lock the inp lock when setting sb_tls_info for a socket send buffer so that the routines in tcp_ratelimit can safely dereference the pointer without needing to grab the socket buffer lock. - Add an IFCAP_TXTLS_RTLMT capability flag and associated administrative controls in ifconfig(8). TLS rate limit tags are only allocated if this capability is enabled. Note that TLS offload (whether unlimited or rate limited) always requires IFCAP_TXTLS[46]. Reviewed by: gallatin, hselasky Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26691	2022-07-11 11:52:46 +02:00
Andrey V. Elsukov	b8e36b9251	Implement SIOCGIFALIAS. It is lightweight way to check if an IPv4 address exists. Submitted by: Roy Marples Reviewed by: gnn, melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26636	2022-07-11 11:52:46 +02:00
Richard Scheffenegger	3f0cc70c13	Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow. This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7. Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409	2022-07-11 11:52:46 +02:00
Konstantin Belousov	ec997fae0e	Fix typo. Sponsored by: Mellanox Technologies/NVIDIA Networking MFC after: 3 days	2022-07-11 11:52:46 +02:00
Alexander V. Chernikov	48ba673ce9	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449	2022-07-11 11:52:46 +02:00
Ed Maste	9dd91a8330	add SIOCGIFDATA ioctl For interfaces that do not support SIOCGIFMEDIA (for which there are quite a few) the only fallback is to query the interface for if_data->ifi_link_state. While it's possible to get at if_data for an interface via getifaddrs(3) or sysctl, both are heavy weight mechanisms. SIOCGIFDATA is a simple ioctl to retrieve this fast with very little resource use in comparison. This implementation mirrors that of other similar ioctls in FreeBSD. Submitted by: Roy Marples <roy@marples.name> Reviewed by: markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D26538	2022-07-11 11:52:46 +02:00
Richard Scheffenegger	7b30b9f648	TCP: send full initial window when timestamps are in use The fastpath in tcp_output tries to send out full segments, and avoid sending partial segments by comparing against the static t_maxseg variable. That value does not consider tcp options like timestamps, while the initial window calculation is using the correct dynamic tcp_maxseg() function. Due to this interaction, the last, full size segment is considered too short and not sent out immediately. Reviewed by: tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26478	2022-07-11 11:52:46 +02:00
Navdeep Parhar	43e76bafcd	Add two new ifnet capabilities for hw checksumming and TSO for VXLAN traffic. These are similar to the existing VLAN capabilities. Reviewed by: kib@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873	2022-07-11 11:52:46 +02:00
Konstantin Belousov	1306ff4c92	Support for userspace non-transparent superpages (largepages). Created with shm_open2(SHM_LARGEPAGE) and then configured with FIOSSHMLPGCNF ioctl, largepages posix shared memory objects guarantee that all userspace mappings of it are served by superpage non-managed mappings. Only amd64 for now, both 2M and 1G superpages can be requested, the later requires CPU feature. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652	2022-07-11 11:52:46 +02:00
Mark Johnston	1a5f14a0c5	Include the psind in data returned by mincore(2). Currently we use a single bit to indicate whether the virtual page is part of a superpage. To support a forthcoming implementation of non-transparent 1GB superpages, it is useful to provide more detailed information about large page sizes. The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind) values, indicating a mapping of size psind, where psind is an index into the pagesizes array returned by getpagesizes(3), which in turn comes from the hw.pagesizes sysctl. MINCORE_PSIND(1) is equal to the old value of MINCORE_SUPER. For now, two bits are used to record the page size, permitting values of MAXPAGESIZES up to 4. Reviewed by: alc, kib Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26238	2022-07-11 11:52:46 +02:00
Mateusz Guzik	d066d123f1	sys: clean up empty lines in .c and .h files	2022-07-11 11:52:46 +02:00
Mateusz Guzik	27fc846731	net: clean up empty lines in .c and .h files	2022-07-11 11:52:46 +02:00
Konstantin Belousov	941cda2c16	Add SOL_LOCAL symbolic constant for unix socket option level. The constant seems to exists on MacOS X >= 10.8. Requested by: swills Reviewed by: allanjude, kevans Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D25933	2022-07-11 11:52:46 +02:00
Kyle Evans	c95c267a46	shm_open2: Implement SHM_GROW_ON_WRITE Lack of SHM_GROW_ON_WRITE is actively breaking Python's memfd_create tests, so go ahead and implement it. A future change will make memfd_create always set SHM_GROW_ON_WRITE, to match Linux behavior and unbreak Python's tests on -CURRENT. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D25502	2022-07-11 11:52:46 +02:00
Wei Hu	1a840361e8	HyperV socket implementation for FreeBSD This change adds Hyper-V socket feature in FreeBSD. New socket address family AF_HYPERV and its kernel support are added. Submitted by: Wei Hu <weh@microsoft.com> Reviewed by: Dexuan Cui <decui@microsoft.com> Relnotes: yes Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D24061	2022-07-11 11:52:46 +02:00
John Baldwin	7293d1e7b6	Initial support for kernel offload of TLS receive. - Add a new TCP_RXTLS_ENABLE socket option to set the encryption and authentication algorithms and keys as well as the initial sequence number. - When reading from a socket using KTLS receive, applications must use recvmsg(). Each successful call to recvmsg() will return a single TLS record. A new TCP control message, TLS_GET_RECORD, will contain the TLS record header of the decrypted record. The regular message buffer passed to recvmsg() will receive the decrypted payload. This is similar to the interface used by Linux's KTLS RX except that Linux does not return the full TLS header in the control message. - Add plumbing to the TOE KTLS interface to request either transmit or receive KTLS sessions. - When a socket is using receive KTLS, redirect reads from soreceive_stream() into soreceive_generic(). - Note that this interface is currently only defined for TLS 1.1 and 1.2, though I believe we will be able to reuse the same interface and structures for 1.3.	2022-07-11 11:52:46 +02:00
Randall Stewart	1da65b8919	This change does a small prepratory step in getting the latest rack and bbr in from the NF repo. When those come in the OOB data handling will be fixed where Skyzaller crashes. Differential Revision: https://reviews.freebsd.org/D24575	2022-07-11 11:52:46 +02:00
Alexander V. Chernikov	b948693357	Convert route caching to nexthop caching. This change is build on top of nexthop objects introduced in r359823. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms. Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn. Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same. Differential Revision: https://reviews.freebsd.org/D24340	2022-07-11 11:52:46 +02:00
Jonathan T. Looney	09e5cb57a0	Make the path length of UNIX domain sockets specified by a #define. Also, add a comment describing the historical context for this length. Reviewed by: bz, jhb, kbowling (previous version) MFC after: 2 weeks Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24272	2022-07-11 11:52:46 +02:00
Alexander V. Chernikov	86484e84d7	Introduce nexthop objects and new routing KPI. This is the foundational change for the routing subsytem rearchitecture. More details and goals are available in https://reviews.freebsd.org/D24141 . This patch introduces concept of nexthop objects and new nexthop-based routing KPI. Nexthops are objects, containing all necessary information for performing the packet output decision. Output interface, mtu, flags, gw address goes there. For most of the cases, these objects will serve the same role as the struct rtentry is currently serving. Typically there will be low tens of such objects for the router even with multiple BGP full-views, as these objects will be shared between routing entries. This allows to store more information in the nexthop. New KPI: struct nhop_object fib4_lookup(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, uint32_t flowid); struct nhop_object fib6_lookup(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, uint32_t flowid); These 2 function are intended to replace all all flavours of <in_\|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous fib[46]-generation functions. Upon successful lookup, they return nexthop object which is guaranteed to exist within current NET_EPOCH. If longer lifetime is desired, one can specify NHR_REF as a flag and get a referenced version of the nexthop. Reference semantic closely resembles rtentry one, allowing sed-style conversion. Additionally, another 2 functions are introduced to support uRPF functionality inside variety of our firewalls. Their primary goal is to hide the multipath implementation details inside the routing subsystem, greatly simplifying firewalls implementation: int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); All functions have a separate scopeid argument, paving way to eliminating IPv6 scope embedding and allowing to support IPv4 link-locals in the future. Structure changes: * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size. * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz. Old KPI: During the transition state old and new KPI will coexists. As there are another 4-5 decent-sized conversion patches, it will probably take a couple of weeks. To support both KPIs, fields not required by the new KPI (most of rtentry) has to be kept, resulting in the temporary size increase. Once conversion is finished, rtentry will notably shrink. More details: * architectural overview: https://reviews.freebsd.org/D24141 * list of the next changes: https://reviews.freebsd.org/D24232 Reviewed by: ae,glebius(initial version) Differential Revision: https://reviews.freebsd.org/D24232	2022-07-11 11:52:46 +02:00
Gleb Smirnoff	f3303cf1d5	Although most of the NIC drivers are epoch ready, due to peer pressure switch over to opt-in instead of opt-out for epoch. Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch when processing its packets. Now this will create recursive entrance in epoch in >90% network drivers, but will guarantee safeness of the transition. Mark several tested drivers as IFF_KNOWSEPOCH. Reviewed by: hselasky, jeff, bz, gallatin Differential Revision: https://reviews.freebsd.org/D23674	2022-07-11 11:52:46 +02:00
Randall Stewart	0dfcaa0211	White space cleanup -- remove trailing tab's or spaces from any line. Sponsored by: Netflix Inc.	2022-07-11 11:52:46 +02:00
Gleb Smirnoff	301991542a	Introduce flag IFF_NEEDSEPOCH that marks Ethernet interfaces that supposedly may call into ether_input() without network epoch. They all need to be reviewed before 13.0-RELEASE. Some may need be fixed. The flag is not planned to be used in the kernel for a long time.	2022-07-11 11:52:46 +02:00
Michael Tuexen	ebbb6536b7	Add flags for upcoming patches related to improved ECN handling. No functional change. Submitted by: Richard Scheffenegger Reviewed by: rgrimes@, tuexen@ Differential Revision: https://reviews.freebsd.org/D22429	2022-07-11 11:52:46 +02:00
Edward Tomasz Napierala	0c4d87ca5f	Make use of the stats(3) framework in the TCP stack. This makes it possible to retrieve per-connection statistical information such as the receive window size, RTT, or goodput, using a newly added TCP_STATS getsockopt(3) option, and extract them using the stats_voistat_fetch(3) API. See the net/tcprtt port for an example consumer of this API. Compared to the existing TCP_INFO system, the main differences are that this mechanism is easy to extend without breaking ABI, and provides statistical information instead of raw "snapshots" of values at a given point in time. stats(3) is more generic and can be used in both userland and the kernel. Reviewed by: thj Tested by: thj Obtained from: Netflix Relnotes: yes Sponsored by: Klara Inc, Netflix Differential Revision: https://reviews.freebsd.org/D20655	2022-07-11 11:52:46 +02:00
David Bright	0c854dd6d1	Jail and capability mode for shm_rename; add audit support for shm_rename Co-mingling two things here: * Addressing some feedback from Konstantin and Kyle re: jail, capability mode, and a few other things * Adding audit support as promised. The audit support change includes a partial refresh of OpenBSM from upstream, where the change to add shm_rename has already been accepted. Matthew doesn't plan to work on refreshing anything else to support audit for those new event types. Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: kib Relnotes: Yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22083	2022-07-11 11:52:46 +02:00
John Baldwin	12fb531a70	Add a TOE KTLS mode and a TOE hook for allocating TLS sessions. This adds the glue to allocate TLS sessions and invokes it from the TLS enable socket option handler. This also adds some counters for active TOE sessions. The TOE KTLS mode is returned by getsockopt(TLSTX_TLS_MODE) when TOE KTLS is in use on a socket, but cannot be set via setsockopt(). To simplify various checks, a TLS session now includes an explicit 'mode' member set to the value returned by TLSTX_TLS_MODE. Various places that used to check 'sw_encrypt' against NULL to determine software vs ifnet (NIC) TLS now check 'mode' instead. Reviewed by: np, gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21891	2022-07-11 11:52:46 +02:00
Kyle Evans	1ef7e3904d	MFD_*: swap ordering This API is still young enough that I would expect no one to be dependant on this yet... Swap the ordering while it's young to match Linux values to potentially ease implementation of linuxolator syscall, being able to reuse existing constants.	2022-07-11 11:52:46 +02:00
David Bright	53648039c4	Add an shm_rename syscall Add an atomic shm rename operation, similar in spirit to a file rename. Atomically unlink an shm from a source path and link it to a destination path. If an existing shm is linked at the destination path, unlink it as part of the same atomic operation. The caller needs the same permissions as shm_unlink to the shm being renamed, and the same permissions for the shm at the destination which is being unlinked, if it exists. If those fail, EACCES is returned, as with the other shm_* syscalls. truss support is included; audit support will come later. This commit includes only the implementation; the sysent-generated bits will come in a follow-on commit. Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: jilles (earlier revision) Reviewed by: brueffer (manpages, earlier revision) Relnotes: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21423	2022-07-11 11:52:46 +02:00
Kyle Evans	9243caa8d3	Add linux-compatible memfd_create memfd_create is effectively a SHM_ANON shm_open(2) mapping with optional CLOEXEC and file sealing support. This is used by some mesa parts, some linux libs, and qemu can also take advantage of it and uses the sealing to prevent resizing the region. This reimplements shm_open in terms of shm_open2(2) at the same time. shm_open(2) will be moved to COMPAT12 shortly. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D21393	2022-07-11 11:52:46 +02:00
Kyle Evans	99b66f5315	Add a shm_open2 syscall to support upcoming memfd_create shm_open2 allows a little more flexibility than the original shm_open. shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate shmflag argument that can be expanded later. Currently the only shmflag is to allow file sealing on the returned fd. shm_open and memfd_create will both be implemented in libc to use this new syscall. __FreeBSD_version is bumped to indicate the presence. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21393	2022-07-11 11:52:46 +02:00
Randall Stewart	878b65b3b6	This commit adds BBR (Bottleneck Bandwidth and RTT) congestion control. This is a completely separate TCP stack (tcp_bbr.ko) that will be built only if you add the make options WITH_EXTRA_TCP_STACKS=1 and also include the option TCPHPTS. You can also include the RATELIMIT option if you have a NIC interface that supports hardware pacing, BBR understands how to use such a feature. Note that this commit also adds in a general purpose time-filter which allows you to have a min-filter or max-filter. A filter allows you to have a low (or high) value for some period of time and degrade slowly to another value has time passes. You can find out the details of BBR by looking at the original paper at: https://queue.acm.org/detail.cfm?id=3022184 or consult many other web resources you can find on the web referenced by "BBR congestion control". It should be noted that BBRv1 (which this is) does tend to unfairness in cases of small buffered paths, and it will usually get less bandwidth in the case of large BDP paths(when competing with new-reno or cubic flows). BBR is still an active research area and we do plan on implementing V2 of BBR to see if it is an improvement over V1. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D21582	2022-07-11 11:52:46 +02:00
Alan Somers	ce921ffca8	Reduce namespace pollution from r349233 Define __daddr_t in _types.h and use it in filio.h Reported by: ian, bde Reviewed by: ian, imp, cem MFC after: 2 weeks MFC-With: 349233 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20715	2022-07-11 11:52:46 +02:00
Alan Somers	5a6ad7c5bc	#include <sys/types.h> from sys/filio.h This fixes world build after r349231 Reported by: Jenkins MFC after: 2 weeks MFC-With: 349231 Sponsored by: The FreeBSD Foundation	2022-07-11 11:52:46 +02:00
Alan Somers	8fe49db783	Add FIOBMAP2 ioctl This ioctl exposes VOP_BMAP information to userland. It can be used by programs like fragmentation analyzers and optimized cp implementations. But I'm using it to test fusefs's VOP_BMAP implementation. The "2" in the name distinguishes it from the similar but incompatible FIBMAP ioctls in NetBSD and Linux. FIOBMAP2 differs from FIBMAP in that it uses a 64-bit block number instead of 32-bit, and it also returns runp and runb. Reviewed by: mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20705	2022-07-11 11:52:46 +02:00
Brooks Davis	c42aaaea4f	Move 32-bit compat support for FIODGNAME to the right place. ioctl(2) commands only have meaning in the context of a file descriptor so translating them in the syscall layer is incorrect. The new handler users an accessor to retrieve/construct a pointer from the last member of the passed structure and relies on type punning to access the other member which requires no translation. Unlike r339174 this change supports both places FIODGNAME is handled. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17475	2022-07-11 11:52:46 +02:00
Pedro F. Giffuni	eb4cbf4fd3	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2022-07-11 11:52:46 +02:00
Sebastian Huber	5c0c0e5c77	RTEMS: Remove FreeBSD version tags	2022-07-11 11:52:46 +02:00
Warner Losh	9331071f02	cdefs.h: Remove redundant #ifdefs Remove redunant #ifdef __GNUC__ inside an #if defined(__GNUC__) block. They are nops. Sponsored by: Netflix	2022-07-11 11:52:46 +02:00
Mark Johnston	f537ff8ee5	cdefs: Add a default definition for __nosanitizememory MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-07-11 11:52:46 +02:00
Mark Johnston	8801440e4f	cdefs: Make __nosanitizeaddress work for KASAN as well Add __nosanitizememory while I'm here. Reviewed by: andrew, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30126	2022-07-11 11:52:46 +02:00
Warner Losh	65338d7299	cdefs.h: Remove __GNUCLIKE___OFFSETOF, it's unused __GNUCLIKE___OFFSETOF is unreferenced in the tree, remove it as long obsolete. Sponsored by: Netflix	2022-07-11 11:52:46 +02:00
Alex Richardson	68109f904b	Expose clang's alignment builtins and use them for roundup2/rounddown2 This makes roundup2/rounddown2 type- and const-preserving and allows using it on pointer types without casting to uintptr_t first. Not performing pointer-to-integer conversions also helps the compiler's optimization passes and can therefore result in better code generation. When using it with integer values there should be no change other than the compiler checking that the alignment value is a valid power-of-two. I originally implemented these builtins for CHERI a few years ago and they have been very useful for CheriBSD. However, they are also useful for non-CHERI code so I was able to upstream them for Clang 10.0. Rationale from the clang documentation: Clang provides builtins to support checking and adjusting alignment of pointers and integers. These builtins can be used to avoid relying on implementation-defined behavior of arithmetic on integers derived from pointers. Additionally, these builtins retain type information and, unlike bitwise arithmetic, they can perform semantic checking on the alignment value. There is also a feature request for GCC, so GCC may also support it in the future: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98641 Reviewed By: brooks, jhb, imp Differential Revision: https://reviews.freebsd.org/D28332	2022-07-11 11:52:46 +02:00
Warner Losh	c382ecde55	cdefs.h: remove intel_compiler support The age of the intel compiler support is so old as to be uninteresting. No recent recports of intel compiler support have been received. Remove all the special case workarounds for the Intel compiler. Should there be interest in supporting the compiler, contact me and I'll work with people to make it happen, though I suspect these instances are more likely to be in the way than to be helpful. Reviewed by: cem, emaste, vangyzen, dim Differential Revision: https://reviews.freebsd.org/D26817	2022-07-11 11:52:46 +02:00
Sebastian Huber	282d57d2a8	Reduce namespace pollution from <sys/_types.h> Provide only __daddr_t through <sys/_types.h>.	2022-07-08 06:57:52 +02:00
Sebastian Huber	4ab39e0a85	RTEMS: Declare ioctl() also if _KERNEL is defined This fixes the following warning in libbsd: rtems/blkdev.h:200:10: warning: implicit declaration of function 'ioctl'; did you mean 'ifioctl'? [-Wimplicit-function-declaration] Remove unnecessary includes.	2022-07-08 06:57:52 +02:00
Ken Brown	5d4f405d3b	Cygwin: redefine some macros for Linux compatibility Define FD_SETSIZE (<sys/select.h>) to be 1024 by default, and define NOFILE (<sys/param.h>) to be OPEN_MAX (== 3200) by default. Remove the comment in <sys/select.h> that FD_SETSIZE should be >= NOFILE. Bump API minor. Addresses: https://cygwin.com/pipermail/cygwin/2022-July/251839.html	2022-07-07 08:22:40 -04:00
Ken Brown	1503d14af1	Cygwin: stdio: don't try again to read after EOF This reverts commit `1f8f7e2d54`, "* libc/stdio/refill.c (__srefill): Try again after EOF on Cygwin." If EOF is set on a file, the stdio input functions will now immediately return EOF rather than trying again to read. This aligns Cygwin's behavior to that of Linux. Addresses: https://cygwin.com/pipermail/cygwin/2022-June/251672.html	2022-07-04 18:55:08 -04:00
Sebastian Huber	27fd806cd7	RTEMS: _KERNEL tweak for <sys/cpuset.h> If _KERNEL is defined, then do not delcare CPU_ALLOC() and CPU_FREE() since __cpuset_alloc() and __cpuset_free() are not declared as well.	2022-07-01 07:25:32 +02:00
Stefan Eßer	e7ffbdb018	newlib/libc/sys/rtems/include/sys/cpuset.h Fix typo in source file. Reported by: pluknet at gmail.com (Sergey Kandaurov)	2022-06-22 10:28:42 +02:00
Stefan Eßer	e927f541f7	Make CPU_SET macros compliant with other implementations The introduction of <sched.h> improved compatibility with some 3rd party software, but caused the configure scripts of some ports to assume that they were run in a GLIBC compatible environment. Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being added to ports, but there still were compatibility issues due to invalid assumptions made in autoconfigure scripts. The differences between the FreeBSD version of macros like CPU_AND, CPU_OR, etc. and the GLIBC versions was in the number of arguments: FreeBSD used a 2-address scheme (one source argument is also used as the destination of the operation), while GLIBC uses a 3-adderess scheme (2 source operands and a separately passed destination). The GLIBC scheme provides a super-set of the functionality of the FreeBSD macros, since it does not prevent passing the same variable as source and destination arguments. In code that wanted to preserve both source arguments, the FreeBSD macros required a temporary copy of one of the source arguments. This patch set allows to unconditionally provide functions and macros expected by 3rd party software written for GLIBC based systems, but breaks builds of externally maintained sources that use any of the following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR. One contributed driver (contrib/ofed/libmlx5) has been patched to support both the old and the new CPU_OR signatures. If this commit is merged to -STABLE, the version test will have to be extended to cover more ranges. Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do no longer require that option. The FreeBSD version has been bumped to 1400046 to reflect this incompatible change. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D33451	2022-06-22 10:28:42 +02:00
Stefan Eßer	6af6e29552	sys/_bitset.h: Fix fall-out from commit 5e04571cf3c The changes to the bitset macros allowed sched.h to be included into userland programs without name space pollution due to BIT_* and BITSET_* macros. The definition of a "struct bitset" had been overlooked. This name space pollution caused the build of port print/miktex to fail. This commit makes the definition of struct bitset depend on the same condition as the visibility of the BIT_* and BITSET_* macros, i.e. needs _KERNEL or _WANT_FREEBSD_BITSET to be defined before including sys/_bitset.h. It has been tested with "make universe" since a prior attempt to fix the issue broke the PowerPC64 kernel build. This commit shall be MFCed together with commit 5e04571cf3c. Reported by: arrowd MFC after: 1 month	2022-06-22 10:15:26 +02:00
Stefan Eßer	3af17aef2b	sys/_bitset.h: revert commit 74e014dbfab It caused kernel build for PowerPC64 to fail. A different patch is being tested with make universe to make sure it works on all architectures. MFC after: 1 month<N [day[s]\|week[s]\|month[s]]. Request a reminder email>	2022-06-22 10:15:26 +02:00
Stefan Eßer	c78c56c06d	sys/_bitset.h: Fix fall-out from commit 5e04571cf3c There is a reference to malloc() in #define __BITSET_ALLOC. Even though this macro is only defined but not used, it causes the lang/gcc ports to fail. The gcc ports "poison" a number of functions including malloc() and prevent their use (including in macro definitions). This commit moved the declaration of __BITSET_ALLOC into the conditional block that depends on _KERNEL or _WANT_FREEBSD_BITSET being defined. There is no use of __BITSET_ALLOC in the FreeBSD sources, and userland programs that want to use BITSEC_ALLOC will define _WANT_FREEBSD_BITSET anyway. This patch has been tested by building lang/gcc11 and a successful make buildworld. This commit shall be MFCed together with commit 5e04571cf3c. MFC after: 1 month	2022-06-22 10:15:26 +02:00
Konstantin Belousov	2f6651097e	sys/_bitset.h: Fix fall-out from commit 5e04571cf3c The changes to the bitset macros allowed sched.h to be included into userland programs without name space pollution due to BIT_* and BITSET_* macros. The definition of a global variable "bitset" had been overlooked. This name space pollution caused a compile failure in print/miktex. This commit renames the bitset variable to __bitset with the same mapping back to the bitset if _KERNEL or _WANT_FREEBSD_BITSET is defined. This fix has been suggested by kib. It has been tested to let the build of the print/miktex port succeed and to not break buildworld. This commit shall be MFCed together with commit 5e04571cf3c. Reported by: arrowd MFC after: 1 month	2022-06-22 10:15:26 +02:00
Stefan Eßer	a1071cb178	sys/bitset.h: reduce visibility of BIT_* macros Add two underscore characters "__" to names of BIT_* and BITSET_* macros to move them to the implementation name space and to prevent a name space pollution due to BIT_* macros in 3rd party programs with conflicting parameter signatures. These prefixed macro names are used in kernel header files to define macros in e.g. sched.h, sys/cpuset.h and sys/domainset.h. If C programs are built with either -D_KERNEL (automatically passed when building a kernel or kernel modules) or -D_WANT_FREENBSD_BITSET (or this macros is defined in the source code before including the bitset macros), then all macros are made visible with their previous names, too. E.g., both __BIT_SET() and BIT_SET() are visible with either of _KERNEL or _WANT_FREEBSD_BITSET defined. The main reason for this change is that some 3rd party sources including sched.h have been found to contain conflicting BIT_* macros. As a work-around, parts of shed.h have been made conditional and depend on _WITH_CPU_SET_T being set when sched.h is included. Ports that expect the full functionality provided by sched.h need to be built with -D_WITH_CPU_SET_T. But this leads to conflicts if BIT_* macros are defined in that program, too. This patch set makes all of sched.h visible again without this parameter being passed and without any name space pollution due to BIT_* macros becoming visible when sched.h is included. This patch set will be backported to the STABLE branches, but ports will need to use -D_WITH_CPU_SET_T as long as there are supported releases that do not contain these patches. Reviewed by: kib, markj MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D33235	2022-06-22 10:15:26 +02:00
Konstantin Belousov	4ac3ee88c7	sched.h: add CPU_EQUAL() for better compatibility with Linux Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32901	2022-06-22 10:15:26 +02:00
Mark Johnston	6070714e0f	cpuset(9): Add CPU_FOREACH_IS(SET\|CLR) and modify consumers to use it This implementation is faster and doesn't modify the cpuset, so it lets us avoid some unnecessary copying as well. No functional change intended. This is a re-application of commit 9068f6ea697b1b28ad1326a4c7a9ba86f08b985e. Reviewed by: cem, kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32029	2022-06-22 10:15:26 +02:00
Mark Johnston	27e8401c46	bitset: Reimplement BIT_FOREACH_IS(SET\|CLR) Eliminate the nested loops and re-implement following a suggestion from rlibby. Add some simple regression tests. Reviewed by: rlibby, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32472	2022-06-22 10:15:26 +02:00
Mark Johnston	96ddb4055e	Revert "cpuset(9): Add CPU_FOREACH_IS(SET\|CLR) and modify consumers to use it" This reverts commit 9068f6ea697b1b28ad1326a4c7a9ba86f08b985e. The underlying macro needs to be reworked to avoid problems with control flow statements. Reported by: rlibby	2022-06-22 10:15:26 +02:00
Mark Johnston	112245b78f	cpuset(9): Add CPU_FOREACH_IS(SET\|CLR) and modify consumers to use it This implementation is faster and doesn't modify the cpuset, so it lets us avoid some unnecessary copying as well. No functional change intended. Reviewed by: cem, kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32029	2022-06-22 10:15:26 +02:00
Mark Johnston	37a3e59636	bitset(9): Introduce BIT_FOREACH_ISSET and BIT_FOREACH_ISCLR These allow one to non-destructively iterate over the set or clear bits in a bitset. The motivation is that we have several code fragments which iterate over a CPU set like this: while ((cpu = CPU_FFS(&cpus)) != 0) { cpu--; CPU_CLR(cpu, &cpus); <do something>; } This is slow since CPU_FFS begins the search at the beginning of the bitset each time. On amd64 and arm64, CPU sets have size 256, so there are four limbs in the bitset and we do a lot of unnecessary scanning. A second problem is that this is destructive, so code which needs to preserve the original set has to make a copy. In particular, we have quite a few functions which take a cpuset_t parameter by value, meaning that each call has to copy the 32 byte cpuset_t. The new macros address both problems. Reviewed by: cem, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32028	2022-06-22 10:15:26 +02:00
Patrick Kelsey	7c03cdf47e	iflib: Improve mapping of TX/RX queues to CPUs iflib now supports mapping each (TX,RX) queue pair to the same CPU (default), to separate CPUs, or to a pair of physical and logical CPUs that share the same L2 cache. The mapping mechanism supports unequal numbers of TX and RX queues, with the excess queues always being mapped to consecutive physical CPUs. When the platform cannot distinguish between physical and logical CPUs, all are treated as physical CPUs. See the comment on get_cpuid_for_queue() for the entire matrix. The following device-specific tunables influence the mapping process: dev.<device>.<unit>.iflib.core_offset (existing) dev.<device>.<unit>.iflib.separate_txrx (existing) dev.<device>.<unit>.iflib.use_logical_cores (new) The following new, read-only sysctls provide visibility of the mapping results: dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu When an iflib driver allocates TX softirqs without providing reference RX IRQs, iflib now binds those TX softirqs to CPUs using the above mapping mechanism (that is, treats them as if they were TX IRQs). Previously, such bindings were left up to the grouptaskqueue code and thus fell outside of the iflib CPU mapping strategy. Reviewed by: kbowling Tested by: olivier, pkelsey MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D24094	2022-06-22 10:15:26 +02:00
Ryan Libby	fb0a5865e4	bitset: implement BIT_TEST_CLR_ATOMIC & BIT_TEST_SET_ATOMIC That is, provide wrappers around the atomic_testandclear and atomic_testandset primitives. Submitted by: jeff Reviewed by: cem, kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22702	2022-06-22 10:15:26 +02:00
D Scott Phillips	9d50b44689	bitset: expand bit index type to `long` An upcoming patch to use the bitset macros for tracking vm page dump information could conceivably need more than INT_MAX bits. Expand the bit type to long so that the extra range is available on 64-bit platforms where it would most likely be needed. CPUSET_COUNT and DOMAINSET_COUNT are also modified to remain of type `int`. Reviewed by: kib, markj Approved by: scottl (implicit) MFC after: 1 week Sponsored by: Ampere Computing, Inc. Differential Revision: https://reviews.freebsd.org/D26190	2022-06-22 10:15:26 +02:00
D Scott Phillips	4c5b7bec97	bitset: add BIT_FFS_AT() for finding the first bit set greater than a start bit Reviewed by: kib Approved by: scottl (implicit) MFC after: 1 week Sponsored by: Ampere Computing, Inc. Differential Revision: https://reviews.freebsd.org/D26128	2022-06-22 10:15:26 +02:00
Konstantin Belousov	5e7a2b174a	Fix undefined behavior: left-shifting into the sign bit. Reviewed by: dim, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22898	2022-06-22 10:15:26 +02:00

1 2 3 4 5 ...

3224 Commits