FreeBSD uses a 64-bit ino_t since 2017-05-23. We need this for the
pipe() support in libbsd.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Various structures exported by sysctl_rtsock() contain padding fields
which were not being zeroed.
Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: ae
MFC after: 3 days
Security: kernel memory disclosure
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18333
The threshold value at which powf overflows depends on the rounding mode
and the current check did not take this into account. So when the result
was rounded away from zero it could become infinity without setting
errno to ERANGE.
Example: pow(0x1.7ac7cp+5, 23) is 0x1.fffffep+127 + 0.1633ulp
If the result goes above 0x1.fffffep+127 + 0.5ulp then errno is set,
which is fine in nearest rounding mode, but
powf(0x1.7ac7cp+5, 23) is inf in upward rounding mode
powf(-0x1.7ac7cp+5, 23) is -inf in downward rounding mode
and the previous implementation did not set errno in these cases.
The fix tries to avoid affecting the common code path or calling a
function that may introduce a stack frame, so float arithmetics is used
to check the rounding mode and the threshold is selected accordingly.
for {n,u,m}stosbt
Integer overflows and wrong constants limited the accuracy of these
functions and created situatiosn where sbttoXs(Xstosbt(Y)) != Y. This
was especailly true in the ns case where we had millions of values
that were wrong.
Instead, used fixed constants because there's no way to say ceil(X)
for integer math. Document what these crazy constants are.
Also, use a shift one fewer left to avoid integer overflow causing
incorrect results, and adjust the equasion accordingly. Document this.
Allow times >= 1s to be well defined for these conversion functions
(at least the Xstosbt). There's too many users in the tree that they
work for >= 1s.
This fixes a failure on boot to program firmware on the mlx4
NIC. There was a msleep(1000) in the code. Prior to my recent rounding
changes, msleep(1000) worked, but msleep(1001) did not because the old
code rounded to just below 2^64 and the new code rounds to just above
it (overflowing, causing the msleep(1000) to really sleep 1ms).
A test program to test all cases will be committed shortly. The test
exaustively tries every value (thanks to bde for the test).
Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D18051
of the result rather than the floor(). Returning the floor means that
sbttoX(Xtosbt(y)) != y for almost all values of y. In practice, this
results in a difference of at most 1 in the lsb of the sbintime_t. This
difference is meaningless for all current users of these functions, but
is important for the newly introduced sysctl conversion routines which
implicitly rely on the transformation being idempotent.
Sponsored by: Netflix, Inc
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
and decimal time units. Use them in some existing code that is
vulnerable to roundoff errors.
The existing constant SBT_1NS is a honeypot, luring unsuspecting folks into
writing code such as long_timeout_ns*SBT_1NS to generate the argument for a
sleep call. The actual value of 1ns in sbt units is ~4.3, leading to a
large roundoff error giving a shorter sleep than expected when multiplying
by the trucated value of 4 in SBT_1NS. (The evil honeypot aspect becomes
clear after you waste a whole day figuring out why your sleeps return early.)
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
- Add CLOCK_REALTIME_COARSE, CLOCK_MONOTONIC_RAW,
CLOCK_MONOTONIC_COARSE and CLOCK_BOOTTIME
- Guard new values with __GNU_VISIBLE
- Add CLOCK_REALTIME_COARSE as (clockid_t) 0 for simplicity
(It allows to have all values < 8 and so be used as array
index into an array of clocks)
- Fix macro bracketing
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
A previous commit introduced the ability to use the semi-hosting
SYS_EXIT_EXTENDED operation to libgloss, this commit adds the same
ability to the sys/arm/ backend so that building newlib only will
provide the same capabilities.
This patch fixes an issue in the previous memset loop change. If the
zva size is >= 256 and there are more than 64 bytes left in the
tail, we could enter the loop and thus need to rebias dst by 32 as
well.
Since no known CPUs use this size this can't be tested natively, so I've
tested it on a simulator initialized with a large zva size.
--
Do not define __ATTRIBUTE_IMPURE_PTR__ for RTMES on the v850 target.
The previous definition lead to the following linker error in
combination with -fdata-sections:
relocation truncated to fit: R_V810_GPWLO_1 against symbol
`_global_impure_ptr' defined in .rodata._global_impure_ptr section in
libc.a(lib_a-impure.o)
relocation truncated to fit: R_V810_GPWLO_1 against symbol `_impure_ptr'
defined in .data._impure_ptr section in libc.a(lib_a-impure.o)
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
The <machine/param.h> header file exposes some unrelated stuff not
covered by C or POSIX. Avoid its use in <sys/_cpuset.h> since it is
included in <rtems.h>.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This fixes an ineffiency in the non-zero memset. Delaying the writeback
until the end of the loop is slightly faster on some cores - this shows
~5% performance gain on Cortex-A53 when doing large non-zero memsets.
Tested against the GLIBC testsuite.
The new GCC port for OpenRISC will use the init_fini_array only and not
provide the init() and fini() functions. Disable the function usage by
default as its no longer needed.
Signed-off-by: Stafford Horne <shorne@gmail.com>
The malloc, alloc_size and alloc_aligned attributes must be only used in
case the function returns the pointer to the allocated memory.
See also:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87683
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
The following FreeBSD kernel methods are not in any standard and
prototypes/definitions were leaking into application space:
+ round_page()
+ trunc_page()
+ atop()
+ ptoa()
+ pgtok()
v3: Add support for read ahead using strnlen, giving an additional 25% speedup
on large inputs (both short and long needles).
This patch significantly improves performance of strstr by using Sunday's
Quick-Search algorithm. Due to its simplicity it has the best average
performance of string matching algorithms on almost all inputs. It uses a
bad-character shift table to skip past mismatches.
The needle length is limited to 254 - this reduces the shift table memory
4 to 8 times, lowering preprocessing overhead and minimizing cache effects.
The limit also implies its worst-case performance is linear.
Larger needles are processed by the Two-Way algorithm. The macro AVAILABLE
has been improved to use strnlen to read the input in chunks. This results
in a 2.5 times speedup for large needles, reducing the performance drop when
the Quick-Search algorithm can't be used.
The code for 1-4 byte needles has been simplified and now uses unsigned
char. Since the optimized code relies on 8-bit chars, we defer to the
size-optimized implementation if CHAR_BIT > 8.
The performance gain of finding a set of randomly chosen words of size 8 in
256 bytes of English text is 14 times on AArch64. For longer haystacks the
gain is well over 20 times.
The size-optimized strstr has also been rewritten from scratch to improve
performance. On the same test the performance gain is 69%.
Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).
--
Use existing HAVE_OPENDIR define to determine if a generic
implementation should be provided. Cygwin for example has its own
implementation of opendir() and dirfd().
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
This is used by the file system support of libstdc++ for example. Use
content from latest FreeBSD <sys/dirent.h>
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Move common content of the various <sys/dirent.h> and the latest FreeBSD
<dirent.h> to <dirent.h>.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Use O_RDONLY since you are not supposed to write to a directory.
Use O_DIRECTORY as mandated by POSIX (The Open Group Base Specifications
Issue 7, 2018 edition IEEE Std 1003.1-2017):
"If the type DIR is implemented using a file descriptor, the descriptor
shall be obtained as if the O_DIRECTORY flag was passed to open()."
Use O_CLOEXEC as mandated by POSIX:
"When a file descriptor is used to implement the directory stream, it
behaves as if the FD_CLOEXEC had been set for the file descriptor."
Drop the fcntl() call in favour of O_CLOEXEC.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Make the POSIX O_CLOEXEC, O_NOFOLLOW, O_DIRECTORY, O_EXEC, and O_SEARCH
open() flags available also to non-Cygwin systems.
Make the BSD/glibc O_DIRECT open() flag available also to non-Cygwin
systems.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Commit fbace81684
("Import correctly working strtold from David M. Gay.")
introduced two new files, strtorx.c and strtodg.c. The functions
are only called from strtold.c. However, while strtold.c is only
built if HAVE_LONG_DOUBLE is defined, the patch erroneously added
the two new files to GENERAL_SOURCES unconditionally.
Fix this by building both files only if HAVE_LONG_DOUBLE has been
defined.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Commit 6c212a8b78
("Fix strtod ("nan") and strtold ("nan") returns wrong negative NaN")
introduced an unconditional dependency to nanl and, in turn, to libm.
Rather than including nanl in libc as well, just call __builtin_nanl
from here. Requires GCC 3.3 or later.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
Drop Cygwin-specific nanl in favor of a generic implementation
in newlib. Requires GCC 3.3 or later.
Signed-off-by: Corinna Vinschen <corinna@vinschen.de>
These attributes help static analysis tools to produce less false
positives, e.g. double free warnings.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
AngelSWI_Reason_ReportException does not return accoring to the ARM
documentation, so it is valid to mark _kill() as noreturn. This way,
the compiler does not warn about _exit() returning a value despite
being noreturn.
2018-10-01 Christophe Lyon <christophe.lyon@linaro.org>
* libgloss/arm/_exit.c (_exit): Declare _kill() as noreturn.
* libgloss/arm/_exit.c (_kill): Likewise. Remove the return
statements.
* newlib/libc/sys/arm/syscalls.c (_kill): Likewise..
While working on the strstr patch I noticed several copyright headers
of the new math functions are missing closing quotes after ``AS IS.
I've added these. Also update spellings of Arm Ltd in a few places
(but still use ARM LTD in upper case portion). Finally add SPDX
identifiers to make everything consistent.
It's been a while... I see the CRIS port broke with the
time_t-default-to-64-bit change, observable by a few test-cases in the gcc
fortran(!) tests failing, regressing when trying a recent newlib.
This is a two-part belt-and-suspenders change: adjust the CRIS port
gettimeofday syscall (the only one in newlib/CRIS passing a time_t or
struct timeval) to handle a userspace 64-bit time_t and secondly default
time_t to 32-bit long anyway. I considered making the local
"kernel_timeval" copy in _gettimeofday conditional on (userspace) time_t
being 64 bits, but thought it not worth bothering with the few move insns.
The effect of a 64-bit time_t is however observable as longer simulation
time when running the gcc testsuite and as bigger binaries without any
actual upside from the larger time_t size, so I thought better make the
default for this port go back to being a "long" again.
Tested by running the gcc testsuite over the three combinations of two
parts of the patch and observing the expected changes. Committed.
newlib:
* configure.host (cris, crisv32): Default to "long" time_t.
Signed-off-by: Hans-Peter Nilsson <hp@axis.com>
hash.h: Use 32-bit type for data stored on disk, so code
works for 16 and 64-bit targets. Reduce maximum bucket size on 16-bit
targets, so it fits in available memory.
hash.c: Check bucket size isn't too big for target.
hash_buf.c: Fix overflow warning on 16-bit targets.
When __HAVE_LOCALE_INFO__ is not selected, directly access the
existing _ctype_ variable from __locale_ctype_ptr() and
__locale_ctype_ptr_l(), eliminating the need for any locale or reent
structure
Signed-off-by: Keith Packard <keithp@keithp.com>
v2:
locale: fix conflict with __locale_ctype_ptr macro
If we are building without __HAVE_LOCALE_INFO__, there is a
macro providing __locale_ctype_ptr which in turn fouls up this
declaration.
Signed-off-by: Michael Lyle <mlyle@lyle.org>
The string/float conversion functions need to get the locale decimal
point. Instead of calling __localeconv_l (which copies locale data
into lconv form from __get_numeric_locale), use __get_numeric_locale
directly.
Signed-off-by: Keith Packard <keithp@keithp.com>
This makes sure any system-defined limits are specified before the
defaults are checked. Without this, ARG_MAX and PATH_MAX end up
getting the default definitions from limits.h rather than the defines
from syslimits.h. This could potentially cause problems when
different files used different values for the same name.
Signed-off-by: Keith Packard <keithp@keithp.com>
Improve strstr performance for the common case of short needles. For a single
character strchr is best, for 2-4 characters a small loop is fastest. For these
the speedup over the Two-Way algorithm is ~10 times on large strings.
Newlib builds, the new code passes GLIBC testsuite. OK for commit?
Issuing an ARM semi-hosting Seek command when just querying file
position with SEEK_CUR and offset zero is unnecessary, because unlike
the lseek() Unix system call the Seek command does not actually return
the file position. For that reason, syscalls.c for ARM keeps track of
file position in the 'poslog', so we can just return that.
Moreover, since the Seek command only accepts an absolute file position,
SEEK_CUR operations are implemented by adding the relative offset to the
position in the poslog. If the host implements non-binary files with
implicit carriage return characters but doesn't discount those implicit
CRs when implementing Seek (by just mapping straight to Windows file
operations), this actually ended up wrongly changing file position when
using SEEK_CUR with offset zero or functions like ftell() or fgetpos()
that are based on that.
Also, use off_t rather than int for the poslog.
Standard headers shouldn't use non-reserved identifiers as parameter
names in function declarations, because programs could in theory
define macros with such names before including a header.
This macro selects a compiler option that disables recognition of
common memset/memcpy patterns and converting those to direct
memset/memcpy calls.
Signed-off-by: Keith Packard <keithp@keithp.com>
These types were introduced by FreeBSD commit:
"Make struct xinpcb and friends word-size independent.
Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.
On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662. This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.
PR: 228301 (exp-run by antoine)
Reviewed by: jtl, kib, rwatson (various versions)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15386"
In RTEMS, there is no user/kernel space separation. So, use the types
size_t and uintptr_t.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
The __XSI_VISIBLE is not enabled by default in Newlib. This is an
incompatiblity between FreeBSD and glibc.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
with name SO_DOMAIN to get the domain of a socket.
This is helpful when testing and Solaris and Linux have the same
socket option using the same name.
Reviewed by: bcr@, rrs@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16791
queues per bucket.
There is a hashing algorithm which should distribute IPv6 reassembly
queues across the available buckets in a relatively even way. However,
if there is a flaw in the hashing algorithm which allows a large number
of IPv6 fragment reassembly queues to end up in a single bucket, a per-
bucket limit could help mitigate the performance impact of this flaw.
Implement such a limit, with a default of twice the maximum number of
reassembly queues divided by the number of buckets. Recalculate the
limit any time the maximum number of reassembly queues changes.
However, allow the user to override the value using a sysctl
(net.inet6.ip6.maxfragbucketsize).
Reviewed by: jhb
Security: FreeBSD-SA-18:10.ip
Security: CVE-2018-6923
The IPv4 fragment reassembly code supports a limit on the number of
fragments per packet. The default limit is currently 17 fragments.
Among other things, this limit serves to limit the number of fragments
the code must parse when trying to reassembly a packet.
Add a limit to the IPv6 reassembly code. By default, limit a packet
to 65 fragments (64 on the queue, plus one final fragment to complete
the packet). This allows an average fragment size of 1,008 bytes, which
should be sufficient to hold a fragment. (Recall that the IPv6 minimum
MTU is 1280 bytes. Therefore, this configuration allows a full-size
IPv6 packet to be fragmented on a link with the minimum MTU and still
carry approximately 272 bytes of headers before the fragmented portion
of the packet.)
Users can adjust this limit using the net.inet6.ip6.maxfragsperpacket
sysctl.
Reviewed by: jhb
Security: FreeBSD-SA-18:10.ip
Security: CVE-2018-6923
Rack includes the following features: - A different SACK processing
scheme (the old sack structures are not used). - RACK (Recent
acknowledgment) where counting dup-acks is no longer done instead time
is used to knwo when to retransmit. (see the I-D) - TLP (Tail Loss
Probe) where we will probe for tail-losses to attempt to try not to take
a retransmit time-out. (see the I-D) - Burst mitigation using TCPHTPS -
PRR (partial rate reduction) see the RFC.
Once built into your kernel, you can select this stack by either
socket option with the name of the stack is "rack" or by setting
the global sysctl so the default is rack.
Note that any connection that does not support SACK will be kicked
back to the "default" base FreeBSD stack (currently known as "default").
To build this into your kernel you will need to enable in your
kernel:
makeoptions WITH_EXTRA_TCP_STACKS=1
options TCPHPTS
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D15525
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.
Most of the code was copied from a similar patch for DragonflyBSD.
However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.
Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.
Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).
This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@
for the substantive feedback that is included in this commit.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Obtained from: DragonflyBSD
Relnotes: Yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003
Part 3 of many ...
The VPC framework relies heavily on cloning pseudo interfaces
(vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc).
This pulls in that piece. Some ancillary changes get pulled
in as a side effect.
Reviewed by: shurd@
Approved by: sbruno@
Sponsored by: Joyent, Inc.
Differential Revision: https://reviews.freebsd.org/D15347
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.
Most of the code was copied from a similar patch for DragonflyBSD.
However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.
Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.
Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).
Submitted by: Johannes Lundberg <johanlun0@gmail.com>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003
Use an accessor to access ifgr_group and ifgr_groups.
Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such
as "case SIOCAIFGROUP:". This avoids poluting the switch statements
with large numbers of #ifdefs.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14960
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900
Add a new "interleave" allocation policy which stripes pages across
domains with a stride or width keeping contiguity within a multi-page
region.
Move the kernel to the dedicated numbered cpuset #2 making it possible
to assign kernel threads and memory policy separately from user. This
also eliminates the need for the complicated interrupt binding code.
Add a sysctl API for viewing and manipulating domainsets. Refactor some
of the cpuset_t manipulation code using the generic bitset type so that
it can be used for both. This probably belongs in a dedicated subr file.
Attempt to improve the include situation.
Reviewed by: kib
Discussed with: jhb (cpuset parts)
Tested by: pho (before review feedback)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14839
Make all kernel accesses to ifru_buffer go via access functions
which take the process ABI into account and use an appropriate union
to access members in the correct place in struct ifreq.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14846
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be
considered as untagged, and only PCP and DEI values from the VLAN tag
are meaningful. See for instance
https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html.
Make it possible to specify PCP value for outgoing packets on an
ethernet interface. When PCP is supplied, the tag is appended, VLAN
id set to 0, and PCP is filled by the supplied value. The code to do
VLAN tag encapsulation is refactored from the if_vlan.c and moved into
if_ethersubr.c.
Drivers might have issues with filtering VID 0 packets on
receive. This bug should be fixed for each driver.
Reviewed by: ae (previous version), hselasky, melifaro
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D14702
Include _uio.h instead of uio.h in several headers to reduce header
polution.
Fix a few places that relied on header polution to get the uio.h header.
I have not moved struct uio as many more things that use it rely on
header polution to get other definitions from uio.h.
Reviewed by: cem, kib, markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14811
which we discussed at the developer summits at BSDCan and BSDCam in 2017.
The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.
It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.
You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.
This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.
There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.
Reviewed by: gnn (previous version)
Obtained from: Netflix, Inc.
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D11085
These macros take an existing ioctl(2) command and replace the length
with the specified length or length of the specified type respectively.
These can be used to define commands for 32-bit compatibility with fewer
opportunities for cut-and-paste errors then a whole new definition.
Reviewed by: cem, kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14706
[RFC7413]. It also includes a pre-shared key mode of operation in which
the server requires the client to be in possession of a shared secret in
order to successfully open TFO connections with that server.
The names of some existing fastopen sysctls have changed (e.g.,
net.inet.tcp.fastopen.enabled -> net.inet.tcp.fastopen.server_enable).
Reviewed by: tuexen
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14047
that had the IPv6 fragmentation header:
o Neighbor Solicitation
o Neighbor Advertisement
o Router Solicitation
o Router Advertisement
o Redirect
Introduce M_FRAGMENTED mbuf flag, and set it after IPv6 fragment reassembly
is completed. Then check the presence of this flag in correspondig ND6
handling routines.
PR: 224247
MFC after: 2 weeks
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
for SO_TIMESTAMP and other similar socket options.
Provide new control message SCM_TIME_INFO to supply information about
timestamp. Currently it indicates that the timestamp was
hardware-assisted and high-precision, for software timestamps the
message is not returned. Reserved fields are added to ABI to report
additional info about it, it is expected that raw hardware clock value
might be useful for some applications.
Reviewed by: gallatin (previous version), hselasky
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D12638
in nanoseconds from boot for the received packets.
The rcv_tstmp field overlaps the place of Ln header length indicators,
not used by received packets. The basic pkthdr rearrangement change
in sys/mbuf.h was provided by gallatin.
There are two accompanying M_ flags: M_TSTMP means that there is the
timestamp (and it was generated by hardware).
Another flag M_TSTMP_HPREC indicates that the timestamp is
high-precision. Practically M_TSTMP_HPREC means that hardware
provided additional precision comparing with the stamps when the flag
is not set. E.g., for ConnectX all packets are stamped by hardware
when PCIe transaction to write out the completion descriptor is
performed, but PTP packet are stamped on port. For Intel cards, when
PTP assist is enabled, only PTP packets are stamped in the limited
number of registers, so if Intel cards ever start support this
mechanism, they would always set M_TSTMP | M_TSTMP_HPREC if hardware
timestamp is present for the given packet.
Add IFCAP_HWRXTSTMP interface capability to indicate the support for
hardware rx timestamping, and ifconfig(8) command to toggle it.
Based on the patch by: gallatin
Reviewed by: gallatin (previous version), hselasky
Sponsored by: Mellanox Technologies
MFC after: 2 weeks (? mbuf KBI issue)
X-Differential revision: https://reviews.freebsd.org/D12638
It will be needed by hn(4) to configure its RSS key and hash
type/function in the transparent VF mode in order to match VF's
RSS settings. The description of the transparent VF mode and
the RSS hash value issue are here:
https://svnweb.freebsd.org/base?view=revision&revision=322299https://svnweb.freebsd.org/base?view=revision&revision=322485
These are generic enough to promise two independent IOCs instead
of abusing SIOCGDRVSPEC.
Setting RSS key and hash type/function is a different story,
which probably requires more discussion.
Comment about UDP_{IPV4,IPV6,IPV6_EX} were only in the patch
in the review request; these hash types are standardized now.
Reviewed by: gallatin
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12174
They are defined by XSI or newer SUS.
This is a follow-up to r318780.
Reported by: jbeich
Obtained from: DragonflyBSD commit e08b3836c962
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Guard, requested by the MAP_GUARD mmap(2) flag, prevents the reuse of
the allocated address space, but does not allow instantiation of the
pages in the range. It is useful for more explicit support for usual
two-stage reserve then commit allocators, since it prevents accidental
instantiation of the mapping, e.g. by mprotect(2).
Use guards to reimplement stack grow code. Explicitely track stack
grow area with the guard, including the stack guard page. On stack
grow, trivial shift of the guard map entry and stack map entry limits
makes the stack expansion. Move the code to detect stack grow and
call vm_map_growstack(), from vm_fault() into vm_map_lookup().
As result, it is impossible to get random mapping to occur in the
stack grow area, or to overlap the stack guard page.
Enable stack guard page by default.
Reviewed by: alc, markj
Man page update reviewed by: alc, bjk, emaste, markj, pho
Tested by: pho, Qualys
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D11306 (man pages)
o Separate fields of struct socket that belong to listening from
fields that belong to normal dataflow, and unionize them. This
shrinks the structure a bit.
- Take out selinfo's from the socket buffers into the socket. The
first reason is to support braindamaged scenario when a socket is
added to kevent(2) and then listen(2) is cast on it. The second
reason is that there is future plan to make socket buffers pluggable,
so that for a dataflow socket a socket buffer can be changed, and
in this case we also want to keep same selinfos through the lifetime
of a socket.
- Remove struct struct so_accf. Since now listening stuff no longer
affects struct socket size, just move its fields into listening part
of the union.
- Provide sol_upcall field and enforce that so_upcall_set() may be called
only on a dataflow socket, which has buffers, and for listening sockets
provide solisten_upcall_set().
o Remove ACCEPT_LOCK() global.
- Add a mutex to socket, to be used instead of socket buffer lock to lock
fields of struct socket that don't belong to a socket buffer.
- Allow to acquire two socket locks, but the first one must belong to a
listening socket.
- Make soref()/sorele() to use atomic(9). This allows in some situations
to do soref() without owning socket lock. There is place for improvement
here, it is possible to make sorele() also to lock optionally.
- Most protocols aren't touched by this change, except UNIX local sockets.
See below for more information.
o Reduce copy-and-paste in kernel modules that accept connections from
listening sockets: provide function solisten_dequeue(), and use it in
the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
infiniband, rpc.
o UNIX local sockets.
- Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
local sockets. Most races exist around spawning a new socket, when we
are connecting to a local listening socket. To cover them, we need to
hold locks on both PCBs when spawning a third one. This means holding
them across sonewconn(). This creates a LOR between pcb locks and
unp_list_lock.
- To fix the new LOR, abandon the global unp_list_lock in favor of global
unp_link_lock. Indeed, separating these two locks didn't provide us any
extra parralelism in the UNIX sockets.
- Now call into uipc_attach() may happen with unp_link_lock hold if, we
are accepting, or without unp_link_lock in case if we are just creating
a socket.
- Another problem in UNIX sockets is that uipc_close() basicly did nothing
for a listening socket. The vnode remained opened for connections. This
is fixed by removing vnode in uipc_close(). Maybe the right way would be
to do it for all sockets (not only listening), simply move the vnode
teardown from uipc_detach() to uipc_close()?
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D9770
INHERIT_ZERO is an OpenBSD feature.
When a page is marked as such, it would be zeroed
upon fork().
This would be used in new arc4random(3) functions.
PR: 182610
Reviewed by: kib (earlier version)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D427
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
Our mprotect() function seems to take a "const void *" address to the
pages whose permissions need to be adjusted. POSIX uses "void *". Simply
stick to the POSIX one to prevent us from writing unportable code.
PR: 211423 (exp-run)
Tested by: antoine@ (Thanks!)
for libthr.so.3, without breaking the ABI. Special value is stored in
the lock pointer to indicate shared lock, and offline page in the shared
memory is allocated to store the actual lock.
Reviewed by: vangyzen (previous version)
Discussed with: deischen, emaste, jhb, rwatson,
Martin Simmons <martin@lispworks.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation