commit f6303b8b0dd6353c6a5bb4de2e855a13b86f22cf
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 09:54:12 2026 -0400

    ci: publish a release (source tarball + MSVC binaries) on version tags
    
    Add release.yml: on a v* tag it verifies the tag matches configure.ac, builds
    the source distribution via 'make dist', extracts the matching section of
    README.md as the release notes, and publishes a GitHub release with the tarball.
    
    Also add a 'tags: v[0-9]*' trigger to build.yml so the full matrix runs at the
    tagged commit and the existing MSVC 'Upload to GitHub Release' step (gated on
    github.ref_type == 'tag') attaches the 32-bit and 64-bit Windows MSVC binaries
    to the same release.
    
    No signing yet; mingw/cygwin binaries are not published.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 4e537ca4a37d1401598834b40aa1406dcf4997b4
Merge: ce9904d6 78417bf2
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 09:45:29 2026 -0400

    Merge pull request #981 from libffi/dlmalloc-builtin-clz-all-arches
    
    dlmalloc: use __builtin_clz/ctz bit-index on all GNU compilers (#754)

commit ce9904d6aec8dc6bb536a901a71c1e445eae7ec6
Merge: b32ddaf7 2f61bf22
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 09:44:59 2026 -0400

    Merge pull request #982 from emiltayl/msvc-atomics
    
    Add atomic intrinsics for MSVC to dlmalloc.c

commit b32ddaf73a16f0ddb8055908b829682eb5c73eeb
Merge: 0bff58c1 ad506b70
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 09:43:53 2026 -0400

    Merge pull request #983 from libffi/ci-drop-mingw-ansi-stdio
    
    ci: drop ineffective __USE_MINGW_ANSI_STDIO from Win64 mingw test run

commit ad506b7055ce76f7254c821a68149e4e38807da0
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 08:36:41 2026 -0400

    ci: drop ineffective __USE_MINGW_ANSI_STDIO from the Win64 mingw test run
    
    The RUNTESTFLAGS="TOOL_OPTIONS=-D__USE_MINGW_ANSI_STDIO=1" added to silence the
    long double %Lf/%Lg -Wformat warnings on x86_64-w64-mingw32 does not actually
    work: the flag reaches the compile line but gcc's -Wformat still uses the MS
    printf archetype with the mingw headers on the runner, so the ~10 long double
    '(test for excess errors)' failures persist regardless.  The rlgl policy already
    XFAILs those as the environmental printf-format issue they are, so the
    ineffective flag is just noise -- remove it and let the baseline handle them.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 78417bf2e6ca007fe4c9ff9cb74b31a55a2f3cae
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 08:16:29 2026 -0400

    dlmalloc: use __builtin_clz/ctz bit-index on all GNU compilers (#754)
    
    Upstream 2.8.6 already replaced the old x86 'bsrl' inline asm in
    compute_tree_index/compute_bit2idx with __builtin_clz/__builtin_ctz, but gated
    the intrinsic path to __i386__/__x86_64__; every other GNU-compatible target
    fell back to the generic C bit-twiddling (or ffs()).  __builtin_clz/__builtin_ctz
    are target-independent, so enable the intrinsic path for any __GNUC__ compiler.
    This lowers to a hardware bit-count instruction on aarch64 (clz / rbit+clz),
    riscv, power, s390x, etc. instead of the open-coded idiom.
    
    Also drop the now-unused USE_BUILTIN_FFS define in closures.c (and thus the
    <strings.h> include for ffs), since the intrinsic path no longer needs it.
    
    Implements the optimization from #754 (whose diff predated the 2.8.6 update).
    Verified: x86_64 closures suite 588/0 (unchanged); the macros compile and emit
    clz/ctz instructions for aarch64/riscv64/ppc64le/s390x/arm/mips64/sparc64.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 2f61bf2213bbda7b348f13c1d58947c5fa33f858
Author: Emil Taylor Bye <phptph@gmail.com>
Date:   Sat Jun 20 14:05:16 2026 +0200

    Add atomic intrinsics for MSVC to dlmalloc.c

commit 0bff58c1043064d7cb0e825107ba79946c2bae51
Merge: 2388f6dc ad447103
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 08:06:46 2026 -0400

    Merge pull request #978 from libffi/prep-3.6.0-release
    
    Prepare libffi 3.6.0 release

commit 2388f6dcfac3c434472668d32c805ec34a08ce2d
Merge: 68be2150 72ce9313
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 08:02:26 2026 -0400

    Merge pull request #980 from libffi/doc-closure-free-safety
    
    doc: note that freeing an in-use closure is unsafe (#835)

commit 68be21507f1008e069e572b585dee75d48249528
Merge: bc0ebb2e f554e4ce
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 08:01:59 2026 -0400

    Merge pull request #979 from libffi/portability-c99-clang-asm
    
    Portability: strict-C99 and clang-assembler fixes (#795, #851, #947)

commit 72ce931309653640ddd4e687a30a8293b95f909c
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 04:35:17 2026 -0400

    doc: note that freeing a closure in use is unsafe (#835)
    
    Clarify in the ffi_closure_free documentation that a closure must not be freed
    while it may still be invoked -- including from within its own callback -- since
    that frees the executable trampoline still in use.  (It may appear to work by
    luck on some targets, but is not safe in general.)
    
    Fixes #835
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit f554e4ce0cb10ace600819e08beefd6593cdedc0
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 04:33:36 2026 -0400

    x86: use pc-relative label difference for clang's assembler (#947)
    
    When HAVE_AS_X86_PCREL is not defined (e.g. builds that don't run libffi's
    autoconf checks, such as the meson port), the unwind tables fall back to the
    'X@rel' relocation variant, which clang's integrated assembler rejects with
    "invalid variant 'rel'".  The pc-relative label difference 'X - .' assembles
    on clang and GNU as alike, so prefer it for clang.  The gcc/other path (which
    may target Sun as on Solaris) is left on @rel.  Verified end-to-end with clang.
    
    Fixes #947
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 2760a1e2c828a9d7ab2169ff31c0c495a0c5f2d4
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 04:33:36 2026 -0400

    use __asm__ instead of the asm keyword for strict C99 (#851)
    
    Clang (e.g. the Android NDK) only provides the 'asm' keyword in GNU modes; under
    a strict -std=c99 it errors with "use of undeclared identifier 'asm'".  Replace
    the bare 'asm' keyword with the always-available __asm__ spelling across the
    affected backends (aarch64, alpha, frv, ia64, moxie, or1k, pa, riscv, sh64,
    sparc).  This is a pure spelling change with identical semantics on GNU
    compilers; verified the __asm__/__asm__ volatile/register-__asm__ forms compile
    under gcc and clang -std=c99 -pedantic.
    
    Fixes #851
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 08487a6214d4419430b0e8e7dcb3ec967caf8f6f
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 04:33:36 2026 -0400

    ffi.h: avoid -Wpedantic warning on the anonymous union under C99 (#795)
    
    The ffi_closure struct uses an anonymous union, which is C11 (a GNU extension
    in C99).  Including <ffi.h> with -std=c99 -pedantic warned (and broke -Werror
    users).  Prefix the union with __extension__ on GNU compilers so the field
    names are kept without the pedantic diagnostic; MSVC accepts anonymous unions
    natively.  Verified with gcc and clang under -std=c99 -pedantic -Werror.
    
    Fixes #795
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit ad4471039fa55c7e001aef031da297df30c952ae
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 03:47:51 2026 -0400

    Prepare libffi 3.6.0 release
    
    Bump version to 3.6.0, libtool version-info to 11:1:3 (revision++; no public
    interface change since 3.5.2), set the release date, and record the dlmalloc
    2.8.6 update and the #873 ThreadSanitizer fix in the history.
    
    NOTE: the release date (June 20, 2026) is a placeholder -- adjust to the actual
    tag date before merging.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit bc0ebb2e04c39fa750c3c6ebd93755f9b4762d77
Merge: d3f96bee fc67b31a
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 03:36:07 2026 -0400

    Merge pull request #977 from libffi/dlmalloc-tsan-clean
    
    dlmalloc: make mparams init and spin-lock peek TSAN-clean (fix #873)

commit fc67b31ab8c0871dd94385f018fc55413bc034b5
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 03:30:59 2026 -0400

    dlmalloc: make mparams init and spin-lock peek TSAN-clean
    
    dlmalloc reads two lock-free locations on every allocation that race with
    writes elsewhere; ThreadSanitizer flags both (libffi issue #873).  The 2.8.6
    update did not fix #873 -- confirmed with a -fsanitize=thread build of the
    libffi.threads/tsan.c 20-thread test, which reported two data races on global
    'mparams', plus (newly, since 2.8.6 uses spin locks instead of 2.8.3's pthread
    mutexes) a race on the spin-lock word.
    
    1. mparams init (#873): ensure_initialization() reads mparams.magic without a
       lock to decide if one-time init already ran, racing the write in
       init_mparams().  Upstream writes magic through a `volatile` cast, which is
       not a synchronizing operation.  Use an acquire load for the read and a
       release store for the write, so a thread that observes magic != 0 is
       guaranteed to see the other init writes (page_size, mflags, ...) too.
    
    2. spin-lock peek: spin_acquire_lock()'s `*(volatile int *)sl != 0` lock-free
       peek races CLEAR_LOCK's atomic release.  Read it with a relaxed atomic load
       instead -- still just a hint (CAS_LOCK provides the acquire on success).
    
    Both use __atomic_* builtins (work in any -std, no _Atomic field churn) with the
    original volatile access as a fallback when they are unavailable (e.g. MSVC),
    and are marked as libffi-local patches so they survive the next dlmalloc sync.
    
    After the fix the tsan.c test reports zero data races across 20 runs; the normal
    testsuite is unaffected (the atomic accesses read/write the same values).
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit d3f96bee061158b0934a08ca2355e4e92fcc4b98
Merge: 5e3eeaff 97006e1a
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 20 03:03:43 2026 -0400

    Merge pull request #975 from libffi/update-dlmalloc-2.8.6
    
    dlmalloc: update bundled allocator 2.8.3 → 2.8.6

commit 97006e1a2d7f846b02e19411672c09a0f4aae56b
Merge: 26a1c8cd 5e3eeaff
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 21:55:11 2026 -0400

    Merge remote-tracking branch 'origin/master' into update-dlmalloc-2.8.6

commit 5e3eeaffbf7cf98e6991fe3995ab0826a461545e
Merge: 17bce72a 2aabe7eb
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 20:08:41 2026 -0400

    Merge pull request #976 from libffi/x86_64-android-binary128-longdouble
    
    x86: support IEEE binary128 long double on x86_64 (Android)

commit 2aabe7eb739a16197ed0fcaacb45eeba76149c19
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 20:04:09 2026 -0400

    x86: support IEEE binary128 long double on x86_64 (e.g. Android)
    
    On most x86_64 targets `long double` is the 80-bit x87 type, which the SysV
    psABI classifies X87/X87UP, passes in memory, and returns in st(0).  But some
    x86_64 targets -- notably Android (bionic), and anything built with
    -mlong-double-128 -- make `long double` the IEEE binary128 quad type, which the
    psABI passes and returns in SSE registers exactly like __float128 (class
    SSE/SSEUP, i.e. one %xmm register; _Complex binary128 is 32 bytes -> memory).
    
    libffi's x86_64 backend hardcoded the x87 path, so every long double call/return
    on x86_64 Android produced garbage (sizeof(long double)==16 for both formats, so
    the configure check could not tell them apart).  This was misdiagnosed as a QEMU
    or bionic-on-host issue; it is a real ABI bug -- the binaries run natively, the
    ABI is simply different.  Confirmed via the compiler: x86_64-linux-android sets
    __LDBL_MANT_DIG__=113 (binary128) and emits __addtf3/%xmm, vs 64/fldt on glibc.
    
    Detect binary128 long double at compile time (__LDBL_MANT_DIG__ == 113) and:
      - classify scalar long double as SSE/SSEUP, returned in %xmm0 (new
        UNIX64_RET_XMM128 store/load path: movups %xmm0);
      - classify _Complex long double as memory (passed/returned via hidden pointer);
      - copy the SSEUP eightbyte (the high half of the same %xmm register) in both
        the call and closure argument marshalling, which previously dropped it.
    
    Everything is gated on the compile-time check, so x87 targets (glibc, *BSD,
    macOS) are byte-for-byte unchanged.
    
    Validated by building with `-mlong-double-128` on x86_64 glibc (which reproduces
    the Android ABI: __addtf3/%xmm) and round-tripping a full-precision binary128
    value through both ffi_call and a closure -- both preserve all 113 mantissa bits.
    x87 long double regression is clean (full testsuite, 0 unexpected failures).
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 26a1c8cddcbdbbd1b5bae43e89c9deb20477d99e
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 19:16:38 2026 -0400

    dlmalloc: drop FORCEINLINE on win32 mmap helpers (fix mingw build)
    
    The Win64 mingw build failed:
    
      dlmalloc.c: error: multiple storage classes in declaration specifiers
      (at static FORCEINLINE win32mmap / win32direct_mmap / win32munmap)
    
    mingw-w64's <windows.h> (included in dlmalloc's WIN32 block) defines
    FORCEINLINE with an 'extern' storage class, so the upstream-2.8.6
    'static FORCEINLINE' on these helpers expands to 'static extern' -- rejected by
    gcc. MSVC's FORCEINLINE is __forceinline (no storage class), so MSVC was fine.
    
    libffi's previous 2.8.3 dlmalloc used plain 'static' for these win32 helpers;
    restore that (keeping the PAGE_EXECUTE_READWRITE change). They are trivial
    wrappers, so dropping the forced-inline hint has no practical effect.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 17bce72aa5e2c396983b3c4ffa83a15e1cf453fa
Merge: a00d2dd3 7e93373f
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 18:36:28 2026 -0400

    Merge pull request #973 from libffi/ci-win64-mingw-real-tests
    
    ci: make Win64 mingw job actually run the testsuite

commit a00d2dd3b650e4d7d1b19008e1879179fa7dd315
Merge: 22c1a3e6 9db3ee0b
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 18:35:56 2026 -0400

    Merge pull request #974 from libffi/ci-fix-dist-hook-readme
    
    build: fix make dist failure from missing README "was released on" line

commit 326fff39c41a3c076c63f18b216fb0a44104edf4
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 18:33:50 2026 -0400

    dlmalloc: always define malloc_getpagesize (fix MSVC Win64 build)
    
    The MSVC Win64 build failed compiling the ported dlmalloc:
    
      dlmalloc.c: error C2065: 'malloc_getpagesize': undeclared identifier
    
    Upstream 2.8.6 defines malloc_getpagesize only under #ifndef WIN32 (WIN32
    builds use GetSystemInfo in init_mparams), and the init_mparams reference is
    likewise guarded by #if !defined(WIN32) -- so on a normal cl.exe build with
    _WIN32 defined the reference is preprocessed out. MSVC nonetheless reports the
    symbol undeclared, so add an unconditional fallback definition (the same
    ((size_t)4096U) the upstream block already uses as its deepest fallback).
    
    This is a no-op on every platform that already defines the symbol; on WIN32 the
    value is never used (GetSystemInfo supplies the page size), so behavior is
    unchanged. It just guarantees the translation unit compiles.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 7e93373fb08960df21ded627b5380a27b7ccc091
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 17:36:22 2026 -0400

    ci: fix Win64 mingw long double tests via __USE_MINGW_ANSI_STDIO
    
    With the testsuite now actually running on x86_64-w64-mingw32, 5 long double
    tests (float2, cls_longdouble, cls_align_longdouble_split{,2}, huge_struct)
    failed "(test for excess errors)" -- not real bugs, just -Wformat warnings:
    
      warning: format '%Lf' expects argument of type 'double',
               but argument has type 'long double'
    
    mingw-w64 gcc/clang default to the MS C runtime printf, where the %L length
    modifier means double rather than long double. Pass __USE_MINGW_ANSI_STDIO=1 so
    the tests use mingw's ISO-C99 printf, where %Lf/%Lg handle long double
    correctly -- clearing the warnings and producing correct output (e.g.
    huge_struct's output-pattern check). No-op on the Cygwin targets.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 2a0b4b52cad6715cb1fe6ab5a8189ebdeaea7c74
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 17:31:36 2026 -0400

    dlmalloc: update bundled allocator from 2.8.3 to 2.8.6
    
    Re-bases libffi's bundled dlmalloc on Doug Lea's pristine 2.8.6 (the last
    upstream release) instead of the heavily-diverged 2.8.3 fork we carried.
    
    Method: rather than line-merging 2.8.3->2.8.6 onto our fork (which silently
    duplicated/stranded code where upstream's restructuring didn't align), this
    re-applies libffi's small, well-understood patch set on top of clean 2.8.6.
    The bundled file is now pristine 2.8.6 + ~100 lines of clearly-isolated libffi
    changes, so future syncs are a tractable rebase rather than archaeology.
    
    libffi-local changes carried forward (adapted to 2.8.6):
      * FFI_MMAP_EXEC_WRIT machinery: per-segment exec_offset, get/set_segment_flags
        and check_segment_merge abstractions, add/sub_segment_exec_offset.  The
        2.8.3 IS_MMAPPED_BIT is spelled USE_MMAP_BIT in 2.8.6.
      * Windows: win32mmap/win32direct_mmap allocate PAGE_EXECUTE_READWRITE so
        closures are executable (closures.c calls these directly, not via dlmmap).
      * OS/2 support: DosAllocMem mmap shims wired into 2.8.6's *_DEFAULT scheme,
        page size/granularity in init_mparams.  OS/2 locking now rides 2.8.6's
        spin locks (the old HMTX mutex block is unnecessary in the new framework).
      * _GNU_SOURCE for Linux mremap, conflicting-malloc.h undefs, dlmalloc_stats
        prototype, an extra corruption check, and assorted warning/typo cleanups.
    
    What 2.8.6 buys us: CAS spin locks instead of pthread mutexes for closure
    alloc/free (lower overhead, no implicit pthread dep), stronger FOOTERS/RTCHECK
    heap-corruption detection, __builtin_ctz bit indexing instead of fragile x86
    asm/ffs, cleaner _MSC_VER/x86_64/WIN32_LEAN_AND_MEAN handling, and ~7 years of
    upstream bug fixes.
    
    Validated natively on x86_64-pc-linux-gnu: closures 588/0, full suite 2468/0.
    Draft pending the full CI matrix to exercise the W^X paths cross-platform.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 9db3ee0b8041127ec1c97e424b6845ed2d41096e
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 16:52:51 2026 -0400

    build: don't let dist-hook fail when README has no "was released on" line
    
    The dist-hook trims README.md by locating the "<version> was released on ..."
    line and tail'ing from just above it. Commit 35f3a8f replaced that line with a
    "This is WIP repo ..." banner in preparation for 3.6.0, so the awk now returns
    empty, $((s-1)) becomes -1, and `tail -n +-1` aborts `make dist`:
    
      tail: invalid number of lines: '+-1'
      make[4]: *** [Makefile:2107: dist-hook] Error 1
    
    This broke the snapshot-dist-tarball workflow on every master push. Fall back to
    copying README.md verbatim when the marker is absent; tagged releases that
    restore a "was released on" line still get the original trimming behavior.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 9aeda2a51a55a2a64ec4b74f23c9ab8fc0519ffe
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 16:46:53 2026 -0400

    ci: give Win64 mingw job a Cygwin-hosted toolchain so tests actually run
    
    The x86_64-w64-mingw32 job has been green since 3.5.0-pre0 but exercised
    nothing: every one of its ~310 testsuite compiles failed with
    
      ffitest.h:8:10: fatal error: ffi.h: No such file or directory
    
    even though configure generates ffi.h at x86_64-w64-mingw32/include/ffi.h and
    the test compile's -I points right at it. The cause is path translation:
    DejaGnu runs under Cygwin and emits Cygwin POSIX include paths
    (-I/cygdrive/d/.../include), but the x86_64-w64-mingw32-gcc being resolved is a
    native (non-Cygwin) mingw driver that can't read /cygdrive/... paths, so no -I
    directory resolves and the first #include <ffi.h> dies. Result: 0 real passes,
    310 compile failures -- all masked GREEN by the rlgl baseline. The 32-bit jobs
    work because they use genuine *-pc-cygwin compilers that translate the paths.
    
    Install Cygwin's own mingw64-x86_64 cross toolchain. Its driver lives in
    /usr/bin (ahead of the native Windows toolchain in the login-shell PATH) and is
    a Cygwin process, so it translates the POSIX paths and the test programs compile
    and run for the first time.
    
    This is expected to surface real Win64 mingw test results that were previously
    invisible; the rlgl-policy baseline will need refreshing for whatever shows up.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 22c1a3e67a06c498c94dd5fca85692f9ae80dd8f
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 16:29:12 2026 -0400

    ci: don't route native x86 Android tests through qemu-x86_64
    
    The Android job's "Set up QEMU (binfmt)" step used docker/setup-qemu-action's
    default, which runs `tonistiigi/binfmt --install all` and registers binfmt
    handlers for every arch -- including x86_64/i686, with the F (fix-binary) flag.
    That forced the statically-linked x86_64 Android test binaries (native to the
    host) through qemu-x86_64 instead of running them directly as intended.
    
    qemu-user's x87 80-bit extended-precision long double emulation is inaccurate,
    so all 26 long double tests on x86_64-linux-android failed (float/float2/float3,
    return_ldl, every *longdouble* complex/closure test, and huge_struct's output
    pattern from its long double fields). i686 was unaffected because its long
    double is 64-bit (== double, HAVE_LONG_DOUBLE=0), and aarch64/armv7 pass under
    their own qemu.
    
    Pin platforms: arm64,arm so only the genuinely foreign binaries run under
    emulation; the x86 Android binaries run natively on real x87 hardware.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit e815b3683e53f4af8b04b635617c72ec3112ff55
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 16:24:56 2026 -0400

    ci: cave workflow note — privileged is opt-in (privileged: true)

commit 69378c94a7d4bfc08f2f158544148e7268ac4999
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 16:17:20 2026 -0400

    ci: cave workflow note — job containers are privileged (nested containers ok)

commit 1af78f88eb3b923d89817a33fd30ed4f8fa030d9
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 16:15:04 2026 -0400

    ci: add cave workflow for the cave.moxielogic.com mirror
    
    Clean native x86_64 build + dejagnu test gated by rlgl, in cave's workflow
    format (.cave/workflows), so the libffi mirror's CI runs on cave's
    self-hosted runners. GitHub Actions ignores .cave/; cave's pull-mirror sync
    picks it up and runs it on the cave runner.

commit d6cea487cd19841eb083e2d2a8e0b7d7e6aee14d
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 15:59:14 2026 -0400

    ci: actually execute Android tests (static + qemu-user)
    
    Android tests were cross-compiled but never run (dejagnu exec'd them on the
    ubuntu host), so all ~722 execution tests failed and were baselined. Link the
    test binaries statically (--target_board=unix/-static) so they need no bionic
    loader/sysroot, register qemu-user binfmt, and let them execute: x86_64/i686
    run on the host (Android is a Linux kernel), aarch64/armv7 under qemu-user.

commit c8cf0cda7d66c4bc8a72078be914eb2cf0a2152a
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 15:36:36 2026 -0400

    testsuite: skip complex_i128 on clang (no _Complex __int128)
    
    clang defines __SIZEOF_INT128__ but rejects _Complex __int128, so the test
    failed to compile (excess-errors FAIL + UNRESOLVED execution) on clang
    targets. Exclude __clang__ from the guard so clang falls through to the
    trivial main(), the same no-op path already used on platforms without
    int128/complex support; gcc still runs the full test.

commit 88243588b3e6f5318ad3fcf49a626b96c1ea6820
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 15:26:48 2026 -0400

    ci: publish rlgl's original-report log so report links resolve
    
    The rlgl HTML report links to a sibling rlgl-report-original.<ext> file that
    rlgl writes next to it, but we only uploaded/published rlgl-report.html, so
    the link 404'd on Pages. Upload both via the rlgl-report* glob and copy the
    original log next to index.html when publishing.

commit 7b159e655916249e827fbd30a17748309d278d2d
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 15:20:39 2026 -0400

    ci: add x86_64 macOS coverage via Rosetta on Apple Silicon
    
    GitHub's free Intel macOS runners are gone, so test the x86_64 ABI by building
    x86_64 on a macos-14 (arm64) runner with clang -arch x86_64 and running the
    dejagnu testsuite under Rosetta 2. New MACOS_X86 env flag routes build.sh to
    build_linux (dejagnu + rlgl) with --host=x86_64-apple-darwin, bypassing the
    legacy xcodebuild build_macosx path.

commit 9d942dae34df4135958044ff67e051a7c43829f1
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 15:16:28 2026 -0400

    ci: drop macOS-13 (Intel) job
    
    GitHub's free Intel macOS runners are effectively unavailable (jobs sit queued
    for hours, then time out). Test only the Apple-Silicon runners (macos-14/15).

commit 35a3dd28132fc93508ddb8a4a6e5f8d6181732b1
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 08:59:27 2026 -0400

    ci: use correct Debian cross triple for mips64el (gnuabi64)
    
    The N64 mips64el cross package is gcc-mips64el-linux-gnuabi64, not
    gcc-mips64el-linux-gnu (which doesn't exist -> 'Unable to locate package').
    Use HOST=mips64el-linux-gnuabi64.

commit 57f7790aec8a08bea0f12077d9f336f1ddf28320
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 08:30:49 2026 -0400

    ci: expand QEMU coverage (riscv64, armv7, sparc64, mips64el) + fixes
    
    - Env-driven QEMU dispatch in build.sh (FOREIGN_IMAGE -> foreign container;
      CROSS_QEMU -> cross+qemu-user) instead of fragile HOST-triple globs, which
      failed to match the cross triples (powerpc-linux-gnu fell through to
      build_linux, so QEMU_LD_PREFIX was never set and every execution test
      failed with "Could not open /lib/ld.so.1").
    - build_cross_qemu now exports QEMU_LD_PREFIX so qemu-user finds the target
      loader; install.sh installs gcc-$HOST + qemu-user-static when CROSS_QEMU set.
    - Add foreign-container targets riscv64 and armv7 (official multi-arch debian
      image, since fedora lacks them); add cross+qemu targets sparc64 and mips64el.

commit 7ed781614757ed3f6b7f15c8bbd25f4d6019f2a0
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 08:17:14 2026 -0400

    ci: add more QEMU non-native targets (s390x, big-endian PowerPC)
    
    Extend QEMU coverage beyond ppc64le:
    
    - s390x: another build-qemu matrix entry using the official multi-arch fedora
      image (Containerfile.s390x, --platform linux/s390x), same foreign-container
      path as ppc64le.
    
    - powerpc64 (big-endian) and 32-bit powerpc: no ready foreign-arch container
      image exists, so add a build-cross-qemu job that cross-compiles with the
      Debian gcc-<HOST> toolchain and runs the testsuite under qemu-user (binfmt).
      build.sh gains build_cross_qemu + dispatch; install.sh installs the cross
      toolchain + qemu-user-static for these hosts.
    
    Both new jobs feed the rlgl report publisher.

commit 5b1c6fa30b55f153d135ddd77b3ac784908a61eb
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 07:41:58 2026 -0400

    ci: fix dejagnu site.exp for modern Tcl; tolerate make dist failure
    
    The QEMU ppc64le container ships a newer Tcl that removed the `case` command,
    so `make check` died with 'invalid command name "case"' and produced no
    libffi.log. Convert site.exp's case to `switch -glob` (equivalent, works on
    all Tcl versions). Also make `make dist` non-fatal in the emulated container
    build, since it builds the PDF manual (needs TeX) which isn't installed and is
    irrelevant to testing.

commit 223a1a01256f6c1a9d8d8e5c9a964d51d9d32315
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 07:25:59 2026 -0400

    ci: build the ppc64le QEMU image with --platform linux/ppc64le
    
    docker build defaulted to linux/amd64 and ppc64le/fedora has no amd64
    variant (no match for platform in manifest). Use the official multi-arch
    fedora base and pass --platform linux/ppc64le so buildx builds the ppc64le
    image under QEMU.

commit 4bf3406cfd146e90a7a3ab3482057736c63919b4
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 07:24:22 2026 -0400

    ci: reintroduce QEMU-emulated non-native testing (PowerPC)
    
    GitHub offers no ppc64le/s390x/etc. runners, so test these under QEMU: a new
    build-qemu job registers binfmt (docker/setup-qemu-action), builds the
    foreign-arch container from .ci/Containerfile.<arch>, and runs the libffi
    testsuite natively inside it via build_foreign_linux; rlgl evaluates the
    results on the host. Starts with powerpc64le; add a matrix entry + a
    Containerfile to cover more architectures. build.sh gains a ppc64le dispatch
    case driven by FOREIGN_IMAGE.

commit 260e301c811fe0bc912aa1c219c0c79bdfc1a26b
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 19 07:16:45 2026 -0400

    ci: fetch rlgl via direct release URL, not the GitHub API
    
    download_rlgl resolved the asset through api.github.com/releases/latest, which
    is unauthenticated and rate-limited (60 req/hr/IP). Under the CI matrix (~20+
    jobs each calling it) it returns 403, leaving rlgl undownloaded and the build
    step dying with "./rlgl: No such file" (exit 127). Release-asset downloads are
    not rate-limited, so fetch a direct, version-pinned URL
    (v2.0.4, which includes the Intel macOS binary) instead. Also drop the
    best-effort "|| true" on macOS, which now evaluates results with rlgl.

commit 56bff074d23555fc856c7fc24e3aa9c635491210
Author: Anthony Green <green@moxielogic.com>
Date:   Thu Jun 18 19:37:13 2026 -0400

    ci: make rlgl.exe executable after unzip on Windows
    
    unzip does not set the execute bit, so cygwin refused to run ./rlgl.exe
    ("Permission denied", exit 126). chmod +x it after extraction, matching the
    tar-based Linux/macOS path.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 5532b1ee260ddc3ad8249c448bbf6fb609abed80
Author: Anthony Green <green@moxielogic.com>
Date:   Thu Jun 18 13:17:18 2026 -0400

    ci: deploy rlgl reports via GitHub Actions Pages flow
    
    The gh-pages branch is configured as a "workflow" Pages source, and pushes
    made with GITHUB_TOKEN don't trigger the legacy branch build, so committing
    reports to gh-pages alone never updated the live site. Have the publish job
    deploy the assembled site with actions/upload-pages-artifact + deploy-pages
    (keeping the gh-pages branch as the persistent store). Adds pages/id-token
    permissions and the github-pages environment, and guards deploy on whether
    any reports were published.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 062af85f0d8cf908dae022e12a7d4d5812733e07
Author: Anthony Green <green@moxielogic.com>
Date:   Thu Jun 18 12:55:34 2026 -0400

    ci: publish rlgl HTML reports to GitHub Pages
    
    Each report-producing job now uploads its rlgl-report.html as a uniquely
    named artifact and emits a ::notice:: with the report's Pages URL. A
    publish-reports job in each workflow (serialized across workflows via the
    gh-pages-publish concurrency group) collects the artifacts and commits them
    to the gh-pages branch under reports/<run-id>/<job>/, regenerating an index
    page that lists every run. Reports are thus viewable at
    https://libffi.github.io/libffi/ and persist across runs.
    
    Publishing is gated to pushes on the canonical repo (fork PRs lack the write
    token); the reports are still produced and uploaded as artifacts otherwise.
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 07e977a4aadd8d327b035cbfa601d4effe088306
Author: Anthony Green <green@moxielogic.com>
Date:   Thu Jun 18 08:55:03 2026 -0400

    ci: use the new client-side rlgl from GitHub releases
    
    rlgl is now a self-contained client-side tool rather than a thin client for
    the hosted rl.gl server, so update CI accordingly:
    
    - Download the rlgl binary from its GitHub releases, selecting the asset by
      the runner's own OS/arch (rlgl runs on the runner, not the build target).
    - Drop the "rlgl l --key=... https://rl.gl" login step and the policy-bound
      API key; evaluation now happens entirely locally against the git-hosted
      policy.
    - Windows jobs fetch the release zip and run ./rlgl.exe (was ./rlgl/rlgl.exe).
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

commit 9cb5c4c139be925672cd2e035708b896b416c67a
Merge: e5cefe94 4355ecc4
Author: Anthony Green <green@moxielogic.com>
Date:   Thu Jun 18 06:52:19 2026 -0400

    Merge pull request #972 from kxxt/rv64d
    
    riscv64: fix float marshal for ABI_FLEN >= 64

commit 4355ecc40541f07b2cd8bafa3102f0947b7da1fd
Author: Levi Zim <rsworktech@outlook.com>
Date:   Tue Jun 16 22:04:31 2026 +0800

    riscv64: fix float marshal for ABI_FLEN >= 64
    
    The original code tries to use asm to reinterpret floats as doubles.
    However, that does not work since the float is converted to double
    instead.
    
    This patch fixes it by manually copying the 32-bit float bits and
    perform the required NaN-boxing when storing a narrower float type
    inside a wider float type.

commit e5cefe9499b801647b3b198923b5d86795e3826e
Author: Anthony Green <green@moxielogic.com>
Date:   Sat Jun 6 05:46:08 2026 -0400

    Add security policy

commit 30c03c63aca3d1f5d7e9ce43de5b2edce930b0e5
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 5 08:03:23 2026 -0400

    README: note recent 3.6.0 changes
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit f0ca1577b0145854c486931b11aded99f610d5b3
Author: Anthony Green <green@moxielogic.com>
Date:   Fri Jun 5 07:54:04 2026 -0400

    powerpc64: support FFI_TYPE_COMPLEX under ELFv2 (#970)
    
    The reporter of #962 hits a 1-slot drift on ppc64le when passing a
    float _Complex argument followed by an int: the C callee, compiled by
    GCC, treats the complex per ELFv2 §2.2.3.2.2 (split_complex_arg) and
    reserves *two* GPR shadow slots for the two halves, while libffi has
    no complex support for PowerPC and sees only the user's surrogate
    struct{float;float;} (one GPR shadow slot under HFA packing). All
    subsequent integer args land in the wrong register.
    
    Add FFI_TYPE_COMPLEX support for the LINUX64 / ELFv2 path:
    
      * ffitarget.h: define FFI_TARGET_HAS_COMPLEX_TYPE for POWERPC64 +
        _CALL_ELF == 2 so prep_cif accepts the type and types.c emits the
        ffi_type_complex_{float,double,longdouble} symbols.
    
      * ffi_linux64.c, arg path (prep_cif + prep_args64 + closure helper):
        each half of a _Complex argument is now passed independently —
        float/double halves go to consecutive FPRs and consume one GPR
        shadow slot each; integer halves go to consecutive GPR slots
        sign-/zero-extended to doublewords. This matches what GCC's
        split_complex_arg emits and fixes the 1-slot drift.
    
      * ffi_linux64.c, return path: float/double complex are returned via
        the existing 2-element FP-HFA assembly path (PPC64_LD_FLOAT_HOMOG /
        PPC64_LD_DOUBLE_HOMOG, FLAG_RETURNS_SMST + FLAG_RETURNS_FP).
        Integer complex are returned via .Lsmall_struct with FLAG_RETURNS_SMST
        only; ffi.c repacks the bounce buffer (real in slot 0, imag in
        slot 8) into the caller's natural packed layout, and the closure
        helper repacks the user-written packed value into two doublewords
        before returning PPC_LD_R3R4.
    
      * discover_homogeneous_aggregate now recurses through FFI_TYPE_COMPLEX
        of float/double, so a struct containing _Complex float/double members
        is recognised as an FP HFA (matches GCC's HFA rule for aggregates
        containing complex). The top-level complex arg/return paths bypass
        this and use their own switch cases.
    
    64-bit long double complex is treated as _Complex double. IBM-128 and
    IEEE-binary128 _Complex are explicitly rejected (FFI_BAD_TYPEDEF) and
    left as follow-up work; the same applies to PPC32 SysV and ELFv1.
    
    Fixes #962.
    
    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit 547449d705258ea490aa8787dc084c5b38dbe722
Author: Muhammad Kamran <muhammad.kamran@arm.com>
Date:   Fri Jun 5 12:48:36 2026 +0100

    aarch64: add feature build attributes (#969)
    
    Emit AArch64 feature build attributes when assembler support is
    available, and keep using GNU property notes as the fallback.
    
    Rename the feature bit macros so they can be shared by both metadata
    formats.
    
            * src/aarch64/internal.h (AARCH64_POINTER_AUTH): Rename from
            GNU_PROPERTY_AARCH64_POINTER_AUTH.
            * src/aarch64/sysv.S (AARCH64_BTI): Rename from
            GNU_PROPERTY_AARCH64_BTI.
            (AARCH64_GCS): Rename from GNU_PROPERTY_AARCH64_GCS.
            (GNU_PROPERTY): New macro.
            (FEATURE_1_AND_MARK): New macro.
            Emit feature build attributes when __ARM_BUILDATTR64_FV is defined.

commit 11ef568167fb84bfcfefbb79bfb3ae4429fcb0e2
Author: Stewart X Addison <6487691+sxa@users.noreply.github.com>
Date:   Fri Jun 5 11:11:23 2026 +0100

    Fix erroneous semi colon in FFI_EXTRA_CIF_FIELDS on some platforms (#968)
    
    Signed-off-by: Stewart X Addison <sxa@ibm.com>

commit 4639c933db028a883cc8e7e36f8b97fb6df804e7
Author: hudsonzuo <zuohsh@163.com>
Date:   Fri Jun 5 18:10:53 2026 +0800

    add NULL check in sys_trim (#966)
    
    * add NULL check in sys_trim
    
    * space
    
    * Update dlmalloc.c

commit c937c979f70596ba502b142c420286dd3da00e40
Author: pietro <pietro@users.noreply.github.com>
Date:   Fri Jun 5 06:09:17 2026 -0400

    Use absolute paths for tests include paths (#965)
    
    Co-authored-by: Pietro Monteiro <pietro@sociotechnical.xyz>

commit c5abbdad2f930f806791942776ccd45beeff1613
Author: Jakub Jelínek <jakub@redhat.com>
Date:   Fri May 22 17:58:01 2026 +0200

    Fix two comment typos. (#961)

commit 9760868682cc9a33008761f158d86481d56738aa
Author: Anthony Green <green@moxielogic.com>
Date:   Tue Apr 21 05:33:31 2026 -0400

    Remove nios ii credit (port was removed in 3.4.7)
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

commit 0b1690778700480bbf94ce7288defb3d831cacf1
Author: Anthony Green <green@moxielogic.com>
Date:   Tue Apr 21 05:32:39 2026 -0400

    Add Meng Qinggang to LoongArch credits for LoongArch32 port
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

commit 35f3a8f4a0f38f788f47bbc34e72abb9f2894a0a
Author: Anthony Green <green@moxielogic.com>
Date:   Tue Apr 21 05:29:59 2026 -0400

    Update README.md in preparation for libffi 3.6.0
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

commit 70e6c2615d7deb0529568b013b1c274fc5e24375
Author: cloud <mengqinggang@loongson.cn>
Date:   Tue Apr 21 17:24:49 2026 +0800

    Add support for LoongArch32 (#957)
    
    * Add support for LoongArch32
    
    Change src/loongarch64 to src/loongarch.
    
    Change __loongarch64 to __loongarch_grlen.
    __loongarch64 is uesd for compatibility with legacy code,
    new programs should not assume existence of this macro.[1]
    And there is no __loongarch32 macro for LoongArch32.
    
    Add ilp32s, ilp32f, ilp32d abi.
    Add ADDI macro.
    
    [1] https://github.com/loongson/la-toolchain-conventions?tab=readme-ov-file#cc-preprocessor-built-in-macro-definitions
    
    * LoongArch: Fix libffi.closures/cls_longdouble.c
    
    CHECK(a8 == 8) fail on LoongArch32.
    
    long double passed by reference on LoongArch32 ilp32 ABI.
    If a argument register is available, the address is passed
    in the argument register; otherwise, it is passed on the stack[1].
    
    The address of return value and a1-a7 arguments of cls_ldouble_fn
    passed by a0-a7 argument registers. The address of a8 argument need
    to be passed by stack.
    
    But ffi_call_int only allocates cif->bytes(sizeof(long double)*8)
    bytes space for stack. Both the address of a8 argument and a8 argument
    are saved at sp+0 address. The lowest 4-byte of a8 argument are overwritten
    by the address of a8 argument.
    
    Allocates extra conservative estimate space like RISC-V.
    
    [1] https://github.com/loongson/la-abi-specs/blob/release/lapcs.adoc#passing-arguments

commit 10056e7e6a0d40d2a21af63484b99f08898dde9e
Author: Ryan VanderMeulen <rvandermeulen@mozilla.com>
Date:   Fri Apr 10 17:59:07 2026 -0400

    x86: add FFI_ASAN_NO_SANITIZE to ffi_call_int in ffiw64.c (#959)
    
    ffi_call_int in ffiw64.c uses the same alloca-as-stack trick as
    ffi_call_int in ffi64.c and ffi.c: it alloca's a block and passes it
    to ffi_call_win64() which uses it as its own stack frame. This
    confuses ASan's shadow memory tracking.
    
    ffi64.c and ffi.c already have FFI_ASAN_NO_SANITIZE on their
    ffi_call_int (added alongside the macro definition in ffi_common.h),
    but ffiw64.c was missed. Add it for consistency.

commit 70cb813f2bb2a8eb30c619666d289f59aacbecba
Author: Anthony Green <green@moxielogic.com>
Date:   Thu Mar 26 12:27:46 2026 -0400

    README: merge autogen.sh note into configure paragraph (#955)
    
    Folds the git-build note directly into the configure paragraph so
    readers who skip ahead see the autogen.sh requirement without having
    to notice a separate paragraph. Suggested by @gitonthescene in #954.
    
    Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

commit 11141fba1dcc69bdb98a8d63e7eb2c8109507c34
Author: bobo215 <bobo@atgreen.org>
Date:   Sun Mar 8 08:58:40 2026 -0400

    sh: Fix linker errors by applying __USER_LABEL_PREFIX__ in CNAME (#953)
    
    The CNAME macro in src/sh/sysv.S was defined as a no-op (#define CNAME(x) x),
    which ignores __USER_LABEL_PREFIX__. On targets where this prefix is set (e.g.
    an underscore on some platforms), the assembler emits symbols without the
    expected prefix, causing linker errors when C code references them.
    
    Fix this by using the C1/C2 two-level macro concatenation pattern (as used in
    x86/sysv.S) to properly prepend __USER_LABEL_PREFIX__ to symbol names.
    
    This replaces the approach proposed in #809.
    
    Co-authored-by: bobo215 <266481280+bobo215@users.noreply.github.com>
    Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

commit 21519b700380f9f4206c57ae44ca3c7e3e4ac18c
Author: pietro <pietro@users.noreply.github.com>
Date:   Sun Mar 8 07:57:22 2026 -0400

    Define `WIN32_LEAN_AND_MEAN` before including windows.h (#937)
    
[--snip--]
