Exetools - Windows on Arm64, x86/x64 emulation

Exetools (https://forum.exetools.com/index.php)

- General Discussion (https://forum.exetools.com/forumdisplay.php?f=2)

- - Windows on Arm64, x86/x64 emulation (https://forum.exetools.com/showthread.php?t=20066)

DavidXanatos

01-25-2022 04:13

Windows on Arm64, x86/x64 emulation

I'm wondering how the x86/x64 emulation on arm64 works, is the image loaded to memory the x86/x64 original and there is just a transpiled arm64 shadow copy, or is the code transpiled before and the loaded image is arm64 only.

I don't have any arm hardware yet, but am already thinking about how function hooks in x86/x64 emulated on arm64 would, or maybe wouldn't, work.

Anyone here who already has some experience and could shine some light on it?

DavidXanatos

01-25-2022 17:52

I have found some information: https://blogs.blackberry.com/en/2019/09/teardown-windows-10-on-arm-x86-emulation

So it seams that what lives in memory is the x86/x64 image and its only transpiled piece by piece on demand. I hope a call to FlushInstructionCache will invalidate the cached transliled result ...

EDIT: now I just need some good device to experiment on, any suggestions?

sh3dow

01-26-2022 03:48

Was about to post that link but apparently I'm late and you found it, this one also has a good information https://wbenny.github.io/2018/11/04/wow64-internals.html

Also this fantastic CODE BLUE talk which has juicy details:
- https://www.slideshare.net/ffri/appearances-are-deceiving-novel-offensive-techniques-in-windows-1011-on-arm-250472833
- https://www.youtube.com/watch?v=amHAot3X8cE

kino0924

01-26-2022 05:50

You can try to run ARM64 windows on QEMU.
Im not sure about the part of capability of running x86/x64 app on ARM64 Windows tho

DavidXanatos

01-26-2022 17:42

@sh3dow thanks

@kino0924 emulating arm64 on x64 which in turns emulates x86/x64 sounds painfully slow.
That I'm probably better of trying to run it naively on an old raspberry pi :p

DavidXanatos

02-12-2022 03:17

Also interesting: https://oofhours.com/2021/02/19/running-x64-on-windows-10-arm64-how-the-heck-does-that-work/

DavidXanatos

02-15-2022 01:11

So some things I learned about x64 on arm64, it seams MSFT went all in to provide a good interoperability between x64 and arm 64 code.

They have introduced a new type of PE so called CHPEv2 (Compiled Hybrid Portable Executable) which can contain booth x64 and arm64 code, as far as I understand in practice this is mostly arm64 code with x64 entry-points.
These are called ARM64EC in the VS 2019/2022 tool chain.

An executable compiled as x64 can load ARM64EC dll's and call them normally,
the x64 wrappers have some (as far as I can tell) dummy prollogs, large enough to install x64 hooks.
The intended use case for this is to load system libraries from system32 which on arm64 are all provided as such CHPE's so no C:\Windows\SysX8664 or alike.

So to say 64 bit is 64 bit no mater if its ARM or AMD, also no separate registry paths. Its all thoroughly mixed together, unlike SysWOW64 or SysArm32 that booth get an own system directory and an own registry redirection.

A executable compiled as ARM64EC can load x64 Dll's just fine, haven't looked into the hook ability in this scenario yet (TODO)
The intended use case is to allow developers to port a part of their application to Arm64 and keep the rest x64 for the time being, as well as to provide compatibility with x64 plugins and extensions (according to MSFT's docs).

So technically the x64 on arm thing is an own feature and should not be confused with being an other version or an extension of WOW its a separate interoperability feature, one which works quite different, unfortunately I don't know if MSFT named it somehow. Based on some dll and service names its probably called XTA that would probably stand for "x86_x64 to ARM", or something like that.

XTA is a just in time compiler that converts x64 or x86 code when needed to arm64 which is the being executed, the loaded binary image in memory stays untouched x64/x86, hooking any portion of it seams to work just fine, of cause FlushInstructionCache is probably particularly important to ensure the XTA cache gets updated.

When running a x86 application on a arm64 machine in fact booth WOW and XTA seam to work together, while the SysWOW64 directory contains only x86 dll's there is an other one called SyChpe32 that contains something like ARM64EC just in 32 bit so ARMEC (?) unfortunately MSFT did not provide a Toolchain to create such binaries for ourselves. There we have the most commonly used system dll's in such a hybrid format.

So WOW takes care of the syscall translation from 32 bit to 64 bit close to the kernel, be that on arm64 or on x64, and filesystem + registry redirection. While XTA takes care of the transition from emulated native code as close to the loaded user code.

In between lives native arm/arm64 code.

When running x64 on arm 64 only XTA is active and no WOW is in place, a call to IsWow64Process2 confirms that an x64 application running on arm64 does so without WOW.

Interestingly when querying NtQuerySystemInformationEx(SystemSupportedProcessorArchitectures, ...
ARM64EC binaries give the same result x64 binaries on arm64

Also when debugging ARM64EC binaries you need to use the x64 debugger, there is no dedicated ARM64EC one.

DavidXanatos

02-15-2022 05:30

Interesting, I was under the impression it was booth the same, ARM64 code with provisions to
a) call x64 code
and
b) be callable from x64 code
so the ARM64X version contains full code for booth platforms, hence would be runable on a x64 windows as well?

DavidXanatos

02-15-2022 05:55

So how is ARM64X different form a dll compiled as ARM64EC?

deepzero

02-15-2022 06:22

So they have a new type of relocation that "relocates" imports, etc. rva depending on arm vs x64? That's actually pretty cool, I am now only half-mad they introduced yet another relocation type...

DavidXanatos

02-17-2022 00:19

I have been experimenting with my HackLib with code injection into/from Arm64 processes and noticed something unexpected but in hindsight logic ARM64EC processes are more x64 than arm even if most of their compiled code is arm.
That I mean, when I hijack the main thread before it resumes, and point it to a peace of shell code of mine the shell code must be x64 in order to work.
The funny thing is my shell code loads a dll that hooks then in x64 various functions, for demo purposes for example MessageBoxW and alters its title and all that works as expected.
The ARM64EC app having its main and rest of user written functions being ARM64 code when calling MessageBoxW invoked the hooked x64 version just fine.
To be honest I would have expected a shorthand here where the arm code calls system API's directly, but no the control flow goes from Arm64 to the x64 stub which can be hooked and then back to Arm64 code in the system\user32.dll that is pretty nice. And in a way logical that a process that is supposed to be able to load native x64 dll's would for ideal compatibility have provisions to allow that dll to consistently hook functions even if that means a few more detours are taken.

Now about cross architecture code injection (assuming shell code and injected dll have the right architecture):
Code injection from an arm64 process into an x64 or arm64ec process works just fine using x64 code

What how ever currently fails is code injection from a x64 or arm64ec process into a arm64 process and it seams for a quite mundane reason, NtSetContextThread when called from a x64 process to act upon a arm64 process, returns STATUS_SET_CONTEXT_DENIED
I expect would the operation have been performed, the rest would succeed.

Since arm64 to arm64 injection using the same method works just fine I expert this being som sort of intentional limitation, perhaps not even a real security measure but rather a safeguard against screwing things up.

So the question, did anyone here already experienced this and would know if there is a easy way around, short of native arm64 spawning a helper process just to do that one function call.

sh3dow

02-23-2022 03:56

Found this on my feed and wanted to share :)

├─Jack-in-the-Cache: A New Code injection Technique through Modifying X86-to-ARM Translation Cache (this one is from the same author I mentioned before in #3).
│ it's was presented for BlackHat con also the video this time is in English while the code blue one was in Japanese.
│
│─────https://i.blackhat.com/eu-20/Wednesday/eu-20-Nakagawa-Jack-In-The-Cache-A-New-Code-Injection-Technique-Through-Modifying-X86-To-Arm-Translation-Cache.pdf
│
└─────ttps://www.youtube.com/watch?v=8wg7X5IaEto

Quote:

Recently, the adoption of ARM processors for laptop computers is becoming popular due to its high energy efficiency. Windows 10 on ARM is a new OS for such ARM-based computers. Several laptop computers with this OS have already been shipped; notably, the recent launch of Microsoft Surface Pro X will be a driving force to facilitate the widespread use of Windows 10 on ARM.

You might think that there are new threats to such a new OS. Yes! We found such a threat.

In this talk, we present a new code injection technique to abuse a novel feature of Windows 10 on ARM: X86 emulation.
Remarkably, Windows 10 on ARM can run X86 apps via the X86 emulation feature that translates binary from X86-to-ARM just in time. To reduce the performance overhead of JIT binary translation, the OS has the mechanism to cache already-translated results as X86-to-ARM (XTA) cache files.

Our new code injection technique is performed by modifying this XTA cache file. Since this technique is difficult to detect and trace, appropriate countermeasures are necessary. Moreover, this technique can be used as an API hooking invisible to an X86 process. Therefore, this technique has already been a threat to Windows 10 on ARM.

We believe that future OSs have a JIT translation mechanism at the processor transition. For example, Apple has recently announced Rosetta 2, which is a similar mechanism for introducing their own ARM-based chip. For these OSs, the caching of already-translated results as files is a reasonable way to decrease performance overhead.

Our new code injection technique might also apply to such OSs.This presentation becomes a beneficial advisory for the developers of such future OSs, not limited to Windows 10 on ARM. PoC code of our new code injection technique and analysis results of the X86 emulation will be public on GitHub after this talk.

Excellent blog from the same author as well
├─Discovering a new relocation entry of ARM64X in recent Windows 10 on Arm
│
│─────https://ffri.github.io/ProjectChameleon/new_reloc_chpev2/
│
└─────https://github.com/FFRI/ProjectChameleon/

DavidXanatos

02-24-2022 05:04

hmpf... sooo I have a nice function hijacking code that from arm64 to arm64 works perfectly.

Now we also know that all processes in arm64 start execution as arm64 (or at least I think that) so at the very start of every program we should enter ntdll's LdrInitializeThunk and for arm64 on arm64 its of cause so.
For other architectures on arm64 we should at some point divert into emulated code.

When trying to inject a detour in LdrInitializeThunk of a created suspended x64 process on arm64 however that code does not seam to ever be executed. Meaning I can inject garbage and it will still startup just fine.

Now my assumption of how x64 on arm64 works is that as soon as execution goes into a system dll i.e. anything compiled as ARM64X we exit emulated x64 mode and execute the native arm code in the dll. So it stands to argue that wen bootstrapping a process, it behaves analogously everything is executed in native arm until it comes the time to call the x64 processes entry point.

Well it seams something isn't quite right here, one possibility is that the ARM64X dll's truly have all the code doubled including large portions of the arm code, so when I manipulate the LdrInitializeThunk I get i do it to a copy that will never be used.

Now I find that strange I would have assumed that the code wouldn't be doubled that MSFT would have some smart redirection in place allowing the ARM64X dll's to re use most of the arm code for the native and the emulated mode.

@RamMerLabs since you apparently have already a lot of experience with the layout of the new PE files, would you may be have a few tips is that really so that the code is fully separated?

DavidXanatos

02-24-2022 16:10

After some more research I can confirm that this is what happens, when i get the address of #LdrInitializeThunk from the symbol file for ntdll and use these my injected code works.
Sooo... the next question is how to get the "export" addresses without the need of a pdb file.
It was already earlier written that this ARM64X files have a 2nd export directory, so I guess parsing that "by hand" would be the strait forward approach.
Unless there is a flag that can be passed to LdrLoadDll that would do this for me ?

DavidXanatos

02-25-2022 05:21

Splendid! Where can i find some documentation/reverse engineering notes on the new PE features? As for my use case it's not enough to see them in a tool i need to programmatically get to the relevant addresses for code injection.
I already have found this: https://ffri.github.io/ProjectChameleon/new_reloc_chpev2/ but its from 2019 i imagine by now there should be more refined information available?

DavidXanatos

02-25-2022 21:27

i was somehow under the impression the article was older, but you seam to be right, it was may be a bot to late yesterday, LOL
downloading the newest WDk right now....

sh3dow

02-26-2022 06:32

Quote:

Originally Posted by DavidXanatos (Post 124883)

I already have found this: https://ffri.github.io/ProjectChameleon/new_reloc_chpev2/ but its from 2019 i imagine by now there should be more refined information available?

Actually it's not that old and very recent. it was published 2021/07/13. so I don't think a lot of things has changed if any at all in this short time window knowing Microsoft.

Also the author claimed while the title says "Windows 10 on Arm" but the results are also valid in Windows 11 on Arm. probably that an evidence at the slowness of MS side of adding new features.

No new public refined information so far and he is the only public researcher known in doing research in this subject. Actually I recommend you to contact him. I'm sure he will be delighted that another reverse engineer is interested in his research and that will spark an interesting chat in this subject, I'm sure you will get a lot of info and questions answered.

DavidXanatos

02-26-2022 06:53

1 Attachment(s)

Indeed I should contact him, good idea :D

But now lets talk about what we can get from the relevant header files: ...\10.0.22000.0\um\winnt.h and ...\10.0.22000.0\km\ntimage.h

With the attached FindDllExport we can extract exports from a loaded image with the below example we run a arm64 process and target a x64 process. For ourselves we get the normal ntdll.dll!LdrLoadDll while for the x64 process we get the ntdll.dll!EXP+#LdrLoadDll

Code:

        //ntdllBase 0x00007ffa5a7d0000

        //0x00007ffa5a811050 {ntdll.dll!LdrLoadDll(void)}

        //0x00007ffa5a7d1890 {ntdll.dll!EXP+#LdrLoadDll}

        //0x00007ffa5a969920 {ntdll.dll!#LdrLoadDll}



        HMODULE hNtdll = GetModuleHandle(L"ntdll.dll");

        //DWORD64 LLW1 = GetProcAddress(hNtdll, "LdrLoadDll");

        DWORD64 LLW1 = FindDllExport(GetCurrentProcess(), (DWORD64)hNtdll, "LdrLoadDll");



        DWORD64 ntdllBase = FindDllBase(hProcess, L"\\system32\\ntdll.dll");

        DWORD64 LLW2 = FindDllExport(hProcess, ntdllBase, "LdrLoadDll");

The export directory on disk (or in the arm64 case) is at RVA 0x31E1A0
for the x64 process the value at 0x178 is overwritten with 0x308810 what is the alternative export directory.
This operation is indicated by PE Anatomist in "Loader Config -> Dyn. Value Relocs", 2nd entry. So if we don't have a loaded process we could extract the value from there and read the second export directory from disk directly.

In "Debug->POGO" we see that the export directory starts at 0x308810 and is 0x2b224 in size
First the alternative directory 0x308810 to 0x31E1A0
and given the entry size it seams the primary starting at 0x31E1A0 goes to 0x333A34 booth are similar in size, so there does not seam to be a 3rd one for the ntdll.dll!#.... exports

I assume that's because there is no typical use case where a process would want those directly, a arm64 will load the primary table a x64 process the alternative table, and 32 bit once have their own ntdll's in the wow folders.

So where do we go from here, we notice that "Loader Config -> hybrid PE -> WoW Thunks Metadata" seams to hold all the RVA's we get from the export directory, and the destinations are the addresses of the #... functions.

So we can get the !EXP+#... function addresses form the export dir and look up the #... addresses in this RedirectionMetadata table, the FindDllExport now checks if the first char is a '#' and if so triggers the additional lookup.

Code:

DWORD64 LLW3 = FindDllExport(hProcess, ntdllBase, "#LdrLoadDll");

Gives us now the right #... function address which we can use as target for arm64 code injection as well as for function calls from the injected shell code.

One thing I haven't figured yet out is how we get the alternative export directory if we don't have a process with emulation at at our disposal.
I'm not sure how PE Anatomist gets the "Dyn. Value Relocs" from, for me in a live process DynamicValueRelocTable is NULL and base + DynamicValueRelocTableOffset or loaderConfig + DynamicValueRelocTableOffset do not seam to result in valid data.
While for the use case at hand its not required I would like to know how to get to this list as well, any tips would be greatly appreciated.

DavidXanatos

02-28-2022 16:57

So a small progress report on my arm64/x64 code injection experiments:
Injecting a arm64ec library into a x64 or arm64ec process on arm64 works just fine, including calling an exported function from the injected shell code. One just need to take care of calling the # function address and not the !EXP+# as given by LdrGetProcedureAddress.

Injecting a x64 library into a x64 or arm64ec process on arm64 also works just fine.
What does not yet work is calling a exported x64 function. I assume some additional code for arm64 to x64 transition needs to be added to the call.

Also the next BIG problem there is no arm32ec tool chain available.
I presume arm shell code to just load a x86 dll will work, but if one wants to call some exported function (instead of just relaying on the DllMain entry point) the arm to x86 transition will need to be researched.

What's also probably a bit of an issue, PEAnatomist does not show any "WoW Thunks Metadata" for the hybrid 32 bit ntdll in SyChpe32, while a stack trace to a statically loaded dll's DllMain shows the presence of # functions.
So there will be some investigation needed if these data can still be obtained form the image. Alternatively parsing the !EXP+# thunk should allow one to find the right # address.

DavidXanatos

04-09-2022 15:16

I have figured out how to get the Dyn. Relocs Table with which we can get the alternate export directory from an image on disk:

Code:

                        IMAGE_LOAD_CONFIG_DIRECTORY64 LoadConfig;



                        IMAGE_DATA_DIRECTORY* dir10 = &opt_hdr_64->DataDirectory[IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG];

                        if (resolve_ec && dir10->VirtualAddress && dir10->Size >= FIELD_OFFSET(IMAGE_LOAD_CONFIG_DIRECTORY64, CHPEMetadataPointer) + sizeof(ULONGLONG)) {



                                status = ReadDll(hProcess, FindImagePosition(dir10->VirtualAddress, nt_hdrs_64, DllBase), &LoadConfig, min(sizeof(LoadConfig), dir10->Size), NULL);

                        }



                        typedef struct _DYN_RELOC_TABLE {

                                ULONG Unknown1;

                                ULONG Unknown2;

                                ULONG Unknown3;

                                ULONG Unknown4;

                                ULONG TableSize;

                                UCHAR Entries[];

                        } DYN_RELOC_TABLE;

                        

                        DYN_RELOC_TABLE* DynamicValueRelocTable = NULL;



                        if (DllBase == 0 && (resolve_ec || resolve_exp)) { // only for images on disk, on linve images we take the actuallly used export directory



                                PIMAGE_SECTION_HEADER section = IMAGE_FIRST_SECTION(nt_hdrs);

                                nt_hdrs->FileHeader.NumberOfSections;



                                section += (LoadConfig.DynamicValueRelocTableSection - 1);



                                ULONG pos = FindImagePosition(section->VirtualAddress, nt_hdrs_64, DllBase);

                                status = ReadDll(hProcess, pos, Buffer2, min(sizeof(Buffer2), section->Misc.VirtualSize), NULL);



                                DynamicValueRelocTable = (DYN_RELOC_TABLE*)(Buffer2 + LoadConfig.DynamicValueRelocTableOffset);



                                //dir0->VirtualAddress = 0x308810;

                        }



                        for (UCHAR* TablePtr = DynamicValueRelocTable->Entries; TablePtr < DynamicValueRelocTable->Entries + DynamicValueRelocTable->TableSize; ) {



                                struct {

                                        ULONG Offset;

                                        ULONG Size;

                                } *Section = TablePtr;

                                TablePtr += 8;

                                Section->Size -= 8;



                                for (UCHAR* EntryPtr = TablePtr; TablePtr < EntryPtr + Section->Size; ) {

                                        struct {

                                                USHORT  

                                                        RVA : 12,

                                                        Unknown: 1,

                                                        Size : 3;

                                        } *Entry = TablePtr;

                                        TablePtr += 2;



                                        ULONGLONG Value = 0;

                                        memcpy(&Value, TablePtr, Entry->Size);

                                        TablePtr += Entry->Size;



                                        DbgPrintf("%08x -> %08x\n", Section->Offset + Entry->RVA, (ULONG)Value);



                                }

                        }

there are a couple unknown values so if anyone has an idea what they are please share.

DavidXanatos

04-09-2022 20:02

And here an other useful nugget of information: https://docs.microsoft.com/en-us/windows/uwp/porting/arm64ec-abi

All times are GMT +8. The time now is 14:55.