Exetools

Exetools (https://forum.exetools.com/index.php)
-   General Discussion (https://forum.exetools.com/forumdisplay.php?f=2)
-   -   How to inline x64 asm in vs2017 ? (https://forum.exetools.com/showthread.php?t=18814)

Mahmoudnia 06-17-2018 08:22

How to inline x64 asm in vs2017 ?
 
Hi
Can I use inline x64 asm in vs 2017 ?
When I use inline asm in x64 , this error show up :

nonstandard extension used: '__asm' keyword not supported on this architecture

deepzero 06-17-2018 09:53

There is no x64 inline assembly with the Microsoft compiler.

Quote:

One of the constraints for the x64 compiler is to have no inline assembler support. This means that functions that cannot be written in C or C++ will either have to be written as subroutines or as intrinsic functions supported by the compiler. Certain functions are performance sensitive while others are not. Performance-sensitive functions should be implemented as intrinsic functions.

user1 06-17-2018 14:31

http://masm32.com/board/index.php?topic=4211.0

maybe useful in check options.

Archer 06-18-2018 02:16

You have several options.
1. Switch to some other compiler like intel or GCC. You can still use Visual Studio, just a different compiler, they don't have inline asm restrictions.
2. Compile a separate .asm and link with other compiled .cpp files. This can be configured, so it's done automatically when the solution is built.
3. Sometimes it's enough to use intrinsics. But of course they don't cover all asm instructions.

Mahmoudnia 06-18-2018 02:22

@Archer
I am trying to use GCC x64

chants 06-18-2018 03:08

Or as a fourth option since it has yet to be mentioned, write a tool which at compile time extracts all inline code from C modules intended for x64 compilation, put them in an .asm file with some type of label or function definition, compile them, replace the C code with an appropriate control flow transfer, and so forth.

Unfortunately, nothing will be exactly equivalent mentioned so far in MSVC as the control flow transfer is pretty hard to avoid.

Best yet might be to keep requesting MS to make the long overdo change as a developer feedback or feature request.

Evilcry 06-18-2018 14:54

Two ways:

- Intrinsics https://msdn.microsoft.com/en-us/library/26td21ds.aspx
- As above suggested, .asm linking here a tutorial on how to setup VS + MASM

http://lallouslab.net/2016/01/11/introduction-to-writing-x64-assembly-in-visual-studio/

Best Regards,
Evilcry

chants 06-18-2018 19:35

But since you are not linking the .asm inline, there are excessive call or jump statements emitted. The best would be if MS were to add it.

gigaman 06-20-2018 01:34

Does one call or jump really matter?
I mean, if you said you couldn't easily access local variables or structures, I'd agree... but "excessive call", it sounds like you are trying to optimize something. In that case inline assembler is hardly any good - it's a blackbox for the compiler (at least for the Microsoft's) so it has to dump the values from registers into local variables and after the inline assembly load them back. In other words, a piece of inline assembly heavily breaks the optimization of the surrounding C code - so it's usually not worth it, it does more damage than one call would (so it's better to write the whole CPU intensive piece of code in assembler as a separate function).

chants 06-20-2018 05:38

It matters because its very convenient to program like this. By having to call/return or jump/jump or what have you (also don't forget all the stack setup and cleanup), it forces calling conventions and requires the parameters to be dealt with and such. Yes the MS implementation is not as clever as in GCC/GAS where you can really customize details of the behavior. I agree for optimization its a lame point as you would better be off with pure asm or optimized C rather than a mix and match without sophisticated inline-ing support.

Further its easier to write portable 64/32 bit code without calling conventions and clever use of macros, as the calling conventions are so different you have to use different assembly instructions (registers vs stack).

A C function can modify itself in memory also using clever tricks with inline assembler which has its obfuscation or other uses.

But I suppose this discussion is easily already documented:
Quote:

https://msdn.microsoft.com/en-us/library/80ccffx3.aspx
Quote:

Advantages of Inline Assembly
Because the inline assembler doesn't require separate assembly and link steps, it is more convenient than a separate assembler. Inline assembly code can use any C variable or function name that is in scope, so it is easy to integrate it with your program's C code. Because the assembly code can be mixed inline with C or C++ statements, it can do tasks that are cumbersome or impossible in C or C++.
The uses of inline assembly include:
Writing functions in assembly language.
Spot-optimizing speed-critical sections of code.
Making direct hardware access for device drivers.
Writing prolog and epilog code for "naked" calls.
So optimization is on the list after all :D

atom0s 06-21-2018 01:44

While you can't use inline asm, you can link ASM files into your program and use a separate compiler such as MASM to build .asm files with your project. Visual Studio has support for this built-in.

If you absolutely need inline asm you can use a different compiler/linker.

chants 06-21-2018 21:11

Well it looks like it will not happen anytime either. Unless we all get together to vote it to the top. Difficult to reason about proving correctness in the compiler I suppose :D

Quote:

https://visualstudio.uservoice.com/forums/121579-visual-studio-ide/suggestions/2609085-support-inline-assembler-on-c-64-bit
Quote:

DECLINED·

Admin
Visual Studio Team (Product Team, Microsoft Visual Studio) responded · May 11, 2016
Because of our experience based implementing x86 inline assembler and the many correctness issues we’ve faced with it, we don’t recommend that developers use this approach and won’t be implementing this for new architectures.
As a workaround, you can use the Microsoft Assembler for x64 (https://msdn.microsoft.com/en-us/library/hb5z4sxd.aspx) to create an .OBJ file that you can link against.
-C++ Team

Avalon 07-17-2018 04:00

Just create a .ASM file, change the build rule to MASM, define the subroutine and call it from the C file.


masm.asm
Quote:

.CODE

PUBLIC MyAsmRoutine
PUBLIC ChangeRaxRoutine

MyAsmRoutine PROC
push rbp
mov rbp, rsp
call qword ptr [rcx]
mov rsp, rbp
pop rbp
ret
MyAsmRoutine ENDP

ChangeRaxRoutine PROC
mov rax, 0x4141
ChangeRcxRoutine ENDP

END
file.c
Quote:

void MyAsmRoutine(PVOID pFunc);
void __declspec(naked) ChangeRaxRoutine();

int main()
{
PVOID pNtDirectCall = ....
MyAsmRoutine(pNtDirectCall);
ChangeRaxRoutine();
//now your program will return 0x4141 as RAX is the return code
}

Insid3Code 07-17-2018 06:56

@avalon
Typo...

ChangeRaxRoutine PROC
mov rax, 0x4141
ChangeRcxRoutine ENDP

....
.code

public ChangeRaxRoutine

ChangeRaxRoutine proc
mov rax, 04141h
ChangeRaxRoutine endp
end

vic4key 07-18-2018 19:23

Just an example, hope can help you: https://github.com/vic4key/MS-Mix-Cpp-n-Asm-in-64-bit

Insid3Code 07-19-2018 01:45

@vic4key
To avoid the application crash you need to allocate/align the stack...
Compiled and tested (MSVC 2017 15.7.3)

PHP Code:

F1 PROC
  SUB RSP
40 Allocate space on the stack (for alignment and 32 for shadow space)...
  
PUSH RBP
  MOV RBP
RSP
  LEA RCX
TXT_F1
  CALL puts
  LEAVE
  ADD RSP
40 Cleanup the stack...
  
RET
F1 ENDP

F2 PROC
  SUB RSP
40 Allocate space on the stack (for alignment and 32 for shadow space)...
  
PUSH RBP
  MOV RBP
RSP
  LEA RCX
TXT_F2
  CALL puts
  LEAVE
  ADD RSP
40 Cleanup the stack...
  
RET
F2 ENDP 


vic4key 07-20-2018 12:56

Hi Insid3Code. Not used any local variables inside. So the allocation is unnecessary I think. Even it can be shorter. Eg.

F1 PROC
PUSHAD
LEA RCX, TXT_F1
CALL puts
POPAD
F1 ENDP

More, your edited code should be:

F1 PROC
PUSH RBP
MOV RBP, RSP
SUB RSP, 40 ; Allocate space on the stack (8 for alignment and 32 for shadow space); Below of MOV RBP, RSP, this instruction already saved RSP to RBP.
LEA RCX, TXT_F1
CALL puts
LEAVE
ADD RSP, 40 ; Cleanup the stack... ; Not needed. The LEAVE instruction did it.
RET
F1 ENDP

Insid3Code 07-21-2018 00:33

1 Attachment(s)
Hi Vic,
Are you already tested your snippets ?
Attached, both snippets (allocate/align) and binaries (one crash the other works fine)

I don't know if you can download the attachment from this topic, here external link:
PHP Code:

http://www.mediafire.com/file/s9dd88iel47s7h8/poc.rar 

Compiled and tested (MSVC 2017 15.7.3)

ionioni 07-21-2018 05:05

Quote:

Originally Posted by vic4key (Post 114066)
ADD RSP, 40 ; Cleanup the stack... ; Not needed. The LEAVE instruction did it.

Quote:

Originally Posted by Insid3Code (Post 114077)
Hi Vic,
Are you already tested your snippets ?
Attached, both snippets (allocate/align) and binaries (one crash the other works fine)

I don't know if you can download the attachment from this topic, here external link:
PHP Code:

http://www.mediafire.com/file/s9dd88iel47s7h8/poc.rar 

Compiled and tested (MSVC 2017 15.7.3)

leave is short for
mov rsp, rbp
pop rbp

lose "add rsp, ..."

chants 07-21-2018 06:51

This discussion is majorly lacking a hugely important point:
Calling convention in x64 always uses the RCX, RDX, R8, R9 registers for passing the first 4 arguments (anything up to 64 bit values or pointers), while additionally to those 4 registers, RAX, R10 and R11 are considered volatile. The return value is in the RAX or possibly for a 128-bit return value would be in the RAX:RDX.

This is opposed to x86 where the prior scheme is closest to fastcall which used the ECX and EDX for argument passing before resorting to the stack with additionally the EAX volatile. However in cdecl (caller clean-up stack) calling convention, arguments are all passed on the stack, EAX, ECX and EDX are considered volatile, and the return value in EAX or EAX:EDX. syscall is the same except without the 3 registers being considered volatile. stdcall is also almost the same except the callee cleans up the stack.

If mixing C with external asm, it would be extremely wise to be familiar with all these details.

For more details which are too lengthly to include, refer to:
Quote:

https://en.wikipedia.org/wiki/X86_calling_conventions

chants 07-21-2018 07:01

Microsoft x64 calling convention

Quote:

Stack aligned on 16 bytes. 32 bytes shadow space on stack.
Therefore code given here is all non-standards compliant with arbitrary calling convention (compiler will have return address of 8 bytes so an extra 8 indeed is needed but if called from assembler directly, etc assumption may not hold). If not calling back into C code which has been externed for use by the asm code (like puts is for sure), this should obviously not be necessary - neither alignment or shadow space.

PHP Code:

  SUB RSP32 Allocate space on the stack 32 for shadow space
  
AND RSP, -16 Align on 16 bytes

  LEAVE 

That pattern is needed for both F1 and F2 and its straightforward.

vic4key 07-21-2018 12:57

Yes, right. In x64 arch, we always need to allocate the space for which called "shadow space". So, the above code should be:

Code:

F1 PROC
  PUSH RBP
  MOV RBP, RSP
  SUB RSP, 0x30 ; Just need to add this instruction.
  LEA RCX, TXT_F1
  CALL puts
  LEAVE
  RET
F1 ENDP

Thank you, guys.

chants 07-21-2018 14:39

It should be:

Code:

F1 PROC
  PUSH RBP
  MOV RBP, RSP
  SUB RSP, 32 ; Allocate space on the stack 32 for shadow space
  AND RSP, -16 ; Align on 16 bytes

  LEA RCX, TXT_F1
  CALL puts
  LEAVE
  RET
F1 ENDP


gigaman 07-21-2018 16:25

You normally don't align stack like that.
You know that the caller has (according to the calling convention) taken care of its stack alignment and therefore the RSP on entry ends by 8 (the stack was 16B aligned before and then the return address has been pushed there by the CALL).
So the initial PUSH RBP has aligned the stack to 16B again, SUB RSP, 32 didn't break the alignment - and the AND instruction is useless, RSP is already aligned there.

Mahmoudnia 07-21-2018 18:09

I have to say thank you to all of you guys thank you for your solutions I learned a lot of things. :)

chants 07-22-2018 01:04

Quote:

Originally Posted by gigaman (Post 114087)
You normally don't align stack like that.
You know that the caller has (according to the calling convention) taken care of its stack alignment.

If your asm function is designed to be called from C only then yes I suppose that is a fair assumption. But in fact if it is called from asm including your own or I suppose unknown callers, then it is a false assumption. And this code is known.

The extra 8 bytes comes from having called your own function within proper convention asm already or the return address in case of Windows ABI invocation. CALL F1 in an improperly aligned routine of course adds 8 bytes to the stack and then adding 8 again would misalign it (hence the code examples leaked out there and above which show 40 byte assuming misalignment by an internal call already from ASM despite not having this in examples). The safest assumption is to assume any caller, and realign the stack with an AND RSP, -16 or even to just do it on just the lower 32-bit ESP.

Linux has the same issue even with 32-bit code for late GCC versions as seen in this discussion: Calling printf in extended inline ASM
Quote:

https://stackoverflow.com/questions/37502841/calling-printf-in-extended-inline-asm/37503773
Even better here is a book on the issue for more in depth detail though unfortunately not listing the generic solution and instead letting you make assumptions or track things:
Quote:

https://github.com/simon-whitehead/assembly-fun/tree/master/windows-x64
Quote:

As with the AMD64/SystemV ABI, the Windows ABI dictates that the stack should be aligned on a 16-byte boundary. What this means is that, at the conclusion of the prologue of a function, the memory address that rsp points to should be aligned on a memory address that is a multiple of 16.
The simple act of calling a function misaligns the stack by placing an 8 byte return address on the stack when entering a function.
...
The 64-bit Windows ABI specifies that every single non-leaf function must allocate 32 bytes of stack space for "register spill". This is commonly referred to as "Shadow Space" and must be adjacent to the return address to the previous function. The ABI states that it is the callers job to allocate this stack space, and not the callee. The stack must also always be 16 byte aligned, which can be confusing because on entry to a function the last entry in the stack is the return address of the preview function - which is already 8 bytes. Therefore, for a function to allocate 32 bytes of "Shadow Space" and keep the stack aligned, it must allocate 40 bytes (40 + 8 = 48, which is a multiple of 16).


All times are GMT +8. The time now is 11:31.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2026, vBulletin Solutions, Inc.
Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX