Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-fno-short-wchar: Inconsistent wchar_t Definition with L-Prefix Strings #461

Open
open-leocat opened this issue Oct 14, 2024 · 2 comments
Open

Comments

@open-leocat
Copy link

open-leocat commented Oct 14, 2024

I encountered an issue with the -fno-short-wchar flag while using the latest release of Clang MinGW on Windows. This flag should ensure that wchar_t is 4 bytes instead of 2 bytes (which forces UTF-16 encoding). On Linux, using Clang 19, this flag behaves as expected.

However, in the newest release of Clang MinGW on Windows, the flag does not fully change the definition of wchar_t. While the flag causes the compiler to interpret L-prefixed strings as arrays of int (i.e., 4 bytes per character), the wchar_t type remains 2 bytes, as defined by unsigned short in the standard library. This inconsistency leads to a compiler warning and causes this flag to break existing code when working with L-prefixed strings.

Test Code

#include <wchar.h>

int main() {
	wchar_t* string = L"a";
	
	return 0;
}

Compiler Output

main.c:5:11: warning: incompatible pointer types initializing 'wchar_t *' (aka 'unsigned short *') with an expression of type 'int[2]' [-Wincompatible-pointer-types]
    4 |         wchar_t* string = L"f";
      |                  ^        ~~~~
1 warning generated.

Problematic Lines in the Standard Library (corecrt.h, line 95-100)

#ifndef _WCHAR_T_DEFINED
#define _WCHAR_T_DEFINED
#if !defined(__cplusplus) && !defined(__WIDL__)
typedef unsigned short wchar_t;
#endif /* C++ */
#endif /* _WCHAR_T_DEFINED */

Summary

  • The size of wchar_t is constantly 2 bytes on Windows.
  • The -fno-short-wchar flag causes a mismatch between the expected wchar_t size (4 bytes) and the actual size defined by the standard library (2 bytes).
  • This leads to warnings and potential runtime issues when handling L-prefixed wide strings.

Excepted behaviour

The -fno-short-wchar flag should properly redefine wchar_t as 4 bytes across the entire system, including the standard library, to prevent mismatches.

Environment

  • Clang MinGW (llvm-mingw 20241001 with LLVM 19.1.1, UCRT-Version, x86-64 Windows)
  • Windows 11
  • Target: 64-bit
@open-leocat
Copy link
Author

open-leocat commented Oct 14, 2024

I just realized that this is not actually the MinGW repository, but simply a repository for building the LLVM-MinGW-Compiler combination thingy.

However, this still is a Clang and UCRT incompatiblity, so I am not sure whether this should actually be fixed, or whether it is acceptable that the feature is broken.

All this is also caused by Microsoft being retarded and opting for UTF-16 instead of something decent, like UTF-8 or UTF-32 and hardcoding it retardedly into the UCRT. Maybe this could be patched, but I am sure that all the wchar functions would also be unfunctional and would need to be patched. Thus one should simply pray that one day the C23 UTF-8 functions will be properly implemented.

I will reopen this issue, so maybe someone else will decide what to do with this finding.

@open-leocat open-leocat reopened this Oct 14, 2024
@open-leocat
Copy link
Author

open-leocat commented Oct 15, 2024

After playing around with this flag for a bit, I have also found a conflict between stddef.h and corecrt.h, which happened when importing stdint.h:

In file included from C:/Program Files/LLVM/lib/clang/19/include/stdint.h:56:
In file included from C:/Program Files/LLVM/include/stdint.h:32:
In file included from C:/Program Files/LLVM/lib/clang/19/include/stddef.h:103:
C:/Program Files/LLVM/lib/clang/19/include/__stddef_wchar_t.h:24:24: error: typedef redefinition with different types ('int' vs 'unsigned short')
   24 | typedef __WCHAR_TYPE__ wchar_t;
      |                        ^
C:/Program Files/LLVM/include/corecrt.h:98:24: note: previous definition is here
   98 | typedef unsigned short wchar_t;
      |                        ^
1 error generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant