This issue has two distinct aspects:
- discussion of an existing documentation bug
- discussion of the problematic fixed default file encoding currently (alpha16) chosen for Core.
Steps to reproduce
'ö' | Set-Content -NoNewline -Encoding ASCII tmp.txt
'ö' | Add-Content -Encoding ASCII -NoNewline tmp.txt
Get-Content -Encoding ASCII tmp.txt
(Get-Content -Encoding Byte -TotalCount 2 tmp.txt) | % { '0x{0:x}' -f $_ }
'--'
'ö' | Set-Content -NoNewline tmp.txt # use default encoding
'ö' | Add-Content -NoNewline tmp.txt # use default encoding
Get-Content tmp.txt # use default encoding
(Get-Content -Encoding Byte -TotalCount 2 tmp.txt) | % { '0x{0:x}' -f $_ }
Expected behavior
??
0x3f
0x3f
--
??
0x3f
0x3f
Actual behavior
??
0x3f
0x3f
--
öö
0xf6
0xf6
That is, ASCII encoding turns a non-ASCII character into literal ? (0x3f)
The fact that Set-Content without an -Encoding argument resulted in ö on reading implies that ASCII encoding wasn't used, and the specific byte value of 0xf6 further implies that that a single-byte, extended-ASCII encoding was used:
-
For Windows PowerShell, it is the respective system's legacy codepage ("ANSI"), such as Windows-1252 on US-English systems, or Windows-1251 on Russian systems. In other words: the specific encoding is, to put it in Unix terms, locale-dependent.
-
For PowerShell Core, as of alpha 16, it is ISO-8859-1, as @iSazonov helpfully points out (see his comment below for the source-code links).
In contrast, Get-Help Set-Content, Get-Help Add-Content, and Get-Help Get-Content state for parameter -Encoding:
Specifies the file encoding. The default is ASCII.
The help-topic sources (branch live) for the relevant cmdlets can be found here.
Additionally:
-
While these cmdlets accept an encoding identifier Default, as used in other cmdlets, the help only mentions String.
-
Given that the two appear to result in the same encoding - what is their relationship?
-
The description for encoding String in the online help is inadequate:
Uses the encoding type for a string.
Environment data
PowerShell Core v6.0.0-alpha (v6.0.0-alpha.16) on Microsoft Windows 10 Pro (64-bit; v10.0.14393)
This issue has two distinct aspects:
Steps to reproduce
Expected behavior
Actual behavior
That is, ASCII encoding turns a non-ASCII character into literal
?(0x3f)The fact that
Set-Contentwithout an-Encodingargument resulted inöon reading implies that ASCII encoding wasn't used, and the specific byte value of0xf6further implies that that a single-byte, extended-ASCII encoding was used:For Windows PowerShell, it is the respective system's legacy codepage ("ANSI"), such as Windows-1252 on US-English systems, or Windows-1251 on Russian systems. In other words: the specific encoding is, to put it in Unix terms, locale-dependent.
For PowerShell Core, as of alpha 16, it is ISO-8859-1, as @iSazonov helpfully points out (see his comment below for the source-code links).
In contrast,
Get-Help Set-Content,Get-Help Add-Content, andGet-Help Get-Contentstate for parameter-Encoding:The help-topic sources (branch
live) for the relevant cmdlets can be found here.Additionally:
While these cmdlets accept an encoding identifier
Default, as used in other cmdlets, the help only mentionsString.Given that the two appear to result in the same encoding - what is their relationship?
The description for encoding
Stringin the online help is inadequate:Environment data