-- Leo's gemini proxy

-- Connecting to g.mikf.pl:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Attempting TarGz Powershell Cmdlet in .NET 7.0 in Visual Basic, getting curiously bad tar files. Finally achieving the srht.site comeback on Windows again.

2022-12-04

Previously:

2022-11-14-powershell.gmi

A continuation of the attempts from the end of the above linked post

Imports System.IO
Imports System.Management.Automation
Imports System.Formats.Tar
Imports System.IO.Compression

<Cmdlet("New", "TarGz")>
Public Class NewTarGzCommand
    Inherits Cmdlet
    <Parameter(ValueFromPipeline:=True, Mandatory:=True)>
    Public FileNames As String()
    <Parameter(Mandatory:=True, Position:=0)>
    Public RootPath As String

    Protected Overrides Async Sub ProcessRecord()
        Dim filePaths = From fileName In FileNames
                        Select New With {
                .PathFileName = Path.Join(RootPath, fileName),
                .EntryName = fileName}

        Dim totalDirectorySize = (From filePath In filePaths
                                  Select FileLen(filePath.PathFileName)
                ).Sum()
        Dim tarStream = New MemoryStream(totalDirectorySize * 2)
        Dim tarWriter = New TarWriter(tarStream)
        Dim gzOut = New MemoryStream(totalDirectorySize)
        Dim gZipper = New GZipStream(gzOut, CompressionLevel.SmallestSize)
        Dim tarToGzPromise = tarStream.CopyToAsync(gZipper)
        For Each f In filePaths
            tarWriter.WriteEntry(fileName:=f.PathFileName, entryName:=f.EntryName)
        Next
        tarWriter.Dispose()
        Await tarToGzPromise
        WriteObject(gzOut.ToArray(), True)
    End Sub
End Class

And the Powershell script to use it:

$GemCap = "~/gemcap"
Get-ChildItem $GemCap -File -Recurse -Name |
Where-Object { $_ -NotMatch "^(.*[\\\/])?\..*" } |
New-TarGz (Resolve-Path $GemCap)

And it doesn't work!

The script returns no output, redirecting to file results in an empty file.

Time to split things up

I shouldn't really make a cmdlet that does both tar and gz because it won't be modular enough. I dislike how Powershell piping here apparently can only do sending whole collections after they are finished, but here we aren't going to have the archives too large (we're storing them in byte arrays anyway, after all), so maybe it would be good to write two separate cmdlets.

Imports System.IO
Imports System.Management.Automation
Imports System.Formats.Tar

<Cmdlet("New", "Tar")>
Public Class NewTarCommand
    Inherits Cmdlet
    <Parameter(ValueFromPipeline:=True, Mandatory:=True)>
    Public FileNames As String()
    <Parameter(Mandatory:=True, Position:=0)>
    Public RootPath As String

    Protected Overrides Sub ProcessRecord()
        Dim filePaths = From fileName In FileNames
                        Select New With {
                .PathFileName = Path.Join(RootPath, fileName),
                .EntryName = fileName}

        Dim totalDirectorySize = (From filePath In filePaths
                                  Select FileLen(filePath.PathFileName)
                ).Sum()
        Dim tarStream = New MemoryStream(totalDirectorySize * 2)
        Dim tarWriter = New TarWriter(tarStream)
        For Each f In filePaths
            tarWriter.WriteEntry(fileName:=f.PathFileName, entryName:=f.EntryName)
        Next
        tarWriter.Dispose()
        WriteObject(tarStream.ToArray(), True)
    End Sub
End Class

And running it with:

Get-ChildItem $GemCap -File -Recurse -Name |
 Where-Object { $_ -NotMatch "^(.*[\\\/])?\..*" } |
 New-Tar (Resolve-Path $GemCap) > examplefilename.tar

Now, we do get a file. A text file. A list of decimal byte values.

Our return type was an array of bytes.

How do I redirect binary in Powershell?!

People say the ways are

[io.file]::WriteAllBytes(filename, $_)
Set-Content -Path path -AsByteStream

This is laughable. Modern Powershell really has no better ways?!

But ok, I used the latter one for now.

First result - tar format issues

The result turns out to be tar file with hidden contents - bsdtar 3.5.2 as well as 7-Zip only see the first file added, but I can see all of the files' contents when inspecting it raw with notepad. A 660KB tar file in which only the first added 4KB file is found.


"PaxHeaders" can be found repeatedly in files' headers, so that indicates WriteTar/WriteEntry defaulted to the "Pax" POSIX tar format mentioned as one of the options in its documentation.

So I decided to initialize TarWriter to another format:

New TarWriter(tarStream, TarEntryFormat.Ustar)

> Ustar 2 POSIX IEEE 1003.1-1988 Unix Standard tar entry format.

The result: notepad inspection shows that the format is much different, but the result in bsdtar -t and in 7-Zip is the same. I even checked bsdtar -xvp just to be sure.

And the filenames of all the "hidden" files are also still there in the file!


I would also like to mention that just in case, I don't really have a way at hand to debug these DLLs as i Import-Module them in Powershell.


Should I do another attempt with `TarEntryFormat.Gnu`? Ughhh should I..

I lazily did, it seemed undistinguishable from the old file that i never checksummed, there is a chance i just looked at the old file or a result of the old DLL.


No idea where to go from there. But ok, I got that bsdtar at C:\Windows\system32\tar.exe

I should probably go on to use it and make a gzipping cmdlet


The bsdtar 3.5.2 system32\tar.exe be a bit broken, too

at least in Powershell

Funnily enough, redirecting the String produced by running that bsdtar 3.5.2 tar.exe like

tar cvpf "-" 'C:\Users\Mika Feiler\gemcap' > qwerty.tar

makes the String redirection produce a broken archive that is broken even for that bsdtar itself. And actually it seemed to me earlier there were some memory leaks with the - as the `f` argument, because some random strings like what looked like rubbish from PATH variable but also other 'deeper' stuff were occuring in the error messages.


How the processing in a Powershell cmdlet really works

> The Process statement list runs one time for each object in the pipeline. While the Process block is running, each pipeline object is assigned to the $_ automatic variable, one pipeline object at a time.

The above is an excerpt from about_Functions, as I wanted to start writing my stuff in Powershell again, so I read up both about creating Cmdlets with about_Functions_Advanced as well as about regular functions, and decided to use the regular functions this time. The whole reason to stray from pure Powershell into .NET was not being able to do some too dotNETy things revolving around streams, but now it appears I no longer need to safe-haven in it for things to feel sane enough.


I was actually very dum seeing that I'm overriding a `ProcessRecord` method and not thinking any of it. I had the feeling that something may ultimately be off when I don't have debugging for the DLLs that i Import-Module, but decided it couldn't. Now I finally know roughly what those mysterious methods to override for Begin and End do.


How the heck did all the filenames go into the tar, though? My understandings of things clash so badly now. I gotta make another take at the Tar cmdlet.


Ooh, maybe because it were actually many tars appended to each other. I guess it's not too easy to distinguish multiple tars appended to each other from a single-made tars.

Imports System.IO
Imports System.Management.Automation
Imports System.Formats.Tar

<Cmdlet("New", "Tar")>
Public Class NewTarCommand
    Inherits Cmdlet
    <Parameter(ValueFromPipeline:=True, Mandatory:=True)>
    Public FileName As String
    <Parameter(Mandatory:=True, Position:=0)>
    Public RootPath As String
    Public Shared MEMORYSTREAM_KB = 1000
    Private Shared ReadOnly Property MemoryStreamCapacity As Int32
        Get
            Return MEMORYSTREAM_KB * 1024
        End Get
    End Property
    Private tarStream = New MemoryStream(MemoryStreamCapacity)
    Private tarWriter = New TarWriter(tarStream, TarEntryFormat.Ustar)

    Protected Overrides Sub ProcessRecord()
        tarWriter.WriteEntry(
            fileName:=Path.Join(RootPath, FileName),
            entryName:=FileName)
    End Sub
    Protected Overrides Sub EndProcessing()
        tarWriter.Dispose
        Dim tarArrayResult As Byte() = tarStream.ToArray
        WriteObject(tarArrayResult, True)
    End Sub
End Class

It now produces valid tars!

With just one issue, the backslashes in paths became middle-square-dot (``) characters.


Managed to get that fixed with adding a dumb character replace to the pipe:

 | foreach-Object { $_ -replace '\\', '/' } |

These tars seem to be fine.

Gzip cmdlet now

I discovered that 7z has the format gzip not do tar, so I could use it for compression. But now that I got the thing to work, maybe I should do gzip in .net while i'm at it.

Imports System.IO
Imports System.IO.Compression
Imports System.Management.Automation
<Cmdlet("Compress", "Gzip")>
Public Class CompressGzipCommand
    Inherits Cmdlet

    <Parameter(ValueFromPipeline:=True, Mandatory:=True)>
    Public Input As Byte

    Public Shared MEMORYSTREAM_KB = 1000
    Private Shared ReadOnly Property MemoryStreamCapacity As Int32
        Get
            Return MEMORYSTREAM_KB * 1024
        End Get
    End Property

    Private ReadOnly result = New MemoryStream(MemoryStreamCapacity)
    Private ReadOnly zipper = New GZipStream(result, CompressionLevel.SmallestSize)

    Protected Overrides Sub ProcessRecord()
        zipper.WriteByte(Input)
    End Sub

    Protected Overrides Sub EndProcessing()
        zipper.Dispose()
        Dim resultBytes As Byte() = result.ToArray
        WriteObject(resultBytes)
    End Sub

End Class

invoked

Get-Content -AsByteStream -Path .\file.tar | Compress-Gzip | Set-Content -AsByteStream -path .\file.tar.gz

does produce a valid tar.gz accepted by `tar xz`.

Yeah it does work it byte by byte because I wasn't sure what the behavior could be with multiple byte arrays being processed. But doesn't seem too bad and at least is streaming.

Time for a little curl but powershell

PS C:\Users\Mika Feiler> Get-ChildItem $GemCap -File -Recurse -Name |
>> Where-Object { $_ -NotMatch "^(.*[\\\/])?\..*" } |
>> foreach-Object { $_ -replace '\\', '/' } |
>> New-Tar (Resolve-Path $GemCap) | Compress-Gzip | set-content -path realshit.tar.gz -asbytestream

PS C:\Users\Mika Feiler> Invoke-RestMethod -Method Post -Uri "https://pages.sr.ht/publish/g.mikf.pl" -Authentication OAuth -Token (Read-Host -AsSecureString) -Form @{ protocol="GEMINI"; content = Get-ChildItem .\realshit.tar.gz }
********************************************************************************************
d86f55594b3d2d59f443f26dc3efb9cfaaebc136e804bab6620de1af0c72876f

PS C:\Users\Mika Feiler>

equivalent

Get-ChildItem $GemCap -File -Recurse -Name |
 Where-Object { $_ -NotMatch "^(.*[\\\/])?\..*" } |
 foreach-Object { $_ -replace '\\', '/' } |
 New-Tar (Resolve-Path $GemCap) | Compress-Gzip |
 set-content -path realshit.tar.gz -asbytestream

Invoke-RestMethod -Method Post
  -Uri "https://pages.sr.ht/publish/g.mikf.pl"
  -Authentication OAuth -Token (Read-Host -AsSecureString)
  -Form @{
    protocol="GEMINI";
    content = Get-ChildItem .\realshit.tar.gz }

Ok but now I just did that and forgot about my tinylog gemfeed generator

My tinylog gemfeed generator

Now that one is quirky because its very spec declares the dates suitable for `date -d`, tying it to implementation.

PS C:\Users\Mika Feiler> get-date -date "Nov18 2022 9:30 CET"

> Get-Date: Cannot bind parameter 'Date'. Cannot convert value "Nov18 2022 9:30 CET" to type "System.DateTime". Error: "The string 'Nov18 2022 9:30 CET' was not recognized as a valid DateTime. There is an unknown word starting at index '16'."

PS C:\Users\Mika Feiler> get-date -date "Nov 18 2022 9:30 CET"

> Get-Date: Cannot bind parameter 'Date'. Cannot convert value "Nov 18 2022 9:30 CET" to type "System.DateTime". Error: "The string 'Nov 18 2022 9:30 CET' was not recognized as a valid DateTime. There is an unknown word starting at index '17'."


For now I had to resort to bash from a unlikely-not-unlikely place

 & 'C:\Program Files\Git\bin\bash.exe' -c 'cd g.mikf.pl; date -d "$(grep "^## " tinylog.gmi | cut -c 4- | head -1)" +"# Mika Feiler%n=> tinylog.gmi %F Tinylog" > gemfeed-tinylog.gmi'

-- Response ended

-- Page fetched on Tue May 21 21:11:27 2024