Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelise upload to get better upload speeds #6

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pecigonzalo
Copy link

Im trying to get a working parallelised upload mechanism to get better upload speeds.
You get about 5Mbps per worker.

This will be exposed as an parameter like max chunks, allowing to set according to the destination but 4/10 workers is a safe config and maybe 4 should be the default.

This is still work in progress im putting the PR here to get visibility over the work and maybe some help.

Todo:

  • The workers currently write to a file each based on its worker "number" but because they are async the chunks are spread with no specific order, we need to add some functionality that returns the chunk number and use that to re assemble the file.
    This should be easy by appending the chunk number to the end of the file name chunk that we write on the destination.
    It would be even easier if we change this code to be more OO instead of plain functions.

@dylanmei
Copy link
Contributor

I'm glad to see that you're taking this on. Trying to get large files through WinRM quickly is a difficult problem. Sending files using XML SOAP is never going to be pretty, breaking files up into individual tasks should help save some time with parallelization and also make it much easier to insert fail/retry logic.

As for a parallelization flag, in the long run you might consider only supporting 1. not-parallel and 2. fast-as-we-can-parallel. Between the different flavors of Windows and the wildly different WinRM configurations (max ops per shell, max concurrent operations etc.) out there, it's going to be hard to find sweet spot without building a feedback mechanism in this code.

@pecigonzalo
Copy link
Author

Just FYI ill rebase/merge/squash many of this commits and put proper comments but im still working trough it and i needed to push so i can work on the different locations, forgot to just do it to another branch.

@pecigonzalo pecigonzalo force-pushed the f-Parallel branch 2 times, most recently from 2608e12 to d0faeb8 Compare September 1, 2015 02:48
@pecigonzalo
Copy link
Author

Did some rework and cleanup. When you have time please give this a look.
Im also working on a much bigger work to move this into objects which makes working with it much easier.

@pecigonzalo
Copy link
Author

As a note, more than 6 shells seem to be throttled by the CPU, as the amount of connections/logins ramps that up.
As a future feature we can add the functionality to increase the maxchunk size/max connections per session values and then restore to the originals.

@pecigonzalo
Copy link
Author

I think there is also something missing here, Invoke-Command from powershell is able to send much larger files.
EG:
Create ps1 file containing a massive string that makes the file itself bigger than 100KB
Use Invoke-Command -File saidfile.ps1 -ComputerName test -port 1233 and you can send it... no problems. Some performances metrics show 90Mbps rates...
Im trying to check C:\Windows\Microsoft.NET\assembly\GAC_MSIL\System.Management.Automation\v4.0_3.0.0.0__31bf3856ad364e35\System.Management.Automation.dll by reflecting it to try and identify how it does it.
Initially i see it also fragments the data, but i think its just to comply with the SOAP envelope size limit, this limit is much larger than the 8000 char we can do because of using echo <content> >> file
But aside from that, i cant find what transport mechanism is using... If anyone can facilitate more WSMAN documentation i would be really happy.

EG: i found that we could be directly using powershell trough wsman instead of calling it on a cmd shell, by calling to this other URI on the winrm library:

https://msdn.microsoft.com/en-us/library/ff469933.aspx

There is also this
https://github.com/runner-mei/wsman
and
https://github.com/Openwsman/wsmancli

Maybe we can re use some of that as well. Honestly im surprised how hard is to find good info on this compared to other MS products.

@masterzen i could use your input here as well.

@dylanmei
Copy link
Contributor

dylanmei commented Sep 2, 2015

re: "Invoke-Command": I assume you're talking about running icm from a Windows host where PowerShell is available. You are certainly invited to make this happen without impacting those who do their automation via Mac and Linux hosts.

@pecigonzalo
Copy link
Author

Im not saying to call the Invoke-Command directly, im saying lets reflect the dotNet library and copy the general idea of how it works.

@pecigonzalo pecigonzalo changed the title WIP - Parallelise upload to get better upload speeds Parallelise upload to get better upload speeds Jan 12, 2016
@pecigonzalo
Copy link
Author

@dylanmei any thoughts about merging this as is and if i ever get progress on reflecting the .net part ill do a new PR?

@pecigonzalo
Copy link
Author

This is what i wanted from refelecting the System.Management.Automation.dll thanks to @StefanScherer, altough this will need to be implemented on the winrm project ( @masterzen )

http://download.microsoft.com/download/9/5/E/95EF66AF-9026-4BB0-A41D-A4F81802D92C/%5BMS-PSRP%5D.pdf

@StefanScherer
Copy link

@pecigonzalo I'm just connecting the dots 😀

@masterzen
Copy link

@pecigonzalo looks very interesting. I didn't read the full paper, but it looks like a project in itself. I don't think I'll have the time and energy to implement this new protocol :( but as usual I'm willing to accept PRs :)

@mwrock
Copy link

mwrock commented Jan 13, 2016

When we were first looking at transferring files for Test-Kitchen windows compatibility, we took a similar approach of paralleling the uploads via multiple shells. This does provide some gains in perf, but we found that compressing the entire payload into a single compressed file and transferring that over a single shell was significantly faster given a variety of varying payload compositions. This is the strategy currently used in winrm-fs.

You could get very clever and certainly optimize by adding parallelization on top of that and possibly implementing a torrent-like pattern, but in the end you are always going to be crippled by the 8k command line limit and a limit of shells you can open. In line with the findings of @pecigonzalo I too discovered the PSRP protocol eliminated this 8k limit by looking at wireshark traces of remote Invoke-Command calls. So rather than spending a ton of energy optimizing further against winrm, it seemed best to just get it to a reasonably working state and later look to implementing PSRP, which certainly requires a good bit of work but is likely energy better spent.

I finally got around to investigating that further last week and came up with a very rough but working prototype here. There is a considerable amount of work left to "productionize" that. Honestly its an ugly protocol. If you hated winrm(I did) you might like it after working with PSRP. Rather than plain text, payloads are in binary format and the powershell itself has to be serialized into CliXML, embedded into a binary blob and then that is base64 encoded into the wsman SOAP envelope. Good times. However as I explain here, I think a "partial" implementation can provide alot of benefit.

I have played with sending large payloads over that protocol. 100k worked great in one round trip but I think at some point you do have to fragment the packets. 200k hung but I have not spent time investigating further. The first step is going to be working it into basic powershell commands in the winrm gem and then optimizing it specifically for file uploading in winrm-fs.

Jut thought I'd share that experience and perhaps we can all learn from each others efforts in the this process. Maybe we'll all meet one day in a winrm/psrp support group for those who cant stop adding XML namespaces to their SOAP messages.

@frezbo
Copy link

frezbo commented May 18, 2018

Is there any plans for further work on this? We have a lot of cookbooks and the WinRM uploads are really slow that it hinders development, would be happy to contribute.

@arizvisa
Copy link

arizvisa commented Jun 14, 2018

I took a quick look at how winrmcp is writing files into the filesystem, and it looks like for every chunk that gets transffered, it's executing this command:

cmd, err := shell.Execute(fmt.Sprintf("echo %s >> \"%s\"", content, filePath)), so it's executing Write-Output for each chunk during the transfer? If the content is unprocessed, then can you inject a " to break out of this and (maybe) get arbitrary code execution when sending files? I'm pretty certain if this is benchmarked, this is where the performance hit is at. Especially because Write-Output is not intended for writing to files.

Maybe this should use Out-File with -Append instead? Or to get even better performance, bypass all of this and just instantiate an instance of System.Io.File and write to the file directly?

It'll probably be more effective to look at improving the chunk transfer before making the code more complex by parallellizing it.

It looks like System.IO.StreamWriter can be used to append to the file and this way you can avoid using I/O redirection to append which also includes an implicit seek to get to the end. So, the logic would be to instantiate this class in uploadChunks, and then in appendContent you can use System.IO.StreamWriter.Write to append to the file. This keeps the file handle open and removes the performance hit of having to allocate/release a file handle everytime you append your chunk.

(edited to add last paragraph mentioning System.IO.StreamWriter)

@w0rldart
Copy link

Is this ever going to be merged?

@BongoEADGC6
Copy link

Another year gone by

@johnypony3
Copy link

one more year

Copy link

@chalbersma chalbersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@arizvisa
Copy link

#38

@arizvisa
Copy link

have any of you guys tried my PR? when I dev'd it, it worked significantly faster for me. try it out, even...heh.

@johnypony3
Copy link

@arizvisa i did not try it, solved it by using the cd feature of the cloud im using. until its merged, i cant use it

@arizvisa
Copy link

welp, it's worth a compile because it optimizes the innermost loop of the copy which is using i/o redirection. this involves repeatedly doing: open file, seek to end of file, write data, close file. this is being done on top of powershell's already-existent cost of evaluation and base64 decoding. by parallelizing it, you multiply all of those costs by the number of processes that you're parallelizing it with, which works..sorta.

but as every one of these calls can block your i/o, and my PR fixes that by opening the file once, keeping the handle open, and then just repeatedly appending to the file, closing it when it's complete, so that powershell isn't repeatedly evaluating script and converting it for every single line of input. this divides the cost to just one decode and one i/o call.

@pecigonzalo
Copy link
Author

👋🏼 Is there intent to merge this? If there is, ill try and resolve conflicts so we can merge.

@arizvisa
Copy link

@pecigonzalo, future of this project is compromised as per #38..so it's not really related to conflict issues or anything development related. Rather the project needs someone who's willing to be a steward and maintain it. This is why these 4 PRs (#6, #34, #35, #6) have not yet been merged.

@johnypony3
Copy link

johnypony3 commented Nov 25, 2022 via email

@johnypony3 johnypony3 mentioned this pull request Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.