Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Transcoding #321

Closed
onedr0p opened this issue Mar 24, 2023 · 40 comments
Closed

RFC: Transcoding #321

onedr0p opened this issue Mar 24, 2023 · 40 comments
Labels
request for comments Opportunity to share ideas

Comments

@onedr0p
Copy link
Contributor

onedr0p commented Mar 24, 2023

This is mainly also on the topic of transcoding, I know you have some opinions on Midarr and transcoding but this is to bring up a discussion transcoding and how it could be implemented if you were to ever think about adding such abilities.

The way Midarr works currently is that transcoding is not really supported, anything outside x264 most likely will not play. I wonder if it is worth the effort to use jellyfin's ffmpeg as a library to integrate transcoding in general.

On the topic of distributed transcoding

Given you agree that transcoding is on the slate for possibly being implemented...

I really love that I can scale midarr to n number of pods in kubernetes, however there is no way to "load balance" transcoding requests so that an overworked container will not pick up new requests for transcoding.

These are just questions to chew over on future ideas of Midarr.

@trueChazza
Copy link
Member

Thanks for opening this discussion. I think this is definitely a topic worth talking and exploring more about, and as you’ve mentioned Midarr now has a half baked solution for this.

You’ve brought up a few interesting points to discuss:

  • ffmpeg jellyfin
  • x264 support only
  • Load balancing across Midarr instances (particularly for transcoding)

@trueChazza
Copy link
Member

trueChazza commented Mar 25, 2023

ffmpeg jellyfin

What is the benefit of this over say just ffmpeg itself? A quick look through the repo it seems jellyfin has correlated only the tools they need for their transcoding pipeline.

Unless I’m missing something?

@trueChazza
Copy link
Member

I’m all for not reinventing the wheel and have looked into other external / decoupled solutions like Go Transcode before doing a basic Midarr implementation.

Go Transcode not only provides the ffmpeg / transcoder implementation, but also the “glue” or interface Midarr can consume for transcoding.

@trueChazza
Copy link
Member

trueChazza commented Mar 25, 2023

If Go Transcode is a viable option to get us up and running quickly, but also achieve load balancing across Midarr instances. All that we’d need really is the Midarr implementation for it.

@onedr0p
Copy link
Contributor Author

onedr0p commented Mar 25, 2023

I just picked jellyfins ffmpeg because I figured it would be a fork of ffmpeg geared towards transcoding various different types of media so many nothing too special there.

go-transcode seems interesting as well!

@trueChazza
Copy link
Member

trueChazza commented Mar 25, 2023

Would you be willing to test out Go Transcode and see if it’s a good viable solution?

It’s come a long way since I last looked at it. It didn’t have VOD support back then, which I see it does now.

@trueChazza
Copy link
Member

What exactly is Midarr lacking in its albeit basic transcoder at the moment?

How does it play your media that previously you said didn’t have any audio?

If you had a wishlist for Midarr transcoding, what would that be?

@onedr0p
Copy link
Contributor Author

onedr0p commented Mar 25, 2023

If you had a wishlist for Midarr transcoding, what would that be?

That's a pretty hard question to answer because I am very much in love with Kubernetes for managing containers. It would be awesome if Midarr could have a "transcoding" service which would be a dedicated number of "warm" containers just for transcoding.

To get deeper into the weeds it would amazing if Midarr could speak to the kubernetes API to create/destroy pods on demand for transcoding while also having "warm" transcoding pods already available. As far as I know in standard docker-compose land, or hell even using a thing like Portainer this would not be possible. Kubernetes is amazing at automation like which is why I am so in love with managing containers with it.

A Midarr Kubernetes operator would be amazing if you wanted to learn more about Kubernetes 😄 but I understand this is a huge change and would require major refactoring. You did ask for a wishlist though 🤣

@trueChazza
Copy link
Member

trueChazza commented Mar 25, 2023

Yeah Kubernetes is amazing I agree!

That’s quite a Wishlist 🤣 a Midarr K8s operator sounds interesting, but is out of scope for the Midarr app itself.

I created exstream as the upstream transcoding service Midarr currently relies on. Packaging this service up in its own Docker container would be a start - and we’d be able to scale this n number of containers in K8s and Docker etc. How does that sound?

Plus by all means - if you’re keen to create a Midarr K8s operator, you have my full support!

@onedr0p
Copy link
Contributor Author

onedr0p commented Mar 26, 2023

Maybe an operator could be done in the long term. Breaking out the transcoding from the main app would be very neat in any way it could be done. That would really set Midarr apart from the other players in the game like Jellyfin and Plex which are behemoth monoliths.

@trueChazza
Copy link
Member

trueChazza commented Mar 26, 2023

Yeah wouldn’t take much to package up exstream, I’ll just need to figure out the public API for it 😁

I’ll look into it after v3 release. Let’s keep this conversation going though - a lot of great ideas already!

@bo0tzz
Copy link

bo0tzz commented Mar 29, 2023

I just had this idea and came here to find the discussion about transcoding already going, but I figured I'll share it anyways: Midarr's approach of leveraging existing services that people are already running is very nice. If that could somehow be applied to transcoding too, for example by hooking into something like Tdarr, that would be pretty cool. I don't know whether Tdarr (or other software like it) would support something like that at all though.

If that's not possible, I agree with onedr0p that breaking out the transcoding to exstream or another app running externally would be very nice to have. Over at Immich we don't need to do any live media handling, but we still have all the processing done in a separate container which keeps the main UI/API nice and responsive and makes scaling easier to reason about too :D

@onedr0p
Copy link
Contributor Author

onedr0p commented Mar 29, 2023

Tdarr not being open source is a full stop for me.

@bo0tzz
Copy link

bo0tzz commented Mar 29, 2023

I wasn't aware of that, but I agree! I just used Tdarr to illustrate my idea, I believe there are other similar applications out there that are open source.

@trueChazza trueChazza added enhancement New feature or request request for comments Opportunity to share ideas and removed enhancement New feature or request labels Apr 4, 2023
@trueChazza
Copy link
Member

services:
  
  midarr:
    container_name: midarr
    image: ghcr.io/midarrlabs/midarr-server:latest
    environment:
      - EXSTREAM_URL=http://exstream

  exstream:
    container_name: exstream
    image: exstream:latest
    volumes:
      - /path/to/media:/media

Just noting this down for more discussion. If we packaged up exstream into its own image - and have Midarr reference it for all streams, this would be a breaking change.

I'm not too keen on having Midarr default to itself, then reference exstream optionally (opt in to exstream).

What do you think?

@trueChazza
Copy link
Member

services:

  exstream:
    container_name: exstream
    image: exstream:latest
    volumes:
      - /path/to/media:/media

There would also be no reason to mount libraries into Midarr, only Exstream. Midarr would just proxy on the media locations received from Radarr / Sonarr to Exstream.

So Midarr would send something like this to Exstream to resolve:
http://exstream/movies/Some Movie/some-movie.mp4

@trueChazza
Copy link
Member

I'm not too keen on having Midarr default to itself, then reference exstream optionally (opt in to exstream).

I would prefer Exstream be the default / required option.

@bo0tzz
Copy link

bo0tzz commented Apr 9, 2023

I find the idea of the main midarr container just "coordinating" things with all the hard work left to exstream pretty appealing.

@onedr0p
Copy link
Contributor Author

onedr0p commented Apr 9, 2023

I like this idea too!

@bo0tzz
Copy link

bo0tzz commented Apr 9, 2023

I'm thinking about how handle scaling to multiple exstream instances, and I'm not managing to come up with an obvious answer. With each exstream instance being stateful, I think the midarr server would have to be able to reference them individually in some way. What are your thoughts on that?

@onedr0p
Copy link
Contributor Author

onedr0p commented Apr 9, 2023

I'm not sure there is an obvious answer, it really depends on if Kubernetes wanted to be a first class citizen. If not, there would need to be a set amount of exostream containers that would need to handle the load and run all the time. IIRC This is pretty much how tdarr does it with their agents, since there's no kubernetes operator for it.

@bo0tzz
Copy link

bo0tzz commented Apr 9, 2023

What about an inverted setup where instead of midarr having the address of the exstream container, exstream registers itself with midarr? Then dns etc aren't really a concern.

@trueChazza
Copy link
Member

What about an inverted setup where instead of midarr having the address of the exstream container, exstream registers itself with midarr? Then dns etc aren't really a concern.

Interesting! Could you provide an example or expand on how this could work? I’m not sure how Exstream would register itself.

@trueChazza
Copy link
Member

If Exstream is an agnostic http service, how would it know to register itself with another service?

@bo0tzz
Copy link

bo0tzz commented Apr 11, 2023

If it registers itself, it wouldn't be agnostic. Exstream would be configured with Midarr's address, and hit an API endpoint to register itself.

I guess a preceding question to this is: How do we want to handle the stream bytes, after Exstream has been instructed to start transcoding a file? I think the cleanest option (that I can come up with) is for Exstream to just expose the stream on HTTP, and then either Midarr (or a fronting nginx) proxies that endpoint or the client hits Exstream directly. Another option is that Exstream writes the transcoded stream to a shared volume, and the Midarr server takes care of serving that on HTTP, but that might be coupling things too tight.

If we go the proxy route, when there are multiple Exstream backends, Midarr needs to be able to map each stream to the correct backend. I believe the docker (and kubernetes) native way is doing this through separate DNS names: exstream-1, exstream-2 etc, but requiring a particular DNS layout feels brittle to me which is why I'm looking for another approach.

@bo0tzz
Copy link

bo0tzz commented Apr 11, 2023

This is maybe more for implementation details of Exstream, but https://membrane.stream/ could be worth taking a solid look at.

@trueChazza
Copy link
Member

Awesome thanks for explaining that. I was initially thinking of the proxy option you mentioned. I would prefer Midarr be as loosely coupled as possible to Exstream.

I’m leaning more towards Exstream just being a standalone HTTP service that anyone could use. Midarr would just implement the API.

@trueChazza
Copy link
Member

If we go the proxy route, when there are multiple Exstream backends, Midarr needs to be able to map each stream to the correct backend. I believe the docker (and kubernetes) native way is doing this through separate DNS names: exstream-1, exstream-2 etc, but requiring a particular DNS layout feels brittle to me which is why I'm looking for another approach.

For this part - does Midarr really need to know those specific details? I’m just asking because I’m unsure.

If you scaled out Exstream to say 10 instances / containers and had Traefik load balance across them - could Midarr not just reference that single Traefik URL? Midarr wouldn’t need to know there are potentially more than 1 instance. Am I on the right path or way off? 😂

@onedr0p
Copy link
Contributor Author

onedr0p commented Apr 11, 2023

could Midarr not just reference that single Traefik URL?

It could but what happens when there is 10 transcoding requests and they happen to land on the same exostream container? This is why I'm thinking midarr has to be aware of what work is happening on which exostream container and coordinate accordingly.

@trueChazza
Copy link
Member

Ah true!! If that could potentially happen, then yes Midarr would need to be aware.

Might be worth me doing some discovery work around that too maybe. Just to validate our assumptions 😁

@trueChazza
Copy link
Member

This is maybe more for implementation details of Exstream, but https://membrane.stream/ could be worth taking a solid look at.

https://membrane.stream/guide/v0.9/packages.html

They don't seem to support h265 (yet?) 😢

@bo0tzz
Copy link

bo0tzz commented Apr 12, 2023

They don't seem to support h265

Damn, I missed that :/ After looking at Exstream a bit more, I think Membrane might be overkill here anyways.

I laid out the architecture q to a few friends and together we came up with these options:

  1. Instead of Exstream being aware of Midarr and registering itself, we could include a small component (could maybe even be just a bash script) that runs in the same pod as Exstream and does the registration, with Exstream itself still staying agnostic.
  2. To make the DNS based approach a bit friendlier to other platforms, we could make it so that:
    • At start, Midarr is configured with a list of Exstream addresses. This can be DNS names or IP addresses. eg EXSTREAM_ADDRESS=exstream;192.168.1.8.
    • Any DNS names in this config are resolved to their IP addresses (with possibly multiple A records on one DNS name).
    • After this, all the IPs are added to the pool.

With approach 2, Midarr needs to regularly re-fetch the DNS records to make sure it's aware of instances being created/removed/etc. The Exstream API should probably also have some endpoints for its status (eg system load, health, whether it's shutting down) that Midarr keeps track of.

@trueChazza
Copy link
Member

This is awesome thanks for this!

@trueChazza
Copy link
Member

trueChazza commented Apr 12, 2023

  1. Instead of Exstream being aware of Midarr and registering itself, we could include a small component (could maybe even be just a bash script) that runs in the same pod as Exstream and does the registration, with Exstream itself still staying agnostic.

For this option does this register a single instance?

@trueChazza
Copy link
Member

  1. To make the DNS based approach a bit friendlier to other platforms, we could make it so that:

    • At start, Midarr is configured with a list of Exstream addresses. This can be DNS names or IP addresses. eg EXSTREAM_ADDRESS=exstream;192.168.1.8.
    • Any DNS names in this config are resolved to their IP addresses (with possibly multiple A records on one DNS name).
    • After this, all the IPs are added to the pool.

With approach 2, Midarr needs to regularly re-fetch the DNS records to make sure it's aware of instances being created/removed/etc. The Exstream API should probably also have some endpoints for its status (eg system load, health, whether it's shutting down) that Midarr keeps track of.

This we could definitely work towards. A few moving parts to break down into smaller deliverables, but I think the payoff would be huge! Being able to scale Exstream out with Midarr aware!

@trueChazza
Copy link
Member

trueChazza commented Apr 12, 2023

The Exstream API should probably also have some endpoints for its status (eg system load, health, whether it's shutting down) that Midarr keeps track of.

Something like a Prometheus exporter / stats endpoint? 🔥

@trueChazza
Copy link
Member

midarrlabs/exstream#10

I’ve started working on a MVP release for Exstream already. Just for the MVP I’ll get a minimal functional version working - then we can build these features on top and over the next iterations.

@trueChazza
Copy link
Member

I’ll add a roadmap for Exstream too so we can prioritise these features for releases.

@bo0tzz
Copy link

bo0tzz commented Apr 13, 2023

For this option does this register a single instance?

This would be one instance of Exstream with a sidecar script that registers that instance, but that approach can be repeated to register multiple instances (each with their own sidecar).

Something like a Prometheus exporter / stats endpoint?

That'd certainly be good to have, and if it has the right metrics then Midarr can use it to monitor the instance too. The goal there is that Midarr knows how the Exstream instances are doing load-wise etc, so it can make smarter decisions about where to assign new streams.

@trueChazza
Copy link
Member

I'm going to move this into a discussion, as it's quite a broad topic. We can create issues tied to the discussion as we progress.

@midarrlabs midarrlabs locked and limited conversation to collaborators Aug 14, 2023
@trueChazza trueChazza converted this issue into discussion #392 Aug 14, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
request for comments Opportunity to share ideas
Projects
None yet
Development

No branches or pull requests

3 participants