Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc/processor #141

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Doc/processor #141

wants to merge 8 commits into from

Conversation

Neirda24
Copy link
Contributor

Closes #140

@Neirda24 Neirda24 added the documentation Improvements or additions to documentation label Jan 11, 2025
docs/processing.md Outdated Show resolved Hide resolved

## Using FileProcessor

Useful if you want to store the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the file in the filesystem"

docs/processing.md Outdated Show resolved Hide resolved
docs/processing.md Outdated Show resolved Hide resolved
docs/processing.md Outdated Show resolved Hide resolved
docs/processing.md Outdated Show resolved Hide resolved
$chunk = yield;
// do something with it
} while (!$chunk->isLast());
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also explain that the returned value of the processor will be returned by the process method

@DanielBadura
Copy link

What would be the solution when I want to use it via cli & use e.g. flysystem to save the pdf to another source like a s3 bucket? If I understood the docs correctly, this is right now not possible unless I implement a own Processor, like e.g. StringProcessor which simply returns the file as a string (I know, this could be consuming alot of memory, but this could be used universally). Other solution would be to implement a Processor like the FileProcessor but coupled to Flysystem.

@Neirda24
Copy link
Contributor Author

What would be the solution when I want to use it via cli & use e.g. flysystem to save the pdf to another source like a s3 bucket? If I understood the docs correctly, this is right now not possible unless I implement a own Processor, like e.g. StringProcessor which simply returns the file as a string (I know, this could be consuming alot of memory, but this could be used universally). Other solution would be to implement a Processor like the FileProcessor but coupled to Flysystem.

A StringProcessor would eventually cancel all the memory efficiency that this bundle tries to provide. But yes a custom Processor to leverage https://github.com/async-aws/aws/blob/master/src/Integration/Aws/SimpleS3/src/SimpleS3Client.php#L69 for example to upload chunk by chunk. We are planning for a flysystem native integration with this bundle.

@Jean-Beru
Copy link
Contributor

What would be the solution when I want to use it via cli & use e.g. flysystem to save the pdf to another source like a s3 bucket? If I understood the docs correctly, this is right now not possible unless I implement a own Processor, like e.g. StringProcessor which simply returns the file as a string (I know, this could be consuming alot of memory, but this could be used universally). Other solution would be to implement a Processor like the FileProcessor but coupled to Flysystem.

You're right, you will have to create your own Processor to stream to your bucket. You can use AsyncAws to make it easier.

There is no such class in the GotenbergBundle yet but it's planned as a Bridge.

@Neirda24
Copy link
Contributor Author

What would be the solution when I want to use it via cli & use e.g. flysystem to save the pdf to another source like a s3 bucket? If I understood the docs correctly, this is right now not possible unless I implement a own Processor, like e.g. StringProcessor which simply returns the file as a string (I know, this could be consuming alot of memory, but this could be used universally). Other solution would be to implement a Processor like the FileProcessor but coupled to Flysystem.

Here is a naive approach for async aws : https://gist.github.com/Neirda24/05e6a65a862ab34854be3b5f9e562420 As @Jean-Beru said we will try to integrate natively with Flysystem and so on.

@Neirda24
Copy link
Contributor Author

and here is a working example for flysystem. https://gist.github.com/Neirda24/f950f1aaa5343c67bcbaeb12ed0cb887 I'll have to investigate more to see if its possible without a tmpfile but I don't see how given the abstraction.

@Neirda24 Neirda24 requested a review from Jean-Beru January 13, 2025 00:29
@@ -33,12 +33,16 @@
"symfony/mime": "^6.4 || ^7.0"
},
"require-dev": {
"ext-mbstring": "*",
"async-aws/s3": "^2.6",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth to lower the requirement

Comment on lines +39 to +40
"league/flysystem": "^3.29",
"league/flysystem-bundle": "^3.3",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth to lower the requirement

"phpstan/extension-installer": "^1.3",
"phpstan/phpstan": "^1.10",
"phpstan/phpstan-symfony": "^1.3",
"phpunit/phpunit": "^10.4",
"shipmonk/composer-dependency-analyser": "^1.7",
"shipmonk/composer-dependency-analyser": "^1.8",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed to ignore ext-mbstring issues

@smnandre
Copy link
Contributor

and here is a working example for flysystem. https://gist.github.com/Neirda24/f950f1aaa5343c67bcbaeb12ed0cb887 I'll have to investigate more to see if its possible without a tmpfile but I don't see how given the abstraction.

What about using kbond filesystem that does all the abstractions for you?

@Neirda24
Copy link
Contributor Author

and here is a working example for flysystem. https://gist.github.com/Neirda24/f950f1aaa5343c67bcbaeb12ed0cb887 I'll have to investigate more to see if its possible without a tmpfile but I don't see how given the abstraction.

What about using kbond filesystem that does all the abstractions for you?

Not sure hw that would help in this case. Flysystem does not expose a way to write by chunk. the version from kbonc suffers the same issues.

public function __invoke(string|null $fileName): \Generator
{
if (null === $fileName) {
$fileName = uniqid('gotenberg_', true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use Filesystem here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be droped soon I think. I don't see a reason filename should be null. I went along with it in this PR but it will change. Otherwise yes filesystem could be a lead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah because of the getFileName() who's returned $this->response->getFileName(). But I think we can't receive a null or it willcrash with an exception before this part.

$this->logger?->debug('{processor}: no filename given. Content will be dumped to "{file}".', ['processor' => self::class, 'file' => $fileName]);
}

$this->logger?->debug('{processor}: starting multi part upload of "{file}".', ['processor' => self::class, 'file' => $fileName]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private readonly LoggerInterface $logger = new NullLogger()

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ? This works well already. What kind of gain would we have ?

Comment on lines +36 to +42
$this->filesystemOperator->writeStream($fileName, $tmpfile);

$this->logger?->debug('{processor}: content dumped to "{file}".', ['processor' => self::class, 'file' => $fileName]);

return function () use ($fileName) {
return $this->filesystemOperator->read($fileName); // use readStream instead ?
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not feel optimised

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does nothing unless you can that callable. I was thinking of returning an anonymous class that implements Stringable and returns this call instead to make it easier to use. What do you suggest ?

*
* @implements ProcessorInterface<CompleteMultipartUploadOutput>
*/
final class AsyncAwsProcessor implements ProcessorInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, specific processors like AWS or Flysystem must be in a separated bridge to avoid adding dev-dependencies. It could block the update of GotenbergBundle to Symfony x.y if, for example, FlysystemBundle does not support it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then they would have to require it just for one class potentially ? Seems a bit too much don't you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO such classes must be fixed on package version instead of framework version. So maybe rename it to include more clearly package name and version of said package ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for a Bridge folder (like the Notifier for example) for now.

To avoid developer to install both Flysystem and AsyncAWS on their local project, we can install these dev dependencies only in the CI (if tests need them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are dev dependencies already so they are not installed on developper projects. Could be move to suggest only if needed and manually required in test env for CI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not. We should also not rely on the bundle but just the FlySystem lib

Comment on lines +85 to +90
```php
do {
$chunk = yield;
// do something with it
} while (!$chunk->isLast());
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```php
do {
$chunk = yield;
// do something with it
} while (!$chunk->isLast());
```
```php
public function __invoke(string|null $fileName): \Generator
{
do {
$chunk = yield;
// do something with it
} while (!$chunk->isLast());
// rest of your code
}


## Other processors

* `Sensiolabs\GotenbergBundle\Processor\AsyncAwsProcessor` : Upload using the `async-aws/s3` package. Uploads using the (multipart upload)[https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html] feature of S3. Returns a `AsyncAws\S3\Result\CompleteMultipartUploadOutput` object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an example of it

public function __invoke(string|null $fileName): \Generator
{
if (null === $fileName) {
$fileName = uniqid('gotenberg_', true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah because of the getFileName() who's returned $this->response->getFileName(). But I think we can't receive a null or it willcrash with an exception before this part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create documentation about Processor's
5 participants