Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some images cannot be cached by 【Sputniknews. cn】, Russian satellite news agency #6

Open
Whichbfj28 opened this issue May 16, 2024 · 7 comments
Labels
question Further information is requested

Comments

@Victrid
Copy link
Owner

Victrid commented May 16, 2024

Can't replicate, it works on my setting:

image

~ % curl --request GET \
  --url 'https://[piccache].workers.dev/piccache?url=https%3A%2F%2Fcdn.sputniknews.cn%2Fimg%2F07e7%2F03%2F09%2F1048554996_0%3A240%3A1280%3A960_1920x0_80_0_0_1ef90b9d0835157789ba71fd099d385c.jpg.webp' -vvv --output 1.webp
....
< HTTP/2 200
...
< content-length: 236510
< cf-ray: ....-HKG
< cf-cache-status: HIT
...
{ [5 bytes data]
100  230k  100  230k    0     0   843k      0 --:--:-- --:--:-- --:--:--  842k
* Connection #0 to host [piccache].workers.dev left intact

You need to provide more info, like response header, or curl verbosed output.

@Victrid Victrid added the question Further information is requested label May 16, 2024
@Whichbfj28
Copy link
Author

Whichbfj28 commented May 16, 2024

curl --request GET \
>   --url 'https://freshrss.freshrss.com/i/hc.php?url=https%3A%2F%2Fcdn.sputniknews.cn%2Fimg%2F102817%2F69%2F1028176914_0%3A14%3A1100%3A633_1920x0_80_0_0_3758150fc8c0ae7e75642b3cbdedbb7b.jpg.webp' -vvv --output 1.webp
Note: Unnecessary use of -X or --request, GET is already inferred.
* Expire in 0 ms for 6 (transfer 0x557638d2e010)
* Expire in 1 ms for 1 (transfer 0x557638d2e010)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Expire in 1 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 1 ms for 1 (transfer 0x557638d2e010)
* Expire in 1 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 2 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
* Expire in 3 ms for 1 (transfer 0x557638d2e010)
* Expire in 3 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
* Expire in 3 ms for 1 (transfer 0x557638d2e010)
* Expire in 3 ms for 1 (transfer 0x557638d2e010)
* Expire in 4 ms for 1 (transfer 0x557638d2e010)
*   Trying IP...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x557638d2e010)
* Connected to freshrss.freshrss.com (1.1.1.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2393 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [79 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=freshrss.freshrss.com
*  start date: Apr 19 15:52:20 2024 GMT
*  expire date: Jul 18 15:52:19 2024 GMT
*  subjectAltName: host "freshrss.freshrss.com" matched cert's "freshrss.freshrss.com"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x557638d2e010)
} [5 bytes data]
> GET /i/hc.php?url=https%3A%2F%2Fcdn.sputniknews.cn%2Fimg%2F102817%2F69%2F1028176914_0%3A14%3A1100%3A633_1920x0_80_0_0_3758150fc8c0ae7e75642b3cbdedbb7b.jpg.webp HTTP/2
> Host: freshrss.freshrss.com
> User-Agent: curl/7.64.0
> Accept: */*
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [265 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [265 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
} [5 bytes data]
< HTTP/2 200 
< server: nginx
< date: Thu, 16 May 2024 05:43:36 GMT
< content-type: application/x-empty; charset=binary
< content-length: 0
< x-piccache-status: HIT
< 
{ [0 bytes data]
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host freshrss.freshrss.com left intact



hc.php=piccache.php

image

The output file size is 0
We are using a self built piccache method. Non CF workers. dev

@Victrid
Copy link
Owner

Victrid commented May 17, 2024

< x-piccache-status: HIT

The piccache serving seems to be working fine. When the server is downloading this particular image, the connection is broken, causing an empty file to be created with no content written to it.

Then when you accessing it, the caching server checked that the cache exists and pass you the wrongly downloaded empty file.

I think this error occurrence should be very rare. If you insist on fixing the specific image, deleting this file would work:

[CACHE_PLACE_PATH]/piccache/4e72203c1e7acc0245219d3d2a2b9d9615495ed5cb2f84ac619449e52fdcbdd4

or simply removing the entire folder [CACHE_PLACE_PATH]/piccache and run curl again. You should see the correct output, but header x-piccache-status is MISS.

If you are seeing lots of empty images, or still outputing empty files, please let me know.

@Whichbfj28
Copy link
Author

Whichbfj28 commented May 18, 2024

image
image

freshrss :docker image: freshrss/freshrss:1.23.1
extensions:freshrss-image-cache-plugin-0.4【Cloudflare was not used. We used a self built Piccache method(Place piccache in the sub path of freshress)】

  1. There are still many files with a size of 0

  | 2024-05-18 08:30:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/71
  | 2024-05-18 08:30:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/2
  | 2024-05-18 07:30:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/2
  | 2024-05-18 07:01:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/71
  | 2024-05-18 07:01:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/2
  | 2024-05-18 06:01:01 | Feed already being actualized: https://rsshub.*.org/jiemian/list/2

freshrss log

2. Encountered another serious problem. Using version 0.4 to enable active caching seems to cause feed updates to freeze.I am replacing 0.4 with the old version 0.3. After replacing with version 0.3. There is no such issue

@Victrid
Copy link
Owner

Victrid commented May 19, 2024

This seems strange. Can you attach your piccache.php file?

if you flush the cache folder, and change get($url) to:

function get($url)
{
   if ( file_exists(get_name($url)) ) {
      $file = get_name($url);
      return filesize($file) != 0 ? $file : null;
   } else {
      return null;
   }
}

can you see the picture?

@Whichbfj28
Copy link
Author

<?php
define("CACHE_PLACE_PATH", "../../data/");
# Also possible:
# define("CACHE_PLACE_PATH", "C:\\your\\Directory");
# define("CACHE_PLACE_PATH", "/var/www/html/directory");
# Remember to set correct privileges allowing PHP access.
function join_paths(...$paths) {
    return preg_replace('~[/\\\\]+~', DIRECTORY_SEPARATOR, implode(DIRECTORY_SEPARATOR, $paths));
};

function get_name($url) {
    $tmp_path = join_paths(CACHE_PLACE_PATH, "piccache");
    if (!file_exists($tmp_path)) mkdir(join_paths($tmp_path), 0777);
    return join_paths($tmp_path, hash('sha256', $url));
}

function get($url) { return file_exists(get_name($url)) ? get_name($url) : null; }

function set($url) {
    $file_name = get_name($url);
    $content = file_get_contents($url);
    file_put_contents($file_name, $content);
    return $file_name;
}

if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    $post = json_decode(file_get_contents('php://input'), true);
    if (! $post || ! array_key_exists("url", $post)) {
        http_response_code(400); exit();
    }
    set($post['url']);
    header('Content-Type: application/json; charset=utf-8');
    echo '{"status": "OK"}' . PHP_EOL;
    exit();
} elseif ($_SERVER['REQUEST_METHOD'] === 'GET') {
    $url = $_GET['url'];
    if (!$url){ http_response_code(400); exit(); }
    $file = get($url);
    header("X-Piccache-Status: ". ($file ? "HIT" : "MISS"));
    if (! $file) $file = set($url);
    $finfo = finfo_open(FILEINFO_MIME);
    header('Content-Type: ' . finfo_file($finfo, $file));
    finfo_close($finfo);
    header('Content-Length: ' . filesize($file));
    $fp = fopen($file, 'rb');
    fpassthru($fp);
    exit();
} else {
    http_response_code(405);
    exit();
}
?>

@Whichbfj28
Copy link
Author

Whichbfj28 commented May 20, 2024

1、Only the line "define (" CACHE-PLACEPATH ","../../data/")" has been modified. The rest remain unchanged
2、maybe it be because the image mentioned in the title cannot be converted. Resulting timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants