Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSC-STM-A7: Deployment of PSC Stream Ingester App #246

Open
tiredpixel opened this issue Feb 28, 2024 · 7 comments
Open

PSC-STM-A7: Deployment of PSC Stream Ingester App #246

tiredpixel opened this issue Feb 28, 2024 · 7 comments
Assignees

Comments

@tiredpixel
Copy link
Contributor

Instead of a monthly job run on an EC2 instance, this will need to run continuously, so that new records through the PSC Stream API are detected and ingested immediately.

Subtasks include:

  • Deciding what infrastructure this should run on (eg EC2, Heroku)
  • Documenting the deployment process
  • Automating with CI/CD if relevant

Estimate: 12 hours

@tiredpixel
Copy link
Contributor Author

Static IP Experiment

Heroku doesn't support static IPs, at least not in its standard cloud offering. This prevents it from being used to host Transformer PSC, which needs a static IP to be whitelisted for OpenCorporates reconciliation. An experiment in routing requests through a proxy or similar will be conducted.

It's worth noting that some time was spent considering a similar approach in Jul 2023 during Register 1; however, that wasn't successful, mostly because SOCKS proxies configured via environment variables (or indeed, even specified explicitly in code) seemingly aren't supported by Net::HTTP::Persistent, which Sources OC ReconciliationClient uses. At that time, no Heroku plugins were tried, however. This time, it would be helpful to check not only SOCKS proxies, but whether an HTTP proxy (ideally using TLS) would be sufficient for the purposes. Alternatively, perhaps some other method is available in Heroku via a network wrapper.

@tiredpixel
Copy link
Contributor Author

tiredpixel commented Apr 17, 2024

IP Test Case using Net::HTTP::Persistent

require 'net/http/persistent'

http = Net::HTTP::Persistent.new
http.proxy = :ENV
p http

uri = URI('https://ipinfo.io')
res = http.request(uri)
puts res.body

This may be run via a console in Heroku.

For this to work, ensure that HTTP_PROXY is set. This can be copied from whatever env var is set from a Heroku plugin.

@tiredpixel
Copy link
Contributor Author

tiredpixel commented Apr 17, 2024

Fixie

https://elements.heroku.com/addons/fixie

  • approx 0.76 USD/1K reqs/month (@ 25K reqs)
  • HTTP proxy
  • HTTPS via CONNECT proxy
  • sets FIXIE_URL env var
  • 2 static IPs allocated; reported via plugin UI
  • basic metrics available via plugin UI
  • tested Rails console on Heroku: PASS
  • tested modified Sources OC/Transformer PSC locally: PASS

Fixie Socks

https://elements.heroku.com/addons/fixie-socks

  • approx 1.16 USD/1K reqs/month (@ 25K reqs)
  • SOCKS 5 proxy
  • sets FIXIE_SOCKS_HOST env var
  • 2 static IPs allocated; reported via plugin UI
  • basic metrics available via plugin UI
  • tested Rails console on Heroku: FAIL
  • as expected, Net::HTTP::Persistent can't use SOCKS proxies directly

QuotaGuard Static IP's

https://elements.heroku.com/addons/quotaguardstatic

  • approx 0.95 USD/1K reqs/month (@ 20K reqs)
  • HTTP proxy
  • HTTPS via CONNECT proxy
  • SOCKS 5 proxy
  • sets QUOTAGUARDSTATIC_URL env var
  • 2 static IPs allocated; reported via CLI
  • detailed metrics available via plugin UI
  • tested Rails console on Heroku: PASS

QuotaGuard Shield Static IP's

https://elements.heroku.com/addons/quotaguardshield

  • approx 1.45 USD/1K reqs/month (@ 20K reqs)
  • HTTP proxy
  • HTTPS via CONNECT proxy
  • SOCKS 5 proxy
  • sets QUOTAGUARDSHIELD_URL env var
  • 2 static IPs allocated; reported via CLI
  • detailed metrics available via plugin UI
  • tested Rails console on Heroku: FAIL
  • it seems Net::HTTP::Persistent doesn't support HTTPS proxies

IPBurger Static IPs

https://elements.heroku.com/addons/ipburger

  • unlimited reqs/month for fixed $99.00/month
  • HTTP proxy
  • HTTPS via CONNECT proxy
  • SOCKS 5 proxy
  • sets IPB_HTTP, IPB_HTTPS, IPB_MYIP, IPB_SOCKS5 env vars
  • 1 static IP allocated; reported via CLI
  • tested Rails console on Heroku using IPB_HTTP: PASS
  • tested Rails console on Heroku using IPB_HTTPS and HTTP_PROXY: FAIL
  • tested Rails console on Heroku using IPB_HTTPS and HTTPS_PROXY: FAIL
  • it seems Net::HTTP::Persistent doesn't support HTTPS proxies

Proximo

https://elements.heroku.com/addons/proximo

  • approx 0.50 USD/1K reqs/month (@ 50K reqs)
  • HTTP proxy
  • HTTPS (via CONNECT ?) proxy
  • wrapper program
  • proximo-stacklet wrapper for Dante SOCKS server
  • sets PROXIMO_URL env var
  • 1 static IP allocated; reported via CLI
  • tested Rails console on Heroku: PASS

@tiredpixel
Copy link
Contributor Author

Static IP Experiment Conclusion

SOCKS proxies are not supported by Net::HTTP::Persistent. Whilst these would usually be my preference, they are not necessary since we require only HTTP/HTTPS traffic. So, we can ignore those options which support only SOCKS, as well as those alternative configurations which support SOCKS but also provide HTTP or HTTPS proxies.

HTTPS proxies are also not supported by Net::HTTP::Persistent. This is unfortunate. However, note that it's still possible to access HTTPS sites using CONNECT, and that most of the connection is encrypted:

The most common form of HTTP tunneling is the standardized HTTP CONNECT method.[1][2] In this mechanism, the client asks an HTTP proxy server to forward the TCP connection to the desired destination. The server then proceeds to make the connection on behalf of the client. Once the connection has been established by the server, the proxy server continues to proxy the TCP stream to and from the client. Only the initial connection request is HTTP - after that, the server simply proxies the established TCP connection.

https://en.wikipedia.org/wiki/HTTP_tunnel

Most of the connection is not all of the connection, however. This leaves us with a couple of options:

  1. Rewrite Sources OC code to change from Net::HTTP::Persistent to a library supporting HTTPS proxies (or even SOCKS proxies). This would likely not be a large amount of work, since most of the code is just a few lines long. However, it would be necessary to check carefully for any differences in methods available in an alternative library.
  2. Leave existing Sources OC code as-is (almost), but use an HTTP proxy. Since all the data being accessed is available publicly anyway, and this is just for the OpenCorporates reconciliation connection (not for any database or similar), then this is likely acceptable.

Given (2), which particular Heroku plugin to use (that is, which third-party service) isn't so important. There are a number of options, and some prices are similar depending on the number of requests per month (which we don't currently know). So, we could start with one, and change it easily within a few minutes, with no code changes required. In the case we proceed with HTTP proxies, I'd suggest Fixie or Proximo for this use case, in the first instance.

Also given (2) (so, no rewrite), there is one small change needed, which I'll submit in a PR momentarily.

With one of these options in place, we could plan to host the new Ingester PSC and Transformer PSC apps in Heroku as new apps, configured with proxy plugins, and provide those IPs to whitelisted. This could be 1 IP per app (Proximo and Fixie alternative config), or 2 IPs per app (Fixie default), resulting in 2-4 IPs to be whitelisted (since we would need stg and prd apps).

If everything worked satisfactorily, there would be the option to reconsider the existing EC2 and dev whitelisted IPs, since using a proxy should theoretically also be possible there, too (although a proxy provider would still have to be found).

However, it's useful to note that these Heroku plugins are not doing anything special; they are just proxy providers. So, it would in fact be possible to use other proxy providers, instead, without installing the plugins, or alternatively to sign up directly for one of those services, and to share the usage across multiple applications. e.g. Fixie could be used (https://usefixie.com/), but there are many others—including ones not listed as Heroku plugins.

@tiredpixel
Copy link
Contributor Author

To clarify something I muddled: Ingester PSC doesn't need run OpenCorporates reconciliation; rather, Transformer PSC does. This means that it doesn't need access to OpenCorporates or any IPs whitelisted, but Transformer PSC will. All the IP-related experiments and notes in this ticket still stand, but apply to Transformer PSC, not Ingester PSC. Thus, they should have been done under #252 rather than this ticket.

My apologies for the confusion.

@tiredpixel
Copy link
Contributor Author

Ingester PSC has been deployed to Heroku.

There is only a production app, since it's not possible for us to run staging apps on the Register data pipelines within our current setup.

Ingester PSC on Heroku is now intentionally in a crash loop; this will be lifted when openownership/register-ingester-psc#33 is merged once the streaming code is ready to go live.

@tiredpixel
Copy link
Contributor Author

Ingester PSC on Heroku is now live and streaming updates from PSC datasource.

@tiredpixel tiredpixel moved this from In Progress to In Testing in Open Ownership Register and BODS pipelines Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant