Skip to content

Commit

Permalink
Merge pull request #20 from NikolayS/new_apis
Browse files Browse the repository at this point in the history
New_apis
  • Loading branch information
dmius authored Mar 7, 2018
2 parents 440f7b7 + 6692379 commit f7ebb99
Show file tree
Hide file tree
Showing 18 changed files with 771 additions and 58 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
*.swp
*~
/setup.yml
36 changes: 23 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# postgrest-google-translate
PostgreSQL/PostgREST proxy to Google Translate API, with caching and ability to combine multiple text segments in one single request. It allows to work with Google Translate API right from Postgres or via REST API requests.
# postgrest-translation-proxy
PostgreSQL/PostgREST proxy to Google, Bing and Prompt Translate APIs, with caching and ability to combine multiple text segments in one single request. It allows to work with those Translate APIs right from Postgres or via REST API requests.

[![Build Status](https://circleci.com/gh/NikolayS/postgrest-google-translate.png?style=shield&circle-token=fb58aee6e9f98cf85d08c4d382d5ba3f0f548e08)](https://circleci.com/gh/NikolayS/postgrest-google-translate/tree/master)

This tiny project consists of 2 parts:

1. SQL objects to enable calling Google API right from SQL environment (uses [plsh](https://github.com/petere/plsh) extension)
1. SQL objects to enable calling Translation APIs right from SQL environment (uses [plsh](https://github.com/petere/plsh) extension)
2. API method (uses [PostgREST](http://postgrest.com))

Part (1) can be used without part (2).

Table `google_translate.cache` is used to cache Google API responses to speedup work and reduce costs.
Table `translation_proxy.cache` is used to cache API responses to speedup work and reduce costs.
Also, it is possible to combine multiple phrases in one API request, which provides great advantage (e.g.: for 10 uncached phrases, it will be ~150-200ms for single aggregated request versus 1.5-2 seconds for 10 consequent requests). Currently, Google Translate API accepts up to 128 text segments in a single request.

:warning: Limitations
Expand All @@ -29,22 +29,32 @@ Dependencies
---
1. cURL
2. [PostgREST](http://postgrest.com) – download the latest version. See `circle.yml` for example of starting/using it.
2. `plsh` – PostgreSQL contrib module, it is NOT included to standard contribs package. To install it on Ubuntu/Debian run: `apt-get install postgresql-X.X-plsh` (where X.X could be 9.5, depending on your Postgres version)
3. `plsh` – PostgreSQL contrib module, it is NOT included to standard contribs package. To install it on Ubuntu/Debian run: `apt-get install postgresql-X.X-plsh` (where X.X could be 9.5, depending on your Postgres version). For Archlinux use AUR package 'postgresql-plsh'.
4. Ruby for easy installer (optional)

Installation and Configuration
---
For your database (here we assume that it's called `DBNAME`), install [plsh](https://github.com/petere/plsh) extension and then execute two SQL scripts, after what configure your database setting `google_translate.api_key` (take it from Google Could Platform Console):
Simple method
----
Edit `setup.yml` then execute `setup.rb`. You need to have the ruby interpreter been installed.

Step-by-step method
----
For your database (here we assume that it's called `DBNAME`), install [plsh](https://github.com/petere/plsh) extension and then execute `_core` SQL scripts, after what configure your database settings:
`translation_proxy.promt_api_key`, `translation_proxy.bing_api_key` and
`translation_proxy.google_api_key` (take it from Google Could Platform Console):
```sh
psql DBNAME -c 'create extension if not exists plsh;'
psql DBNAME -f install_core.sql
psql -c "alter database DBNAME set google_translate.api_key = 'YOU_GOOGLE_API_KEY';"
psql -c "alter database DBNAME set google_translate.begin_at = '2000-01-01';"
psql -c "alter database DBNAME set google_translate.end_at = '2100-01-01';"
psql -c "alter database DBNAME set translation_proxy.google_api_key = 'YOUR_GOOGLE_API_KEY';"
psql -c "alter database DBNAME set translation_proxy.google_begin_at = '2000-01-01';"
psql -c "alter database DBNAME set translation_proxy.google_end_at = '2100-01-01';"
```

Alternatively, you can use `ALTER ROLE ... SET google_translate.api_key = 'YOU_GOOGLE_API_KEY';` or put this setting to `postgresql.conf` or do `ALTER SYSTEM SET google_translate.api_key = 'YOU_GOOGLE_API_KEY';` (in these cases, it will be available cluster-wide).

Parameters `google_translate.begin_at` and `google_translate.end_at` are responsible for the period of time, when Google Translate API is allowed to be called. If current time is beyond this timeframe, onlic cache will be used.
Alternatively, you can use `ALTER ROLE ... SET translation_proxy.google_api_key = 'YOUR_GOOGLE_API_KEY';` or put this setting to `postgresql.conf` or do `ALTER SYSTEM SET translation_proxy.google_api_key = 'YOUR_GOOGLE_API_KEY';` (in these cases, it will be available cluster-wide).

Parameters `translation_proxy.google_begin_at` and `translation_proxy.google_end_at` are responsible for the period of time, when Google Translate API is allowed to be called. If current time is beyond this timeframe, only the cache table will be used.

To enable REST API proxy, install [PostgREST](http://postgrest.com), launch it (see `cirle.yml` as an example), and initialize API methods with the additional SQL script:
```sh
Expand All @@ -64,10 +74,10 @@ Usage
In SQL environment:
```sql
-- Translate from English to Russian
select google_translate.translate('en', 'ru', 'Hello world');
select translation_proxy.google_translate('en', 'ru', 'Hello world');

-- Combine multiple text segments in single query
select * from google_translate.translate('en', 'ru', array['ok computer', 'show me more','hello world!']);
select * from translation_proxy.google_translate('en', 'ru', array['ok computer', 'show me more','hello world!']);
```

REST API:
Expand Down
25 changes: 13 additions & 12 deletions circle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,25 @@ dependencies:
- cd ~ && wget https://github.com/begriffs/postgrest/releases/download/v0.3.2.0/postgrest-0.3.2.0-ubuntu.tar.xz
- cd ~ && tar xf postgrest-0.3.2.0-ubuntu.tar.xz
- echo "~/postgrest postgres://apiuser:SOMEPASSWORD@localhost:5432/test --pool=200 --anonymous=apiuser --port=3000 --jwt-secret notverysecret --max-rows=500 --schema=v1" > ~/postgrest-run.sh && chmod a+x ~/postgrest-run.sh
- npm install -g npm newman
- npm install -g npm newman ava
database:
override:
- sudo -u postgres psql -c "create role apiuser password 'SOMEPASSWORD' login;"
- sudo -u postgres psql -c "create database test;"
- sudo -u postgres psql -c "alter database test set google_translate.api_key = 'AIzaSyCauv2HRjprFX3DcGhorJFYGyeVmzvunuc';"
- sudo -u postgres psql -c "alter database test set google_translate.begin_at = '2000-01-01';"
- sudo -u postgres psql -c "alter database test set google_translate.end_at = '2100-01-01'"
- sudo -u postgres psql -c "alter database test set translation_proxy.google_api_key = 'AIzaSyCauv2HRjprFX3DcGhorJFYGyeVmzvunuc';"
- sudo -u postgres psql -c "alter database test set translation_proxy.google_begin_at = '2000-01-01';"
- sudo -u postgres psql -c "alter database test set translation_proxy.google_end_at = '2100-01-01'"
- sudo -u postgres psql test -c "create extension if not exists plsh;"
- sudo -u postgres psql test -f ~/postgrest-google-translate/install_core.sql
- sudo -u postgres psql test -f ~/postgrest-google-translate/install_api.sql
- sudo -u postgres psql test -f ~/postgrest-translation-proxy/install_google_core.sql
- sudo -u postgres psql test -f ~/postgrest-translation-proxy/install_promt_core.sql
- sudo -u postgres psql test -f ~/postgrest-translation-proxy/install_bing_core.sql
- sudo -u postgres psql test -f ~/postgrest-translation-proxy/install_api.sql
- ~/postgrest-run.sh:
background: true
background: true
test:
override:
- ~/"$CIRCLE_PROJECT_REPONAME"/test/run.sh -f junit >$CIRCLE_TEST_REPORTS/junit.xml
- ~/postgrest-translation-proxy/test/run.sh -f junit >$CIRCLE_TEST_REPORTS/junit.xml
- nc -z -v -w5 localhost 3000
- newman run ~/"$CIRCLE_PROJECT_REPONAME"/test/postman/postgrest-google-translate.postman_collection --bail -e ~/postgrest-google-translate/test/postman/local.postman_environment --reporter-junit-export $CIRCLE_TEST_REPORTS/newman.xml
- sudo -u postgres psql test -v ON_ERROR_STOP=1 -f ~/"$CIRCLE_PROJECT_REPONAME"/uninstall_api.sql
- sudo -u postgres psql test -v ON_ERROR_STOP=1 -f ~/"$CIRCLE_PROJECT_REPONAME"/uninstall_core.sql

- newman run ~/postgrest-translation-proxy/test/postman/postgrest-translation-proxy.postman_collection --bail -e ~/postgrest-translation-proxy/test/postman/local.postman_environment --reporter-junit-export $CIRCLE_TEST_REPORTS/newman.xml
- sudo -u postgres psql test -v ON_ERROR_STOP=1 -f ~/postgrest-translation-proxy/uninstall_api.sql
- sudo -u postgres psql test -v ON_ERROR_STOP=1 -f ~/postgrest-translation-proxy/uninstall_core.sql
41 changes: 16 additions & 25 deletions install_api.sql
Original file line number Diff line number Diff line change
@@ -1,26 +1,17 @@
create schema if not exists v1;
do
$$
begin
if not exists (
select *
from pg_catalog.pg_user
where usename = 'apiuser'
) then
create role my_user password 'SOMEPASSWORD' login;
end if;
end
$$;
CREATE OR REPLACE FUNCTION v2.translate_array(source CHAR(2), target CHAR(2), q JSON)
RETURNS TEXT[] AS $BODY$
DECLARE
rez TEXT[];
BEGIN
SELECT
CASE current_setting('translation_proxy.api.current_engine')
WHEN 'google' THEN
translation_proxy.google_translate_array( source, target, q )
WHEN 'promt' THEN
translation_proxy.promt_translate_array( source, target, array_agg( json_array_elements_text(q) ) )
END INTO rez;
RETURN rez;
END;
$BODY$ LANGUAGE PLPGSQL SECURITY DEFINER;

grant usage on schema v1 to apiuser;

create or replace function v1.google_translate_array(source char(2), target char(2), q json) returns text[] as $$
select * from google_translate.translate_array(source, target, q);
$$ language sql security definer;

create or replace function v1.google_translate(source char(2), target char(2), q text) returns text as $$
select * from google_translate.translate(source, target, q);
$$ language sql security definer;

grant execute on function v1.google_translate_array(char, char, json) to apiuser;
grant execute on function v1.google_translate(char, char, text) to apiuser;
GRANT EXECUTE ON FUNCTION v2.translate_array(CHAR(2), CHAR(2), JSON) TO apiuser;
17 changes: 17 additions & 0 deletions install_api_vars.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
ALTER DATABASE DBNAME SET translation_proxy.api.current_engine = 'CURRENT_API_ENGINE';

CREATE SCHEMA IF NOT EXISTS v2;
DO
$$
BEGIN
IF NOT EXISTS (
SELECT *
FROM pg_catalog.pg_user
WHERE usename = 'apiuser'
) THEN
CREATE ROLE apiuser PASSWORD 'APIUSER-PASSWORD' LOGIN;
END IF;
END
$$;

GRANT USAGE ON SCHEMA v2 TO apiuser;
48 changes: 48 additions & 0 deletions install_bing_core.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
-- main function is the bing_translate( source char(2), target char(2), qs text[] )

-- api_key
CREATE OR REPLACE FUNCTION translation_proxy._bing_get_token_curl(text) RETURNS INTEGER AS $$
#!/bin/sh
KEY=$1
ACCESS_TOKEN = `curl -X POST -H "content-type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials&client_id=$CLIENTID&client_secret=$CLIENTSECRET&scope=http://api.microsofttranslator.com" \
https://datamarket.accesscontrol.windows.net/v2/OAuth2-13 | grep -Po '"access_token":.*?[^\\]",'`

curl -X POST --header 'Ocp-Apim-Subscription-Key: ${KEY}' --data "" 'https://api.cognitive.microsoft.com/sts/v1.0/issueToken' 2>/dev/null
$$ LANGUAGE plsh;

CREATE OR REPLACE FUNCTION translation_proxy._bing_login() RETURNS BOOLEAN AS $$
DECLARE
token TEXT;
BEGIN
token := translation_proxy._bing_get_token_curl(current_setting('translation_proxy.bing.api_key'));
IF token IS NOT NULL AND token <> '' THEN
UPDATE translation_proxy.authcache SET ( creds, updated ) = ( token, now() ) WHERE api_engine = 'bing';
RETURN 't';
ELSE
RETURN 'f';
END IF;
END;
$$ LANGUAGE plpgsql;

-- token, source lang, target lang, text, category
CREATE OR REPLACE FUNCTION translation_proxy._bing_translate_curl(TEXT, CHAR(2), CHAR(2), TEXT) RETURNS TEXT AS $$
#!/bin/sh
TOKEN=$1
SRC=$2
DST=$3
QUERY=$4
$CTG=$5
curl -X GET -H "Authorization: Bearer $TOKEN" \
--data-urlencode "text=$QUERY" \
--data-urlencode "from=$SRC" \
--data-urlencode "to=$DST" \
--data-urlencode "category=$CTG" \
'https://api.cognitive.microsoft.com/sts/v1.0/Translate' 2>/dev/null
$$ LANGUAGE plsh;

CREATE OR REPLACE FUNCTION translation_proxy.bing_translate(source CHAR(2), target CHAR(2), qs TEXT[], profile TEXT DEFAULT '')
RETURNS text[] as $$
BEGIN
END;
$$ LANGUAGE plpgsql;
3 changes: 3 additions & 0 deletions install_bing_vars.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
-- Microsoft Bing
ALTER DATABASE DBNAME SET translation_proxy.bing.api_key = 'YOUR_BING_API_KEY';
ALTER DATABASE DBNAME SET translation_proxy.bing.key_expiration = 'BING_TOKEN_EXPIRATION';
100 changes: 100 additions & 0 deletions install_global_core.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
CREATE SCHEMA IF NOT EXISTS translation_proxy;
CREATE EXTENSION IF NOT EXISTS plsh;
CREATE EXTENSION IF NOT EXISTS plpython2u;

CREATE TYPE translation_proxy.api_engine_type AS ENUM ('google', 'promt', 'bing');

CREATE TABLE translation_proxy.cache(
id BIGSERIAL PRIMARY KEY,
source char(2), -- if this is NULL, it need to be detected
target char(2) NOT NULL,
q TEXT NOT NULL,
result TEXT, -- if this is NULL, it need to be translated
profile TEXT NOT NULL DEFAULT '',
created TIMESTAMP NOT NULL DEFAULT now(),
api_engine translation_proxy.api_engine_type NOT NULL,
encoded TEXT -- urlencoded string for GET request. Is null after an successfull translation.
);

CREATE UNIQUE INDEX u_cache_q_source_target ON translation_proxy.cache
USING btree(md5(q), source, target, api_engine, profile);
CREATE INDEX cache_created ON translation_proxy.cache ( created );
COMMENT ON TABLE translation_proxy.cache IS 'The cache for API calls of the Translation proxy';

-- trigger, that URLencodes query in cache, when no translation is given
CREATE OR REPLACE FUNCTION translation_proxy._urlencode_fields()
RETURNS TRIGGER AS $BODY$
from urllib import quote_plus
TD['new']['encoded'] = quote_plus( TD['new']['q'] )
return 'MODIFY'
$BODY$ LANGUAGE plpython2u;

CREATE TRIGGER _prepare_for_fetch BEFORE INSERT ON translation_proxy.cache
FOR EACH ROW
WHEN (NEW.result IS NULL)
EXECUTE PROCEDURE translation_proxy._urlencode_fields();

-- cookies, oauth keys and so on
CREATE TABLE translation_proxy.authcache(
api_engine translation_proxy.api_engine_type NOT NULL,
creds TEXT,
updated TIMESTAMP NOT NULL DEFAULT now()
);

CREATE UNIQUE INDEX u_authcache_engine ON translation_proxy.authcache ( api_engine );

COMMENT ON TABLE translation_proxy.authcache IS 'Translation API cache for remote authorization keys';

INSERT INTO translation_proxy.authcache (api_engine) VALUES ('google'), ('promt'), ('bing')
ON CONFLICT DO NOTHING;

CREATE OR REPLACE FUNCTION translation_proxy._save_cookie(engine translation_proxy.api_engine_type, cookie TEXT)
RETURNS VOID AS $$
BEGIN
UPDATE translation_proxy.authcache
SET ( creds, updated ) = ( cookie, now() )
WHERE api_engine = engine;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION translation_proxy._load_cookie(engine translation_proxy.api_engine_type)
RETURNS TEXT AS $$
DECLARE
cookie TEXT;
BEGIN
SELECT creds INTO cookie FROM translation_proxy.authcache
WHERE api_engine = engine AND
updated > ( now() - current_setting('translation_proxy.promt.login_timeout')::INTERVAL )
AND creds IS NOT NULL AND creds <> ''
LIMIT 1;
RETURN cookie;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION translation_proxy._find_detected_language(qs TEXT, engine translation_proxy.api_engine_type)
RETURNS TEXT AS $$
DECLARE
lng CHAR(2);
BEGIN
SELECT lang INTO lng FROM translation_proxy.cache
WHERE api_engine = engine AND q = qs AND lang IS NOT NULL
LIMIT 1;
RETURN lng;
END;
$$ LANGUAGE plpgsql;

-- adding new parameter to url until it exceeds the limit of 2000 bytes
CREATE OR REPLACE FUNCTION translation_proxy._urladd( url TEXT, a TEXT ) RETURNS TEXT AS $$
from urllib import quote_plus
r = url + quote_plus( a )
if len(r) > 1999 :
plpy.error('URL length is over, time to fetch.', sqlstate = 'EOURL')
return r
$$ LANGUAGE plpython2u;

-- urlencoding utility
CREATE OR REPLACE FUNCTION translation_proxy._urlencode(q TEXT)
RETURNS TEXT AS $BODY$
from urllib import quote_plus
return quote_plus( q )
$BODY$ LANGUAGE plpython2u;
Empty file added install_global_vars.sql
Empty file.
Loading

0 comments on commit f7ebb99

Please sign in to comment.