Skip to content

Commit

Permalink
first version of feature elastic search ready
Browse files Browse the repository at this point in the history
* First approach to use the elasticsearch service

* elasticsearch query example

* elasticsearch query complete example

* Elasticsearch setup configured

* Remove .DS_Store and add it to dockerignore

* Created an elastic search endpoint. It isnt working yet

* Just to test the new database

* To rebase develop

* Working but not up-to-date with develop branch

* Service working. Need to get up-to-date with develop

* Some bugfixes for the connectors

* Deleting docker images in clean script

* Fixed issues with authentication

* Publication search seems to work. TODO: test cases and other resourcse

* platform and platform_identifier changed from aiod_entry to each instance

* Created testcase for publication search

* Made ElasticSearch router generic, implemented it for dataset

* Logstash configured for dataset, experiment, ml_model, publication, and service

* Logstash configured for dataset, experiment, ml_model, publication, and service

* Logstash waits until fill-db-with-examples ends

* take src from develop

* Copied entire develop branch

* Logstash configuration readapted to new names

* Logstash configuration readapted to new names

* added ai4experiments to platform names

* Copied initial search routers to start creating them

* Examples of ml_model, dataset and experiment used to insert ai4experiment data

* Descriptions of the ai4experiment data improved

* platform added to mappings

* elasticsearch query example completed

* First version of search service working

* Search router tests implemented

* Search fields selection added

* Added search for event, news, ortganisation and project

* Added routers for event, news, organisation and project

* Logstash names changed

* added logstash_config.py, just for having it there

* Pagination changed to actual pages

* Pagination changed to actual pages

* Application areas added to elasticsearch resuts

* First version with deletion

* Prepared to be merged with develop

* pull request modifications

* pull request modifications

* pull request modifications

* Combined search with sql queries in process

* Search functionality combined with optional SQL statment to retrieve everything

* Elasticsearch and logstash configuration integrated in src

* Search router tests actualised

* Search router tests actualised

* Search router tests actualised

* pre-commit passed

* All test passed and working. Not merged with develop

* huggingface connector test to its original state

* back to commented huggingface connector

* Fixing unittests by making sure Elasticsearch instance can also be created when ES_USER and ES_PASSWORD env vars are empty; used the style of PR #199

* clean logstash configuration

* clean logstash configuration

* clean logstash configuration

* clean logstash configuration

* clean logstash configuration

* clean logstash configuration

* logstash config files generated with jinja2

* logstash config files generated with jinja2

* Logstash config files generated with jinja2. All test passed, but not merged with develop.

* Second round of pull request comments

* Second round of pull request comments

* Second round of pull request comments

* Second round of pull request comments

* Second round of pull request comments

* Created data/elasticsearch/.gitkeep to make sure it exists with the right permissions

* Deleted autogenerated file logstash/config/logstash.yml

* cleanup

* Making sure docker compose up works even if generated files do not exist; added logging; simplified file names

* Make sure data folders are always created with correct permissions (this was by accident removed in commit cc8c22f)

* Added default logstash configuration

* Fixed docker compose

* Using FastAPI input validation

* Made status nullable, so that we can return an empty status in the search_router

---------

Co-authored-by: Adrián <[email protected]>
Co-authored-by: Jos van der Velde <[email protected]>
  • Loading branch information
3 people authored Nov 28, 2023
1 parent 4e053bc commit 7155396
Show file tree
Hide file tree
Showing 61 changed files with 1,777 additions and 17 deletions.
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
scripts
venv
data
**.DS_Store
10 changes: 10 additions & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,13 @@ KEYCLOAK_ADMIN_PASSWORD=password
KEYCLOAK_CLIENT_SECRET="QJiOGn09eCEfnqAmcPP2l4vMU8grlmVQ"
REDIRECT_URIS=http://${HOSTNAME}/docs/oauth2-redirect
POST_LOGOUT_REDIRECT_URIS=http://${HOSTNAME}/aiod-auth/realms/aiod/protocol/openid-connect/logout

#ELASTICSEARCH
ES_USER=elastic
ES_PASSWORD=changeme
ES_DISCOVERY_TYPE=single-node
ES_ROLE="edit_aiod_resources"
ES_JAVA_OPTS="-Xmx256m -Xms256m"

#LOGSTASH
LS_JAVA_OPTS="-Xmx256m -Xms256m"
14 changes: 13 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,18 @@
# Project Specific

# data/ is intended for database data from the mysql container
data/

# Generated Logstash configuration
logstash/config/config/logstash.yml
logstash/config/config/pipelines.yml
logstash/config/pipeline
logstash/config/sql





# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -113,6 +124,7 @@ venv/
ENV/
env.bak/
venv.bak/
**.DS_Store

# Spyder project settings
.spyderproject
Expand All @@ -135,4 +147,4 @@ dmypy.json
# Pyre type checker
.pyre/

.vscode
.vscode
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,11 @@ For development:
- Additional 'mysqlclient' dependencies. Please have a look at [their installation instructions]
(https://github.com/PyMySQL/mysqlclient#install).

## Production environment

For production environments elasticsearch recommends -Xss4G and -Xmx8G for the JVM settings.\
This parameters can be defined in the .env file.
See the [elasticsearch guide](https://www.elastic.co/guide/en/logstash/current/jvm-settings.html).

## Installation

Expand Down
10 changes: 5 additions & 5 deletions connectors/fill-examples.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,6 @@ python3 connectors/synchronization.py \
-c connectors.example.example.ExampleEducationalResourceConnector \
-w /opt/connectors/data/example/educational_resource

python3 connectors/synchronization.py \
-c connectors.example.example.ExampleEventConnector \
-w /opt/connectors/data/example/event

python3 connectors/synchronization.py \
-c connectors.example.example.ExampleExperimentConnector \
-w /opt/connectors/data/example/experiment
Expand All @@ -40,6 +36,10 @@ python3 connectors/synchronization.py \
-c connectors.example.example.ExamplePersonConnector \
-w /opt/connectors/data/example/person

python3 connectors/synchronization.py \
-c connectors.example.example.ExampleEventConnector \
-w /opt/connectors/data/example/event

python3 connectors/synchronization.py \
-c connectors.example.example.ExampleProjectConnector \
-w /opt/connectors/data/example/project
Expand Down Expand Up @@ -92,4 +92,4 @@ python3 connectors/synchronization.py \

python3 connectors/synchronization.py \
-c connectors.example.enum.EnumConnectorStatus \
-w /opt/connectors/data/enum/status
-w /opt/connectors/data/enum/status
Empty file added data/elasticsearch/.gitkeep
Empty file.
67 changes: 65 additions & 2 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

version: '3.9'

services:
Expand Down Expand Up @@ -45,8 +46,7 @@ services:
depends_on:
app:
condition: service_healthy



deletion:
build:
context: deletion
Expand Down Expand Up @@ -167,3 +167,66 @@ services:
depends_on:
app:
condition: service_healthy

elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.8.2
container_name: elasticsearch
env_file: .env
environment:
- ES_JAVA_OPTS=$ES_JAVA_OPTS
- ELASTIC_USER=$ES_USER
- ELASTIC_PASSWORD=$ES_PASSWORD
- discovery.type=$ES_DISCOVERY_TYPE
ports:
- 9200:9200
- 9300:9300
volumes:
- type: bind
source: ./es/elasticsearch.yml
target: /usr/share/elasticsearch/config/elasticsearch.yml
read_only: true
- ./data/elasticsearch:/usr/share/elasticsearch/data
healthcheck:
test: ["CMD-SHELL", "curl -u $ES_USER:$ES_PASSWORD --silent --fail localhost:9200/_cluster/health || exit 1"]
interval: 5s
timeout: 30s
retries: 30

es_logstash_setup:
image: ai4eu_server
container_name: es_logstash_setup
env_file: .env
environment:
- MYSQL_ROOT_PASSWORD=$MYSQL_ROOT_PASSWORD
- ES_USER=$ES_USER
- ES_PASSWORD=$ES_PASSWORD
volumes:
- ./src:/app
- ./logstash:/logstash
command: >
/bin/bash -c "python setup/logstash_setup/generate_logstash_config_files.py &&
python setup/es_setup/generate_elasticsearch_indices.py"
restart: "no"
depends_on:
elasticsearch:
condition: service_healthy
logstash:
build:
context: logstash/
dockerfile: Dockerfile
container_name: logstash
env_file: .env
environment:
- LS_JAVA_OPTS=$LS_JAVA_OPTS
ports:
- 5044:5044
- 5000:5000/tcp
- 5000:5000/udp
- 9600:9600
volumes:
- ./logstash/config/config:/usr/share/logstash/config:ro
- ./logstash/config/pipeline:/usr/share/logstash/pipeline:ro
- ./logstash/config/sql:/usr/share/logstash/sql:ro
depends_on:
es_logstash_setup:
condition: service_completed_successfully
13 changes: 13 additions & 0 deletions es/elasticsearch.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
## Default Elasticsearch configuration from Elasticsearch base image.
## https://github.com/elastic/elasticsearch/blob/master/distribution/docker/src/docker/config/elasticsearch.yml
#
cluster.name: "docker-cluster"
network.host: 0.0.0.0

## X-Pack settings
## see https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-xpack.html
#
xpack.license.self_generated.type: basic
xpack.security.enabled: true
xpack.monitoring.collection.enabled: true
13 changes: 13 additions & 0 deletions logstash/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# https://www.docker.elastic.co/
FROM docker.elastic.co/logstash/logstash:8.11.0

# Download MySQL JDBC driver to connect Logstash to MySQL
RUN curl -Lo "mysql-connector-j-8.2.0.tar.gz" "https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-j-8.2.0.tar.gz" \
&& tar -xf "mysql-connector-j-8.2.0.tar.gz" "mysql-connector-j-8.2.0/mysql-connector-j-8.2.0.jar" \
&& mv "mysql-connector-j-8.2.0/mysql-connector-j-8.2.0.jar" "mysql-connector-j.jar" \
&& rm -r "mysql-connector-j-8.2.0" "mysql-connector-j-8.2.0.tar.gz"

ENTRYPOINT ["/usr/local/bin/docker-entrypoint"]

# Add your logstash plugins setup here
# Example: RUN logstash-plugin install logstash-filter-json
72 changes: 72 additions & 0 deletions logstash/config/config/jvm.options
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
## JVM configuration

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms1g
-Xmx1g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
11-13:-XX:+UseConcMarkSweepGC
11-13:-XX:CMSInitiatingOccupancyFraction=75
11-13:-XX:+UseCMSInitiatingOccupancyOnly

## Locale
# Set the locale language
#-Duser.language=en

# Set the locale country
#-Duser.country=US

# Set the locale variant, if any
#-Duser.variant=

## basic

# set the I/O temp directory
#-Djava.io.tmpdir=$HOME

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
#-Djna.nosys=true

# Turn on JRuby invokedynamic
-Djruby.compile.invokedynamic=true

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${LOGSTASH_HOME}/heapdump.hprof

## GC logging
#-Xlog:gc*,gc+age=trace,safepoint:file=@loggc@:utctime,pid,tags:filecount=32,filesize=64m

# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${LS_GC_LOG_FILE}

# Entropy source for randomness
-Djava.security.egd=file:/dev/urandom

# Copy the logging context from parent threads to children
-Dlog4j2.isThreadContextMapInheritable=true
Loading

0 comments on commit 7155396

Please sign in to comment.