Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cicd #268

Merged
merged 396 commits into from
Mar 4, 2020
Merged

Feature/cicd #268

merged 396 commits into from
Mar 4, 2020

Conversation

dylanbannon
Copy link
Contributor

@dylanbannon dylanbannon commented Feb 20, 2020

We now have automated cluster creation and destruction tests that are triggered by pushes!

This branch isn't ready to merge yet. This PR is just being filed to get feedback from others.

Here are the remaining points of work/discussion:

TODOs

  • A lot of environmental variables are defined in .github/workflows/main.yml, more than we need. A minimal list of necessary environmental variables should be identified.
  • The service account used for deployment currently has the following roles assigned in Google Cloud: Editor, Compute Admin, Kubernetes Engine Admin, Service Account Admin, and Service Account User. A minimal set of these should be identified and documented, perhaps as comments to the gke/test/create/service-account endpoint.
  • The cluster name (currently, the CLOUDSDK_CONTAINER_CLUSTER and CLUSTER_NAME variables) should be randomized, somehow. This might take a little work. Ideally, it'd be randomized similar to the way that cluster names are randomized by default in the main menu.
  • A necessary prerequisite for automation is installing the kubens script. It's committed to the git project and just sitting in the conf folder right now, but that looks awkward. I need to revise the test Makefile target to install kubens on the fly before cluster configuration.

Discussion points

  • The tests currently run on Github Actions, but our Docker build appears to be controlled by Travis. We should consolidate these tests and the Docker build together onto either Github Actions or Travis. I don't have any preference which.
  • I made copies of all Makefiles I needed to modify (saved as [Makefile_name].test) and left the copies in the same directories as the originals. This might not be ideal for organization. Other organizational options include adding the test/ targets to existing Makefiles, instead of breaking them out into their own files or, if we're still breaking them out, maybe organizing all the *.test Makefiles into their own folder structure. We should talk about the best way to organize this.
  • There's currently no way to test ELK deployment, since the shopt command is incompatible with the Github Actions environment. If we switch everything over to Travis, this might get resolved. Otherwise, we should think about whether it's worth replacing the shopt calls in order to enable this test.
  • FINALLY: Right now, we're just creating and deleting a cluster, but not testing any cluster functionality. This is a great start, but a lot of the core functionality that we've unintentionally broken in the past (and then spent hours or days repairing) is still outside the tests' scope. We need to add tests for cluster functionality once the cluster is up.

Fixes #273.
Fixes #269.
Fixes #176!

@dylanbannon
Copy link
Contributor Author

The ELK stack test is now disabled unless a commit message contains [test-elk].

To re-enable the ELK stack test, we need to resolve #274 first.

@dylanbannon
Copy link
Contributor Author

dylanbannon commented Mar 3, 2020

@willgraf

Alright, I'm going to say that I've addressed all the points in the last review:

I've punted a few things to other issues, mostly just to get this PR to a merge-able state, but I think it has accomplished its main goal and I'm confident we're tracking all of the dangling issues we identified in the process.

Fixes #273.
Fixes #269.
Fixes #176!

@dylanbannon dylanbannon changed the base branch from master to stable March 3, 2020 22:48
.github/PULL_REQUEST.md Outdated Show resolved Hide resolved
@willgraf
Copy link
Contributor

willgraf commented Mar 4, 2020

I was about to approve the changes to this PR, but there's no green checkmark coming up!

Maybe we can build (but not run) on all branches, but only run the integration tests on stable/master? We could mock up the unit test part here, but just have it echo "no unit tests yet" or something as a placeholder. That way it is easier to add units next issue.

Additionally, maybe we can also update the name of the stage to integration or integration-tests or something for more clarity.

@dylanbannon
Copy link
Contributor Author

dylanbannon commented Mar 4, 2020

I've updated the name of the test stage to integration_tests and I've added a new stage, unit_tests, with a stub placeholder for future unit tests. Currently, unit_tests runs and succeeds on every push.

Copy link
Contributor

@willgraf willgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯
Great work! Thanks for the patience with all the changes.

@dylanbannon
Copy link
Contributor Author

WHOO!

@dylanbannon dylanbannon merged commit a1b5b4b into stable Mar 4, 2020
@dylanbannon dylanbannon deleted the feature/cicd branch March 4, 2020 19:37
willgraf added a commit that referenced this pull request May 22, 2020
* Feature/cicd (#268)

* Set all custom charts image.pullPolicy to IfNotPresent (#258)

* setting TRANSLATE_COLON_NOTATION=false by default (#289)

* Update Getting Started  (#287)

* Update PULL_REQUEST.md for grammar (#292)

* Use gomplate to template patches/hpa.yaml. (#293)

* default account has 100 firewalls, not 200. (#297)

* Update all documentation and links to reference kiosk-console instead of kiosk (#295)

* Use yq and helmfile build to dynamically deploy helm charts based on release name. (#300)

* Upgrade the openvpn chart to latest 4.2.1. (#301)

* Change CLUSTER in Makefile to kiosk-console to fix binary name issue. (#302)

* update raw.gif and tracked.gif with new nearly perfect gif (#303)

* Update default values for tf-serving (#306)

* Update Redis to the latest helm chart before they migrate to bitnami (#307)

* Update autoscaler to 0.4.1 (#308)

* Update redis-janitor to 0.3.1 (#309)

* Update frontend to 0.4.1. (#310)

* Update OpenVPN command for version 4.2.1 (#313)

* Upgrade consumers to 0.5.1 and update models to DeepWatershed. (#311)

* Set no-appendfsync-on-rewrite=yes to prevent Redis latency issues during AOF fsync (#316)

* Install yq in install_script.sh (#319)

* Use 4 random digits for cluster names. (#318)

* update to latest version of the frontend (#322)

* Change default consumer machine type to n1-standard-2 (#323)

* Upgrade benchmarking to 0.2.4 and fix for Deep Watershed models (#324)

* Use GRAFANA_PASSWORD env var to override the default grafana password. (#325)

* Update Getting Started docs with new user feedback (#321)

* Add basic unit tests (#326)

* Use the docker container to run integration tests. (#327)

* Warn users if bucket's region and cluster's region do not match (#329)

* Bump benchmarking to latest 0.2.5 release (#331)

* Add Logo Banner and Update README (#332)

* Add new menu option for default settings with 4 GPUs (#333)

* Update HPA target to 2 keys per zip consumer pod. (#334)

* Bump consumers to version 0.5.2 (#336)

* Update consumer and benchmarking versions (#337)

* Bump redis-janitor to 0.3.2 to fix empty key bug. (#339)

* bump benchmarking to 0.3.1 to fix No route to host bug. (#341)

* Allow users to select which zone(s) to deploy the cluster (#340)

* Pin KUBERNETES_VERSION to 1.14. (#346)

* Fix bug indexing into last array element of valid_zones. (#348)

* Fix logs to indicate finality and be less redundant. (#351)

* If KUBERNETES_VERSION is 1.14, warn user of potential future version removal (#352)

Co-authored-by: dylanbannon <[email protected]>
Co-authored-by: MekWarrior <[email protected]>
willgraf pushed a commit that referenced this pull request May 23, 2020
Lots of stuff in this squashed commit:

* Integration tests for cluster deployment on GKE were added. Part of this process was paring down a minimal build environment with 1) a minimal set of environmental variables, 2) a permanent service account with a minimal set of permissions, and 3) a VM environment with a minimal set of dependencies (see `.travis/install_script.sh`).

* Integration testing utilizes as much production code as possible (almost total overlap).

* All testing is currently done in TravisCI and has been optimized to minimize build time.

* Integration tests only run on merges into master, but can be triggered at any time by placing [build-integration-tests] in a commit message.

* Testing of ELK-enabled clusters is currently turned off, due to the potential for dangling resources upon cluster failure. ELK-enabled integration tests can be triggered by including both [build-integration-tests] and [test-elk] in the same commit message.

* The project's pull request template now "requires" integration tests to be run, although it's up to the reviewer to enforce this requirement.

Minor:

* Added CONF_PATH_PREFIX variable for compatibility between production environment and testing VMs.

* Replaced shopt calls with a new system for toggling ELK deployment. (ELK_DEPLOYMENT_TOGGLE = "" by default and any other value turns ELK deployment on.)
willgraf added a commit that referenced this pull request May 23, 2020
Lots of stuff in this squashed commit:

* Integration tests for cluster deployment on GKE were added. Part of this process was paring down a minimal build environment with 1) a minimal set of environmental variables, 2) a permanent service account with a minimal set of permissions, and 3) a VM environment with a minimal set of dependencies (see `.travis/install_script.sh`).

* Integration testing utilizes as much production code as possible (almost total overlap).

* All testing is currently done in TravisCI and has been optimized to minimize build time.

* Integration tests only run on merges into master, but can be triggered at any time by placing [build-integration-tests] in a commit message.

* Testing of ELK-enabled clusters is currently turned off, due to the potential for dangling resources upon cluster failure. ELK-enabled integration tests can be triggered by including both [build-integration-tests] and [test-elk] in the same commit message.

* The project's pull request template now "requires" integration tests to be run, although it's up to the reviewer to enforce this requirement.

Minor:

* Added CONF_PATH_PREFIX variable for compatibility between production environment and testing VMs.

* Replaced shopt calls with a new system for toggling ELK deployment. (ELK_DEPLOYMENT_TOGGLE = "" by default and any other value turns ELK deployment on.)
willgraf pushed a commit that referenced this pull request May 23, 2020
Lots of stuff in this squashed commit:

* Integration tests for cluster deployment on GKE were added. Part of this process was paring down a minimal build environment with 1) a minimal set of environmental variables, 2) a permanent service account with a minimal set of permissions, and 3) a VM environment with a minimal set of dependencies (see `.travis/install_script.sh`).

* Integration testing utilizes as much production code as possible (almost total overlap).

* All testing is currently done in TravisCI and has been optimized to minimize build time.

* Integration tests only run on merges into master, but can be triggered at any time by placing [build-integration-tests] in a commit message.

* Testing of ELK-enabled clusters is currently turned off, due to the potential for dangling resources upon cluster failure. ELK-enabled integration tests can be triggered by including both [build-integration-tests] and [test-elk] in the same commit message.

* The project's pull request template now "requires" integration tests to be run, although it's up to the reviewer to enforce this requirement.

Minor:

* Added CONF_PATH_PREFIX variable for compatibility between production environment and testing VMs.

* Replaced shopt calls with a new system for toggling ELK deployment. (ELK_DEPLOYMENT_TOGGLE = "" by default and any other value turns ELK deployment on.)
willgraf pushed a commit that referenced this pull request May 23, 2020
Lots of stuff in this squashed commit:

* Integration tests for cluster deployment on GKE were added. Part of this process was paring down a minimal build environment with 1) a minimal set of environmental variables, 2) a permanent service account with a minimal set of permissions, and 3) a VM environment with a minimal set of dependencies (see `.travis/install_script.sh`).

* Integration testing utilizes as much production code as possible (almost total overlap).

* All testing is currently done in TravisCI and has been optimized to minimize build time.

* Integration tests only run on merges into master, but can be triggered at any time by placing [build-integration-tests] in a commit message.

* Testing of ELK-enabled clusters is currently turned off, due to the potential for dangling resources upon cluster failure. ELK-enabled integration tests can be triggered by including both [build-integration-tests] and [test-elk] in the same commit message.

* The project's pull request template now "requires" integration tests to be run, although it's up to the reviewer to enforce this requirement.

Minor:

* Added CONF_PATH_PREFIX variable for compatibility between production environment and testing VMs.

* Replaced shopt calls with a new system for toggling ELK deployment. (ELK_DEPLOYMENT_TOGGLE = "" by default and any other value turns ELK deployment on.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wip
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants