Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for making PDO work in SGX HW-mode #477

Merged
merged 3 commits into from
Apr 9, 2024

Conversation

bvavala
Copy link
Member

@bvavala bvavala commented Mar 14, 2024

This PR continues the work in #463 to enable tests in SGX HW-mode, and makes the following contributions.

  1. this PR adds targets to the docker makefile to enable sgx builds and tests.
  2. the enclave code signing key is passed from the host to the docker container (if one exists)
  3. it adds the necessary volumes and devices for sgx tests in docker compose
  4. it assigns a location for sgx keys in a container (different from the xfer folder)
  5. at runtime, it copies the sgx keys to the xfer folder on the host, and, in the container, from the xfer folder to the assigned location
  6. it adds (or updates) documentation regarding the location of the sgx keys, the build and deployment of the services in hw mode, and the registration of the attestation policy on the ledger
  7. it types the services docker image built for sgx by tagging it as pdo_services_sgx
  8. it removes the PDO_SPID, PDO_SPID_API_KEY and PDO_ENCLAVE_CODE_SIGN_PEM environment variables; now the material is meant to be found under PDO_SGX_KEY_ROOT
  9. it provides two scripts for preparing the enclave signing key and the sgx keys for the docker container build
  10. it remove the HttpsProxy configuration parameter in the toml files
  11. it adds the sgx_key_root configuration parameter to the toml files, so at configuration time this is immediately added to the toml files, and no input parameter is necessary for eservice, enclave-info, and other scripts (because it's included in the toml file they'll use)
  12. it adds the sgx_key_root option to multiple scripts to override the value in the toml file
  13. it removes the expand-config script as it is unused
  14. for docker builds, it adds a fetch-ias-certificate step on the host side (who builds the docker images), so to avoid connections to IAS from docker images at build time and issues derived from network/proxy misconfigurations -- this does not change the normal (non-docker-based) build process
  15. it adds fixes for docker compose when testing PDO in hw mode behind a proxy.

Copy link
Contributor

@cmickeyb cmickeyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally good. biggest issue is that you are making the makefile a lot more complicated than it needs to be. just add a "test_hw" (or something) target that explicitly sets the target images. note that you probably need to also change the image name to reflect that hw status (for the images that change)

docker/tools/environment.sh Outdated Show resolved Hide resolved
docker/Makefile Show resolved Hide resolved
docker/Makefile Show resolved Hide resolved
Copy link
Contributor

@g2flyer g2flyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me (even after Mic's counter-argument to my counter-argument to his ;-)). Also tests seem to run fine.

PS: On non-FLC machines, e.g., SKL, the upstreamed in-kernel driver do not work and the out-of-free kernel in binary distribution or main branch in https://github.com/intel/linux-sgx-driver do not compile anymore. However, applying the patch from https://github.com/intel/linux-sgx-driver/pull/151/files made driver compile and everything work with (on host) sgx 2.23

docker/tools/environment.sh Outdated Show resolved Hide resolved
@bvavala bvavala marked this pull request as draft March 15, 2024 18:26
build/common-config.sh Outdated Show resolved Hide resolved
@cmickeyb
Copy link
Contributor

I still have a number of conceptual issues with this. It feels like we are designing around the code rather than figuring out the right thing to do.

First, we still need all the environment variables for host build. For docker builds we are trying to remove (effectively) ALL of the pre-established environment variables. So... removing SGX_MODE from common_config is not a solution. And its completely orthogonal to setting the variable in the makefile for docker.

Second, we have an outstanding question for where to put the sgx keys and when to remove them. Who needs the enclave signing keys? When are they necessary. If they are truly ephemeral, then, as we discussed, I don't care where they are put so long as they are removed afterwards. If we need those keys, then you need to think more about where they are put and when they are put there. And we are NOT constrained by how we currently build the enclave (i.e. its OK to change the build flow to make this work "right"). To this end, I would really like to see a "document" that covers the process more thoroughly.

@bvavala
Copy link
Member Author

bvavala commented Mar 19, 2024

I revised the PR.

  • the SGX_MODE variable is still there
  • now the enclave sigining key is generated externally, passed to the container through the dockerfile and removed after the build

@bvavala bvavala marked this pull request as ready for review March 19, 2024 15:51
@bvavala bvavala requested a review from cmickeyb March 19, 2024 15:51
build/common-config.sh Outdated Show resolved Hide resolved
docker/Makefile Outdated Show resolved Hide resolved
docker/Makefile Outdated Show resolved Hide resolved
@bvavala bvavala marked this pull request as draft March 20, 2024 21:30
@bvavala bvavala marked this pull request as ready for review March 21, 2024 16:48
Copy link
Contributor

@g2flyer g2flyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After figuring out that i must define PDO_ENCLAVE_CODE_SIGN_PEM and PDO_SGX_KEY_ROOT for both hw and sim test (which previously worked properly config-less), "unforgetting" that docker (stupidely) does not take proxy environment variables but has it's own setting in ~/.docker/settings.json) and also realizing that EPID works only on icelake client (i.e., ICL) but not icelake server (i.e., ICX), contrary to what i was told, i could successfully run SIM & HW mode on a KBL NUC outside of FW and SIM mode on an ICX machine behind FW for SIM. Find my main concerns as file annotations.

docker/Makefile Outdated Show resolved Hide resolved
docker/Makefile Outdated Show resolved Hide resolved
docker/Makefile Outdated
@@ -112,6 +121,12 @@ stop_client :
# performance requirements are relatively low.
# -----------------------------------------------------------------
repository :
# if an enclave signing key is available on the host, copy that under build/keys in the repo
# Note: the docker build (see PDO_ENCLAVE_CODE_SIGN_PEM in environment.sh) expects the key there
[ ! -e ${PDO_ENCLAVE_CODE_SIGN_PEM} ] ||\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test also whether PDO_ENCLAVE_CODE_SIGN_PEM is actually defined at all! Note in previous version you did not have to define any env variable to build docker ....

Copy link
Member Author

@bvavala bvavala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here:

  1. I have added some checks to fail more gracefully and see what's going on
  2. however, according to our documentation, build/common-config.sh should be sourced first when developing in PDO. Hence,
    a. PDO_ENCLAVE_CODE_SIGN_PEM is always defined (either by the user, or by the script)
    b. it would be more effective to check whether the script was sourced -- I'm open to suggestions

Copy link
Contributor

@g2flyer g2flyer Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. however, according to our documentation, build/common-config.sh should be sourced first when developing in PDO. Hence,

This is not really required for docker (and from the docker docu also not obvious). That said, couldn't we just source that file in the docker makefile after having set SGX_MODE to get exactly what we want: defaults for somebody who runs docker and hasn't defined anything and the desired specific values otherwise?

2. a. PDO_ENCLAVE_CODE_SIGN_PEM is always defined (either by the user, or by the script)
b. it would be more effective to check whether the script was sourced -- I'm open to suggestions

Given that we document that if not defined we always create one, your current else from if [ ! -z "${PDO_ENCLAVE_CODE_SIGN_PEM}" ]; then ... seems the right way. However, iff it is defined, why do we insist that it has to be in ${PDO_SGX_KEY_ROOT} (outside of docker) and don't just copy it to xfer directly? If we really want to have separate definition for PDO_SGX_KEY_ROOT and PDO_ENCLAVE_CODE_SIGN_PEM i don't think we should "merge" them ...

Copy link
Member Author

@bvavala bvavala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we insist that it has to be in ${PDO_SGX_KEY_ROOT} (outside of docker) and don't just copy it to xfer directly?

(Good question!)
From previous discussions with Mic, we noticed two things:

  1. the enclave code signing key right now is only needed at build time, and if one is available we should pass it into the build
  2. the build happens while the docker image is created, and we cannot mount the xfer folder at that time.

For these reasons, we decided to make ${PDO_SGX_KEY_ROOT} (set to its default value, and thus within the repo!) as the destination folder for the key. In this way, the key copied/cloned together with the repo in the docker container. Then, at build time within the container, the common-config is sourced (setting the default values), and the build will find the key in the right place.

Since the built image is meant to be distributed to users, in the future we will have to remove the key material. We can accomplish that by removing the cloned repo (which includes the key).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i see. But might be worth mentioning above as a comment in that code fragment as not so obvious?

docker/README.md Outdated
Comment on lines 89 to 90
Inside the `pdo_services` images, the `SGX_MODE` environment variable
can help distinguish the build type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i think also Mic has suggested i think we should "type" the images names, not doing so is asking for trouble; in particular given that contrary to the previous approach of relying on external variable definitino of SGX_MODE (which i preferred) we have now multiple targets which "conflict" as far as sgx-mode is concerned which increases opportunities for mix-ups ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was previous discussion here https://github.com/hyperledger-labs/private-data-objects/pull/477/files#r1527009436.
My earlier attempt unfortunately was not promising because we have SGX_MODE in multiple images and the images are interdependent. Hence, it's difficult to do the typing without modifying makefile and yaml and docker files.

However, it is possible that SGX_MODE is only necessary in the pdo-services image. So it could be removed from the other images, and and then we simply add a new type for pdo-services in hw mode.

Conclusion:

  • the BKM here is to always start with a fresh clean development environment
  • mix-n-match can be a concern
  • solving/mitigating this requires further work -- IMO this is orthogonal to this PR, and I'm ok to do it separately.

Copy link
Contributor

@g2flyer g2flyer Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it is possible that SGX_MODE is only necessary in the pdo-services image.

Previously that was the case. However, somebody mentioned that the sdk (or, more likely, sgxssl) would be different based on mode (which on further reflection seems though rather surprising and strange).

In any case, base service is mode-agnostic, then we could (should) just remove the ENV in the corresponding dockerfile so the status is not persisted

So it could be removed from the other images, and and then we simply add a new type for pdo-services in hw mode.

That was my thinking: we just type pdo_services with the mode. Initially i thought that might also apply to ccf if we would run that in HW mode but according to Mic CCF always builds both version and so only the runtime SGX_MODE variable matters.

  • solving/mitigating this requires further work -- IMO this is orthogonal to this PR, and I'm ok to do it separately.

Why is that orthogonal? Given that you have already sgx_build and sgx_run targets, seems also easy to type that image?

Copy link
Member Author

@bvavala bvavala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that orthogonal? Given that you have already sgx_build and sgx_run targets, seems also easy to type that image?

Because I was working on it here and it was too complicated to solve it in this PR.
Also, I have not explored the simplifications that we have discussed above -- if those work, the modifications are way easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a try and the modifications were fairly easy -- I have added a commit to type the sgx image.

docker/README.md Outdated
Comment on lines 273 to 282
This updated command allows to trigger the registration step right before
starting the services. The policy registration must happen before enclaves are
registered (or any enclave registration will fail).

Finally, the _same_ SGX collateral must be made available to all service containers.
At enclave registration time, this will allow the eservice to generate the right
quote (and attestation verification report) that meets the attestation policy
originally registered with the PDO Transaction Processor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should say here also where this information is found and also should mention (and use!) the defaults of build/keys/sgx_mode_{sim|hw). Besides not supporting any default and requiring PDO_SGX_KEY_ROOT to be defined when running make docker, having to define it explicitly is also conceptually inconsistent as this variable is (at least as far as our existing default directories goes) dependent on SGX_MODE. So we are using the worst of the combination: mode is now managed by targets yet we still need mode-dependent external env variables, not allowing for a no-config version of make -C docker (as previously was possible; admittedly previously only for SGX_MODE=sim but now as we have different target we could retain that also for SGX_MODE=hw)

Copy link
Member Author

@bvavala bvavala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should say here also where this information is found and also should mention (and use!) the defaults of build/keys/sgx_mode_{sim|hw).

Correct, and that is described under docs/install.md.

Besides not supporting any default and requiring PDO_SGX_KEY_ROOT to be defined when running make docker, having to define it explicitly is also conceptually inconsistent as this variable is (at least as far as our existing default directories goes) dependent on SGX_MODE.

Let me refer again to the documentation in docs/host_install.md, which indicates to source build/common-config.sh first. If PDO_SGX_KEY_ROOT is not defined, this steps creates a default value for it as you mentioned.

So we are using the worst of the combination: mode is now managed by targets yet we still need mode-dependent external env variables, not allowing for a no-config version of make -C docker

This is what was discussed -- namely to have specific SGX targets (which clearly manage the mode).
Again, the "external" variables are not necessary if build/common-config.sh is sourced first -- as it creates the default values.

I agree that the combination can be a problem when switching on the fly between SIM and HW mode development.
For this reason the BKM is to always start with fresh clean environment.
I'm open to suggestions to simplify this for developers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. I'm open to suggestions to simplify this for developers.

Well, isn't typing in the image names exactly a simple way to simplify it & make it more robust?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably is -- but image typing brings us back to the other thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a commit to type the service image built for sgx.

@@ -71,9 +71,11 @@ function Store {
--spid ${SPID} \
--save ${eservice_enclave_info_file} \
--loglevel warn \
Copy link
Contributor

@g2flyer g2flyer Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this dependent on PDO_DEBUG_LEVEL to make trouble-shooting easier (that said, the recent addition below of log-file output to screen certainly also helps in debugging). That said, i currently have that change also in the (updated) PR #480, so could be done there ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- ok with your update, I'll leave it as is here.

@bvavala
Copy link
Member Author

bvavala commented Mar 26, 2024

After figuring out that i must define PDO_ENCLAVE_CODE_SIGN_PEM and PDO_SGX_KEY_ROOT for both hw and sim test (which previously worked properly config-less), "unforgetting" that docker (stupidely) does not take proxy environment variables but has it's own setting in ~/.docker/settings.json) and also realizing that EPID works only on icelake client (i.e., ICL) but not icelake server (i.e., ICX), contrary to what i was told, i could successfully run SIM & HW mode on a KBL NUC outside of FW and SIM mode on an ICX machine behind FW for SIM. Find my main concerns as file annotations.

Thanks @g2flyer for reviewing and testing this.
I have added a commit to better check the environment variables and the SGX collateral.
Now it is not necessary to define those variables.

Regarding the enclave signing key, if one is defined, it will be used and PDO_SGX_KEY_ROOT must be defined for transferring it in docker.

Regarding the SGX collateral, the makefile will look into PDO_SGX_KEY_ROOT (if defined), or transfer anything that's available in build/keys/sgx_mode_hw to docker. It will throw an error if the collateral is not available.

cp ${XFER_DIR}/services/keys/sgx/* ${PDO_SGX_KEY_ROOT}
# refresh the environment variables (necessary for SGX-related ones)
source /project/pdo/tools/environment.sh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, why are we duplicating this? we can do better with the logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No duplication anymore.

docker/README.md Outdated Show resolved Hide resolved
@bvavala bvavala marked this pull request as draft March 26, 2024 15:58
Copy link
Contributor

@g2flyer g2flyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your changes from yesterday night i'm quite happy from my side: typed images seemed to work nicely and in either hw or sim case i don't have to specify any environment variables anymore with docker picking up the defaults and docker test works fine.

@bvavala bvavala marked this pull request as ready for review March 29, 2024 00:06
@bvavala bvavala requested a review from cmickeyb March 29, 2024 00:07
@bvavala bvavala marked this pull request as draft April 3, 2024 19:53
A new docker build target is added to build the services to run in SGX.
The resulting image is clearly typed/tagged as pdo_service_sgx to
distinguish it from non-SGX builds.
Also, base images are now SGX-MODE agnostic.

3 new scripts are added for handling SGX keys in docker builds.
First, IAS certificates are retrieved by the host (not by the docker
container) and transferred to the container through the "repository"
before the build starts.
Second, the enclave signing key is similarly checked for on (or
directly passed by) the host, and transferred to the container
through the "repository" before the build starts.
Third, at run time, the necessary SGX keys are transferred from the
host to the container through the xfer folder.

The testing support is updated to pass the required SGX volumes and
devices. Also, current small hacks to work behind a proxy are extended
to support also the SGX-based image behind a proxy.

The environment variables related to SPID, SPID_API_KEY and proxy are
removed. All the material is passed to the build or at runtime through
files, and proxy configuration happens through docker config settings
(as usual) -- without forcing specific configurations, except for the
small hack above. As a result the eservice configuration also gets
simplified by reducing and merging the enclave.toml with the eservice.toml.
Additional clean up will follow in later icommits.

The documentation is updated to provide details on using and testing
with SGX-based builds.

Signed-off-by: Bruno Vavala <[email protected]>
Remove the expand-config script as it's unused.
Remove verify-pre-conf as it is now obsolete.
Update common config to remove unused/unecessary variables.
Remove the enclave.toml files and other obsolete toml.
Update the documentation to remove unused variables.

Signed-off-by: Bruno Vavala <[email protected]>
The enclave only performs length check on the SPID, but it does not
check the hex format. Previously this was performed on shell variables.
This makes the enclave check more robust.
Finally, additional naming updates are pushed to align with current
conventions.

Signed-off-by: Bruno Vavala <[email protected]>
@bvavala bvavala marked this pull request as ready for review April 9, 2024 02:22
@cmickeyb cmickeyb merged commit f34f9a0 into hyperledger-labs:main Apr 9, 2024
4 checks passed
@cmickeyb
Copy link
Contributor

cmickeyb commented Apr 9, 2024

Great work, @bvavala

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants