Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebind CLI #1185

Merged
merged 8 commits into from
Feb 4, 2014
Merged

Rebind CLI #1185

merged 8 commits into from
Feb 4, 2014

Conversation

aledsage
Copy link
Member

@aledsage aledsage commented Feb 3, 2014

Adds brooklyn command line args for persistence, e.g.

` brooklyn launch --app brooklyn.demo.WebClusterDatabaseExampleApp --noShutdownOnExit --persist auto`

and to rebind to that:

`brooklyn launch --noShutdownOnExit --persist auto`

One can also specify --persistenceDir /path/to/mydir.

The possible options to --persist are:

  • disabled - default, where there is no persistence
  • auto - rebinds if data exists, otherwise starts up cleanly
  • rebind - rebinds if data exists, otherwise fails
  • clean - starts up cleanly, not using previous data

Note that old persisted data is moved to a timestamped directory, rather than being deleted.

@buildhive
Copy link

Brooklyn Central » brooklyn #1651 FAILURE
Looks like there's a problem with this pull request
(what's this?)

@nakomis
Copy link

nakomis commented Feb 4, 2014

Build fails at brooklyn-launcher: "PersistMode cannot be resolved to a type". I've had a look, and the PersistMode class is missing (either as a full-blown class or inner class)

@ahgittin
Copy link
Member

ahgittin commented Feb 4, 2014

looking at this now (finally) and more generally at all recent change things rebind !

@@ -39,7 +39,12 @@
+ "apps/${entity.applicationId}/"
+ "entities/${entity.entityType.simpleName}_"
+ "${entity.id}");


public static final BasicAttributeSensorAndConfigKey<String> EXPANDED_INSTALL_DIR = new TemplatedStringAttributeSensorAndConfigKey(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add javadoc with the reason for this, or comments on the confusing lines -- in particular its interplay with INSTALL_DIR, eg things like entity.setAttribute(SoftwareProcess.INSTALL_DIR, expandedInstallDir); in AbstractSoftwareProcessSshDriver and calls to setExpandedInstallDir to the installDir in some of the entities

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

/**
 * Indicates the directory in which the unpacked install artifacts can be found. For example, for Tomcat if INSTALL_DIR is 
 * {@code /path/to/installdir}, then the tgz file will be downloaded to {@code /path/to/installdir/apache-tomcat-7.0.47.tar.gz}.
 * When that is unpacked, the actual files will (normally) be found in {@code /path/to/installdir/apache-tomcat-7.0.47/}.
 * The EXPANDED_INSTALL_DIR attribute will be set to this during installation, allowing the appropriate files (e.g. /bin/startup.sh)
 * to be found. 
 * <p>
 * However, an enterprise user may have a bespoke build where the unpacked dir has a different name. If so, the config key
 * could be set to something like:
 *     <pre>
 *     "${(config['install.dir']}/acme-tomcat-1.0.0"
 *     </pre>
 */

Thinking about this more, I'm not sure how well this will cope with someone setting the EXPANDED_INSTALL_DIR configuration. We might just overwrite it in the driver's install method, logging a warning! I'll need to look at that more.

@buildhive
Copy link

Brooklyn Central » brooklyn #1666 SUCCESS
This pull request looks good
(what's this?)


} else {
if (persistenceDir == null) throw new IllegalStateException("Persistence dir must be set with persistence mode "+persistMode);
String persistencePath = persistenceDir.getAbsolutePath();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should persistenceDir default to being current working directory or something under ~/.brooklyn/ ? i was thinking the latter -- though maybe that should just be the default.

i think in practice for us it's important to be able to run mulitple brooklyns on one box and not have them step on each other's toes, even if we didn't change the settings, and we ran them from the same dir. that may be a little irritating but i guess they'll have to collaborate somehow for that to work, e.g. by storing their management timestamp in the filesystem and coordinating who manages using that. solveable, with a little bit of care. can talk F2F on this, yeah?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's discuss this. Fiddly to have desired behaviour of a brooklyn instance rebinding to existing data, while also having two concurrent instances using different dirs. Gut feel is it's solvable - probably requires some kind of lock file that includes a timestamp so another instance can claim the lock on rebind.

For ~/.brooklyn/ or working dir, not sure. My thought is that things like ~/.m2, ~/.ssh etc are for configuration and for caches like the ~/.m2/repository. I wouldn't expect to see the actual application-data in there.

We should also read from ~/.brooklyn/brooklyn.properties for things like the persistenceDir (with it being overridable via the CLI). Maybe we should support a base-dir config option where we can then create a dir-per-brooklyn-instance, and an explicit dir config option. Not sure yet how the base-dir rebind would work though (e.g. if multiple instances were being run concurrently, then which dir to we rebind to?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file-based negotiation for who is manager would be cool. fwiw my gut is that under ~/.brooklyn/ would be a slightly less surprising default than a subdir under whatever the CWD happens to be.

@ahgittin
Copy link
Member

ahgittin commented Feb 4, 2014

code looks good, but for a few comments. CLI is nice and simple. testing it now.

@ahgittin
Copy link
Member

ahgittin commented Feb 4, 2014

with brooklyn launch --persist rebind i see this in the logs:

2014-02-04 16:43:27,627 ERROR Error starting brooklyn app(s)
java.lang.IllegalStateException: Cannot rebind to persistence directory brooklyn-persisted-state/data because not a directory
    at brooklyn.launcher.BrooklynLauncher.initPersistence(BrooklynLauncher.java:411) ~[brooklyn-launcher-0.7.0-SNAPSHOT.jar:na]

however brooklyn proceeds to start. i think it should refuse to start in this sitation. (also, it isn't an error starting an app).

@ahgittin
Copy link
Member

ahgittin commented Feb 4, 2014

running the WebClusterDatabaseExampleApp from the catalog (not on the classpath), i get the following error:

2014-02-04 16:45:46,125 WARN  Problem persisting change-delta
brooklyn.util.exceptions.PropagatedRuntimeException: 
    at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:35) ~[brooklyn-utils-common-0.7.0-SNAPSHOT.jar:na]
Caused by: java.lang.ClassNotFoundException: brooklyn.demo.WebClusterDatabaseExampleApp
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_40]

i don't want this PR blocked on this issue (unless perhaps it is a quick fix, at least to persist, we use the entity's classloader -- though rebinding in such a situation could be harder without some neat way of identifying catalog jars (eg osgi)). FWIW i'm open to catalog.xml being removed altogether, or at least adding to the classpath via it, in favour of a better mechanism osgi.

@ahgittin
Copy link
Member

ahgittin commented Feb 4, 2014

the errors i get in the previous case are interesting. first a bunch of:

2014-02-04 16:49:57,679 WARN  brooklyn.management.internal.LocalEntityManager@33bb2557 redundant call to pre-pre-manage entityJBoss7ServerImpl{id=R1dAV2g1}; skipping
java.lang.Exception: source of duplicate pre-pre-manage of JBoss7ServerImpl{id=R1dAV2g1}
    at brooklyn.management.internal.LocalEntityManager.prePreManage(LocalEntityManager.java:181) ~[brooklyn-core-0.7.0-SNAPSHOT.jar:na]

this happens for all the entities in the standard webapp example except for the root (which should be class not found but i don't see such an error) and for the Controlled cluster (curious!?)

also a space is missing in the error :)

later on i see the cryptic message where it couldn't find the class

2014-02-04 16:49:57,682 WARN  No entity found with id FEuHay1s; returning null
2014-02-04 16:49:57,682 WARN  Cannot unmarshall from persisted xml Entity FEuHay1s; not found in management context!

followed by cryptic not found messages for some of the other entities

2014-02-04 16:49:57,754 WARN  Entity not found; discarding parent hlp4LWdM of entity brooklyn.entity.webapp.DynamicWebAppClusterImpl(Gfvp5rn1), so entity will be orphaned and unmanaged
2014-02-04 16:49:57,758 WARN  Entity not found; discarding child FEuHay1s of entity brooklyn.entity.webapp.DynamicWebAppClusterImpl(Gfvp5rn1)
2014-02-04 16:49:57,772 WARN  Entity not found; discarding parent gmkJrcoN of entity brooklyn.entity.database.mysql.MySqlNodeImpl(LSaCkWaq), so entity will be orphaned and unmanaged
2014-02-04 16:49:57,779 WARN  Entity not found; discarding parent hlp4LWdM of entity brooklyn.entity.proxy.nginx.NginxControllerImpl(OoiXTkVF), so entity will be orphaned and unmanaged

(i guess Controlled worked for some reason, and we are seeing the children which failed, though why we see DWACI twice and the others once is odd)

we should have better handling of problems, including opportunity to fix / ignore / abort -- though that can be a separate issue if it's big

@ahgittin
Copy link
Member

ahgittin commented Feb 4, 2014

right, i'm guessing a lot of that weirdness is because it failed to write at least one of the entity records.

there is a ControlledDynamicWebAppCluster as part of the blueprint so curious that it vanished in the above, but i guess all bets are off if it can't even write! we probably do need to fix that.

we should add a message on failed rebind that the files for the entity will be ignored and left in place, that the problem can be fixed by adding missing jars to the classpath or editing the xml (and that gui/api support for selectively repairing may be available in future -- perhaps point to issue for that?). that just gives a user comfort...

happy to say when i added the jar to the classpath and started over all worked fine. (it did not rebind from the previous state, however, so the Problem persisting change-delta was significant, meaning we can probably ignore some of the subsequent errors on rebind ... though if there are quick wins to make it easier to see, that would be great.)

really nice!

will test some more, but on the strength of this happy for you to merge @aledsage when you feel you've addressed the above (either in this PR or as new issues).

@ahgittin
Copy link
Member

ahgittin commented Feb 4, 2014

first time in a long time i've used the CLI in anger. it's quite good in terms of power.

i think we need to do a bit to make the syntax more intuitive and/or the help more useful. just having help info and launch as options is a bit weird as only the last one does anything. :)

worth mulling over. one idea which was mooted in a meeting is having an ephemeral-brooklyn mode, where you start it up to run some action, but then let it stop. also having a cli against brooklyn server (to wrap the REST API) -- although i'm not convinced we need it, if we have ability to run effectors in ephemeral mode then it could be neat to make it work in a client mode. (though none of this feels urgent to me, as we are mostly focussed on policies which won't work with ephemerality!)

@ahgittin
Copy link
Member

ahgittin commented Feb 4, 2014

this is working fantastic. going to merge so we can use it; the handful of issues can be addressed subsequently.

@ahgittin ahgittin merged commit bf422d7 into brooklyncentral:master Feb 4, 2014
@aledsage aledsage deleted the rebind/cli-strike3 branch February 5, 2014 09:46
@@ -203,6 +205,16 @@ public Void call() throws Exception {
description = "After the application gets started, brooklyn will wait for a key press to stop it.")
public boolean stopOnKeyPress = false;

// TODO currently defaults to disabled; want it to default to on, when we're ready
@Option(name = { "--persist" }, allowedValues = { "disabled", "auto", "rebind", "clean" },
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahgittin note that it confused git the way you added:

CliBuilder<BrooklynCommand> builder = Cli.<BrooklynCommand>builder("brooklyn").withCommand(BrooklynCommand.class)

Somehow the addition of that line was done in a merge-commit, so if you do git log -p or git log -p -- usage/cli/src/main/java/brooklyn/cli/Main.java then it doesn't show where that line of code came from!

Note that github does understand it though, showing the commit-history of the file includes "Merge remote-tracking branch 'sjcorbett/cli-yaml' into review": https://github.com/brooklyncentral/brooklyn/commits/master/usage/cli/src/main/java/brooklyn/cli/Main.java

I personally prefer rebase when working on a branch that is not master (i.e. where the commits are not going to be shared with others). Not sure if that's pertinent here, as not sure exactly what happened!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my bad, there were merge conflicts in that file which I resolved without rebase so that github would pick up both issues (and auto merge them). I also (foolishly and mistakenly) tried to fix the deprecation I noticed; I backed that out in master.

@aledsage
Copy link
Member Author

aledsage commented Feb 5, 2014

Additional comments from @nakomis are:

  • Adding brooklyn-persisted-state to .gitignore would be useful
  • In the brooklyn.demo.WebClusterDatabaseExampleApp, the policy on the web cluster is not re-created, however the policy on the nginx controller is re-created
  • Error when rebinding to jClouds SshMachineLocation (as discussed on Skype). I experienced this issue with both the AgriWise project and the global web fabric example
  • In the simple-messaging-pubsub example, I had no issues rebinding, however when I killed the app via the web gui (help -> stop all apps) then rebound, the app and entities were still there in the 'stopping' state. Should rebind data be deleted for apps / entities that have been stopped or expunged?
  • Couldn't run the whirr-hadoop example due to jclouds version conflict

@aledsage aledsage mentioned this pull request Feb 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants