Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider imporvements for monitoring #44

Open
hohwille opened this issue Nov 30, 2018 · 9 comments
Open

Consider imporvements for monitoring #44

hohwille opened this issue Nov 30, 2018 · 9 comments
Assignees
Labels
documentation Guides, tutorials, readmes, etc. enhancement New feature or request operation

Comments

@hohwille
Copy link
Member

hohwille commented Nov 30, 2018

For monitoring of a devonfw documentation there should be more guidance and features.

@hohwille hohwille added enhancement New feature or request operation labels Nov 30, 2018
@hohwille hohwille added this to the release:3.1.0 milestone Nov 30, 2018
@hohwille
Copy link
Member Author

hohwille commented Jan 18, 2019

See also here:
https://github.com/devonfw/devon4j/wiki/guide-apm
(considering JavaMeldoy)

@sjimenez77
Copy link
Member

sjimenez77 commented Jan 18, 2019

Sometime ago we already created a demo and a cookbook entry in the devonfw guide for the integration of Spring Boot Admin https://github.com/devonfw/devon/wiki/Spring-boot-admin-Integration-with-devon4j. The document is probably deprecated, but could be a starting point.

My point here is that we should save the still valid cookbook entries for the different stacks wikis before removing the devonfw guide as it is today.

@nricheton
Copy link

Hello,

Following a discussion w/ Jörg & Santos, here is my input on monitoring.

Overview :

  • Monitoring should be free / effortless, should be set up on day 1.
  • A core set of monitoring should be available from dev environment to production
  • You should not need to ask a developer/an expert to retrieve or interpret the metrics/monitoring data. This info should be clear, have a first layer of analysis and tell that situation is OK or directly point out the issues.

Some examples of questions that should have an immediate answer from a monitoring solution.
(Immediate = display a web page, now)

  • Is the right version deployed ?
  • Was it able to start successfully ?
  • From the app/module/service point of view, is connectivity OK ?
  • From the app/module/service point of view, is configuration OK ?
  • From the app/module/service point of view, is performance OK ?
  • From the app/module/service point of view, is memory OK ?
  • From the app/module/service point of view, is data OK ?
  • From the app/module/service point of view, is number of runtime business or technical error acceptable ?
  • What is the average response time of X (service, operation, data request, business rule) ?
  • If not, what should I do ?
  • What is the percentage/number of technical errors of X (service, operation, data request, business rule) ?
  • What is the percentage/number of business errors of X (service, operation, data request, business rule) ?
  • Is caching efficient in X (service, operation, data request, business rule) ?
  • Is there background/async process running, when will it finish ?
  • Is there background/async process scheduled and does scheduling work ?
  • Is there any issue in background processing, which business data is causing issue ?

All these answers are priceless in production, but even in development/testing environments, where they are a clear indicator of the upcoming issues in the next stage.

In several projects, we have made huge improvements in quality and efficiency by having and looking at these metrics every day. Even non technical people can point out the code that is causing issues and the impacted features.

Several tools exists to set up this kind of monitoring.

I really think that devon should provide tooling out of the box and ready-to-use accelerators to provide additional analysis value for commons problems.

One effort to have this kind of monitoring have been appstatus :
https://github.com/appstatus/appstatus
http://appstatus.sourceforge.net

  • Provides answers to all questions above
  • Integrate with standard monitoring tools
  • Provide performance logging at no cost
  • Implements "explain first->log after" instead of "log first->interpret after" for reporting status in background jobs
  • Integrate with spring and AOP (does not depend on external tool)
  • Integrate with spring cache

Used by many projects in different IT companies.

Other alternatives :
https://github.com/javamelody/javamelody
https://www.appdynamics.fr/java/
https://www.zabbix.com/features

Again, low level metrics have little value, we need interpreted metrics, with business level (operations, rules, data retrieval, user perceived response time, ...) available from developper env. to production env. (And this probably should NOT be an option when creating a new devon application :-) )

Feedback is welcome !

Nicolas

@hohwille
Copy link
Member Author

hohwille commented Mar 15, 2019

@nricheton thanks for your wunderful input.
I added AppDynamics and Zabbix to our guide:
https://github.com/devonfw-wiki/devon4j/wiki/guide-apm

Also we will have a look at appstatus.

However, we have to be careful with what we integrate by default. In one of my customer projects we used to integrate JavaMelody into all apps and then there came some CVE vulnerabilities with it and we were forced to remove it. Maybe the issues are meanwhile resolved. However, we should investigate your requirements and find a perfect match what we want to integrate as first choice and bring out of the box and what to have a just an option for projects that need more.

Being able to report the release version is of course very simple and does not come with any risk. Also health status (e.g. with spring actuator) should come OOTB.
For monitoring OS level stuff there are tons of solutions already out there and they should IMHO not be build into the app itself (we do not need a Java solution to observe CPU, Memory or Disk). Also there is already SNMP as an established protocol. In this sense we should IMHO also think of complex IT landscapes and microservices. Hence, an app does not really need to ship a UI for monitoring. Assume you have multiple redundant nodes of an app in a cluster with loadbalancing. What use would it make to view a UI in the browser showing CPU usage of the current app itself if I get assigned to some node randomly via some loadbalancer and have no direct access to the node itself? So instead we need to provide services that offer the monitoring data and look for state-of-the-art monitoring systems that integrate with all apps and all their nodes of the entire IT landscape presenting a complete dashboard and triggering alarms if something goes wrong.

Another aspect is OWASP Sensitive Data Exposure. Therefore detailed monitoring data should not be available to the outside world (end-users, internet) but stay secret within the admin-plane. In this manner we should also define strict standards for e.g. URL path scheme for monitoring services to simplify and avoid complex individual configurations.

@nricheton
Copy link

nricheton commented Mar 16, 2019

Hi @hohwille

Thanks for your feedback !

On CVE risk, I would say that all Devon components (and all projects in general) have CVE in their history. Apart from projects which does not fix important CVEs for a long time, we should not consider CVE declaration as a reason of not integrating valuable components.

On OS-level monitoring, I fully agree with you that dedicated, existing solutions should be used.
However, a first level of checks can be integrated in solutions, here are some reasons :

  • Development or testing environnements are often not properly monitored for different reasons (cost, complexity - lot of real world examples), so checking free space or network shares mount are basic feedbacks that saves days of work.
  • JVM memory can be monitored at no cost in Java apps, especially in development phases
  • Some checks, like checking that your app is linked to the right data can prevent a disaster. For instance checking that test configuration module is connected to test data (and not production data)

On the data availability : I agree data should not be available to public, internet users. This should be reserved to people responsables of operations, like any monitoring tool.

Web page in module are mostly for early stages of feature development, then data should be aggregated into a common monitoring interface (any solution).

I would be happy to show you next week how appstatus handles these ideas, and how it allows to export the data for proper aggregation. And discuss of real world examples !

Nicolas

@hohwille hohwille removed this from the release:3.1.0 milestone Apr 12, 2019
@hohwille
Copy link
Member Author

I fully support making progress in this area. Also I assume we will spend a slot on the next DA meeting discussing this. However, as we broadened the scope of this issues and some aspects are not yet completely clear, I removed the milestone. Otherwise we would block the release planned for next month. If people come up with PRs to solve this issue, I am more than happy to replan it for 3.1.0 but at the moment I can not see how I could solve it till then...

@hohwille
Copy link
Member Author

@nricheton thanks for your feedback.
I do agree that having some additional features like Memory or disc-space are great to have if they come without big effort or without complex dependencies. May only concern was that we should not waste our time to scan all mounted devices and observe their disc-space, send alerts, etc. inside Java if there are already tons of OS level tools doing all this.
To be more pragmatic, I would like to start with spring-boot-actuator and maybe also spring-boot-admin. Then we collect the list of features we get with them and see what are the remaining gaps, choose additional tools and move on till we have covered what we think is crutial.

@hohwille
Copy link
Member Author

hohwille commented May 9, 2019

Do we have some key person who could drive the development of this issue. IMHO this is not just a 1-2 hours tasks but will need some attention and continuity. I do not have the time at the moment but would love to see some action and avoid that we are just talking. I am still happy to assist and support this also with some code snipplets or reviews...

@hohwille hohwille added the documentation Guides, tutorials, readmes, etc. label Feb 11, 2020
This was referenced Feb 18, 2020
@hohwille
Copy link
Member Author

So JavaMelody even has a spring-boot-starter so you may only need to add a dependency and you are done.
Also glowroot can be added in a similar easy way.
Then there are solutions like spring actuator to provide app specific sensors to be integrated with existing monitoring tools such as CheckMk/Icinga/Nagios/etc.

So is there anybody left who initialally raised demands for this toppic - maybe @nricheton ?
What is left to do and the way to go?

  • Just add some more documentation?
  • Or create a demo app based on devon4j with the monitoring configured and in place (in one isolated commit after the initial app from the template)?
  • Something else?

As a learning we should go away from such generic issues - either the issue should be cristal clear in what is to do or we need a real driver who actively works on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Guides, tutorials, readmes, etc. enhancement New feature or request operation
Projects
None yet
Development

No branches or pull requests

4 participants