Releases: bosun-monitor/bosun
0.8.0-Preview
0.8.0-Preview Release Notes
Bosun
Upgrading
Since it has been over a year since the previous release it is possible not all breaking changes are accounted for, so you should test this out first and backup your redis/ledis data. Some known breaking changes:
- The elastic configuration is different, see System configuration :: ElasticConf for an example of what it should look like
- In order to support linking related incidents, a migration will run on the redis/ledis data on the first startup which will need to complete before Bosun starts running
Major Changes
- Notification Overhaul [PR #2135]. HTTP notifications are now much more flexible and can be used to interact with rest apis. See customizing notifications documentation. Please note that escalations on posts via notification chains are broken in this version, see issue 2304.
- New Azure Monitor datasource [PR #2283] which supports querying these metrics from Azure. There are also template function for Azure.
- Support connecting to multiple elastic clusters and of different versions of elasticsearch at runtime [300ebd8] [PR #2316].
- Incidents with the same alert keys are now linked to provide more context when handling alerts [PR #2323]. These links are displayed in the dashboard and incident view, and can also be used to include information (such as previous close messages and links to previous incidents) in templates using the new template property
.PreviousIds
and the.GetIncidentState
template function. - Rendered templates are no longer stored in redis/ledis indefinitely in order to control memory growth over time [PR #2167]. This can be set using MaxRenderedTemplateAge in the system configuration.
- Add Delayed Close so active alerts can be closed without waiting for the alert to go normal. 90cc3a9. Note: Needs Documentation
Expression Language Changes
- Add the generic SetVariant Type 95be5d9 so certain functions can take either a numberSet or a seriesSet such as
addtags
,filter
andlimit
, andabs
. - Add
aggr
function to aggregate time series by group within a seriesSet. [PR #2294]. - Add OpenTSDB query functions with end duration parameter:
shiftBand
,overQuery
, andbandQuery
#2310. - Add percentiles to OpenTSDB
window
func #2254
Misc Enhancements
- Configuration: support RuleVars in system configuration for obscuring secrets. c699b45
- Scheduler: distributed alert checks can be enabled to prevent high load spikes #2249
- UI: native short links in bosun instead of goo.gl #2210
- UI: added setting for example expression in *.toml #2261
- UI: disable items menu point for environments without OpenTSDB configuration #2267
- UI: replace textarea with aceeditor in expr #2266
- UI: fix incident view for closed incidents #2315
- Incident Filters: added incident filter since #2215
- Notifications: added extra info to unknown templates #2269
- Logging: improve logs #2299 #2218
- Elastic: added ES SimpleClient support for bosun backend and annotation. #1947
- Templates: add
Scheme
option to system configuration so generated links can be https #2209 - Auth: support for LDAP user authentication. #2181
- Expr: print
db
when profiling InfluxDB queries #2235 - API: add notification stats to health endpoint #2222
Misc Fixes
- UI: silence view: fix durations #2311
- UI: incident view: fix durations #2312
- UI: fix "items" flashing with ng-cloak #2313 / #2286
- UI: remove not used errors column in rule editor #2270
- UI: fixed tags parameter for silencing button on dashboard #2265
- Expr: fix math operations on results of aggr #2306
- Expr: fix parsing of OpenTSDB 2.2+ filters so that only group terms get added to the result tag set #2212
- Expr: Fix influxdb query caching #2234
- API: routes should never serve home page as html. #2284
- Silence: fix scheduled auto-forget silences #2120
- Incident Filters: Fix user field #2246
- Elastic: fix conflicts when using multiple ES cluster with client options enable. #2239
- Notifications: fix issues with
actionBodyForceClose
,actionBodyDelayedClo
andactionBodyCancelClose
#2198 - Notifications: fix macros for Notifications #2199
- Notifications: fixed wrong nested ul in unknown notify #2208
- Notifications: follow lookups for notification validation. #2168
- Notifications: stop sending mail if 'To' slice empty #2217
- Notifications: fix post notification logging #2196
Misc Code Changes
- Expr: Refactor: move miniprofiler Timer to state #2287
- Expr: Add origin to expr state #2317
- Expr: remove elastic v1 "logstash" code #2178
scollector
- avoid panic in google analytics collector #2291
- add sync stats to redis collector #2232
- (wip) windows remote access services (ras) metrics #2245
- include slab as free mem for linux os.mem #2250
- more accurate linux mem free if available #2252
- Added Elastic config options #2257
- support new time format for puppet 5 #2191
- add windows support to puppet collector #2227
- add NETDataProviderForSqlServer metrics #2225
- Update linux interface collector names. #1985
- Added SSL collectors into extrahop collector #1824
- bug fixes for Azure EA billing #1984
- Updated Azure EA collector to add new tags. #2011
- Fix bug in AWS billing collector. #2156
- Fixed bug in ElasticSearch collect...
0.6.0-beta1
0.6.0-beta Release Notes
Bosun
The main goal of Bosun 0.6.0 was to smooth out the alert creation and editing workflow. Previously, users would have to copy the configuration from Bosun's UI, get it to the server, and then restart the process. Now users can edit the config directly from Bosun's UI, and the changes will be loaded without restarting the bosun process. Since the rules can now be edited via Bosun's UI, authorization via LDAP and SSL support has been added as well.
Major Changes:
- Bosun's configuration file has been split into two files:
- RuleConf which contains all definitions.
- System configuration.
- Bosun's RuleConf can now be edited via Bosun's UI. The changes can then be reloaded without restarting the Bosun process.
- Bosun now has support for authorization via LDAP and Tokens that can be configured via the auth section of the system configuration.
- Bosun's dashboard loads much faster than 0.5.0 due to schema change
- Many errors can now be handled in templates making them more robust against runtime errors, and template documentation has been improved.
Upgrading
- Since the configuration file has been split, you will need to move some of what used be in the config into system configuration.
- On the first execution, a database schema migration will take place that can take several minutes. During this time bosun will not be available. It is recommended to back up Bosun's database before the upgrade.
Other Changes
- New Expression Functions:
timedelta
esnot
,esexists
, andesmonthly
elastic functionscrop
tail
map
#1813.addtags
leftjoin
antable
,ancounts
, andandurations
annotation functions.remove
- New Template Functions / Variables:
notNil
,.LastError
, and.Errors
- Comments are now allowed in the Expression Page in Bosun's UI.
- Bosun's navbar now shows if quiet mode is enabled.
- Notes can now be added to incidents.
- Improved Bosun's incident view.
- Improved Bosun's graph page to be more keyboard friendly: better order of fields with the TAB key and shift-enter within text fields will execute.
- Updated Influx library to be compatible with current versions.
- Optimized elastic query performance by not fetching indices and fix bug caused by indices not being part of the cache key #1931.
- Fixed issue in scheduler where adjacent runs would be given the same to represent "now" #2029.
- Allow the empty tagset to be a dependency for alerts that have tags #2027.
- Adjustments to Bosun's UI to make it more vertically compact.
- Fixed HTTP template funcs to have a shorter timeout so they don't block template rendering for too long when they timeout.
- Allow empty tagsets in silence silence: allow empty tag sets #1860.
Expected Changes Between Beta and 0.6.0 final:
- Tooling outside of the Bosun process doesn't have auth support (except for the Grafana Plugin). The following will have auth support before final: scollector, (cli tools: annotate, silence), tsdbrelay, BosunReporter.NET, and Opserver.
- Support for multiple Elastic backends in a single instance may be added. If this happens the syntax for the system configuration (toml file) of elastic might change.
tsdbrelay:
- fix http: proxy error: EOF due to mixing Content-Encoding headers #1889
- enable version numbers, fix metadata, add error metrics #1859
- enable expvar to help with troubleshooting
- Drain body responses in tsdbrelay/scollector/collect so that the connection is eligible for reuse #2036
scollector:
- Google webmaster: Skip new "sc-set" sites that we can't do API calls on. #2032
- TOML flag to enable SWbemServices worker for better WMI performance in Windows #2028
- Bug fixes for Azure EA billing #1984
- Added Azure Enterprise Agreement billing collector #1903
- Process monitoring: track process PIDs as a metric #1964
- When WatchedProc processes die, clean them up #1962
- Bug fix for DNS collector #1920
- Update SNMP timeout from 5s to 30s and allow override #1900
- Set custom UserAgent and use facebookgo/httpcontrol for 60s RequestTimeout #1878
- add a kill switch for total private memory used by scollector #1866
- fix empty host= tags not working after PR #1856 #1871
- fix DisableSelf flag not working for version/post.batchsize #1868
- bugfix - datapoint validation #1856
- Check for NaN in Valid() #1864
- mssql.agent.* for monitoring SQL Agent jobs #1855
- skip md volumes if mdadm binary is missing #1850
- new MSSQL buffer/memory metrics #1849
- update wmi package #1843 and #1837
- Fix WMI names for HP EVA #1852
- Fix dsc StartDate parsing #1994
- elasticsearch: Include cluster tag in cluster health metrics.
- fix httpunit hu.time_total by casting to int64
- add IsRemote flag for cadvisor collector to disable block device lookups
other:
- Add ackTime to incident filters #2014
- Default to limited duration on tagv search. #1977
- Have opentsdb.Duration implement encoding.TextUnmarshaler. #1975
- Add google_webmaster collector. #1959
- fix typo for container.fs.available #1948
- Making redis monitoring a little prettier. #1944
misc:
0.5.0
0.5.0 Release Notes
Bosun
This is our first non-preview release in 9 months and includes significant changes since 0.4.0. Future releases should be at a quicker pace since this release involved a complete refactor of Bosun's internal storage.
- We have moved Bosun's internal storage from purely in-memory (that was seralized to bolt) to redis/ledis and refactored the code to be more incident based. In 0.4.0 the dashboard could take 10-30 seconds to load at times. It now should not take more than a second under normal conditions, and should be even faster in a future release. This also results in faster startup times for Bosun and other performance improvements
- Deprecated the logstash queries and replaced them with more generic elastic functions. This supports different time formats, index naming schemes, and adds more search possibilities
- Added support for basic series operations. Before with seriesSets you could only use operators on them by combining them with scalars or numberSets. Now you can do operations like
q(..) * q(...)
- Added various functions to the expression language:
- merge, shift, and over: Combines series sets, shift the time, show time over time graphs querying OpenTSDB
- series: Manually construct a series -- useful for testing and drawing lines on graphs
- month: Get the start of the calendar month, useful for alerts that follow the calendar such as bandwidth billing
- tod: Turns a number into a duration string, so you can do your query durations based on duration math
- linelr: Draws a line to visualize the result of forecastlr
- Support for OpenTSDB 2.2 filters which allow you to aggregate a subset of tagvalues for a tagkey.
- Added annotations to Bosun which are stored in Elastic (designed to possibly support other backends in the future). Bosun annotations have a start and end time, so you can use them to capture outages and maintenance windows
- Added Grafana integration via the new Bosun Grafana Plugin
- Added a route for OpServer integration (Preview)
- Added bar graphs to the expression page when the result is a numberSet
- Added forceClose and Purge actions
- Improved incident filters (now supports
AND
,OR
,!
, and()
grouping)
Upgrading
Before upgrading to this version, you should decide if you want to use a dedicated redis instance (recommended for production use), or the embedded ledisdb instance (default behavior). Instructions for configuring redis/ledis can be found on our website. The first time bosun starts up with this version, it will migrate all data from the old boltdb file into the new redis store. After that the bolt file should not be needed any more. You should back up your bolt state file before doing this operation, and note that it may take several minutes for Bosun to start while it does the migration.
Breaking Changes you need to be aware of
- When using OpenTSDB 2.2, wildcard expansion in queries is now done by OpenTSDB instead of Bosun. This is cleaner and results in better performance, but mixing wildcards and alternation is no longer supported until OpenTSDB supports it. For example
*foo*|*baz*
. Bosun will not warn you about this so if you are using this alerts may silently fail. Be sure to look for these in your config before upgrading. - The graphite backend now rejects tags that are not valid Bosun tags (which have the same restrictions as OpenTSDB tags) since some of them would cause panics. The restrictions are "Only the following characters are allowed: a to z, A to Z, 0 to 9, -, _, ., / or Unicode letters". Some graphite based alerts may require updating
In Progress for Future Release
- Config reloading without requiring the bosun process to be restarted
- (Experiment) Last data available through the expression language so some alerts could still work if opentsdb is down
- (Experiment) Working with distributions in addition to series (i.e. histograms)
- Better post notifications
scollector
- New collectors:
- systemd service collector
- varnish
- Oracle
- status.io pages
- Fastly api stats
- Extrahop
- Elastic v2
- Google Analytics collector
- Nexpose collector
- Cisco IOS BGP information (via snmp)
- Fortinet SNMP collector
- cadvisor
- Support for MSSQL Named Instances
- Add a local listener so datapoints can be pased to scollector via http
- Bad datapoints in a batch no longer invalidate the entire batch
- Better filtering options for excluding collectors and/or specific metrics
TSDBRelay
- Added "external counters" for infrequent or sporadic metrics. These are counters that can receive increments from multiple sources.
Following is the autogenerated release notes:
other:
- Not allowing invalid datapoints to ruin entire batches #1779
- Add trafficSource tracking for detailed GA metrics. #1780
- document lookupSeries. fix #1035 #1760
- List open incidents #1764
- revendor annotate after repo move #1708
- Add html function to templates #1721
- Adds resource reference to bosun_emitter #1727
- Remove unnecessary go get in travis #1706
- Fix merge #1673
- Small utility to clean up search data for a metric. #1632
- tod(scalar) was returning minutes as hours #1677
- Annotate edit view #1636
- moving version to _version #1637
- working party of elastic.v3 #1562
- Elastic v2 support and elastic expr refactor #1561
- Escape soapLogin credentials #1612
- Fixed link and Go version #1609
scollector:
- fixing redis counters collector to work with ledis or redis #1732
- fastly collector #1728
- fastly status.io monitoring and status.io lib #1735
- Additional functionality into ExtraHop collector #1698
- Make MaxMem kill switch configurable #1652
- fix error: interval.go:64: c_google_analytics: #1641
- negative -f filters and total_time metric for httpunit collector #1630
- add metadata to redis collector #1622
- Fix for unset MaxQueueLen #1596
docs:
- Documenting force close and purge actions. #1697
- Fixing of typos and avoiding a potential ambiguity #1691
bosun/expr:
- fix crash on invalid graphite tags #1663
collect:
- Flush purges internal collections as well. #1552
bosun:
- squelched keys don't go unknown. #1790
- add missing redis connection close #1755
- expr.execute refactor #1775
- fill in unknown subject in incident view #1772
- Fix ungroup to actually return a scalar #1744
- Allow series func to create empty group #1745
- move graphite and tsdb funcs to their own files #1742
- series operations #1672
- support png on egraph api route #1712
- fix braced variable expansion when used in macros #1722
- skiplast cmdline switch for development #1725
- add month func to get end of or start of month #1740
- stopping notifications that should no longer fire [#1716](https://github.com/bosun-monitor/bosu...
0.5.0-rc4
This is an intermediate release that fixes a redis connection that failed to be closed and adds an api route that is used with the Grafana bosun app plugin.
bosun:
- add missing redis connection close #1755
- Fix ungroup to actually return a scalar #1744
- Allow series func to create empty group #1745
- move graphite and tsdb funcs to their own files #1742
- series operations #1672
- support png on egraph api route #1712
- fix braced variable expansion when used in macros #1722
- skiplast cmdline switch for development #1725
- add month func to get end of or start of month #1740
- stopping notifications that should no longer fire #1716
- fix query links on expr page #1675
- link multiple queries from graph UI #1676
- clear filters when changing metrics in graph view #1685
- support actions by incident id #1696
- fix issue removing last annotation from graph view #1639
- don't autocomplete old metrics on graph page #1660
- func to turn seconds (scalars) to duration string #1669
- series func to create series from scalars #1667
- add Ledis bind address config option #1651
- convert datastore to Ledisdb/redis implementation. #1332
- don't panic on opentsdb version #1629
- adding "unknownIsNormal" flag to alerts to convert unknown events into normal ones. #1620
- push annotations to top on graph page #1619
- add remote mac addresses to the host API #1551
- add UnmarshalJSON() to Status type #1555
- 2x API routes: Metadata for all metrics, metrics per tagk #1560
- denormalized metadata should resolve to parent metric #1519
- Show bar graph on the expr page if type == number #1011
- annotate support #1610
- fixing ledis error saving temp config. #1614
- Auto-Closing open alerts if alert doesn't exist anymore. #1604
- angular to from 1.2.x to 1.5 #1603
- Implementing purge and forceCLose actions #1599
- Add linelr func #1602
- optimizing redis access to get tag sets #1575
- serving temporary configs from redis. #1593
- Notifications moved to redis #1592
- fixing bug with chained notifications. #1584
- fixing empty email body, and handling absence better #1588
- fix migration from new install #1581
- performing a preliminary save on new incidents to keep templates consistent #1605
- Don't create filter in Opentsdb v2.1 #1569
- Add over, shift, and merge funcs #1598
- Shorten-only http proxy #1590
scollector:
- fastly collector #1728
- fastly status.io monitoring and status.io lib #1735
- Additional functionality into ExtraHop collector #1698
- Make MaxMem kill switch configurable #1652
- fix error: interval.go:64: c_google_analytics: #1641
- negative -f filters and total_time metric for httpunit collector #1630
- add metadata to redis collector #1622
- Fix for unset MaxQueueLen #1596
docs:
- Documenting force close and purge actions. #1697
- Fixing of typos and avoiding a potential ambiguity #1691
bosun/expr:
- fix crash on invalid graphite tags #1663
collect:
- Flush purges internal collections as well. #1552
WIP:
- Govendor #1594
travisci:
- add start/signal/stop test for bosun #1580
other:
- List open incidents #1764
- revendor annotate after repo move #1708
- Add html function to templates #1721
- Adds resource reference to bosun_emitter #1727
- Remove unnecessary go get in travis #1706
- Fix merge #1673
- Small utility to clean up search data for a metric. #1632
- tod(scalar) was returning minutes as hours #1677
- Annotate edit view #1636
- moving version to _version #1637
- working party of elastic.v3 #1562
- Elastic v2 support and elastic expr refactor #1561
- Escape soapLogin credentials #1612
- Fixed link and Go version #1609
- WIP Incident Redis Refactoring Madness #1497
0.5.0-rc3
bosun:
- All internal bosun data now stored in redis. #1497
- add month func to get end of or start of month #1740
- series operations #1672
- support png on egraph api route #1712
- link multiple queries from graph UI #1676
- clear filters when changing metrics in graph view #1685
- support actions by incident id #1696
- fix issue removing last annotation from graph view #1639
- don't autocomplete old metrics on graph page #1660
- func to turn seconds (scalars) to duration string #1669
- series func to create series from scalars #1667
- add Ledis bind address config option #1651
- convert datastore to Ledisdb/redis implementation. #1332
- don't panic on opentsdb version #1629
- adding "unknownIsNormal" flag to alerts to convert unknown events into normal ones. #1620
- push annotations to top on graph page #1619
- add remote mac addresses to the host API #1551
- add UnmarshalJSON() to Status type #1555
- 2x API routes: Metadata for all metrics, metrics per tagk #1560
- denormalized metadata should resolve to parent metric #1519
- Show bar graph on the expr page if type == number #1011
- annotate support #1610
- fixing ledis error saving temp config. #1614
- Auto-Closing open alerts if alert doesn't exist anymore. #1604
- angular to from 1.2.x to 1.5 #1603
- Implementing purge and forceCLose actions #1599
- Add linelr func #1602
- optimizing redis access to get tag sets #1575
- serving temporary configs from redis. #1593
- Notifications moved to redis #1592
- fixing bug with chained notifications. #1584
- fixing empty email body, and handling absence better #1588
- fix migration from new install #1581
- performing a preliminary save on new incidents to keep templates consistent #1605
- Don't create filter in Opentsdb v2.1 #1569
- Add over, shift, and merge funcs #1598
- Shorten-only http proxy #1590
- Added missing bracket in dev.sample.conf #1536
other / bugfixes:
- Add html function to templates #1721
- Small utility to clean up search data for a metric. #1632
- tod(scalar) was returning minutes as hours #1677
- Annotate edit view #1636
- Elastic v2 support and elastic expr refactor #1561
- Escape soapLogin credentials #1612
travisci:
- add start/signal/stop test for bosun #1580
scollector:
- fastly status.io monitoring and status.io lib #1735
- Additional functionality into ExtraHop collector #1698
- fastly collector #1728
- Make MaxMem kill switch configurable #1652
- fix error: interval.go:64: c_google_analytics: #1641
- negative -f filters and total_time metric for httpunit collector #1630
- add metadata to redis collector #1622
- Fix for unset MaxQueueLen #1596
- Change Linux process monitoring to use regex for Comm… #1565
0.5.0-rc2
This is a major release with lots of features and bugfixes.
redis migration:
Bosun's internal data storage has been converted from purely in-memory to redis. If you still desire a standalone mode, it will fallback to a ledisdb instance hosted in-proc. For configuration details see the docs. Data will be migrated from the state file when you first run this release. Make sure your desired config is established before running. Data will not be deleted from bolt.
scollector:
- fix error: interval.go:64: c_google_analytics: #1641
- negative
-f
filters and total_time metric for httpunit collector #1630 - add metadata to redis collector #1622
- Fix for unset MaxQueueLen #1596
- Change Linux process monitoring to use regex for Comm… #1565
- only send osProc stats if proc count > 0 #1527
- normalize process metrics into os namespace #1524
- continue snmp walk on row error #1514
- poll sysDesc (os version) from cisco devices #1506
- get serial and os version for fortinets #1507
- Collect system uptime via snmp #1488
- Covert gauges to counters for os.cpu #1494
- fix puppet.run_duration and a few win.net.tcp metrics #1541
- c_procstats_linux: Add buffered and cached memory usage to os.mem.free. #1520
- Added ExtraHop collector #1491
Build:
- Dependencies managed with govendor now #1594
processes_linux:
- Add a tiebreaker if the process creation time is identical. #1455
c_keepalived:
- Add new SNMP entry so we work with new keepalived 1.2.19, and fix off-by-one bug. #1501
metadata:
- fix a goroutine and memory leak in sendMetadata() #1476
bosun:
- func to turn seconds (scalars) to duration string #1669 - series func to create series from scalars #1667
- don't autocomplete old metrics on graph page #1660
- add Ledis bind address config option #1651
- convert datastore to Ledisdb/redis implementation. #1332
- fix issue removing last annotation from graph view #1639
- don't panic on opentsdb version #1629
- Annotate edit view #1636
- fix crash on invalid graphite tags #1663
- adding "unknownIsNormal" flag to alerts to convert unknown events into normal ones. #1620
- push annotations to top on graph page #1619
- add remote mac addresses to the host API #1551
- add UnmarshalJSON() to Status type #1555
- 2x API routes: Metadata for all metrics, metrics per tagk #1560
- denormalized metadata should resolve to parent metric #1519
- Show bar graph on the expr page if type == number #1011
- annotate support #1610
- fixing ledis error saving temp config. #1614
- Auto-Closing open alerts if alert doesn't exist anymore. #1604
- angular to from 1.2.x to 1.5 #1603
- Implementing purge and forceCLose actions #1599
- Add linelr func #1602
- optimizing redis access to get tag sets #1575
- serving temporary configs from redis. #1593
- Notifications moved to redis #1592
- fixing bug with chained notifications. #1584
- fixing empty email body, and handling absence better #1588
- fix migration from new install #1581
- performing a preliminary save on new incidents to keep templates consistent #1605
- Don't create filter in Opentsdb v2.1 #1569
- Add over, shift, and merge funcs #1598
- Shorten-only http proxy #1590
- Added missing bracket in dev.sample.conf #1536
- Add title text for dashboard glyphicons #1472
- Have auto filter on graph page #1487
- Make graph links from OpenTSDB 2.1 work with 2.2 #1530
- fix clearing tabs on graph page #1529
- Add proc information to and refactor host API #1526
collect:
- Flush purges internal collections as well. #1552
other:
- working party of elastic.v3 #1562
- Elastic v2 support and elastic expr refactor #1561
- Escape soapLogin credentials #1612
- Fixed link and Go version #1609
- WIP Incident Redis Refactoring Madness #1497
- Add new ASA-specific struts and new metric #1545
- Lscountv2 #1464
- Moving silence to redis #1458
- Move snmp into the bosun.org repo since third party source is gone #1512
- 1096 notification use body switch #1101
- Tune elasticsearch collector. #1500
- Logging on post notifications. #1483
- Stopping flapping notifications and empty emails #1473
- Adding logging if empty templates are generated. #1447
scollector collector:
- local listener #1435
c_nexpose:
- Gather data on asset groups. #1502
travisci:
- add start/signal/stop test for bosun #1580
scollector+bosun:
- get vsphere mounted datastores and expose via h… #1515
tsdbrelay:
- removing udp listener. Adding /api/count to tsdbrelay. #1518
0.5.0-rc1
This is a major release with lots of features and bugfixes.
redis migration:
Bosun's internal data storage has been converted from purely in-memory to redis. If you still desire a standalone mode, it will fallback to a ledisdb instance hosted in-proc. For configuration details see the docs. Data will be migrated from the state file when you first run this release. Make sure your desired config is established before running. Data will not be deleted from bolt.
scollector:
- add metadata to redis collector #1622
- Fix for unset MaxQueueLen #1596
- Change Linux process monitoring to use regex for Comm… #1565
- only send osProc stats if proc count > 0 #1527
- normalize process metrics into os namespace #1524
- continue snmp walk on row error #1514
- poll sysDesc (os version) from cisco devices #1506
- get serial and os version for fortinets #1507
- Collect system uptime via snmp #1488
- Covert gauges to counters for os.cpu #1494
- fix puppet.run_duration and a few win.net.tcp metrics #1541
- c_procstats_linux: Add buffered and cached memory usage to os.mem.free. #1520
- Added ExtraHop collector #1491
Build:
- Dependencies managed with govendor now #1594
processes_linux:
- Add a tiebreaker if the process creation time is identical. #1455
c_keepalived:
- Add new SNMP entry so we work with new keepalived 1.2.19, and fix off-by-one bug. #1501
metadata:
- fix a goroutine and memory leak in sendMetadata() #1476
bosun:
- adding "unknownIsNormal" flag to alerts to convert unknown events into normal ones. #1620
- push annotations to top on graph page #1619
- add remote mac addresses to the host API #1551
- add UnmarshalJSON() to Status type #1555
- 2x API routes: Metadata for all metrics, metrics per tagk #1560
- denormalized metadata should resolve to parent metric #1519
- Show bar graph on the expr page if type == number #1011
- annotate support #1610
- fixing ledis error saving temp config. #1614
- Auto-Closing open alerts if alert doesn't exist anymore. #1604
- angular to from 1.2.x to 1.5 #1603
- Implementing purge and forceCLose actions #1599
- Add linelr func #1602
- optimizing redis access to get tag sets #1575
- serving temporary configs from redis. #1593
- Notifications moved to redis #1592
- fixing bug with chained notifications. #1584
- fixing empty email body, and handling absence better #1588
- fix migration from new install #1581
- performing a preliminary save on new incidents to keep templates consistent #1605
- Don't create filter in Opentsdb v2.1 #1569
- Add over, shift, and merge funcs #1598
- Shorten-only http proxy #1590
- Added missing bracket in dev.sample.conf #1536
- Add title text for dashboard glyphicons #1472
- Have auto filter on graph page #1487
- Make graph links from OpenTSDB 2.1 work with 2.2 #1530
- fix clearing tabs on graph page #1529
- Add proc information to and refactor host API #1526
collect:
- Flush purges internal collections as well. #1552
other:
- working party of elastic.v3 #1562
- Elastic v2 support and elastic expr refactor #1561
- Escape soapLogin credentials #1612
- Fixed link and Go version #1609
- WIP Incident Redis Refactoring Madness #1497
- Add new ASA-specific struts and new metric #1545
- Lscountv2 #1464
- Moving silence to redis #1458
- Move snmp into the bosun.org repo since third party source is gone #1512
- 1096 notification use body switch #1101
- Tune elasticsearch collector. #1500
- Logging on post notifications. #1483
- Stopping flapping notifications and empty emails #1473
- Adding logging if empty templates are generated. #1447
scollector collector:
- local listener #1435
c_nexpose:
- Gather data on asset groups. #1502
travisci:
- add start/signal/stop test for bosun #1580
scollector+bosun:
- get vsphere mounted datastores and expose via h… #1515
tsdbrelay:
- removing udp listener. Adding /api/count to tsdbrelay. #1518
0.5.0-alpha
This is a fairly large update from 0.4.0. This is a pre-release version and should be used with care.
redis migration:
We are in the process of migrating bosun's internal data storage from purely in-memory to redis. If you still desire a standalone mode, it will fallback to a ledisdb instance hosted in-proc. For configuration details see #1332. Currently Search, metadata, and errors are living in redis. They will be migrated from the state file when you first run this release. Make sure your desired config is established before running. Data will not be deleted from bolt.
bosun:
- convert datastore to Ledisdb/redis implementation. #1332
- Support seconds when representing "now" on the expression page #1426
- Add metric stats to host API #1405
- Add services and their running state to host API #1412
- exit on syscall.SIGTERM as well #1394
- search keeps set of all metric/tagsets #1361
- Add a list of open incidents to the Host API #1364
- Collect ack status info by notification #1358
- Allow to query graphite with Auth informations #1350
- Fixing panic if bucket does not exist for restoreState #1344
- add complete influx cred config, docs: update docs to reflect new config #1345
- enable group by interval in the influx() query #1304
util:
- reworking how util exec works with timeouts #1335
c_nexpose:
- Gather scan runtime. #1432
scollector:
- Get bridge information from SNMP net devices #1343
- Collect ifspeed metric and master metadata for linux teams #1372
- Get Ip Addresses and subnetmask in windows. #1389
- Get Ip Addresses and subnet mask in linux+snmp #1388
- Collect controller and ps hardware metadata #1383
- Send physical disk metadata as json object instead #1380
- Collect ifspeed metric and master metadata for linux teams #1377
- Linux interface metadata changes #1368
- Setup a fake url when the -p option is given #1337
- Add warn and fail levels for hw system board wattage #1357
- Change mac address metadata format for linux to match… #1352
- handle SIGINT for somewhat graceful shutdown #1342
- Adding scale parameter to snmp metrics #1340
tsdbrelay:
- fixing tsdbrelay for metadata #1392
- Relaying metadata to secondary relays. Also making relay that sets host header properly. #1326
metadata:
- moving shared constants between bosun and scollector into metadata package #1382
other:
- Add more info to Clean() err message. #1439
- Adding bar for current status to dashboard. #1438
- Trying to fix go-get import issues. #1431
- Allowing
MinGroupSize
setting to avoid hiding things unnecessarily. #1421 - Storing error data in redis #1419
- Add google_analytics collector. #1422
- generating html page for each package so go-get works #1413
- Add ciscobgp snmp collector #1408
- Add nexpose collector. #1407
- Add collector for systemd service status. #1374
- Document variables available for templates referent to incidents #1409
- rule page won't init new search. #1406
- Persist last to redis #1402
- Fix hbase collector for 1.0. #1390
- Updating dependencies #1369
- Migrating search data to redis #1356
- Party update #1351
- scollector generic SNMP documentation syntax and typo fix #1338
- cmd/scollector. Add powered_state and connection_state for vpshere guests #1333
- a few simple placeholder conf items to ease timing on future migration. #1331
c_google_analytics:
- Gather realtime metrics for browsers, trafficType, deviceCategory, and OS. #1436
0.4.0
bosun:
- InfluxDB query support #1291
- Concurrent checks #1231
- Better Error flow #1301
- Seperating core metric metadata into different lookup for performance #1323
- Add last API route returns most recent datapoint for metri… #1330
- Remove metrics from the API host route #1318
- Don't return hosts from /api/host if older than 7 days #1320
- Don't let unknowns trigger until a full check interval after restorestate finishes. #1313
- Only giving the dashboard the most recent action. #1279
- Retrying tsdb requests up to 3 times. #1300
- allowing variables on expression page. #1283
- Button to download config to local machine #1164
- Able to clear silences again. #1275
- Going to action page at no point uses url to transmit alert keys. #1273
- Fix slowness loading dashboard. #1170
- Fixing rule link from dashboard. #1169
- recovering from panic in chart rendering. #1237
- Add an API route to return all metrics and their associate… #1247
- Add normalize switch to graph page to display series on 0-… #1257
- Sending less data to dashboard. Lazily loading templates #1269
- cCount func counts the number of adjacent changes in a series #1268
- Export time.ParseDuration as ParseDuration to templates #1205
- reducing save frequency to 10 minutes. #1222
- adding debug endpoint to view status of schedule lock. #1220
- capturing timings around template rendering. #1196
- Refactor filter expr func to also filter out zero valued results #1206
- Only downsample OpenTSBD queries that don't have it specif… #1201
- Fix template functions to use the same time as the alert #1199
- making save rely on fewer goroutines needing to acquire the lock. #1187
- promote scalars to numberSets #1137
- Validating user and message on action. Requiring message to send notifications. #1167
- support / in lookup sections #1297
scollector:
- Move metadata command collection to scollector #1317
- count running linux processes from linux.proc collector #1316
- dfstat skip pseudo filesystems #1311
- Send a hi metric with a value of 1 #1319
- Get OS Version in CentOS 7 #1322
- Monitoring teams interfaces #1280
- send self metrics with tags from config #1303
- Propagate FullHost setting #1296
- github collector #1281
- Add MTU, AdminStatus, and OperationStatus to SNMP int… #1260
- RabbitMQ #1202
- Add metadata for SNMP network interface stats #1251
- Filter out elastic indices from monitoring via regexp #1243
- Identify the interface type via IF-MIB:ifType #1242
- use dmidecode to get serialNumber and model metadata … #1223
- Add additional Linux interfaces #1124
- riak allow custom URL #1204
- enable pprof = "ip:port" settings in TOML file #1190
- fix missing call to AddProcessDotNetConfig #1159
- Allowing multipart snmp keys in subtrees. #1271
tsdbrelay:
- support for secondary relays for cross-datacenter mirroring. #1210
travis:
docs:
- add resource page and update Bosun.org site #1272
- fix slack notification #1233
- Clarify that scollector supports TLS and HTTP Basic Auth. #1215
- Adding missing documentation #1182
collect:
other:
- Using slog everywhere instead of log. #1274
- Adding header to identify bosun host on all outgoing http requests. #1266
- Add User-Agent to HTTP notifications #1265
- Update logstashElasticHosts format #1259
- Snmp tester #1144
- Change Elasticsearch backend configuration key #1246
- Add jsonq which allows JSON object handling in templates #1176
- updating dependencies #1240
- fix duplicate key errors for graphiteHeader #1234
- support configuring misc http headers for Graphite requests #1221
- Always refer to docker-server-ip in the same way #1218
- a command line utility for silencing #1152
- making save run on a fixed clock #1192
- Return a NaN value instead of an error when the result is of zero len… #1171
- Fixed a typo in haproxy metric description #1166
- Fixed typo in configuration.md #1162
- removing solaris stuff #1155
0.3.0
bosun:
- improve graph tag display #1145
- implementing modulo operator % #1154
- update interval on duration change #1146
- Add dropl and dropg #1143
- Exposing version information via api/version #1134
- shorten the action page URLs #1126
- fix relative time graph expression links #1118
- download entire alerts dashboard at once #1112
- convert absolute to relative times for graph exprs #1105
- add pct template function #1083
- Improve silence page #1111
- Dashboard improvements #1098
- Graphite custom path #1091
- Concrete ls index name #1013
- Adding instrumentation around schedule lock. #1113
collect:
scollector:
- scollector.* metrics disabled properly with DisableSelf #1107
- Increase default batch size to 500 #1129
- Add descriptions to httpunit metrics hu.* #1127
- Order process identifiers by process creation timestamp. #1110
- fix ICMP TOML converter #1086
- WIP for DSC status #1084
- implement os.net.pause_frames #1090
- process_linux process start_time and uptime #1087
- error on unused toml keys #1080
- Replace external AD collector with built in collector #1074
- More generic SNMP collector #1089
- calculate os.net.bytes.total metric. #1141
build:
- making build status in github more granular #1121
doc:
- Add link to Monitorama 2015 presentation in the documentation. #1108
- Documenting function parameter types and returns #1071
- expression language doc edits #1128
bug:
- timeline template not rendered correctly. #1065
tsdbrelay:
- denormalization backfill app #1076