Releases: FoundationDB/fdb-kubernetes-operator
Releases · FoundationDB/fdb-kubernetes-operator
v1.33.0
v1.33.0
Changes
Operator
- Allow to specify the storage engine when running e2e tests #1948
- Allow to specify a cluster name in test suite #1946
- Correct docs and improve log output #1945
- add podName selection functions #1944
- Add support for replacing failed process groups in a fault domain #1943
- Let the operator requeue another reconcile attempt if more failed process groups are detected #1942
- use common process selection function in buggify crash-loop, buggify no-schedule, and restart commands #1941
- Add condition selection option to common processGroup selection function #1940
- Only log if the command was not timing out #1939
- Don't skip the exclusion for process groups without an address when using localities #1938
- Create a unified process selection function and use it in removeProcessGroups #1937
- Improve verbosity of cordon and remove process commands #1936
- Allow the remove sub command to remove by process class #1935
- add ProcessGroupIdPrefix to testing generateClusterStruct and correct test cases #1934
- If the database is unavailable and caching is enabled, allow operator to proceed #1932
- Making the cluster argument optional when we are removing pods in the… #1930
- Remove the global admin mutex for the admin client #1929
- Reduce the update calls for the FoundationDB resource #1926
- Make sure we test the status conditions before running the configure command #1852
v1.32.0
v1.32.0
Changes
Operator
- Add estimates and pretty printing to exclusion status check #1927
- Allow the operator to restart the CC when tester processes are unreachable #1925
- Add additional safety checks for bouncing and excluding processes #1924
- Update the FDB versions used for e2e tests #1923
- Add cache client selector #1922
- Add additional tests for the maintenance mode #1921
- Make sure we exclude the FDB_NETWORK_OPTION_CLIENT_THREADS_PER_VERSION env variable and don't pass it down to fdbcli #1920
- Change base image to rocky9 #1894
v1.31.0
Release v1.31.0
v1.30.0
v1.29.0
v1.29.0
Changes
Operator
- Fix logic when sidecar is unreachable for missing process detection #1896
- Update the way the operator deployment is created and verified #1895
- Add more logs when the fetched status has issues #1893
- Allow to set the namespace in the three data hall example #1892
- fixing error in AllAddressesExcluded #1891
- Add cache to plugin #1888
- Improve exclusion check to make sure that enough processes are up and running #1887
- Ignore HA locality-based exclusion test if version doesn't support locality based exclusions #1886
- Three data hall enable locking by default #1885
- Allow to define creationTimestamp mock #1882
- Add check for plugin version upgrades #1881
- Make sure the exclusion is tried again #1880
- Make sure the e2e test suite only runs locality based e2e tests if the provided versions support it #1879
- Refactor the check process method and remove some duplicate work #1870
- including processes using excluded server list #1857
v1.28.1
v1.28.0
v1.28.0
Changes
Operator
- Make sure the operator checks all provided addresses for exclusions #1875
- Simplify test setup #1871
- Three data hall fault domain storage check bug fix #1869
- Correct the version check for the locality based exclusions #1868
- Make sure the operator can proceed with exclusions even if multiple pods are failing #1867
- Add some buffer for the exclusion logic to mitigate dead locks #1866
- Add support for redwood storage engine #1865
- Make use of string builder for getting the configuration string #1863
- Fix resources limits for non performance tests #1859
- If locality-based exclusions are enabled, only make use of the locality #1844
v1.27.0
v1.27.0
Changes
Operator
- Update fdb go bindings to allow copile on MacOS #1858
- Add e2e test to make sure processes in maintenance zone are ignored #1848
- Update docs about resource limits and update defaults to 4GB for a minimal cluster #1847
- Improve the handling of test processes #1846
- Make sure that the PVC and service gets created if a process group gets quickly marked as removal #1845
- Add more logging for upgrade test failure #1843
- Fix flaky migration test setup #1842
- Remove the list call in the remove process groups reconciler and use the get method instead. #1838
- Add failure condition to logging in automatic replacements #1834
- Make the verification of the coordinators more flexible and use the hard limits to validate the locality distribution #1833
- Bump golang.org/x/net from 0.10.0 to 0.17.0 #1832
- Allow e2e test suite to specify downgrade versions #1831
- Increase knob rollout time #1830
- Update docs on fault domain and coordinators #1826
- Add DataLoader to make sure cluster has minimal data #1762
- Initial support for three data hall replication #1651
- Initial doc for replacement buckets #1368
v1.26.0
v1.26.0
Changes
Operator
- Correct the way an empty result is interpreted #1828
- Add coordinator fault tolerance check #1827
- Fix the exclusion logic for Pods that are recreated before being fully excluded #1825
- Let operator replace processes that are excluded but not yet marked as removed #1824
- Improve test stability #1823
- Handle large exclusions better #1822
- Add e2e test for changing a cluster to use DNS in cluster files #1821
- Add first upgrade test with DNS usage in cluster file and correct statement about DNS #1820
- Service missing namespace #1818
- Investigate HA test failures #1817
- Remove the MissingPVC condition for stateless processes #1815
- Add first set of tests for locality based exclusions and fix bug in locality based exclusions #1811
- Make use of errgroup and simplify some testing code #1810
- Refactor code to make use of the process group directly without passing the class and id number down #1807
v1.25.0
v1.25.0
Changes
Operator
- Split upgrade tests into tests with chaos mesh and without #1812
- Split upgrade tests as we hit the timeout for our CI pipeline #1808
- Correct the namespace creation for e2e tests #1806
- Allow to use max timeout for get status and allow to specify the max timeout #1805
- Improve operator lock handling by releasing locks once cluster is reconciled #1803
- Correct the value used for testing #1802
- Make use of the new fault tolerance methods #1797
- Fix the lock ID to return the correct value #1796
- Only set the MissingProcesses condition if the machine-readable status contains at least one process #1794
- Increase timeout for default single cluster upgrade #1792
- Initial design doc for suspending Process Groups #1785
- Remove unused hot-shard tool #1784