Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatling index too large #250

Closed
1 of 4 tasks
wayneseymour opened this issue Apr 11, 2022 · 4 comments · May be fixed by #251
Closed
1 of 4 tasks

Gatling index too large #250

wayneseymour opened this issue Apr 11, 2022 · 4 comments · May be fixed by #251
Assignees
Labels
enhancement New feature or request

Comments

@wayneseymour
Copy link
Member

wayneseymour commented Apr 11, 2022

We've one index gatling-data-2021-11 that has too much data.
Currently it has 134.5gb.

We want it split into monthly chunks as we think this one large index will
make upgrades difficult.

  • reindex 2021-11 data that fits only in November into a temp 2021-11
  • delete the orig 2021-11
  • reindex the temp 2021-11 back to the orig name
  • clean up gatling-data which means reindexing any data that is in it, then deleting it and immediately completing Gatling data retention #249
@wayneseymour wayneseymour added the enhancement New feature or request label Apr 11, 2022
@wayneseymour wayneseymour self-assigned this Apr 11, 2022
@wayneseymour
Copy link
Member Author

@marius-dr gave me some help, recording it here:

PUT gatling-data-2021-07
PUT gatling-data-2021-07/_mapping
{
      "properties": {
        "CI_BUILD_ID": {
          "type": "integer",
          "coerce": true
        },
        "CI_BUILD_URL": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "CI_RUN_URL": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "baseUrl": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "branch": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "buildHash": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "buildNumber": {
          "type": "long"
        },
        "deploymentId": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "esBuildDate": {
          "type": "date"
        },
        "esBuildHash": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "esLuceneVersion": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "esUrl": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "esVersion": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "isCloudDeployment": {
          "type": "boolean"
        },
        "isSnapshotBuild": {
          "type": "boolean"
        },
        "kibanaBranch": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "maxUsers": {
          "type": "long"
        },
        "message": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "method": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "requestBody": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "requestHeaders": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "requestSendStartTime": {
          "type": "date"
        },
        "requestTime": {
          "type": "long"
        },
        "responseBody": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "responseHeaders": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "responseReceiveEndTime": {
          "type": "date"
        },
        "responseStatus": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "scenario": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "status": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "timestamp": {
          "type": "date"
        },
        "url": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "userId": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "version": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }

}
PUT gatling-data-2021-11/_mapping
{
    "runtime": {
      "path": {
        "type": "keyword",
          "script": {
            "source": "emit(doc['url.keyword'].value.replace(doc['baseUrl.keyword'].value, ''))",
            "lang": "painless"
          }
      }
    }
}
POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "gatling-data-1",
    "query": {
      "range" : {
        "timestamp" : {
           "lte" : "2021-07-01"
        }
      }
    }
  },
  "dest": {
    "index": "gatling-data-2021-06"
  }
}

This was referenced Apr 12, 2022
@wayneseymour
Copy link
Member Author

Turns out I broke them into week groups, but we want it broken down by month.
This means the gatling-data-2021-11 has more than November within it.

@wayneseymour
Copy link
Member Author

Aggregate by month, all the data in the index

GET gatling-data-2021-11/_search
{
  "size": 0,
  "aggs": {
    "monthly": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "month"
      },
      "aggs": {
        "actually-has-docs": {
          "bucket_selector": {
            "buckets_path": {
              "doc_count": "_count"
            },
            "script": "params.doc_count != 0"
          }
        }
      }
    }
  }
}

Result of the reindex requests

GET gatling-data-2022-01/_count
GET gatling-data-2022-02/_count
GET gatling-data-2022-03/_count
GET gatling-data-2022-04/_count

@wayneseymour
Copy link
Member Author

@LeeDr @marius-dr
I've triple checked several times and kibana stats prod has no November 2021 data in it.

Request

GET gatling-data-2021-11/_search
{
  "size": 0,
  "aggs": {
    "monthly": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "month"
      }
    }
  }
}

Response

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "monthly" : {
      "buckets" : [
        {
          "key_as_string" : "2022-01-01T00:00:00.000Z",
          "key" : 1640995200000,
          "doc_count" : 5896381
        },
        {
          "key_as_string" : "2022-02-01T00:00:00.000Z",
          "key" : 1643673600000,
          "doc_count" : 18684255
        },
        {
          "key_as_string" : "2022-03-01T00:00:00.000Z",
          "key" : 1646092800000,
          "doc_count" : 11748230
        },
        {
          "key_as_string" : "2022-04-01T00:00:00.000Z",
          "key" : 1648771200000,
          "doc_count" : 3669161
        }
      ]
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant