Below is a diagram of the LME architecture with labels referring to possible issues at that specific location. Refer to the chart below for protocol information, process information, log file locations, and common issues at each point in LME.
You can also find more detailed troubleshooting steps for each chapter after the chart.
Figure 1: Troubleshooting overview diagram
Diagram Ref | Protocol information | Process Information | Log file location | Common issues |
---|---|---|---|---|
a | Outbound WinRM using TCP 5985 Link is HTTP, underlying data is authenticated and encrypted with Kerberos. See this Microsoft article for more information |
On the Windows client, Press Windows key + R. Then type 'services.msc' to access services on this machine. You should have: ‘Windows Remote Management (WS-Management)’ and ‘Windows Event Log’ Both of these should be set to automatically start and be running. WinRM is started via the GPO that is applied to clients. |
Open Event viewer on Windows Client. Expand ‘Applications and Services Log’->’Microsoft’->’Windows’->’Eventlog-ForwardingPlugin’->Operational | “The WinRM client cannot process the request because the server name cannot be resolved.” This is due to network issues (VPN not up, not on local LAN) between client and the Event Collector. |
b | Inbound WinRM TCP 5985 | On the Windows Event Collector, Press Windows key + R. Then type 'services.msc' to access services on this machine. You should have: ‘Windows Event Collector’ This should be set to automatic start and running. It is enabled with the GPO for the Windows Event Collector. |
Open Event viewer on Windows Event Collector. Expand ‘Applications and Services Log’->’Microsoft’->’Windows’->’EventCollector’->Operational Also, in Event Viewer check the subscription is active and clients are sending in logs. Click on ‘Subscriptions’, then right click on ‘lme’ and ‘Runtime Status’. This will show total and active computers connected. |
Restarting the Windows Event Collector machine can sometimes get clients to connect. |
c | Outbound TCP 5044. Lumberjack protocol using TLS mutual authentication. Certificates generated as part of the install, and downloaded as a ZIP from the Linux server. |
On the Windows Event Collector, Press Windows key + R. Then type 'services.msc' to access services on this machine. You should have: ‘winlogbeat’. It should be set to automatically start and is running. |
%programdata%\winlogbeat\logs\winlogbeat | TBC |
d | Inbound TCP 5044. Lumberjack protocol using TLS mutual authentication. Certificates generated as part of the install. |
On the Linux server type ‘sudo docker stack ps lme’, and check that lme_logstash, lme_kibana and lme_elasticsearch all have a current status of running. | On the Linux server type: ‘sudo docker service logs -f lme_logstash’ |
TBC |
If you receive the error Windows cannot find 'gpmc.msc'
, you need to install the optional feature Group Policy Management Tools
.
-
For Windows Server, follow Microsoft's instructions here. In short, you need to add the "Group Policy Management" Feature from the "Add Roles and Features" menu in Server Manager.
-
For Windows 10/11, open the "Run" dialog box by pressing Windows key + R. Run the command
ms-settings:optionalfeatures
to open Windows Optional Features in Settings. Select "Add a Feature," then scroll down until you findRSAT: Group Policy Management Tools
. Check the box next to it and select install.Figure 2: Add a feature
Figure 3: Install RSAT: Group Policy Management Tools
-
Note: You only need
gpmc.msc
installed on one machine to manage the others. For example, you can install it only on the Domain Controller and modify the Group Policy from that machine.
If you receive the error dsa.msc
cannot be found, you will need to install Active Directoy Domain Services
. The process is nearly identical to the above section Installing Group Policy Management Tools, save for the following exceptions:
- For Windows Server, the feature is located under "Remote Server Administration Tools". Expand by pressing the arrow on the left and check the box next to
Role Administration Tools
. The other nested features should be selected as well. - For Windows 10/11, the Optional Feature to install is called
RSAT: Active Directory Domain Services and Lightweight Directory Services Tools
.
If you are having trouble not seeing Sysmon logs in the client's Event Viewer or not seeing forwarded logs on the WEC, first try restarting all of your systems and running gpupdate /force
on the domain controller and clients.
When diagnosing issues in installing Sysmon on the clients using Group Policy, the first place to check is Task Scheduler
on one of the clients. Look for LME-Sysmon-Task
listed under "Active Tasks." Based on whether or not the task is listed, different troubleshooting steps will prove useful:
- If the task isn't listed either the GPO hasn't been applied or the Task isn't properly configured. See both Step 1 and Step 2.
- If the task is listed, the GPO has been applied, but either the Task has yet to run or it isn't properly configured. See Step 2 and Step 3.
By default, Windows will update group policy settings only every 90 minutes. You can manually trigger a group policy update by running gpupdate /force
in a Command Prompt window on the Domain Controller and the client.
If after ensuring that group policy is updated on the client the client is still missing LME-Sysmon-Task
, continue to Step 2.
Windows Tasks are a fickle beast. In order for a task to trigger for the first time, the trigger time must be set at some time in the future, even if the Task is set to run repeatedly at a given interval.
If you don't see sysmon64
listed in services.msc
, it's likely the install script failed somehow. Double check that the files are organized correctly according to the diagram in the Chapter 2 checklist.
The winlogbeat
service installed in section 3.3 is responsible for sending events from the collector to Kibana. Confirm the winlogbeat
service is running and check the log file (C:\ProgramData\winlogbeat\logs
) for errors.
By default the ForwardedEvents
maximum log size is around 20MB so events will be lost if the winlogbeat
service stops. Consider increasing the size of the ForwardedEvents
log file to help reduce log loss in this scenario. Historical logs are sent once the winlogbeat
service starts.
- Open Microsoft Event View (
eventvwr
) - Expand Windows Logs and right click Forwarded Events
- Click properties
- Adjust _Maximum log size (KB)_ to a higher value. Note that the system will automatically adjust the size to the nearest multiple of 64KB.
Please be aware that Logging Made Easy does not currently support logging Domain Controllers, and the log volumes may be significant from servers with this role. If you wish to proceed forwarding logs from your Domain Controllers please be aware you do this at your own risk! Monitoring such servers has not been tested and may have unintended side effects.
If there are size contstraints on your system and your system doesn't meet our expected requirements, you could run into issues like this ISSUE.
You can try this: DISK-SPACE-20.04
root@util:# vgdisplay
root@util:# lvextend -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv
root@util:~# resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
Usually if you have issues with containers restarting there is probably something wrong with your host or the container itself. Lik in the above sample, a wrong password could be prevent the Elastic Stack from operating properly. You can check the container logs like so:
#TO list the name of the container
sudo docker ps --format "{{.Names}}"
#Using the above name you found, check its logs here.
sudo docker logs -f [CONTAINER_NAME]
Hopefully that is enough to determine the issue, but below we have some common issues you could encounter:
If you encounter errors like this in the container logs, probably your host ownership or permissions for mounted files, don't match what the container expects them to be. In this case the /usr/share/elasticsearch/backups
which is mapped from /opt/lme/backups
on the host.
You can see this in the docker-compose-stack.yml file:
╰─$ cat Chapter\ 3\ Files/docker-compose-stack.yml | grep -i volume -A 5
volumes:
- type: volume
source: esdata
target: /usr/share/elasticsearch/data
- type: bind
source: /opt/lme/backups
target: /usr/share/elasticsearch/backups
To fix this you can change the permissions to what the conatiner expects:
sudo chown -R 1000:1000 /opt/lme/backups
The user id in the container is 1000, so by setting the proper owner we fix the directory permission issue.
We know this by investigating the backing docker container image for elasticsearch LINK GITHUB
This was a bug that was fixed in the current iteration of deploy.sh. This occurs if the elastic
user password was already set in a previous deployment of LME. The easiest fix for this is to delete your old LME volumes as that will clear out any old settings that would be preventing install.
#DONT RUN THIS IF YOU HAVE DATA YOU WANT TO PRESERVE!!
sudo docker volume rm lme_esdata
sudo docker volume rm lme_logstashdata
However most users will probably want to preserve their data, so using the following method you can reset the user password for the built-in elastic user.
Run the following commands to reset your user password to a known password
#grab the name:
sudo docker ps --format "{{.Names}}" | grep -i elastic
#go into elasticsearch container
sudo docker exec -it ${NAME_HERE} /bin/bash
#ignore cert issues with our self signed cert:
echo "xpack.security.http.ssl.verification_mode: certificate" >> config/elasticsearch.yml
#reset in the container:
#add a -f if needed
elasticsearch-reset-password -v -u elastic -i --url https://localhost:9200
If the elasticsearch-reset-password is not available in your version of elasticsearch, you may be able to try recreating the container with a newer version of LME and running the same above steps. We have not tested this last suggestion, so attempting this last step won't be supported, but is worth a try if none of the above works.
Sometimes environmental differences can make the installation process get screwed up ISSUE. If you have the luxury, you could perform a full reinstall:
If you are unable to access https://<LINUX_SERVER_IP/HOSTNAME>, this is most likely because the elasticsearch service fails to run on the Linux server. To perform a full reinstall:
cd /opt/lme/Chapter\ 3\ Files/
sudo ./deploy.sh uninstall
#delete everything:
rm -r /opt/lme
#Reclone the LME repository into /opt/lme/:
git clone [email protected]:cisagov/LME.git /opt/lme/
#Navigate back to Chapter 3 Files:
cd /opt/lme/Chapter\ 3\ Files/
sudo ./deploy.sh install
#Save credentials, then continue with Chapter 3 installation
Optionally you could uninstall docker entirely and reinstall it from the deploy.sh script. If you do end up removing Docker this link could be helpful: https://askubuntu.com/a/1021506.
This step should not be required by default, and should only be used if the installer has failed to automatically populate the expected dashboards or if you wish to make use of your own modified version of the supplied visualizations.
Each dashboard and its visualization objects is contained within a NDJSON file (previously JSON) and can be easily imported
You can now import the dashboards by clicking ‘Management’ -> ‘Stack Management’ -> ‘Saved Objects’. Please follow the steps in Figure 4, and the NDJSON files are located in Chapter 4 Files\dashboards.
Figure 4 - Steps to import objects
Elastic maintain a series of troubleshooting guides which should be consulted as part of the standard investigation process if the issue you are experiencing is within the Elastic stack within LME.
These guides can be found here and cover a number of common issues which may be experienced.
If the Discover section of Kibana is persistently showing the wrong index by default it is worth checking that the winlogbeat index pattern is still set as the default within Kibana. This can be done using the steps below:
Select "Stack Management" from the left hand menu:
Select "Index Patterns" under Kibana Stack Management:
Verify that the "Default" label is set next to the winlogbeat-*
Index pattern:
If this Index pattern is not selected as the default, this can be re-done by clicking on the winlogbeat-*
pattern and then selecting the following option in the subsequent page:
There are a number of reasons why the cluster's health may be yellow or red, but a common cause is unassigned replica shards. As LME is a single-node instance by default this is means that replicas will never be assigned, but this issue is commonly caused by built-in indices which do not have the index.auto_expand_replicas
value correctly set. This will be fixed in a future release of Elastic, but can be temporarily diagnosed and resolved as follows:
Check the cluster health by running the following request against Elasticsearch (an easy way to do this is to navigate to Dev Tools
in Kibana under Management
on the left-hand menu):
GET _cluster/health?filter_path=status,*_shards
If it shows any unassigned shards, these can be enumerated with the following command:
GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state
If the UNASSIGNED
shard is shown as r
rather than p
this means it's a replica. In this case the error can be safely fixed in the single-node default installation of LME by forcing all indices to have a replica count of 0 using the following request:
PUT _settings
{
"index.number_of_replicas": 1
}
Further information on this and general advice on troubleshooting an unhealthy cluster status can be found here, if the above solution was unable to resolve your issue.
For errors encountered when re-indexing existing data as part of an an LME version upgrade please review the Elastic re-indexing documentation for help, available here.
With the correct mapping in place it is not possible to store a string value in any of the fields which represent IP addresses, for example source.ip
or destination.ip
. If any of these values are represented in your current data as strings, such as LOCAL
it will not be possible to successfully re-index with the correct mapping. In this instance the simplest fix is to modify your existing data to store the relevant fields as valid IP representations using the update_by_query method, documented here.
An example of this is shown below, which may need to be modified for the particular field that is causing problems:
POST winlogbeat-11.06.2021/_update_by_query
{
"script": {
"source": "ctx._source.source.ip = '127.0.0.1'",
"lang": "painless"
},
"query": {
"match": {
"source.ip": "LOCAL"
}
}
}
Note that this will need to be run for each index that contains problematic data before re-indexing can be completed.
For security the self-signed certificates generated for use by LME at install time will only remain valid for a period of two years, which will cause LME to stop functioning once these certificates expire. In this case the certificates can be recreated by following the instructions detailed here.
If you encounter an error when the dashboards are updated using the dashboard update script, either manually or as part of automatic updates, this may mean that your current version of Elastic is too old to support the minimum functionality required for the new dashboard versions. Ensure that the latest supported version of the Elastic stack is in use with the following command:
cd /opt/lme/Chapter\ 1\ Files/
sudo ./deploy.sh update
Then upload the latest dashboards by following one of the methods described here.
If you are on Windows 2016 or higher and are getting error code 2150859027, or messages about HTTP URLs not being available in your Windows logs, we suggest looking at this guide.
LME currently runs using the docker stack deployment architecture.
To Stop LME:
sudo docker stack rm lme
To Start LME:
sudo docker stack deploy lme --compose-file /opt/lme/Chapter\ 3\ Files/docker-compose-stack-live.yml