Site Search

Splunk

Splunk

Splunk technical blog Part 1 ~Technical support cases Event missing edition~

Introduction

Hello, I'm in charge of Macnica Splunk support.
This may be out of the blue, but how do Splunk users deal with maintenance and troubleshooting?
Technical information about Splunk can be found on the Web, including public information from Splunk. However, some people may be concerned about not knowing how to use this information when faced with a problem. Our support staff work with such customers every day to help resolve their issues.
In this article, we will introduce how to deal with Splunk event loss as an example.

Occurrence event: event missing

One of the multiple indexers in the customer's environment went down, and some data was not captured as events during that period.

Customer requests

  • I want to import missing events into the indexer
  • I want to know how to prevent it from happening again

Inquiry matters

We received the following inquiries from our customers.

  1. How to import missing events into Splunk (indexer)
  2. The forwarder has load balancing settings, but if a failure occurs in the indexer, will events not be captured?
  3. What should I do as a countermeasure?

Flow to solution

① How to import missing events into Splunk (indexer)

We have provided you with the steps below.

1. Re-importing the original data (log file) using the add oneshot command

If an event is missing, the data will be re-imported. However, since add oneshot imports files in units of files, duplication of events will occur.

2. Find duplicate data

Therefore, search for duplicate data using the following SPL.

(Command example)

index="AAAA" (index containing duplicate events)
| eval cd=_cd
| search [search source="AAAA" (data source with duplicate events)
| stats count last(_cd) AS cd by _raw
| search count > 1
| fields cd ]

3. Eliminate duplicate data with the delete command

After confirming that the event output above is to be deleted, add the delete command to the end of the command example above and execute it. Remove duplicate data.

*The delete command flags the target event for deletion, making it impossible to search for it. Since the data itself is not deleted, it cannot be used to free up disk space.

② The forwarder has load balancing settings, but if a failure occurs in the indexer, will events not be captured?

Normally, if load balancing is configured, the forwarder will follow the settings and switch the indexer to which the data will be forwarded. Also, if the indexer goes down, the transfer destination will be changed from indexer A to indexer B, as shown in the diagram below. At first glance, it appears that missing events do not occur as you might think.

usually

However, missing events may occur in the following cases:

 

Case 1. Data is lost on the NW route after the forwarder sends the data.

Case 2. The indexer goes down after the forwarder sends the data, and the data is not received.

Case 3. The indexer is down and the received data cannot be imported into the event.

Case1

Case2

Case3

③What should be done as a countermeasure?

As a result of our investigation, we found that in the customer's environment, the indexer was down, causing events to be lost due to Cases 2 and 3. To prevent event loss in the future, you can take the following measures.

・Use of Indexer Acknowledgment function

The Indexer Acknowledgment function can be used as a countermeasure for Cases 2 and 3 (it is also effective for Case 1). When the data transferred from the forwarder is written to the indexer's disk, an ACK will be returned to the forwarder, which prevents data from being lost. However, by enabling this feature, there is a possibility that there will be a delay in data ingestion, or that data will be retransmitted from the forwarder due to network failure and the indexer does not return an ACK, resulting in duplicate events.

Also, this time the antivirus was causing the indexer to go down.

・Antivirus scan exclusion settings

It is recommended to exclude Splunk processes and installation directories from being scanned by antivirus products, and this customer did the same.

result:

The case has been closed and we have not received any reports of recurrence of missing events to date.

in conclusion

Until the end Thank you for reading. We hope that this article will be of some help to you.

Inquiry/Document request

In charge of Macnica Splunk Co., Ltd.

Weekdays: 9:00-17:00