CSV data field extraction method - Security business -Macnica

Security business menu

Security business
HOME

To be elected
cause

Service

Handling Manufacturer

Examples/reports/
Blog/Glossary

Seminar/
video on demand

TOP

Products/Services

Specifications/Technical Information

Specifications/Technical Information

solution

solution

User stories

support

Seminar content

Evaluation machine application/FAQ

Application for evaluation machine
- Splunk Evaluation Download
FAQ

Document request

inquiry

Event/Seminar

video on demand

CSV data field extraction method

release date: 2018-06-18

last updated: 2023-11-27

version: Splunk Enterprise 9.0.4

Overview: How to import a CSV file and extract fields

Reference information

https://docs.splunk.com/Documentation/Splunk/9.0.4/Data/Extractfieldsfromfileswithstructureddata

content

There are three ways to import a CSV file and extract fields.

How to extract fields when searching
How to extract fields when importing data
2-1. How to extract fields using Splunk Web
2-2. How to write in the configuration file and extract fields

How to extract fields when searching

To extract fields when searching, write the following in the configuration file.

【overview】

Describe the settings for field extraction in props.conf and transforms.conf. *Details will be explained later
Write the source type created in step 1 to inputs.conf. *Details will be explained later
If you changed inputs.conf in step 2, restart the Splunk service on that server for the settings to take effect.

===
$SPLUNK_HOME/bin/splunk restart
===

*$SPLUNK_HOME is the installation directory. By default it is as follows.

Linux :
Splunk Enterprise : /opt/splunk
Universal Forwarder : /opt/splunkforwarder
Windows :
Splunk Enterprise : C:\Program Files\Splunk
Universal Forwarder : C:\Program Files\SplunkUniversalForwarder

We will introduce two ways to set up steps 1 and 2 above.

Method 1: Extract using delimiter (comma) after importing data

For data that can be separated simply by a delimiter (comma in CSV), define the delimiter (DELIMS) and field name (FIELDS) in the transforms.conf settings to perform field extraction.

Method 1 can be used even when the number of columns is added. (Recommendation)
If a column is added, the added field will not be extracted automatically. Field extraction is also possible for added columns by configuring the added settings.

< Example data >

2015/01/01 00:00:00, aaaa, bbbb, ccc, ddd
2015/01/01 00:00:01,111,2222,333,ddd

props.conf
Location: Splunk server that performs the search
Setting Example:

[testcsv]
KV_MODE = none
REPORT-testHeader = test_header

transforms.conf
Location: Splunk server that performs the search
Setting Example:

[test_header]
DELIMS = ","
FIELDS = "header_field1", "header_field2", "header_field3", "header_field4", "header_field5"

inputs.conf (optional: skip if already configured)
Location: Splunk server or forwarder for data input
Setting Example:

[monitor:///opt/test/data/test1.csv]
sourcetype=testcsv

*Specify the delimiter using DELIMS.
*Define each field with FIELDS.
*Please use alphanumeric characters and underscores for field characters.
*It is not necessary to restart the Splunk service after changing the props.conf and transforms.conf settings. After changing inputs.conf settings, you need to restart the Splunk service.
*If the data contains CSV escape characters (\\ or "), fields cannot be extracted using method 1. In this case, use method 2.

Method 2: How to extract fields using regular expressions after data ingestion

Field extraction is performed using regular expressions (REGEX) in the transforms.conf settings.

Method 2 cannot be used when the number of columns is added. This is recommended only for extractions that cannot be handled by method 1. (If the specified regular expression no longer matches due to the addition of a column, both the existing part and the added part will no longer be extracted as fields.)

< Example data >

2015/01/01 00:00:00, aaaa, bbbb, ccc, ddd
2015/01/01 00:00:01,111,2222,333,ddd

props.conf
Location: Splunk server or forwarder with data input settings
Setting Example:

[testcsv]
INDEXED_EXTRACTIONS=csv
FIELD_NAMES="header_field1", "header_field2", "header_field3", "header_field4", "header_field5"

inputs.conf
Location: Splunk server or forwarder with data input settings
Setting Example:

[monitor:///opt/test/data/test2.csv]
sourcetype=testcsv

fields.conf
Location: Splunk server that performs the search
Setting Example:

[sourcetype::testcsv::*]
INDEXED=True

*header_field1 is the field name. (Specify the field name within <>.)
*It is not necessary to restart the Splunk service after changing the props.conf and transforms.conf settings. After changing inputs.conf settings, you need to restart the Splunk service.
*Regular expressions are just examples. Please create the actual regular expression by yourself.

2. How to extract fields when importing data
There are two ways to extract fields when importing data: configuring from Splunk Web and configuring by writing in a configuration file.
2-1. How to extract fields using Splunk Web
If the CSV file has a header or does not have a header, we will explain each method.

*Settings using this method are limited to the following environments. For environments other than those listed below, please use "2-2. How to extract fields by writing them in a configuration file."

Standalone environment
Heavy forwarder with Splunk Web enabled

If the CSV file has a header

Log in to SplunkWeb.
Click Settings > Add Data.
Select "Upload" on the data addition screen.
Drag and drop the CSV file on the source selection screen.
Click the "Next" button.
Click the source type button on the source type setting screen and select Structured > csv.
(If the source type is "csv", leave it as is)
Make sure that the displayed event values are separated by field.
(Optional) Configure other settings.
Click the Save As button.
Set the source type name, save destination app, etc., and click the "Save" button.
Click the "Next" button.
Specify the host and index and click the "Confirm" button.

2-2. How to write in the configuration file and extract fields
If the CSV file has a header or does not have a header, we will explain each method.

If the CSV file has a header

Extracts field information from header rows when importing CSV files.

< Example data >

field1,field2,field3,field4,field5
2015/01/01 00:00:00,aaaa,bbbb,ccc,ddd
2015/01/01 00:00:01,111,2222,333,ddd

*The first line is the header line.

props.conf
Location: Splunk server or forwarder with data input settings
Setting Example:

[testcsv]
INDEXED_EXTRACTIONS=csv

inputs.conf
Location: Splunk server or forwarder with data input settings
Setting Example:

[monitor:///opt/test/data/test2.csv]
sourcetype=testcsv

fields.conf
Location: Splunk server that performs the search
Setting Example:

[sourcetype::testcsv::*]
INDEXED=True

*After changing the props.conf and inputs.conf settings, you need to restart the Splunk service.
*After changing the settings in fields.conf, access the URL below and press the [refresh] button to apply the settings.

http://<Splunk server IP address>:8000/debug/refresh

If the CSV file has no header

Specify header fields when importing a CSV file.

< Example data >

2015/01/01 00:00:00, aaaa, bbbb, ccc, ddd
2015/01/01 00:00:01,111,2222,333,ddd

props.conf
Location: Splunk server that performs the search
Setting Example:

[testcsv]
REPORT-testHeader=test_header

transforms.conf
Location: Splunk server that performs the search
Setting Example:

[test_header]
REGEX = ^(?<header_field1>[^,]+),(?<header_field2>[^,]+),(?<header_field3>[^,]+),(?<header_field4>[^,]+),(?<header_field5>[^,]+)

inputs.conf (optional: skip if already configured)
Location: Splunk server or forwarder with data input settings
Setting Example:

[monitor:///opt/test/data/test2.csv]
sourcetype=testcsv

http://<Splunk server IP address>:8000/debug/refresh

Supplementary information

Splunk identifies files to ingest by the first 256 bytes.

If there is the same data in the first 256 bytes such as the header part of the CSV file, even if the contents of the file are different, the first 256 bytes will be regarded as the same file as the existing file and will not be imported.

Even if the same data (header) exists at the top of the file, it will be recognized as a different file and if a new file is to be imported, set the following in the inputs.conf file.

[monitor:///tmp]
crcSalt = <SOURCE>

*Please change the path information (/tmp) for [monitor:///tmp] depending on your environment.

that's all

to previous page

In charge of Macnica Splunk Co., Ltd.

inquiry Document request

TEL：045-476-2010
E-mail：splunk-sales@macnica.co.jp

Weekdays: 9:00-17:00

Site Search