CSV data field extraction method

release date
2018-06-18
last updated
2023-11-27
version
Splunk Enterprise 9.0.4
Overview
How to import a CSV file and extract fields
Reference information
content

There are three ways to import a CSV file and extract fields.

  • How to extract fields when searching
  • How to extract fields when importing data
    2-1. How to extract fields using Splunk Web
    2-2. How to write in the configuration file and extract fields
  • How to extract fields when searching

To extract fields when searching, write the following in the configuration file.

【overview】

  • Describe the settings for field extraction in props.conf and transforms.conf. *Details will be explained later
  • Write the source type created in step 1 to inputs.conf. *Details will be explained later
  • If you changed inputs.conf in step 2, restart the Splunk service on that server for the settings to take effect.
===
$SPLUNK_HOME/bin/splunk restart
===

*$SPLUNK_HOME is the installation directory. By default it is as follows.

Linux :
Splunk Enterprise : /opt/splunk
Universal Forwarder : /opt/splunkforwarder
Windows :
Splunk Enterprise : C:\Program Files\Splunk
Universal Forwarder : C:\Program Files\SplunkUniversalForwarder

We will introduce two ways to set up steps 1 and 2 above.

Method 1: Extract using delimiter (comma) after importing data

For data that can be separated simply by a delimiter (comma in CSV), define the delimiter (DELIMS) and field name (FIELDS) in the transforms.conf settings to perform field extraction.

<Application>

Method 1 can be used even when the number of columns is added. (Recommendation)
If a column is added, the added field will not be extracted automatically. Field extraction is also possible for added columns by configuring the added settings.

< Example data >

2015/01/01 00:00:00, aaaa, bbbb, ccc, ddd
2015/01/01 00:00:01,111,2222,333,ddd

<Setting example>

props.conf
Location: Splunk server that performs the search
Setting Example:

[testcsv]
KV_MODE = none
REPORT-testHeader = test_header

transforms.conf
Location: Splunk server that performs the search
Setting Example:

[test_header]
DELIMS = ","
FIELDS = "header_field1", "header_field2", "header_field3", "header_field4", "header_field5"

inputs.conf (optional: skip if already configured)
Location: Splunk server or forwarder for data input
Setting Example:

[monitor:///opt/test/data/test1.csv]
sourcetype=testcsv

*Specify the delimiter using DELIMS.
*Define each field with FIELDS.
*Please use alphanumeric characters and underscores for field characters.
*It is not necessary to restart the Splunk service after changing the props.conf and transforms.conf settings. After changing inputs.conf settings, you need to restart the Splunk service.
*If the data contains CSV escape characters (\\ or "), fields cannot be extracted using method 1. In this case, use method 2.

Method 2: How to extract fields using regular expressions after data ingestion

Field extraction is performed using regular expressions (REGEX) in the transforms.conf settings.

<Application>

Method 2 cannot be used when the number of columns is added. This is recommended only for extractions that cannot be handled by method 1. (If the specified regular expression no longer matches due to the addition of a column, both the existing part and the added part will no longer be extracted as fields.)

< Example data >

2015/01/01 00:00:00, aaaa, bbbb, ccc, ddd
2015/01/01 00:00:01,111,2222,333,ddd

<Setting example>

props.conf
Location: Splunk server or forwarder with data input settings
Setting Example:

[testcsv]
INDEXED_EXTRACTIONS=csv
FIELD_NAMES="header_field1", "header_field2", "header_field3", "header_field4", "header_field5"

inputs.conf
Location: Splunk server or forwarder with data input settings
Setting Example:

[monitor:///opt/test/data/test2.csv]
sourcetype=testcsv

fields.conf
Location: Splunk server that performs the search
Setting Example:

[sourcetype::testcsv::*]
INDEXED=True

*header_field1 is the field name. (Specify the field name within <>.)
*It is not necessary to restart the Splunk service after changing the props.conf and transforms.conf settings. After changing inputs.conf settings, you need to restart the Splunk service.
*Regular expressions are just examples. Please create the actual regular expression by yourself.

2. How to extract fields when importing data
There are two ways to extract fields when importing data: configuring from Splunk Web and configuring by writing in a configuration file.
2-1. How to extract fields using Splunk Web
If the CSV file has a header or does not have a header, we will explain each method.

*Settings using this method are limited to the following environments. For environments other than those listed below, please use "2-2. How to extract fields by writing them in a configuration file."

Standalone environment
Heavy forwarder with Splunk Web enabled

If the CSV file has a header

  • Log in to SplunkWeb.
  • Click Settings > Add Data.
  • Select "Upload" on the data addition screen.
  • Drag and drop the CSV file on the source selection screen.
  • Click the "Next" button.
  • Click the source type button on the source type setting screen and select Structured > csv.
    (If the source type is "csv", leave it as is)
  • Make sure that the displayed event values are separated by field.
  • (Optional) Configure other settings.
  • Click the Save As button.
  • Set the source type name, save destination app, etc., and click the "Save" button.
  • Click the "Next" button.
  • Specify the host and index and click the "Confirm" button.

2-2. How to write in the configuration file and extract fields
If the CSV file has a header or does not have a header, we will explain each method.

If the CSV file has a header

Extracts field information from header rows when importing CSV files.

< Example data >

field1,field2,field3,field4,field5
2015/01/01 00:00:00,aaaa,bbbb,ccc,ddd
2015/01/01 00:00:01,111,2222,333,ddd

*The first line is the header line.

<Setting example>

props.conf
Location: Splunk server or forwarder with data input settings
Setting Example:

[testcsv]
INDEXED_EXTRACTIONS=csv

inputs.conf
Location: Splunk server or forwarder with data input settings
Setting Example:

[monitor:///opt/test/data/test2.csv]
sourcetype=testcsv

fields.conf
Location: Splunk server that performs the search
Setting Example:

[sourcetype::testcsv::*]
INDEXED=True

*After changing the props.conf and inputs.conf settings, you need to restart the Splunk service.
*After changing the settings in fields.conf, access the URL below and press the [refresh] button to apply the settings.

http://<Splunk server IP address>:8000/debug/refresh

If the CSV file has no header

Specify header fields when importing a CSV file.

< Example data >

2015/01/01 00:00:00, aaaa, bbbb, ccc, ddd
2015/01/01 00:00:01,111,2222,333,ddd

<Setting example>

props.conf
Location: Splunk server that performs the search
Setting Example:

[testcsv]
REPORT-testHeader=test_header

transforms.conf
Location: Splunk server that performs the search
Setting Example:

[test_header]
REGEX = ^(?<header_field1>[^,]+),(?<header_field2>[^,]+),(?<header_field3>[^,]+),(?<header_field4>[^,]+),(?<header_field5>[^,]+)

inputs.conf (optional: skip if already configured)
Location: Splunk server or forwarder with data input settings
Setting Example:

[monitor:///opt/test/data/test2.csv]
sourcetype=testcsv

*After changing the props.conf and inputs.conf settings, you need to restart the Splunk service.
*After changing the settings in fields.conf, access the URL below and press the [refresh] button to apply the settings.

http://<Splunk server IP address>:8000/debug/refresh

Supplementary information

Splunk identifies files to ingest by the first 256 bytes.

If there is the same data in the first 256 bytes such as the header part of the CSV file, even if the contents of the file are different, the first 256 bytes will be regarded as the same file as the existing file and will not be imported.

Even if the same data (header) exists at the top of the file, it will be recognized as a different file and if a new file is to be imported, set the following in the inputs.conf file.

<Setting example>

[monitor:///tmp]
crcSalt = <SOURCE>

*Please change the path information (/tmp) for [monitor:///tmp] depending on your environment.

that's all