Splunk Inc. is an American public multinational corporation based in San Francisco, California, that produces software for searching, monitoring, and analyzing machine-generated big data, via a Web-style interface.
Splunk (the product) captures, indexes, and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards, and visualizations.
Splunk’s mission is to make machine data accessible across an organization by identifying data patterns, providing metrics, diagnosing problems, and providing intelligence for business operations. Splunk is a horizontal technology used for application management, security and compliance, as well as business and web analytics.
The scope defines how you want to implement Splunk software and solutions in your data infrastructure.
- Splunk as a solution = Use Splunk software to address use cases for a single team, group or purpose.
- Splunk as a service = Use Splunk software to provide Splunk-related services for multiple teams, groups, and purposes.
- Splunk as a strategy = Use Splunk software to provide mature services that position Splunk as a competitive differentiator for your business.
- Log into https://www.splunk.com/ and click on “Free Splunk Tab”. Fill the details as shown below on click on the “Software Download” option.
- Install the software using the default installer options. This will install the software trial version (valid up to 60 days from date of download).
We need to configure forwarders (the ones who forward logs to central Splunk installation system) and receivers (the ones who process the forwarded logs installed in a cloud or a local system which has the bandwidth)
Step 1: After logging into Splunk click on the settings button and then click on “Forwarding and receiving” to setup forwarders and receivers.
Step 2: Next, we will configure “receiving” on the machine where we have installed the Splunk Enterprise Full Installation Folder.
Step 3: The default port number is “9997” for receiving port and we will use that.
- The forwarder must be installed on the machine where the logs are getting generated. The path to install is https://www.splunk.com/en_us/download/universal-forwarder.html
Follow the same process used to download and install the Splunk Enterprise installation. The only difference is it is installed on a machine where logs are generated.
For example: – If machine A generates logs and machine B receives it, then machine B will have complete Splunk Enterprise installation and machine A will have only Splunk Universal Forwarder Installation or any other type of forwarder depending on the requirement.
We can check the events, logs etc as per our requirement. For a new process to be monitored ideally at the beginning we check all the options.
We will also configure the hostname for Deployment Server. (If the forwarder is located on a different system or machine)
Hostname = Server address which receives and port number by default is “8089”. Click on Next
We need to then tag the receiving Indexer which again has the same address as the receiver (where the Splunk Enterprise Edition is installed). Only the difference is the port number which is “9997”. Rest of the installation is default settings.
Note:- We have installed Universal Forwarder for this document.
Step 4: We navigate to “Search & Reporting” tab and click on Data Summary.
We should be able to see that the host is available and the last update would be the latest.
Extraction of Raw Data
Step 1: Click on Settings tab-> DataInputs
Step 2: Click on “Add New” for “files & Directories” option.
Enter the folder address with the path shown below
%userprofile%\AppData\Local\UiPath\Logs -> enter this run prompt to get absolute complete folder path.
For example: – the above generic folder path will give you local folder path C:\Users\VinayYadatiNagaraj\AppData\Local\UiPath\Logs
Click on Next and click on Review. There we can create our own source type name. This will help us while we are searching for the logs that we need in the search window.
and then click on submit and click on Done.
Click on “Start Searching” once file/folder input has been given.
Step 3: Then in the search window type the below command. The one with the source type name we just created. Give time filter as “All time” in Others section and have it in Verbose Mode.
Step 4: By default, the search results will have only default fields like “_time”,”host”,” source”, “sourcetype”(the custom field we created while importing data). If we want to extract machine name/robot name or any other field which interests us we can extract them by clicking on extract new fields.
For example, we have chosen the machine name as the field that interests us in the first step of extraction.
Step 5: Then select a sample in the sequence of logs generated for this field by clicking on the log where we see the field we are looking for, in this case, machine name.
When selected it will be highlighted in dark blue. The highlighted words inside are the field names which were previously extracted before this document was prepared. Ideally, when this step is processed for the first time there won’t be any word highlighted.
Step 6: We will then click on Next and select “Regular Expression” and again click on Next to move to the “Select Fields” section.
Step 7: We will then select the word of interest in our sample log. We will select machine name in this example which will prompt us for the field name under which we will store all machine names for different logs which are of same structure satisfied as per regular expression.
We will then click on Add extraction and click on Next.
Step 8: We can validate if we have to in the next section or we can just go ahead and save.
Step 9: Then we come back to the home search page and search for the 2 fields which we extracted,
1) source type = location where our logs are and
2) host (machine name) = name of machine present in our logs.
Note: – sometimes the Splunk generated extraction might not be correct. So, we might have to use trial and error method to change the regular expression.
For example, for process name field we had to customize the regular expression as seen below.
processName = “processName”:”(?P<processName>\w+(\W\w+)*)
The regular expression is on the right-hand side and field name on left hand side. We also need to write a python code (?P<processName> , where ?P is python indicator and <processName> is the name of the field.
Similarly, we can extract other fields like robotName, ProcessName, FileName etc. We can also add few customized logs in UiPath which will be added here to the original logs.
Creation of Dashboards
Step 1: Go to the search tab and write down a basic SPL (Splunk processing language) to get a basic statistics report as seen below. A table command is used after “|” to get a table format.
sourcetype=UiPathLogs message=Starting OR Ending StateName=* robotName=* processName=* fileName=* |table _time processName fileName message
Initially, have the mode in Verbose mode but after the fields have been defined it can be changed to “fast mode” for faster retrieval.
The string before “|” will fetch the raw data from our logs, whatever we are looking for to process and the string after has the field names we want in our process. (_time) is a pre-defined Splunk field.
Step 2: We then click on “Save As” tab to create a dashboard panel as seen below.
We can save the dashboard under a customized name best representing our project.
Step 3: In case of frequent usage of the created dashboards we can have it in the home screen for easy fetching of data when we log in to Splunk. We need to click on the Dashboards tab and click on the edit field corresponding to the newly created dashboard and select “Set as Home Dashboard”.
Step 4: Creation of graphs and dashboards can also be fast tracked by navigating to https://splunkbase.splunk.com/app/1603/ and downloading Splunk Sample Dashboard examples. Follow the steps and install this app and we will find the installed app in a home screen called “Splunk Dashboard Examples”.
Splunk High-Level Architecture
Splunk Multi-Tier Architecture
Splunk Forwarder is the component which you have to use for collecting the logs. Suppose, you want to collect logs from a remote machine, then you can accomplish that by using Splunk’s remote forwarders which are independent of the main Splunk instance. There are different types of Splunk Forwarders as seen below.
A Universal Forwarder is a simple component which performs minimal processing on the incoming data streams before forwarding them to an indexer. It forwards all the data from the input logs folder for processing.
Heavy Forwarder typically does parsing and indexing at the source and intelligently routes the data to the Indexer saving on bandwidth and storage space. So, when a heavy forwarder parses the data, the indexer only needs to handle the indexing segment.
An indexer is the Splunk component which you will have to use for indexing and storing the data coming from the forwarder. Splunk instance transforms the incoming data into events and stores it in indexes for performing search operations efficiently. If you are receiving the data from a Universal forwarder, then the indexer will first parse the data and then index it. Parsing of data is done to eliminate the unwanted data. But, if you are receiving the data from a Heavy forwarder, the indexer will only index the data.
As the Splunk instance indexes your data, it creates a number of files. These files contain one of the below:
- Raw data in compressed form.
- Indexes that point to raw data (index files, also referred to as tsidx files), plus some metadata files.
Splunk Search Head
Search head is the component used for interacting with Splunk. It provides a graphical user interface to users for performing various operations. It can be installed on separate servers or with other Splunk components on the same server. There is no separate installation file for search head but only to enable Splunk web service on the Splunk server to enable it.
A Splunk instance can function both as a search head and a search peer. A search head that performs only searching, and not indexing is referred to as a dedicated search head. Whereas, a search peer performs indexing and responds to search requests from other search heads.
In a Splunk instance, a search head can send search requests to a group of indexers, or search peers, which perform the actual searches on their indexes. The search head then merges the results and sends them back to the user. This is a faster technique to search for data called distributed searching.
Hope it has helped you……..