Splunk

Splunk

NTT Smart Connect Co., Ltd.

NTT Smart Connect to strengthen the streaming business. Splunk centralizes the collection and analysis of a wide variety of logs and reduces the time to identify the cause of failure to 1/10

Before
  • Maintaining the quality of streaming services and speedy recovery in the event of a failure are challenges
  • Depends only on skilled engineers for log analysis work for troubleshooting
  • Analysis know-how is lost along with personnel transfers as log analysis skills become dependent on individual skills
Arrow: Horizontal
Arrow: vertical
After
  • Maintaining the quality of streaming services by centrally analyzing various logs
  • Facilitates multifaceted log analysis work and reduces the time to identify the cause of failure to 1/10
  • Improve work efficiency by registering analysis know-how of engineers as a template
Mr. Yoshitaka Inoue

NTT Smart Connect Corporation
Service Operation Department
Chief manager Mr. Yoshitaka Inoue

Mr. Naoto Tsuboi

NTT Smart Connect Corporation
Service Operation Department
Mr. Naoto Tsuboi

Demands for high streaming quality make quick recovery in the event of a failure an important task

NTT Smart Connect Co., Ltd. (hereafter, NTT Smart Connect) operates a "housing business" that provides services at one of the largest data centers in western Japan, a "cloud business" that provides comprehensive cloud services covering from beginners to professional users, and a data center. Supporting the strategic IT business of leading companies with the three pillars of the "streaming business" that provides highly reliable streaming services based on

In particular, the streaming business has been involved in numerous video distribution services such as high school baseball live distribution since the dawn of the Internet, and it can be said that the industry's leading reliability that has accumulated experience and achievements over many years is one of its strengths. The main component is the multi-device on-demand distribution service "SmartSTREAM" that delivers high-quality streaming in real time.

Especially in recent years, due to the penetration of broadband including wireless and the rapid spread of smart devices, there has been an explosive increase in video content for the web. Demand for quality in streaming services is expected to be at the same level as broadcasting, and it is becoming more complex year by year.

However, streaming distribution using the Internet line can cause problems such as slow playback, distorted image quality, or interrupted video due to various internal and external factors. For this reason, maintaining service quality and speedy recovery in the unlikely event of delays or failures have become important tasks for various SmartSTREAM services.

“Investigating the cause and identifying the location of the failure are done by collecting and analyzing the logs of the streaming application, but it usually takes a long time and is a difficult task. In the past, we had no choice but to rely on experienced and skilled engineers,” says Yoshitaka Inoue, Manager of NTT SmartConnect Service Operation Department.

The number of servers installed for SmartSTREAM is on the scale of 100, and there are many networks and applications associated with them, so it is difficult to accurately determine under what circumstances the traffic dropped. For this reason, based on a huge amount of logs such as server load status and network traffic status, we analyzed each one with UNIX commands, created our own scripts to format the logs, and so on. It is said that the focus was on personal handling by

“However, if the skills remain dependent on the individual, the workload cannot be distributed, and the analysis know-how will be lost as the personnel are transferred. I wanted to introduce a system that could integrate and analyze.” (Mr. Inoue)

Selected Splunk for its log collection/analysis functions and ease of use analysis response speed

Mr. Naoto Tsuboi, NTT SmartConnect Service Operation Department, explains the system selection criteria as follows. "We have a variety of functions that can centrally collect log data from network equipment such as load balancers and switches to servers and applications, and analyze them in a skein, as well as a dashboard that leads to maintenance and improvement of service operation quality. We focused on ease of use.”

Assuming that the system will scale out and the amount of logs will increase in the future, a high-performance system with fast analysis response was also a necessary requirement.

Since around June 2014, we have been comparing and considering multiple integrated log management systems, and the one that met all of the above conditions was Splunk Enterprise (hereinafter referred to as Splunk). Macnica was selected as the vendor to provide the service. Mr. Tsuboi goes on to explain the reason for this.

"Macnica has a long history of selling and supporting Splunk products and has a rich track record of implementing them, and as the only certified training partner in Japan, our activities are guaranteed, so we continue to provide training courses for operation managers even after implementing Splunk. I decided that by taking this course, I would be able to acquire more in-depth knowledge necessary for construction and operation."

Inoue also pointed out that what happens after Splunk is important, and that it was necessary to improve operation skills. "Before the implementation, Macnica not only demonstrated the analysis that could be done with Splunk using actual data, but we also appreciated how they put effort into consultations to ensure that Splunk was properly operated."

  • Selected Splunk for its log collection/analysis functions and ease of use analysis response speed

Improve work efficiency by converting know-how into templates to eliminate the dependence on individual skills in registration analysis

The company officially decided to adopt Splunk in September 2014. It was introduced in October and started operation in the production environment.

今後本格的に効果を検証していくという井上氏は、Splunk運用後の変化を確かに感じているという。「機器ごとの大量ログをSplunkに統合し、システムの状態をダッシュボードから即座に取得できるようになったので、ログの分析が大変楽になりました。分析結果も自在に集計が可能で、検索機能も強力です。関連する複数のサーバのログを串刺しで一元的に検索し、分析した結果、原因特定までの時間は従来の10分の1程度にまで短縮している感覚です」

For NTT Smart Connect, which has a heterogeneous system configuration, Splunk, which supports various logs and has excellent functional coverage, has the potential to connect huge amounts of data to improve service quality. Mr. Inoue evaluates.

On the other hand, Mr. Tsuboi said, ``In the future, the analysis know-how that a skilled engineer had in a state of being individualized can be converted into a template in Splunk and registered in large quantities, so work efficiency has improved dramatically. I will.” If the template alone is not enough, you can also use the GUI to perform detailed analysis on your own. Mr. Inoue points out that it is easier to see things from a different point of view because it is not only used by engineers, but can be analyzed by sales representatives who are close to the field without any special skills.

In the future, in order to enable troubleshooting and traffic reporting in the sales department, we will create a dashboard that allows real-time information sharing of service usage status by hour, and sales staff will use it as a sales tool report. We are also planning to use it.

“I feel that Splunk is a rare tool with high expressiveness and flexibility, the more you think about it, the more things you can catch up with.” (Mr. Tsuboi)

Clarifying the target of analysis is the key to expanding the scope of Splunk's use

Now that we have a tentative prospect of a prompt response after a failure occurs, NTT Smart Connect will be accumulating incident cases in the future, which will be useful for trend analysis and expanding to the detection of signs of failures and bottlenecks. I'm putting Mr. Inoue says, "Before trouble occurs, it will be possible to predict it if we can detect that logs similar to past anomalies (exceptional events) have been picked up."

In addition, by utilizing the "Splunk App for Stream" that captures streaming packets in real time, we would like to use it for service level monitoring, such as tracking transaction response times and collecting network performance data.

Looking back on this project, Mr. Inoue is very satisfied with both the potential of Splunk as a data analysis platform and Macnica 's technical capabilities to support it. "Not everything can be visualized, so it's important for users to clearly think about what they want to analyze. If they do that, I think the range of applications will expand infinitely."

Macnica is also determined to fully support NTT Smart Connect's streaming business, taking pride in being Japan's leading Splunk partner.

User Profile

NTT Smart Connect Corporation
URLs

http://www.nttsmc.com/

Founded in 2000 as a strategic antenna company for the NTT West Group. We operate one of the largest data centers in western Japan, equipped with robust facilities and a high-speed backbone directly connected to multiple IX bases in the gigabit class, and by directly peering with many major ISPs, we are able to provide services such as housing, hosting, streaming, and cloud services. We provide a wide range of complex services centered on the Internet platform with the unique support capabilities of the NTT Group.

Inquiry/Document request

In charge of Macnica Splunk Co., Ltd.

Mon-Fri 8:45-17:30