Wednesday, 27 December 2017

First, the data desensitization Celestica Fund has a large number of individual user information

, the log file will retain four elements of personal and bank card information, these data are personal privacy, the original ELK program can not shield the sensitive data, can not fundamentally solve the problem. In the past, when a developer needs to view the log, he or she must follow an operation and maintenance staff next to the operation and maintenance personnel to view the log. Only in a simple process such as checking logs, we need to waste more time on an operation and maintenance staff, which not only lowers the coordination efficiency but also can not free up the supervision of the operation and maintenance personnel. Kangaroo cloud log data desensitization function, you can solve this problem through a simple set. The security administrator selects the fields in the log file that need to be desensitized and conv
erts them in an expression matching manner. The system automatically filters and converts the information into desensitized information. At the same time, with the permission control function, the user who does not have the right to view the log source automatically Block sensitive data information. It is a common requirement for financial customers to desensitize sensitive data in their logs. Information such as bank cards, ID cards, cell phone numbers, etc. that identify the user's identity is desensitized. In addition to supporting the desensitization of these general data, the Kangaroo Cloud Log also supports custom desensitization rules. By customizing desensitization rules, you can incrementally add any desensitization rules that users want. Second, collecting resources management and control Celestica funds all online business server resources, must ensure that non-stop 24 hours a day to provide services, and business and applications to ensure high availability. No external program or third-party applications can affect the stable operation of the production environment, all deployed on the server program, are not invasive on the application system. At the same time, the acquisition program deployed on the server undergoes rigorous pressure and performance testing to ensure that the acquisition process will not have any impact on the business system. At the beginning of the product design, Kangaroo Cloud Log began to consider how to minimize the impact of the log collection client on the server.  The first layer: resource constraints For example, CPU usage can not exceed 5%, memory usage can not exceed 100M, and bandwidth usage can not exceed 500KB / s. The threshold can be freely customized through the web page. As soon as the resource limit is enabled, the Agent will run within the threshold allowed. If there is a sudden increase in the number of loggers, the Agent will automatically suppress resources. The second floor: Agent self-esteem When a very special situation occurs, which results in the failure of resource limitation and the resources occupied by Agent exceeding the set threshold, the Agent of Kangaroo Cloud Logger will terminate the process through self-service mechanism and fully guarantee the security of the business system. After the system is stable, restart and restore the Agent, you can re-collect the previously missed logs to ensure that the log data is not lost. Third, call the link analysis Celestica Fund's business system uses a distributed architecture design, and the introduction of the ants financial cloud Sofa framework for development, the Sofa framework can be configured to log file generation, each system generates a large number of call link log. These logs are not worth the use of the original, but through log analysis can be found, log-based distributed call tracking system, the key is the call chain, for each request to generate a globally unique ID (Traceld), through which different systems " Isolated "call information associated together, restore more valuable information. How to use these logs to help users analyze Cloud log to be solved, after a period of research on the Sofa log files, Kangaroo log successfully parse the call link which, in a visual way for the user to render the various centers Between the call relationship, as well as the number of unsuccessful calls to the interface, call the key information such as time-consuming. Calling the link specific application scenarios include the following aspects:  A. Positioning anomaly statistical time-consuming By calling the link to find the TraceID in the error message of the service exception log, you can see the specific situation in the call chain in the system, locate the problem more intuitively in the call chain, and determine the problem after each layer troubleshooting.
 B. Call the drill-down report For distributed call tracking system, not only provides the function of the call chain, and can monitor all the middleware of the specific circumstances. Therefore, in the process of forming a call chain will form a detailed call monitoring report, and other monitoring is different: The monitoring report is with the drill-down function. Because the call chain can form a variety of dimensions of the report, not only can see the service situation, you can also view the call service situation, grasp the clear call chain information.  C. Full Link Analysis The difference between the full link and the call chain is: the whole link is a concept of application of the whole, and the call chain is the process of single call. The value of analyzing the whole link is mainly reflected in the following points: Link topology morphology analysis: Through the application of the topological relations between the call analysis of the source and destination of the call process to identify the source of unreasonable calls; Depends on carding and capacity estimation: identify problems such as easy fault point / performance bottleneck and interface error rate; evaluate the capacity according to the link call ratio and peak QPS; R & D and management personnel can quickly locate the fault or problem node through the above view, and through the node to view detailed interface call analysis and statistics, the user can easily find out the problem. The biggest advantage of full link analysis and tracking is that the relationship between all distributed applications is transparent. Each transaction or order request can be traceable on the basis of log analysis, and can be effectively reduced without manual inspection O & M and R & D personnel troubleshooting time costs. Intelligent operation and maintenance to use data and algorithms to achieve The development stage of operation and maintenance has gone through the intellectualization from standardization, instrumentation and automation to the present moment. The development of each stage represents a substantial increase in productivity and efficiency. The whole trend is inevitable. The operation and maintenance of the intelligent era is not to make the operation and maintenance personnel unemployed. Instead, it has great demands on the improvement of the operation and maintenance efficiency. For example, how to quickly locate the problem in the complicated environment, and even cause the failure prediction to prevent the failure, Guarantee application stability. Lin Jie believes that: Intelligent operation and maintenance to take advantage of data (operation and maintenance data) and algorithms can be achieved. First of all, the development of O & M capabilities does not jump directly to the stage of intelligent operation and maintenance. It must go through the process of standardization and instrumentation to the development of automation. Only highly sophisticated automation can provide basic capabilities. The second is the accumulation of data, the need for a large number of operational data, log data, network capture packet data, database data and so on. There are daily operation and maintenance of the data generated annotations, such as a fault, the operation and maintenance personnel will record the process, the process will be fed back to the system, in turn, enhance the level of operation and maintenance. The last is the algorithm, in the end what kind of algorithm model to do continuous optimization. In the operation and maintenance department, Celestica hopes to monitor the usage of basic resources of the application system by collecting and analyzing server performance logs.

0 comments:

Post a Comment