just ended, in fact, the most important thing is not the shop tally, nor is the netizen staring at the big spike in goods ready to spike, but the operation and maintenance of online shopping behind the scenes, they are most worried about: what network interruption, the application Caton, response Slow, server downtime ... Double eleven As the top priority for the e-commerce IT department, before the big promotion, the operation and maintenance personnel need to make many sets of preparatory schemes well in advance. They are always nervous and undergo hundreds of simulation exercises. It's unclear how many sleepless nights they have at the back end. The seemingly simple double 11 involves the collaboration and testing of the entire commercial infrastructure including payment, architecture, database, network, operation and maintenance, power, customer service, logistics and so on. Double Eleven to promote these years, operation and maintenance field which crossed the pit? Intelligent operation and maintenance debut today, how should the layout of enterprises? With these questions, Info interviewed Lin Jie, chief operation and maintenance expert of Kangaroo Cloud. He had previously supported BU business operation and maintenance such as Taobao, Lynx, shared services, wireless mobile phone business, Opinion Double eleven to promote the operation and maintenance of these years crossed the pit
Lin Jie recalls: Lynx double eleven big promotion first started in 2009, when Taobao Mall or one day only tens of millions of GMV, and there is no zero people crazy concept. Before the big promotion engineers basically will judge according to their own experience, such as the server's current load, the application of the current RT and QPS, to determine how much each server can support the maximum capacity, and then a few people to discuss the decision after the decision board, How many servers each core application should be added, in the end how much to increase the server, in fact, everyone's bottom of my heart, I really do not worry temporary application for expansion. In short, this phase of the business is small, can cope with the past. In the next few years, with the promotion of the Lynx brand, the explosion of the 11 Big Promises year after year, the original mode of operation and maintenance can no longer be applied. Rapid business development, the number of back-end applications also increased significantly, the call between the various application systems intricate links. How much resources should be prepared before expansion? You can not shoot brain heat, because you apply too much resources may be rejected, apply for less you have to assume greater risk. This time we are using online pressure measurement approach to solve, for example, 1 server can be extracted directly in the production environment, through the analog playback or directly into the multi-flow pressure measurement, according to the pressure measurement results to calculate the maximum single server Carrying capacity, and then use numbers to speak, to apply for expansion. There is even if the capacity planning to do a bit, but when the peak may still exceed the expected zero, the system will still squeeze burst. Therefore, the introduction of the current limit and downgrade, the current limit is to set a maximum threshold for each application, beyond the threshold immediately rejected the new request, this benefit is to protect the application, to avoid avalanche. There is a downgrade, due to the application of too much, during the promotion period, you can turn off some non-core functions, to ensure that the trading process to maximize the capacity. The pressure measurement at that stage is not completely accurate. The main problem is the limitation of pressure measurement. It is only a single measurement of an application, but there is a dependency between applications. In particular, some shared service centers basically All applications are dependent on the call, then how to do? A few years later developed a new pressure measurement tools, the whole link pressure measurement. This is a new idea for capacity planning. It directly generates large quantities of traffic through analog copy in the production environment. Each link is measured and matched with the corresponding monitoring system to find out where the bottleneck is. And quickly optimized. And the process is done automatically. Visible, automated operation and maintenance is the trend. Zero berserk behind the strategist Now that the 11 major promotion activities of e-commerce operators still carry on the zero-beware mode, it is the core guarantee task for application-system security to successfully carry the first 15 minutes or even the first few minutes. Lin Jie made the following suggestions: a. Capacity planning. As far as possible in the production environment to do the pressure test, only experienced pressure measurement, my heart will end. b. Critical applications to support current limiting. Zero crazy traffic is likely to exceed expectations, only set the current limit to protect their own applications, or an avalanche chain reaction. c. Downgrade non-core functions. Each time a pair of eleven will invest a lot of resources, the basic application will be tilted to the core, then the degradation of non-core functions to some extent acceptable. d. Emergency plan. Prepare for possible abnormal conditions.
Double eleven big promote is the most typical flexible scene Flexibility is the biggest advantage of cloud computing, and big promotion is the most typical flexible scenario. With the popularity of cloud computing, especially public cloud, the current operation and maintenance personnel basically do not need to pay attention to the underlying facilities such as the engine room, network and operating system. After continuous exercise, today's e-commerce platform has already adopted a flexible and scalable cloud computing platform, with distributed data, efficient CDN distribution to achieve load balancing, to avoid the collapse of the high concurrent state in the middle of the 11th. Operation and maintenance personnel will be more energy transferred to the rapid on-line, rapid iteration, to support business development. Large activities with the daily flow of traffic is completely out of order, can fully utilize the on-demand use of cloud resources to meet the expansion needs, but also a huge cost savings. In addition to expansion, of course, need to prepare contingency plans. Sort out the possible abnormal situation that day, preview in advance. Last year, Lynx double eleven opening just ten minutes, the world pay records were refreshed again. Alipay data show that at 0:39:12, Alipay peak payment reached 120,000 pen / second, 1.4 times the previous year, set a record peak last year. In terms of the choice of payment methods, flowers and Yuen Po have become very popular with users of payment methods, accounting for as high as 29% and 18% respectively. Stand up to huge transactions, play with the speed of light spike, the technical system resisted, the liquidity of a variety of stability and yield ... ... only withstand the ultimate test of double eleven can be considered a real artifact! Celestica Fund log data analysis based on the efficient operation and maintenance For Celestica Funds, how to ensure that Yuen Po Po's liquidity and return on a smooth 11 is a major challenge. Online systems most common problem location, is the log analysis. Next, we take Yu-Po as an example, focusing on how Celestica Fund breakthrough in the field of log data analysis? Prior to this, Celestica Fund has been using the open source ELK log program, R & D and operation and maintenance staff through the ELK log data processing, the use of log files query search. With the deepening of application scenarios and the increasing demand of internal staff, Celestica hopes to solve the new problems related to operation and maintenance through log analysis. In this regard, Celestial Fund chose to cooperate with Kangaroo Cloud. Specifically include the following aspects:
Wednesday, 27 December 2017
Home »
» Yu balance Po 11.11: Log data analysis and efficient operation and maintenance






0 comments:
Post a Comment