Friday, 29 December 2017

Who do you choose to make friends with?

Eight, choose The most things we do everyday are actually choices, so we have to talk about this topic when talking about our career. I always think that to a large extent we will actually be a kind of person. We have the right to decide on ourselves. Everyday, we are making various choices. I can not write this article or go to other people. Post patting bricks, you can also write these words to help others at the same time also organize their own ideas, I can pay more attention to the format for others to read, you can also write a bunch, 
I can send it up, you can You can choose to go to the interview without shaving, or you can choose to look in the mirror before going out ... Every day, every moment, we are all making such decisions, and we can be careless and spend more time Tens of thousands of small choices add up and decide what kind of person we are in the end. In a sense, our future is not given by others, it is our own choice, many people will say that I am bitter ah, no choice Ah, if you think "go to Microsoft or go to ulate, such as farmers can choose to often irrigate their own fields, you can also choose to irrigate heaven, it is true that today you watering the seedlings not necessarily grow out today, but often pour Water, most of the seedlings will eventually grow out,
if you do not pour, the harvest must be very bad. Everyday life is giving you a chance. He will not give you a pile of cash and will not give you a good job, but in fact, he is still giving you a chance. My family is an ordinary family, without any remarkable social relations. My father was assigned to the frontier after graduating from college. There was only one street in this small county town, and their generation actually had more reason to complain than we did. What did they Did not get that young age that a _ year * generation, the book did not have to read, support frontier cut team settled, until the old, but to give young people a chance. He has every reason to sit in grievances like tens of thousands of youth. Ten years after being allocated to the frontier, however, the country resumed its recruitment of graduate students and returned to the original school. Graduate, he was assigned to a small unit in Anhui, but also 3 years later, the country recruited doctoral students for the first time, he also returned to the original school, became China's first generation of Dr., when he than the current I am older. Life did not give up, he did not give up life. Ten years of waiting, he made his own choice, he did not give up, he did not break the jar broke, 
so the time comes, he changed his life. What kind of person you will eventually become, it is decided between your every little choice. What do you choose to believe?  What do you choose to do? What do you choose to do? ... We are faced with too many choices. Among these choices, the choice of ideology is far more important than the choice of objective conditions. For example, it is not so important to choose which product to make, and it is important to choose what to do. It does not matter which one to choose, but how to choose these talents is important. Most of the time, it does not matter which objective conditions to choose from. Most of the choices about the objective conditions are not right or wrong. What matters is choosing what to do. A college graduate, he is going to Microsoft Ye Hao, he wants to sell pork Ye Hao, he wants to start a business Ye Hao, he has to do the game practice Ye Hao, as long as no offense, no harm, nothing to do, what matters is , Choose the future, how to get things done. In addition to these, you can also choose the time and environment, for example, you can choose to put the greatest difficulties in this life when the most energetic and energetic time, you can take a step by step, wait until the age of 40 to say, but to more than 40 Year-old, that is the most vulnerable time in life, there are old and young, if at that time run into a career crisis, it is really a very distressed thing. Instead of being so bitter at the mid-twenties and thirties, so that when you're weak, you're more comfortable. You can choose to grow in the greenhouse, you can choose to field sharpening,
you can choose to work in the office air-conditioning, you can also choose 40 degrees heat, to see your customers, but, all this will eventually accumulate and guide You come to the future you deserve. I dare not say that you have all the things you have a choice, but the vast majority of things you have a choice, but often you do not think of it as an option. Seriously every choice, there will be a better future. "When the sales vice president or director," this choice is the case, you have no choice, most people have no choice. But every day you can choose whether to be more considerate about your customer service, be more patient about your colleagues, be more meticulous in your work, know more clearly about your situation, and clarify some unclear issues ... ...

You can also choose whether to persist in pain, whether to abandon your own negative thoughts, whether to forgive a person's mistakes, whether I believe these words written here, do not make the same mistake ... life Every day to give you the choice of opportunities every day to give you the chance to change your life, you can choose to rely on the ground spilled roll, you can also choose to stand up teeth. You always have a choice. Some choices are not immediate, you need to ulate, such as farmers can choose to often irrigate their own fields, you can also choose to irrigate heaven, it is true that today you watering the seedlings not necessarily grow out today, but often pour Water, most of the seedlings will eventually grow out, if you do not pour, the harvest must be very bad. Everyday life is giving you a chance. He will not give you a pile of cash and will not give you a good job, but in fact, he is still giving you a chance.
My family is an ordinary family, without any remarkable social relations. My father was assigned to the frontier after graduating from college. There was only one street in this small county town, and their generation actually had more reason to complain than we did. What did they Did not get that young age that a _ year * generation, the book did not have to read, support frontier cut team settled, until the old, but to give young people a chance. He has every reason to sit in grievances like tens of thousands of youth. Ten years after being allocated to the frontier, however, the country resumed its recruitment of graduate students and returned to the original school. Graduate, he was assigned to a small unit in Anhui, but also 3 years later, the country recruited doctoral students for the first time, he also returned to the original school, became China's first generation of Dr., when he than the current I am older. Life did not give up, he did not give up life. Ten years of waiting, he made his own choice, he did not give up, he did not break the jar broke, so the time comes, he changed his life. What kind of person you will eventually become, it is decided between your every little choice. What do you choose to believe? Who do you choose to make friends with? What do you choose to do? What do you choose to do? ... We are faced with too many choices. Among these choices, the choice of ideology is far more important than the choice of objective conditions. For example, it is not so important to choose which product to make, and it is important to choose what to do. It does not matter which one to choose, but how to choose these talents is important.
Most of the time, it does not matter which objective conditions to choose from. Most of the choices about the objective conditions are not right or wrong. What matters is choosing what to do. A college graduate, he is going to Microsoft Ye Hao, he wants to sell pork Ye Hao, he wants to start a business Ye Hao, he has to do the game practice Ye Hao, as long as no offense, no harm, nothing to do, what matters is , Choose the future, how to get things done. In addition to these, you can also choose the time and environment, for example, you can choose to put the greatest difficulties in this life when the most energetic and energetic time, you can take a step by step, wait until the age of 40 to say, but to more than 40 Year-old, that is the most vulnerable time in life, there are old and young, if at that time run into a career crisis, it is really a very distressed thing. Instead of being so bitter at the mid-twenties and thirties, so that when you're weak, you're more comfortable. You can choose to grow in the greenhouse, you can choose to field sharpening, you can choose to work in the office air-conditioning, you can also choose 40 degrees heat, to see your customers, but, all this will eventually accumulate and guide You come to the future you deserve. I dare not say that you have all the things you have a choice, but the vast majority of things you have a choice, but often you do not think of it as an option. Seriously every choice, there will be a better future.

Thursday, 28 December 2017

Planning in life career sixth advantage of work online

Sixth, wait This is the least talkative topic for people who are impetuous and did not want to talk about it because it caused too much debate and I did not intend to argue with others, but given the long-term planning of my career, this is an inevitable Topic, or decided to write, do not like to see please leave it. Not every time you wear a red light will be hit by a car, not every criminal will be caught, not every error will be punished, not every corrupt officials will be shot, not every effort will be you get Not every one of your insistence will be seen, not every point you pay to get a fair return, not every goodwill you can be understood ... ... 

This is the world. Well, the world is not good enough, but do you have the courage to overthrow the world? If not, you have a better solution? There are many times, people need a little patience, a little confidence. Everyone always turn a few unfair things, and usually, peace of mind is the best way. There are many times we need to wait, we need to endure loneliness, waiting for the moment belongs to you. Chow Yun Fat waited, Andy Lau waited, Stephen Chow waited, Faye Wong waited, Zhang Yimou also waited ... saw them today's fame, you can have seen the original waiting for them and patience? Have you ever seen the Golden Horse winner sitting in the street stall? Have you ever seen a group of people from Devon Club talking to an audience in the theater? Have you ever seen Stephen Chow's character not even a line? 
Every successful person has a period of depression and depressing, and I can almost imagine that they are overwhelmed by alcoholism. I can also imagine the distress they are struggling to survive. In their most splendid days, they yearned for success, but they were empty-handed, as you are now. No one pledges that they will succeed in the future, and their choice is resistance and loneliness. If at the time they always talked about "success is only belong to the privileged class", do you think they will be like today? I once did not understand why some people are not capable of sitting in my head than I, older than I must be my leadership Why? 
Why do some bad people earn money without hard work? Why is it so easy for people who have just started the reform and opening up to make money and what should be regularized when it comes to us? One day I suddenly thought that when I was still in school, they struggled in the society. They worked hard in the society and accumulated more than a dozen or twenty years. When we are new, I want some of them, and I am not. To be fair, I'm trying to rob. Because I'm in a hurry because I can not help but loneliness. Twenty-year-old man, no money, no career, but thriving desire. People always encounter setbacks, people always have low ebb, people always have time to understand, when people always have to whispered, these times is precisely the most crucial time in life, because Everyone will encounter setbacks, and most people can not pass the threshold. If you can, you will succeed. At such moments, we need to wait patiently and wait with confidence,

eving that life will not give up on you and opportunities will always come. At least, you are young, you are not jailed, there is no cure for life, there is no outstanding debt. What are you afraid of than your unfortunate people far more than those who are luckier than you? The road to go step by step, although the end of the step is very exciting, but most of the footsteps are ordinary and even boring, but without these steps, or can not endure these ordinary boring, you will eventually be unable to usher in the final excitement People heart. Adversity is God's place to help you knock out your competitors. You know, you do not feel good, others do not feel good, you can not hold on, others are the same, do not tell others that you can not hold on, it will only allow others to insist on the confidence to let competitors smile at you Face, lose confidence, quit the game. Victory belongs to those who are patient. In the most desperate time, I will go to the movie "The Pursuit of Happyness" "JerryMaguire", let myself regain courage, because, no matter what time, we always have hope. When all the people leave, I do not lose hope, 
I do not give up. Sitting in the car every day off work, I like humming "invisible wings" looked out the window, I know, I was waiting silently, waiting for my moment. Originally posted in the words of Iraqi friends I like, copy here: Everyone hopes that he is a unique one With a small spoon born, reborn to a good family, the work arrangements to the Electricity Council to take a 1w monthly salary of such a small probability event, of course,
The best turn yourself The Red Army Long March 25,000, labeled as the right to counter-revolution, corpses sacrificing dignity to fight, it is best left for grandparents and others Naturally, not everyone who has suffered will receive a reward However, at any time, behind every vested interest, there is the shadow of his father's fathers struggling and even blooding their lives. Envy others have a good father, nothing can not The question is, will your next generation have a good dad? As for why can not have the same probability of winning the fac
e? I can only ask: why species and monkeys can not have the same probability of winning the face of competition? Natural Selection. The soul of a monkey is not necessarily humble than you, but there are centuries-old apes that have evolved behind you.

Oracle open source Fn, joined Serverless dispute Oracle released Fn


 Fn is a new open source, cloud platform-independent Serverless platform. It has a wide range of Java capabilities and a J Unit test framework when initially initialized, but also supports "any programming language." Fn contains four major components: Fn Server, Fn FDK, Fn Flow, and Fn Load Balancer. The Fn server, written in Go, is the platform for running code. Developers can use a FDK (Function Development Kit) based on the preferred language to build and test functions that perform business functions. After the function is packaged, it is deployed to the Fn server. Fn Flow provides a tool for timing control and orchestration of workflows so functions can be linked together to enable higher-level business processes. This eliminates the common coupling issues that micro service architectures cause because services need to call each other. Load balance rs are the tools your operations team deploys Fn server clusters and routes traffic to. Like the recently released Spring Cloud Function project, Oracle's Fn provides a cloud platform-neutral framework. Functions are packaged as containers and can be run on any platform that supports Docker. "Container native" is a specific goal of the Fn project development team and making it open source is also their goal. In a blog post, Chad Arimura, vice president of Oracle Software Development, said the Fn team believes open source is the way software is delivered and deployed. Therefore, the Fn project uses the open source Apa
che 2.0 license, and this strategy seems to be working.

Wednesday, 27 December 2017

MariaDB finishes C round 27 million US dollars financing, Alibaba collar

The establishment of MariaDB, based on the open MariaDB is a European company whose MariaDB database was developed and maintained as one of the most popular open source databases. Headquartered in Helsinki, Finland, with offices in Sweden and the United States, there are approximately 12 million global database users. Including booking.com, Hewlett-Packard, Virgin Mobile, Wikipedia and more. MariaDB is a branch of MySQL. After MySQL was acquired by Oracle, MySQL's father, Monty, created MariaDB under the name Maria, whose daughter Mariah was guaranteed to have available a MySQL-compatible branch that was always open source. Originally Ali premeditated investment for a long time Why Ali so generous investment MariaDB it? Obvious: 1. After all, based on the father of MySQL Monty led development, certainly know the MySQL database weaknesses, and then provide better compatibility and scalability, basically can MySQL database recommendations to the MariaDB database, and MariaDB development speed and upgrade Speed is far from priority MariaDB is open source, growing fast and appealing in the open source community; MySQL community two larger branches is Oracle MySQL and MariaDB, not too many other branches to choose from, if it is to O, this investment is inevitable. It is reported that Aliyun as early as 2012 began to contribute to the MariaDB project, the current MariaDB multi-source replication, thread memory monitoring, data flashback and other important features, there Ali cloud contribution. In particular, AliSQL open source, a large number of advanced features in AliSQL are rapidly merged into MariaDB. Will MariaDB replace MySQL? MariaDB open source, free, basically compatible with MySQL, and the new version of development faster than MySQL. Companies such as Apple, Google, Wikipedia, Red Hat and Slackware have also abandoned MySQL and turned to MariaDB or other databases. There is a hot topic among users: Is MySQL really down? Will MariaDB replace MySQL? Some experts have given a rigorous view: When Oracle MySQL still exists, after all, users use Oracle MySQL version is too much, this time, MariaDB is added as MySQL; MariaDB will replace MySQL if the authority changes. From dolphins to seals, will become more powerful? What is the biggest advantage of MariaDB over MySQL? Review MySQL AB early acquisition by Sun, Monty took away its core Server layer staff, has excellent research and development capabilities, including database resource planning, SQL optimization of the efficiency of the layer, in the community to absorb very fast. After the acquisition of MySQL and MySQL update speed and performance optimization is very slow, a lot of problems to be solved have not been upgraded into it. In other words, the most straightforward difference is that MariaDB can quickly query and process data, and consumes relatively less resources than a MySQL database, and is superior to MySQL databases in terms of speed and support for Unicode ordering. The purpose of MariaDB is fully compatible with MySQL, including the API and command line. It is reported that in the storage engine, MariaDB 10.0.9 version of the use of XtraDB (name code Aria) to replace MySQL's InnoDB. Version, MariaDB until version 5.5, all in accordance with the MySQL version. So anyone using MariaDB 5.5 knows all the features of MariaDB from MySQL 5.5. Beginning with version 10.0.
0 released on November 12, 2012, it no longer follows the MySQL version number. MariaDB 10.0.x is based on version 5.5, plus features that were ported from MySQL version 5.6 and new features that have been developed by myself. Compared to the latest MySQL 5.6, MariaDB includes a wealth of features in terms of performance, functionality, management, and NoSQL extensions. Such as microsecond support, thread pool, sub-query optimization, group submission, progress reports. Write in the end The current move from MySQL to MariaDB is relatively easy. Over time, the difficulty of migration will continue to rise. As a domestic developer, is it necessary to start learning MariaDB now and consider migrating to MariaDB?

The key to the success of large-scale transformation of the technical organization of the software engineering

Deming's red beads experiments show that in the upstream part of the delivery of unqualified quality circumstances, downstream no matter what approach can not enhance the final output of material quality. Therefore, the implementation of automated testing in this area is a false starting point for change. What become? The second layer of resistance: lack of consensus on the feasibility of the program The  third layer of resistance: lack of confidence in the program to resolve the consensus  of the problem The fourth layer of resistance: fear of new solutions in turn will have new problems The goal of how an organization can achieve change is its transformation program through decision-making. It should be a combination of a series of strategies based on the organization's current situation and underlying causes. For each organization, it should be customized and targeted. Make change happen! Level 5 Resistance: Lack of a clear path to the blockade implementation This is the most common and the most difficult resistance: finally reached a consensus in the upper, passed the program, go to the team to see the delivery of tasks are piled up, how to learn new tools and new ways to learn new tools? Catch up immediately. Often at this time, a well-seasoned master is needed to quickly resolve the team's current problems and "loosen" them so that confidence and prestige can be quickly established and the team can move to a new trajectory. The sixth floor resistance: lack of follow-up This is also the most common ultimate resistance and the reason customers often complain about "coaching" advisors: We want you to bring in the capabilities of a project manager, and do not let us do anything! Large-scale transformation After the success of the startup and the initial success, the next step is to consider the scale of the organization. There are usually two directions:  Vertical Depth: Copy to a team of the same or similar size, business, or characteristics as the pilot team.  Horizontal promotion: For a method, practice or experience, to promote the entire organization. Because it involves a large number of personnel and capital investment, but also involves the adjustment of organizational structure and role, scale has so far been a restructuring of the industry problems. Experience summary:  practice on the ground floor is to have platform-level tools as a support for technical practices to reduce the complexity of organizational input. For the DevOps transformation, there is a need to come down with a set of DevOps tools platforms and standards. DevOps tools platform implementation key First, the foundations of the foundation: version management. As a source of code, it supports the entire build, deploy, test, and release process. Version of the baseline management is already a very backward approach, compared to git-flow, I am more respected Netflix test release product three constant branch respectively support development, release, trunk. hotfix as a temporary branch of online repair.  Developers usually work on the Test branch, its submission can trigger the automatic deployment of the test environment.
 The Release branch is intended for weekly automated deployments by submitting the code to the Release branch, which triggers automated tests and integration tests, which are initiated by a specific member or deployment team and all are automated. After the deployment is complete, the code is automatically pushed to the Prod branch.  If a feature needs to be released immediately rather than through a weekly fixed release process, developers can submit it to the Prod branch, which automatically triggers a merge of Release branches and triggers an automated deployment process that will be immediately deployed if the auditor passes it   Second, the system architecture governance, and even high-level architecture governance, including the language, the framework of the choice of norms, modular refactoring, and so on. To fully assess the debt and benefits of each system, whether the decision is worth sustained maintenance.  The lack of governance and norms of technical organizations, it is difficult to support all systems in a platform, and with a variety of different technologies, systems may become another huge complex project, which in itself runs counter to the DevOps can give organizations The original intention was to deliver the ability to consistently deliver reliable business versions at low cost and short cycles.  Last advice  I worked as a Product Manager + Architect for Continuous Integration Cloud in over 6,000 IT organizations, split and rewritten Jenkins' most critical task scheduling, and log storage expanded horizontally to build performance.  Six months to deliver the beta version of the development support 800 people, one year released to a stable version, supporting more than 4,000 people tens of thousands of daily build release.  Subsequently, as a consulting project leader, a 2,000-person organization was set up to open source DevOps platform and team within two months to support the development and operation of the pilot team.  DevOps implementation of the difficulty lies not in technology, the biggest obstacle comes from the organization is too large, the management of various tools, the implementation is fragmented in various departments, if the organizational level is not consensus and adjust the centralized decision-making, even if the market behind mature DevOps commercial Products, it is difficult to effectively landing within the organization.

The Release branch is intended for weekly automated deployments

the code to the Release branch, which triggers automated tests and integration tests, which are initiated by a specific member or deployment team and all are automated. After the deployment is complete, the code is automatically pushed to the Prod branch.  If a feature needs to be released immediately rather than through a weekly fixed release process, developers can submit it to the Prod branch, which automatically triggers a merge of Release branches and triggers an automated deployment process that will be immediately deployed if the auditor passes it   Second, the system architecture governance, and even high-level architecture governance, including the language, the framework of the choice of norms, modular refactoring, and so on. To fully assess the debt and benefits of each system, whether the decision is worth sustained maintenance.  The lack of governance and norms of technical organizations, it is difficult to support all systems in a platform, and with a variety of different technologies, systems may become another huge complex project, which in itself runs counter to the DevOps can give organizations The original intention was to deliver the ability to consistently deliver reliable business versions at low cost and short cycles.  Last advice  I worked as a Product Manager + Architect for Continuous Integration Cloud in over 6,000 IT organizations, split and rewritten Jenkins' most critical task scheduling, and log storage expanded horizontally to build performance.  Six months to deliver the beta version of the development support 800 people, one year released to a stable version, supporting more than 4,000 people tens of thousands of daily build release.  Subsequently, as a consulting project leader, a 2,000-person organization was set up to open source DevOps platform and team within two months to support the development and operation of the pilot team.  DevOps implementation of the difficulty lies not in technology, the biggest obstacle comes from the organization is too large, the management of various tools, the implementation is fragmented in various departments, if the organizational level is not consensus and adjust the centralized decision-making, even if the market behind mature DevOps commercial Products, it is difficult to effectively landing within the organization.

Deming's red beads experiments show that in the upstream part of the delivery of unqualified quality circumstances, downstream no matter what approach can not enhance the final output of material quality. Therefore, the implementation of automated testing in this area is a false starting point for change. What become? The second layer of resistance: lack of consensus on the feasibility of the program The  third layer of resistance: lack of confidence in the program to resolve the consensus  of the problem The fourth layer of resistance: fear of new solutions in turn will have new problems The goal of how an organization can achieve change is its transformation program through decision-making. It should be a combination of a series of strategies based on the organization's current situation and underlying causes. For each organization, it should be customized and targeted. Make change happen! Level 5 Resistance: Lack of a clear path to the blockade implementation This is the most common and the most difficult resistance: finally reached a consensus in the upper, passed the program, go to the team to see the delivery of tasks are piled up, how to learn new tools and new ways to learn new tools? Catch up immediately. Often at this time, a well-seasoned master is needed to quickly resolve the team's current problems and "loosen" them so that confidence and prestige can be quickly established and the team can move to a new trajectory. The sixth floor resistance: lack of follow-up This is also the most common ultimate resistance and the reason customers often complain about "coaching" advisors: We want you to bring in the capabilities of a project manager, and do not let us do anything! Large-scale transformation After the success of the startup and the initial success, the next step is to consider the scale of the organization. There are usually two directions:  Vertical Depth: Copy to a team of the same or similar size, business, or characteristics as the pilot team.  Horizontal promotion: For a method, practice or experience, to promote the entire organization. Because it involves a large number of personnel and capital investment, but also involves the adjustment of organizational structure and role, scale has so far been a restructuring of the industry problems. Experience summary: The key to the success of large-scale transformation of the technical organization of the software engineering practice on the ground floor is to have platform-level tools as a support for technical practices to reduce the complexity of organizational input. For the DevOps transformation, there is a need to come down with a set of DevOps tools platforms and standards. DevOps tools platform implementation key First, the foundations of the foundation: version management. As a source of code, it supports the entire build, deploy, test, and release process. Version of the baseline management is already a very backward approach, compared to git-flow, I am more respected Netflix test release product three constant branch respectively support development, release, trunk. hotfix as a temporary branch of online repair.  Developers usually work on the Test branch, its submission can trigger the automatic deployment of the test environment.

Summary of DevOps Transformation Experiences of Large Technological Organizations

 The basic conditions for DevOps Any practice has its birth and application of soil. As with the continuous integration of agile central practices, DevOps is the best practice for cloud platform development and delivery. In other words, it is because of two major changes brought by the cloud platform: 1. Simplifies the operation and maintenance of IT infrastructure, but increases the size of the server under management. 2. The consistency of each environment (network, operating system version, the previous production environment License restrictions, the difficulty of preparing the same environment, you can now use image / snapshot solution), allows developers to use a small amount of wescript to complete the deployment of different environments, Even control the deployment and recovery of resources. Two major changes have led to a reduction in non-cross-cutting areas of expertise in development and operation. This has the potential to unify the responsibilities around the ultimate business goal: the right software. But the most fundamental reason for this is the explosive growth of Internet applications. Instagram, for example, added 10 million users in 10 days and added 60 million users in 5 months. Even in the cloud architecture (Instagram was initially set up on AWS), there is no set of advanced software engineering practices, so many users of photo uploads, compressed storage and access, means that the O & M engineering team must rapidly expand, and traditional hiring And the induction process, it is difficult to have any effect within ten days, in this case, the business can only look disappointed! Large technical organizations A brief sentence: is too much. In the past, when the scale of business was increased, technical organizations of more than a thousand people did not have the overall quality level after scale-up
guaranteed by simplification or governance at several levels like the good example of Instagram above:  Is the infrastructure scalable?  Is the architecture scalable?  Are core competencies shared in the organization?  Avoid manual errors? Of course, lack of a clear business value proposition is one of the reasons why technology organizations are getting larger, like Instagram, which does not become a behemoth when its users grow by 70 million and is still a photo-sharing application, linked to its clear value proposition , But this topic is not discussed in this article. Therefore, in the process of becoming larger in technological organization, the problems usually faced are:  Old technology, a lot of debt  Staff capacity is uneven, the team often laborless but fruitless  Online quality accidents, business satisfaction is low Practice can not drive organizations As mentioned earlier, DevOps is a software engineering practice that for today is one of the most advanced software engineering practices, implying the ability to consistently deliver reliable business versions at low cost and short cycles. Organization is not a project. Practice can change one's cognition of the process of achieving results. However, organization is a link that goes beyond the individual, and practice can not make any change to the organization. For the organization, DevOps implementation is a comprehensive challenge. Let's look at the DevOps coverage given by the wiki: 1. Code - code development and review, source code management tools, code merging 2. Build - continuous integration tools, build status 3. Test - continuous testing tools that provide feedback on business risks 4. Package - artifact repository, application pre-deployment staging 5. Release - change management, release approvals, release automation 6. Configure - infrastructure configuration and management, Infrastructure as Code tools 7. Monitor - applications performance monitoring, end-user experience 8. It requires a consistent set of deployment standards and tools for a consistent deployment environment; it requires teams to be disciplined in their code integration efforts, to write automated test capabilities, and to fix problems early. And, automate everything. 9. This means continuously challenging existing standards, processes, tools, and even security policies in the organization (opening up various deployment environments); it also means continually challenging the team's legacy of project achievements, working methods, and even experiences that once prided itself Curriculum vitae. The first layer of resistance: the lack of consensus on the root causes of the problem Many organizations have fallen by the wayside of automated testing during agile transformations. Lies in the lack of analysis and understanding of fundamental problems and underlying causes (Systemic root cause analysis). Why do automated testing? Is the current deliverable quality, especially the current version of the quality delivered to the testers?

Tencent Gold Member OpenStack Foundation

Why is not Platinum membership? Founded in 2012, the OpenStack Foundation is a not-for-profit organization that sponsors dues and manages OpenStack projects to help promote the development, release, and use of OpenStack. Foundation members have individual members as well as corporate members. Individuals participating in individual members are free, and participating companies are divided into Platinum members, Gold members, Corporate sponsorships according to the company's choice and the amount of sponsorship fee. Several support organizations. The OpenStack Foundation allows for up to 8 Platinum Memberships and 24 Gold Memberships only. At present, the Foundation has eight Platinum members including AT & T, Canonical, Huawei, IBM, Intel, Rackspace, Red Hat and SUSE. The highest number Tencent is currently able to participate in are gold members such as Canonical, Cisco, Dell EMC and Mirantis and many more. Of course, do not rule out the withdrawal of platinum members, if there is that competition is expected next Tencent platinum members. Join OpenStack is inevitable Tencent joins the OpenStack Foundation is expected. Earlier, TStack was based on the infrastructure designed by Tencent and successfully managed more than 6,000 Xen virtual machines. However, the original TStack was not a cloud management platform that supports heterogeneous virtualization. It does not manage and take full advantage of many resources, including heterogeneous virtual machines, thousands of physical servers, and many third-party storage devices. In recent years, OpenStack has grown rapidly, matured and created a vast ecosystem. Dozens of industry leaders from around the world are participating in OpenStack and deploying many projects. OpenStack as a powerful force in cloud computing can be described as the first choice for open source cloud computing platform. As the most efficient solution for software-defined infrastructure, OpenStack has many advantages, such as open source and advanced design. Based on the evaluation and testing results of the internal IT operations team, Tencent introduced OpenStack as the infrastructure for TStack and expects to provide better service.

TStack is designed for large-scale environments. It manages more than 10,000 operating systems, 40% of which are deployed for more than 300 internal IT services including OA authentication, WeChat gateways, RTX, mail systems, video surveillance, internal security, feature management and ERP. These services require 24/7 uptime. TStack also manages various product development and testing services for Tencent such as WeChat, QQ, browser, games and more. It is reported that Tencent private cloud TStack has run 14 OpenStack clusters, a total of 6,000 nodes, supporting about 100 million users. In fact, before joining the OpenStack Foundation, Tencent has become one of the largest OpenStack users. Bowyer Liu, Chief Architect, TStack Cloud, Tencent, said: "Tencent is committed to the cloud computing market and OpenStack is part of our strategy to create a complete hybrid cloud services ecosystem for the global market. Tencent hopes to grow with OpenStack to provide Make valuable contributions and bring prosperity to the OpenStack ecosystem. " Write in the end Can be seen in the enterprise market, OpenStack is undoubtedly the most talked about open source segments, it is even compared to many people in the industry as open source Android, the future may become the mainstream of enterprise IT operating system. Today, Tencent into the OpenStack Foundation members, shows that Chinese enterprises in OpenStack seats and status gradually increased, by the global market attention. 

Bank application paralyzed for 1 hour due to Kubernetes loophole, event details review and analysis

 It is worth noting that we had two major accidents last week, and many users were affected (sorry again). The first incident lasted nearly a week, affecting only our prepaid products, theie, Monzo Alpha and Beta. The second incident lasts 1.5 hours on Friday morning, affecting not only our prepaid products this time, but also our cash account. This article mainly introduces the latter. With the blogs I posted last year (https://monzo.com/blog/2016/09/19/building-a-modern-bank-backend/), you can learn more about our overall back office architecture design, But more important is to understand the next few components in our technology stack to play the role, in order to more in-depth understanding of this article.  Kubernetes is the administrative deployment system for all of our infrastructure. Monzo's backend is hundreds of microservices, already packaged Docker containers. Kubernetes is the manager of these containers, making sure they work properly on our AWS nodes.  etcd is a distributed database that stores information about which services are deployed, where they are running, and what state they are in, provided to Kubernetes. Kubernetes needs a stable connection to etcd to work properly. If etcd stops running, all of our services continue to run, but they will not be able to upgrade or shrink, scaling, etc.  linkrd is a piece of software that we use to manage back-end service communication connections. In our system, where thousands of network connections occur every second, the linkrd plays the role of routing and load for these connection tasks. To know where routing originated, he also relied on where he could receive updates to the Kubernetes service. Timeline Two weeks ago: The platform R & D team made changes to our etcd cluster, upgraded a new version, and expanded the cluster. In the past, this cluster consisted of only three nodes (one for each zone), and this time we upgraded to nine nodes (three nodes per zone). Because etcd's dependencies reach a distributed Quorum, this means that at this setting we can tolerate the loss of the entire zone plus a single node in another zone. This time it is planned to upgrade, and there is no design to arrange any downtime. We can confirm that the cluster is correct, but it is very important that another system error is triggered here. One day ago: One of our team developed a new feature for the current account, deployed a new interface in the production environment, but noticed what he was experiencing. As a precaution, they reduced their service to no running copy, but Kubernetes services still existed. 14:10: Engineers deploying change services need to process the current payment account. This is not uncommon and our engineers often do the following: In order to minimize the risk of change, we deliver these capabilities with a smaller, more granular, and more frequent pace using a repeatable, well-defined process. However, when the service deployment is complete, all requests to it begin to fail. At this point we start the current account of our site to start the payment failed. In the meantime, the prepaid card is not affected because it does not use the failed service. 14:12: We rolled back the published app. This is also the failure of the standard operation of the release process, when the interface is changed, they should ensure the forward compatibility throu
gh the rollback operation. However, in this case even the rollback operation, the error still exists, the payment still can not be successful. 14:16: We immediately announced an internal failure. Team members began to convene to determine the impact of the problem and start debugging. 14:18: The engineer determined that linkerd appeared to be in an unhealthy state and tried to use an internal tool to identify the single node in question and restart them. As mentioned earlier, linkerd is a system we use to manage the communication between back-end services. To know where to send a particular request, you need to get a logical name from the request, such as service.foo, and convert it to an IP address / port. In this case, linkerd did not receive Kubernetes updates on the new pods13 running on the network. Therefore, it tries to route the request to an IP address that no longer corresponds to the running process. 14:26: In our opinion, the best way to do this is to restart all the linkerd instances on the backend, hundreds of them, assuming they all have the same problem. When we run into problems, many engineers are trying to minimize the impact of customers on payments or receiving bank transfers by activating internal processes that are meant to provide backups. This means that most customers will still be able to use their card successfully, despite the constant volatility. 14:37: Replacement linkerd failed to start because Kubelets running on each of our nodes failed to retrieve the appropriate configuration from Kubernetes apiservers. At this point, we suspect Kubernetes or etcd has other issues and restarts the three apiserver processes. When done, replacing the linkerd instance will be able to start successfully. 15:13: All the linkerd pods are restarted, but the service handling thousands of requests per second now does not receive any traffic. At this point, the customer is completely unable to refresh the feed or balance in their Monzo application, and our internal COps ("Customer Operations") tool stops working. Now this problem has been upgraded to a comprehensive platform downtime, there is no service to meet the requirements. As you can imagine, almost all automatic alerts have been triggered. 15:27: We noticed that linkerd is logging a NullPointerException (http: t.cn/Rl086mW) when trying to resolve a service discovery response from Kubernetes apiserver. We found that this is an incompatibility between the Kubernetes and Linkerd versions that we are running, especially there is no resolve service. Because the newer version of linkerd has been tested in our staging environment for a couple of weeks, with incompatible fixes, the platform team's engineers began deploying a new version of linkerd in an attempt to scroll forward. 15:31: After checking for code changes, engineers realized they could prevent parsing errors by removing the Kubernetes service that does not include endpoints (ie, the aforementioned service was reduced to zero copies as a precaution). They remove the violation service and the linker successfully loads the service discovery information. At this point, the platform is back to normal, traffic begins to migrate gracefully between services, and payments begin to resume work. Event is over! source At this point, although we have brought the system back online, we do not yet understand the root cause of the problem. Due to the frequency of deployment and the automatic response to node and application failures, the network is very dynamic at the backend, so it is important to be able to trust our deployment and request routing subsystem. After that, we found a bug in the Kubernetes and etcd clients (https://github.com/kubernetes/kubernetes/issues/47131) that caused the request to time out after the cluster reconfiguration we performed the previous week. Due to these timeouts, linkerd was unable to receive updates from Kubernetes on the network when deploying services. Although well-intentioned,

Operation and maintenance of new books recommended

 | DevOps: Principles, methods and practice Foreword / Preamble In recent years, DevOps development model has had a profound impact on the software industry, a considerable number of software companies began to adopt this new model. Predictions from authorities even think that in the future the world's top 2,000 software companies, more than 80% will move to DevOps mode. In fact, DevOps has grown considerably faster and more widely than anyone expected. The reason why DevOps has such a huge impact, we think it is not accidental. The features inherent in this approach are well suited for use in so-called Internet-age software environments where demand is hard to pinpoint, rapid response changes, fast delivery of value, and high reliability requirements. As a result, as software engineering educators, we have to think about the implications of DevOps for modern software engineering education. On the one hand, our education itself needs to teach the students to combine well-tested and proven management techniques with the specific development techniques available to apply the process of thinking and systematic approach to the development and maintenance of various software systems. In this sense, DevOps is an excellent vehicle for meeting these goals. Therefore, ignoring DevOps will not only miss a good opportunity to achieve the goal of software engineering education, but, worse yet, may widen the gap between school education and industry practice. On the other hand, introducing DevOps into university classrooms also faces many challenges. The first and foremost is the lack of a textbook devoted to DevOps that covers all aspects of DevOps in a comprehensive and systematic manner. In view of this, we compiled such a textbook, trying to make up for this shortcoming. Given that this book is primarily for beginners of DevOps, we did not blindly list DevOps buzzwords and tools for content selection and organization. Instead, we explain as much as possible the rationale behind the DevOps approach. For software engineering techniques and practices that have typical DevOps features, such as microservices architecture evolution, lean management, container technology, and so forth, we're emblazoned with great detail. Therefore, we try to convey the concept that DevOps, as a methodology, can not be simply equivalent to a certain kind of practice or tool, but covers the organic whole of the basic theories, techniques and tools of management. Taken together, this book has the following characteristics: Cover all aspects of DevOps in a comprehensive and systematic way, making it easy for readers to use this book as a foundation for DevOps (though content is not based!) Getting Started Books.  Maintain an objective, neutral and cautious attitude. Although we respect DevOps, we are not blind at all. In terms of material organization and content presentation, we introduce DevOps as a solution to practical problems. At the same time, we also express our views clearly-DevOps does not resist other methodologies.  Some of the knowledge points and the corresponding cases come directly from the work experience of front-line industry experts, can enhance the reader's sense of substitution, but also help readers to better understand DevOps.  The specific division of labor of this book is as follows: The first chapter is written by Wang Tianqing, Shao Dong, Zhang He and Ren Qun; the second chapter is written by Teng Lingling and Song Jun; the third chapter is written by Jiang Mengjie; the fourth chapter is written by Rong Guoping;
rief introduction The book by the Nanjing University Software Institute three senior teachers in the field of industry experts prepared by the frontline, a systematic overview of DevOps - the Internet era of new software development model principles, methods and practices. Details of the content, the structure is clear, easy to understand, very suitable for students to learn to use, can also be used as a reference for industry beginners DevOps.  The first part of the book begins with the background of the times and introduces the origin of DevOps model. Combining with the characteristics of cloud era operation and maintenance, further clarifying DevOps mode is the inevitable choice to adapt to the current software system development, deployment and maintenance. The second part introduces the mainstream software development methods and processes; as the basic theory of DevOps, lean production and Kanban method is the focus of this part. The third section focuses on typical practices in DevOps mode, such as microservices architecture, continuous integration, continuous delivery (deployment), virtualization, Docker containers, automation, and more. About the Author Rong Guoping , Nanjing University Software Institute teachers, has long been engaged in software process improvement related work. Since 2006, SEI has been involved in numerous trainings in the United States and became the only SEI authorized PSP trainer and TSP team coach in that year. He has published more than 40 papers in journals such as JSS and Software Journals, as well as first-class international conferences including ICSE, ESEM, ICSSP, EASE, CSEE & T and APSEC. One of the founders of DevOps China Technology Community. Zhang He , a professor of software engineering and a doctoral tutor at Nanjing University, was selected for the Dengfeng Talent Project (A- level) and is a scientist at the CSIRO. In Europe and Australia engaged in software engineering research and practice of more than ten years, since 2013 working at Nanjing University. Long engaged in software processes, software architecture, service computing, experience in the field of software engineering research and practice. Responsible for presided over Ireland (EU), Australia, China and other countries * research fund projects. He has two books in English and has published more than 100 papers in key international academic journals and conferences of software engineering. Among them, 10 conference papers won the Best Paper Award. Shaodong , Associate Professor, School of Software, Nanjing University, director of Embedded Technology Department, Assistant Dean of the Software Institute. Mainly engaged in software engineering teaching and research work, research direction for the software process, high-tech market theory, agile software development, software engineering education. Three times in 2005, 2009 and 2014, it won the "Second Prize of National * Teaching Achievement Award" issued by the Ministry of Education. It is a key member of the teaching team "Software Engineering Main Course Teaching Team" of the State * and wrote a textbook. As a national excellent course "Calculate And software engineering "teacher, twice won the" Nanjing University favorite teacher "title

Beacon released FitOS6.0 cloud operating system

 Nov. 6-8, Open Source Summit 2H 2017 The OpenStack Sydney Summit is held as scheduled. Open source technology professionals, top cloud computing vendors and open source users from over 50 countries all over the world come together to share their experiences in technology innovation and application. For the world's open source cloud computing users to provide a better business cloud platform. This summit focuses on the deployment and application growth of OpenStack in various fields around the world and the further deep integration with the container field. 

Which also specifically emphasized the Chinese market for the development of OpenStack important contribution, whether from China's community contribution, or the use of Chinese enterprises OpenStack size, are among the highest in the world. At this conference, FiberHome, 

the Gold Member of the OpenStack Foundation, presented the latest released FitOS6.0 cloud operating system, FitCloud solution and the latest cases from different industries to attendees. On the first day of the conference, FiberHome released the latest version of the FitOS6.0 cloud operating system. As the core product of FiberHome's integration of FitCloud, the FitOS6.0 release is of great significance during the Sydney Summit. Based on the native OpenStack community platform, FitOS 6.0 provides users with a more secure, reliable, easy-to-use and smart cloud operating system through the secondary development and hardening of native components and the addition of self-developed functional components . In addition, based on the pre-war pre-research on user intelligence application scenarios, FiberHome FitOS6.0 and FitCloud cloud network integration solutions have opened up a new phase of supporting intelligent applications. FitOS6.0 released for this new, FitCloud solutions and industry cases, booth staff flames for a detailed explanation of the guests, and the advantages of beacon products in-depth communication. 

Since there are three OpenStack superusers in China this year, the case studies in the fields of government affairs, education, transportation and healthcare for large-scale government and enterprises have drawn much attention in the exchange of industry cases. The summit set up a number of Session sharing activities for participants to bring technology, cases, ecological construction, such as full range of sharing, but also for different segments of technology developers, product suppliers and industry users to provide a Depth exchange platform, together for the further development of OpenStack, and other open source communities such as K8s, Ceph cooperation, as well as OpenStack in the field of production deployment, operation and maintenance, testing experience and other aspects of the in-depth discussion. 

At the same time, various OpenStack vendors also introduce users to the latest OpenStack-based products and solutions. At this summit, the community contribution of Chinese enterprises has drawn much attention. Fiberhome has always insisted on giving back "high-quality" community contribution as its mission, requiring R & D personnel to play their respective expertise, and through common development with the community to continuously enhance the core competitiveness of FiberHome computing products. Among them, the beacon in the Senlin, Zaqar, ZUN and OPNFV and other projects in the backbone, and trained more than Core member. At present, OpenStack technology in the Chinese market has been widely recognized. As a golden member of the OpenStack Foundation, FiberHome will continue to contribute new forces to the community. While vigorously upgrading its products, the beacon cloud ecosystem will be built to help enterprises continuously innovate and innovate on the road to ICT transformation and development.

Tsinghua Tongfang UnitedStack acquisition, help China Openstack accelerate

 IT technology is driving the development of the times at an unprecedented speed and profoundly changing human life. As an IT form for the future, cloud computing redefines the basic motive force for technological innovation and enterprise development. November 8, the same cloud computing strategy conference held in Beijing, Tsinghua Tongfang with "Tsinghua Science and Technology, cloud security" as the vision and mission, as the industry market for secure cloud computing solutions and integrated services provider.  Market demand and national needs: the same side of the road with dual clouds IDC forecasts that more than 85% of large enterprises will adopt a hybrid (multi-cloud) IT environment by 2018; by 2019, 43% of IoT data will be pre-processed by edge computing devices; by 2020, 80 Fortune 500 companies will provide digital services to customers through industry clouds. If technology is the primary productivity, then cloud computing is becoming the source of primary productivity. The face of such an attractive industry trend of development, as the leader of China's information technology, Tsinghua Tongfang potential strategic layout here. At the just concluded National Congress of the Communist Party of China, the party and state leaders emphasized once again that we must unswervingly follow the road of building a strong nation with science and technology, build a powerful nation in cyberspace and step out of the path of national security with Chinese characteristics so as to promote development with security and development and promote development with security. As a high-tech enterprise under Tsinghua University, Tongfang assumed the mission of safeguarding national information security. This is also the primary goal of Tsinghua University in implementing the transformation of scientific and technological achievements. With the strong R & D support of Tsinghua University and its own 20 years of industrial accumulation, Tongfang already has a solid foundation. Significant advantage of the full stack industry to enable the development of cloud computing Tongfang provides cloud computing products and solutions for different industries. Its products and solutions cover the entire stack of IT construction, from basic environment construction and hardware equipment to cloud platform operating system to storage, big data and cloud security Products and solutions, until reaching the industry cloud and applications in various industries, providing end-to-end cloud computing solutions one-stop service. Together with global partners to jointly build an open science and technology ecology, for enterprise, government, finance and other users of innovation and development, reconstruction of core competitiveness to provide a solid support. In order to speed up strategic transition with Fang Yun, Tsinghua Tongfang is expected to spend 500 million yuan to acquire UnitedStack, a leading domestic open-source cloud company, to hold a controlling merger. After the completion of the merger, UnitedStack will change its name to Tongfangyunyun and maintain its independent operation. And will provide education, government, finance, military and other industries to provide secure, autonomous cloud computing control solutions. Cloud Environment: Tongfang Cloud creates a dynamic data center with new design concepts based on in-depth research on existing land use plans and full respect of customer suggestions. To build a secure, sustainable, data center with industry concentration and international presence, Tongfang follows the design principles of standards, security, reliability, manageability, agility and scalability, availability and advancement. In addition, Tongfang Computer Room Planning complies with standardized and modular design concepts to help customers achieve data center and IT planning based on the best cost-performance and best-in-class lifecycle cost savings. Infrastructure: The same cloud hardware product cluster is designed for cloud computing environments. Enterprise-class X86 servers and storage have undergone a lot of practical verification. Combined with a modular data center, a cloud computing data center that meets the requirements of enterprise-level production is constructed. Not only can provide users with high computing performance, but also can greatly improve the efficiency of enterprise IT operation and maintenance, not only wide range of applications but also through the intelligent automation of enterprise innovation management. Cloud Platform: The same cloud computing platform relying on Tongfang resources and Tsinghua University closely rely on scientific research and technology to cloud computing products and industry-specific custom development as the focus and core advantages, significantly reducing IT operation and maintenance costs, enabling users to operate in a stable and reliable business In an

First, the data desensitization Celestica Fund has a large number of individual user information

, the log file will retain four elements of personal and bank card information, these data are personal privacy, the original ELK program can not shield the sensitive data, can not fundamentally solve the problem. In the past, when a developer needs to view the log, he or she must follow an operation and maintenance staff next to the operation and maintenance personnel to view the log. Only in a simple process such as checking logs, we need to waste more time on an operation and maintenance staff, which not only lowers the coordination efficiency but also can not free up the supervision of the operation and maintenance personnel. Kangaroo cloud log data desensitization function, you can solve this problem through a simple set. The security administrator selects the fields in the log file that need to be desensitized and conv
erts them in an expression matching manner. The system automatically filters and converts the information into desensitized information. At the same time, with the permission control function, the user who does not have the right to view the log source automatically Block sensitive data information. It is a common requirement for financial customers to desensitize sensitive data in their logs. Information such as bank cards, ID cards, cell phone numbers, etc. that identify the user's identity is desensitized. In addition to supporting the desensitization of these general data, the Kangaroo Cloud Log also supports custom desensitization rules. By customizing desensitization rules, you can incrementally add any desensitization rules that users want. Second, collecting resources management and control Celestica funds all online business server resources, must ensure that non-stop 24 hours a day to provide services, and business and applications to ensure high availability. No external program or third-party applications can affect the stable operation of the production environment, all deployed on the server program, are not invasive on the application system. At the same time, the acquisition program deployed on the server undergoes rigorous pressure and performance testing to ensure that the acquisition process will not have any impact on the business system. At the beginning of the product design, Kangaroo Cloud Log began to consider how to minimize the impact of the log collection client on the server.  The first layer: resource constraints For example, CPU usage can not exceed 5%, memory usage can not exceed 100M, and bandwidth usage can not exceed 500KB / s. The threshold can be freely customized through the web page. As soon as the resource limit is enabled, the Agent will run within the threshold allowed. If there is a sudden increase in the number of loggers, the Agent will automatically suppress resources. The second floor: Agent self-esteem When a very special situation occurs, which results in the failure of resource limitation and the resources occupied by Agent exceeding the set threshold, the Agent of Kangaroo Cloud Logger will terminate the process through self-service mechanism and fully guarantee the security of the business system. After the system is stable, restart and restore the Agent, you can re-collect the previously missed logs to ensure that the log data is not lost. Third, call the link analysis Celestica Fund's business system uses a distributed architecture design, and the introduction of the ants financial cloud Sofa framework for development, the Sofa framework can be configured to log file generation, each system generates a large number of call link log. These logs are not worth the use of the original, but through log analysis can be found, log-based distributed call tracking system, the key is the call chain, for each request to generate a globally unique ID (Traceld), through which different systems " Isolated "call information associated together, restore more valuable information. How to use these logs to help users analyze Cloud log to be solved, after a period of research on the Sofa log files, Kangaroo log successfully parse the call link which, in a visual way for the user to render the various centers Between the call relationship, as well as the number of unsuccessful calls to the interface, call the key information such as time-consuming. Calling the link specific application scenarios include the following aspects:  A. Positioning anomaly statistical time-consuming By calling the link to find the TraceID in the error message of the service exception log, you can see the specific situation in the call chain in the system, locate the problem more intuitively in the call chain, and determine the problem after each layer troubleshooting.
 B. Call the drill-down report For distributed call tracking system, not only provides the function of the call chain, and can monitor all the middleware of the specific circumstances. Therefore, in the process of forming a call chain will form a detailed call monitoring report, and other monitoring is different: The monitoring report is with the drill-down function. Because the call chain can form a variety of dimensions of the report, not only can see the service situation, you can also view the call service situation, grasp the clear call chain information.  C. Full Link Analysis The difference between the full link and the call chain is: the whole link is a concept of application of the whole, and the call chain is the process of single call. The value of analyzing the whole link is mainly reflected in the following points: Link topology morphology analysis: Through the application of the topological relations between the call analysis of the source and destination of the call process to identify the source of unreasonable calls; Depends on carding and capacity estimation: identify problems such as easy fault point / performance bottleneck and interface error rate; evaluate the capacity according to the link call ratio and peak QPS; R & D and management personnel can quickly locate the fault or problem node through the above view, and through the node to view detailed interface call analysis and statistics, the user can easily find out the problem. The biggest advantage of full link analysis and tracking is that the relationship between all distributed applications is transparent. Each transaction or order request can be traceable on the basis of log analysis, and can be effectively reduced without manual inspection O & M and R & D personnel troubleshooting time costs. Intelligent operation and maintenance to use data and algorithms to achieve The development stage of operation and maintenance has gone through the intellectualization from standardization, instrumentation and automation to the present moment. The development of each stage represents a substantial increase in productivity and efficiency. The whole trend is inevitable. The operation and maintenance of the intelligent era is not to make the operation and maintenance personnel unemployed. Instead, it has great demands on the improvement of the operation and maintenance efficiency. For example, how to quickly locate the problem in the complicated environment, and even cause the failure prediction to prevent the failure, Guarantee application stability. Lin Jie believes that: Intelligent operation and maintenance to take advantage of data (operation and maintenance data) and algorithms can be achieved. First of all, the development of O & M capabilities does not jump directly to the stage of intelligent operation and maintenance. It must go through the process of standardization and instrumentation to the development of automation. Only highly sophisticated automation can provide basic capabilities. The second is the accumulation of data, the need for a large number of operational data, log data, network capture packet data, database data and so on. There are daily operation and maintenance of the data generated annotations, such as a fault, the operation and maintenance personnel will record the process, the process will be fed back to the system, in turn, enhance the level of operation and maintenance. The last is the algorithm, in the end what kind of algorithm model to do continuous optimization. In the operation and maintenance department, Celestica hopes to monitor the usage of basic resources of the application system by collecting and analyzing server performance logs.

Yu balance Po 11.11: Log data analysis and efficient operation and maintenance

 just ended, in fact, the most important thing is not the shop tally, nor is the netizen staring at the big spike in goods ready to spike, but the operation and maintenance of online shopping behind the scenes, they are most worried about: what network interruption, the application Caton, response Slow, server downtime ... Double eleven As the top priority for the e-commerce IT department, before the big promotion, the operation and maintenance personnel need to make many sets of preparatory schemes well in advance. They are always nervous and undergo hundreds of simulation exercises. It's unclear how many sleepless nights they have at the back end. The seemingly simple double 11 involves the collaboration and testing of the entire commercial infrastructure including payment, architecture, database, network, operation and maintenance, power, customer service, logistics and so on. Double Eleven to promote these years, operation and maintenance field which crossed the pit? Intelligent operation and maintenance debut today, how should the layout of enterprises? With these questions, Info interviewed Lin Jie, chief operation and maintenance expert of Kangaroo Cloud. He had previously supported BU business operation and maintenance such as Taobao, Lynx, shared services, wireless mobile phone business, Opinion Double eleven to promote the operation and maintenance of these years crossed the pit
Lin Jie recalls: Lynx double eleven big promotion first started in 2009, when Taobao Mall or one day only tens of millions of GMV, and there is no zero people crazy concept. Before the big promotion engineers basically will judge according to their own experience, such as the server's current load, the application of the current RT and QPS, to determine how much each server can support the maximum capacity, and then a few people to discuss the decision after the decision board, How many servers each core application should be added, in the end how much to increase the server, in fact, everyone's bottom of my heart, I really do not worry temporary application for expansion. In short, this phase of the business is small, can cope with the past. In the next few years, with the promotion of the Lynx brand, the explosion of the 11 Big Promises year after year, the original mode of operation and maintenance can no longer be applied. Rapid business development, the number of back-end applications also increased significantly, the call between the various application systems intricate links. How much resources should be prepared before expansion? You can not shoot brain heat, because you apply too much resources may be rejected, apply for less you have to assume greater risk. This time we are using online pressure measurement approach to solve, for example, 1 server can be extracted directly in the production environment, through the analog playback or directly into the multi-flow pressure measurement, according to the pressure measurement results to calculate the maximum single server Carrying capacity, and then use numbers to speak, to apply for expansion. There is even if the capacity planning to do a bit, but when the peak may still exceed the expected zero, the system will still squeeze burst. Therefore, the introduction of the current limit and downgrade, the current limit is to set a maximum threshold for each application, beyond the threshold immediately rejected the new request, this benefit is to protect the application, to avoid avalanche. There is a downgrade, due to the application of too much, during the promotion period, you can turn off some non-core functions, to ensure that the trading process to maximize the capacity. The pressure measurement at that stage is not completely accurate. The main problem is the limitation of pressure measurement. It is only a single measurement of an application, but there is a dependency between applications. In particular, some shared service centers basically All applications are dependent on the call, then how to do? A few years later developed a new pressure measurement tools, the whole link pressure measurement. This is a new idea for capacity planning. It directly generates large quantities of traffic through analog copy in the production environment. Each link is measured and matched with the corresponding monitoring system to find out where the bottleneck is. And quickly optimized. And the process is done automatically. Visible, automated operation and maintenance is the trend. Zero berserk behind the strategist Now that the 11 major promotion activities of e-commerce operators still carry on the zero-beware mode, it is the core guarantee task for application-system security to successfully carry the first 15 minutes or even the first few minutes. Lin Jie made the following suggestions: a. Capacity planning. As far as possible in the production environment to do the pressure test, only experienced pressure measurement, my heart will end. b. Critical applications to support current limiting. Zero crazy traffic is likely to exceed expectations, only set the current limit to protect their own applications, or an avalanche chain reaction. c. Downgrade non-core functions. Each time a pair of eleven will invest a lot of resources, the basic application will be tilted to the core, then the degradation of non-core functions to some extent acceptable. d. Emergency plan. Prepare for possible abnormal conditions.
Double eleven big promote is the most typical flexible scene Flexibility is the biggest advantage of cloud computing, and big promotion is the most typical flexible scenario. With the popularity of cloud computing, especially public cloud, the current operation and maintenance personnel basically do not need to pay attention to the underlying facilities such as the engine room, network and operating system. After continuous exercise, today's e-commerce platform has already adopted a flexible and scalable cloud computing platform, with distributed data, efficient CDN distribution to achieve load balancing, to avoid the collapse of the high concurrent state in the middle of the 11th. Operation and maintenance personnel will be more energy transferred to the rapid on-line, rapid iteration, to support business development. Large activities with the daily flow of traffic is completely out of order, can fully utilize the on-demand use of cloud resources to meet the expansion needs, but also a huge cost savings. In addition to expansion, of course, need to prepare contingency plans. Sort out the possible abnormal situation that day, preview in advance. Last year, Lynx double eleven opening just ten minutes, the world pay records were refreshed again. Alipay data show that at 0:39:12, Alipay peak payment reached 120,000 pen / second, 1.4 times the previous year, set a record peak last year. In terms of the choice of payment methods, flowers and Yuen Po have become very popular with users of payment methods, accounting for as high as 29% and 18% respectively. Stand up to huge transactions, play with the speed of light spike, the technical system resisted, the liquidity of a variety of stability and yield ... ... only withstand the ultimate test of double eleven can be considered a real artifact! Celestica Fund log data analysis based on the efficient operation and maintenance For Celestica Funds, how to ensure that Yuen Po Po's liquidity and return on a smooth 11 is a major challenge. Online systems most common problem location, is the log analysis. Next, we take Yu-Po as an example, focusing on how Celestica Fund breakthrough in the field of log data analysis? Prior to this, Celestica Fund has been using the open source ELK log program, R & D and operation and maintenance staff through the ELK log data processing, the use of log files query search. With the deepening of application scenarios and the increasing demand of internal staff, Celestica hopes to solve the new problems related to operation and maintenance through log analysis. In this regard, Celestial Fund chose to cooperate with Kangaroo Cloud. Specifically include the following aspects:

There are two persistence methods for Redis, AOF and RDB.

Third, persistence There are two persistence methods for Redis, AOF and RDB. AOF persistence refers to the method of appending write commands to an aof file, and RDB refers to the manner in which a snapshot of a memory is periodically saved to an rdb file. Although the RDB can save the snapshot in the background through the bgsave command, the fork () subprocess has overhead and takes a long time in the case of a large memory dataset. Although a shareable data content Need to copy, but will copy the memory page table of the previous process space, if the memory space has 40G (consider each page table entry consumes 8 bytes), then the page table size is 80M, this copy takes time, The server node on the test, 35G data bgsave moment will block more than 200ms, the general recommendation Redis use memory does not exceed 20g. I / O consumption, we are online in the Slave node to open rdb persistence, disk performance in general, 1.2g rdb file persistence once a minute, a time-consuming about 30s, so rdb frequency can not be too frequent, according to The situation is well configured. AOF is an additional write command to the aof file, the advantage is that you can basically do the data lossless, the disadvantage is that the file grows faster, requiring intermittent bgrewrite, bgrewrite is also a consumption of both CPU and disk IO operations, the highest single cpu utilization Up to 100%. bgrewrite period can be set to temporarily write a new write request buffer, bgrewrite synchronous write disk after completion, synchronization will temporarily stop processing client requests, if bgrewrite longer, the buffer backlog data more core blocking time will be very long, so if Must be open aof, it is generally recommended to find several free time to set the script to do bgrewrite. AOF there is a pit more brush set fsync strategy, this setting generally have three ways: always, everysec, no, if set to no, the time to write the disk to the operating system, which is very large To the extent sacrificed aof data lossless advantage, if set to always means that each command will be synchronous brush, will cause frequent I / O, so the general advice is to set everysec, Redis will default once every second fsync call , The data in the buffer to disk. But when this time fsync call longer than 1 second. Redis will take a delay fsync strategy, wait a second. That is fsync after two seconds, this time fsync no matter how long will be carried out. At this time due to the fsync file descriptors will be blocked, so the current write operation will be blocked, because it is synchronized so the core processing block, open aof and require Redis lossless performance on the disk have very high requirements.  Persistence provides a mechanism for restoring data to and from Redis, but turning persistence on comes at a cost, and persistence can cause CPU stalls, affecting the processing of client requests. Do not open the persistence there is a risk, if you restart the master node by mistake, or imagine such a scenario, the master-slave switch fails, it is likely to restart the master because of carelessness, then do not open the persistent master will all slave data clear 0. So whether to turn on persistence, how to open persistence is a problem. And operation and maintenance colleagues discussed some of the options here for your reference: 1, in extreme cases can tolerate the full amount of data loss, it is recommended that the master turn off persistence, slave off persistence; 2, in extreme cases can not tolerate the full amount of data loss, but can tolerate some of the data loss, if the memory data set smaller and does not increase the proposed master open rdb, slave open rdb; if the data set is large, or not sure the data set growth trend , It is recommended master turn off persistence, slave open rdb
Open rdb need cpu and disk performance protection. If master turn off persistence, slave to open rdb need to ensure that slave rdb will not be covered by the master error restart, here are several options:  Restart the script package layer network load command to load the rdb file backup directory backup and then start, to prevent accidental restart, but the preparation for the deployment may need to adjust the script, the host also needs to adjust the script to open the persistence  Regular rdb file through the network io passed to the master node (file is more time-consuming, file growth need to consider the timing of the script execution interval, otherwise it will cause persistent network io), but also a certain loss of data  Regular backup of the rdb to the backup directory Slave, do not do any other operation, error restart manually copy rdb to the master node (there will be some data loss) 3, the maximum data lossless, it is recommended master open aof, slave open aof Open aof need cpu and disk performance protection. Open aof fsync sync brush disk use everysec, custom scripts do bgrewrite regularly in idle time, bgrewrite incremental data buffer. At present, most of the business allows part of the data loss, in order to maximize the Redis performance, turn off the Master persistence, slave open rdb, to prevent a false restart rdb made a 5-minute backup, keep the last 1 hour backup file, if necessary Artificial copy to the master data directory recovery data. Follow-up hardware performance improvements, see the situation and then adjust the persistence mechanism. 

Redis server optimization practices: configuration optimization, master-slave switch, persistence

 First, the Server configuration llkeys-lru or noeviction Redis server has such a configuration parameter maxmemory-policy allkeys-lru, Redis said the use of memory allocated to the upper limit of the default use lru key elimination strategy (there are other optional strategies). We have encountered such a problem, the application side due to an improper hgetall operation, resulting in Redis memory expansion exceeds the limit, trigger key out, the results of the important data is emptied (Fortunately, most businesses have done disaster recovery mechanism) After this accident we changed the strategy of elimination to noeviction-forbidden key to be eliminated. So, in the end to adjust the elimination strategy? In fact, the main business scenarios, Redis is considered as a data cache to be eliminated or the core data storage memory database. One of the core advantages of using Redis is that Redis has a rich data structure. Most current businesses choose Redis for this purpose. Once the data set of some complex structures is emptied, it is impossible to implement a service-free recovery mechanism Automatically rebuilt. If your Redis is not a simple string cache, then you need to carefully consider whether to disable the key out. client-output-buffer-limit normal 0 0 0? The example of hgetall is exactly the same as the reason that an accident was caused by a memory spike caused by the engineer turning on the monitor command.  By default, this buffer is allocated unlimited, and also occupy Redis memory space, which means that a large query memory may double exponentially, check the memory burst on several occasions, if unfortunately not set to prohibit the key to eliminate , Then the data is likely to be cleared 0. Redis APIs provide many levels of O (N) instructions like hgetall, smembers, keys, etc. O (N) level instructions should be used with caution if your application has high QPS or is intended for end-user requests. If the target of the instruction Large data set, then it means that either the request is time-consuming long-term consumption of cpu, or a large amount of data inquiries led to soaring Redis memory. Redis single-threaded design is not intended for high-frequency large data set queries. We set the client-output-buffer-limit normal 10MB 5MB 10 limit for this parameter. The allocation of 10MB buffers for a single request is high enough, and can be lower depending on the application, but it is by no means unlimited. lua-time-limit One of the advantages of Redis' support for custom commands is that they support Lua scripts. Some of our services use Lua scripts. The run time of these custom instructions needs to be tested and evaluated. In applications requiring high QPS, there must be no Lua scripts. Long-term Take
 up CPU resources. Although lua-time-limit will not terminate the Lua script, it will cause Redis to respond to client requests returning Busy errors after the time limit has expired, thus avoiding knowing when a large number of connections have been suspended and timed out. Lua need to set the time limit, but still suggest testing lua script performance. rename-command Redis provides rename-command can rewrite some of the dangerous command, so that it can not be successful, it is proposed to ban the following commands:  rename-command FLUSHALL "" / * Delete all existing databases * /  rename-command FLUSHDB "" / * clear all the current database key * /  rename-command SHUTDOWN "" / * Close redis server (server) * /  rename-command KEYS "" / * Complexity O (N), traversing all key * /  rename-command MONITOR "" / * debug command to see the command Redis is executing * / The company had engineers mistakenly executed flushall lead to the data empty, the United States regiment dropped monitor pit, we also appeared to use the keys caused by CPU Caton. In fact, these commands do not need and should not appear in the application API calls. Hope not to accidentally call engineers, it is better to ban directly on the server side. slaveof This directive is very special, it specifies that the current Redis instance is a slave of a Master instance. If this command is written dead in the configuration file, then the instance can only be a slave after it starts, unless there is a Sentinel to promote it to master, or manually execute slaveof no one. This instruction is an instruction that will be dynamically removed or added by the Sentinel from the configuration file. It is best if the Sentinel decides the existence of the Sentinel. Our previous configuration file was introduced via a subfile, so there was a problem. If a slave was elected as the master by the Sentinel, its slaveof instruction would be removed, but the subroutine could not write the command Removed, once restarted this master instance, it has become a slave. If you do not pay attention to the existence of this sub-file, the problem is still not good investigation, do not know what happened. No problem encountered online, the test environment did a master-slave switch, the operation process, restart the next master, the result of a pit. The test example will be covered later, as it also deals with sentry and master-slave switching.

The second core issue: to IOE save money?

If you do not save money, is it worth doing?  Why IOE can not significantly reduce costs? How to make all fairies understand this? I have been thinking. The most obvious example is not the case: There is no free cake in the world, there is no silver bullet, can not only remember to remember.  For example, to build a house, IOE infrastructure is a solid cement, the application is to build high-rise buildings on the cement, it is safe and stable. Some people say that cement is too expensive and monopolized by oligarchs, which has seriously affected the scalability of the building (too high to make it up) and even its stability. Now you have to remove the flyy cement and build the house on Not reliable yellow sand, yellow sand cheaper and no one can monopolize. Well, architects rack their brains how to build reliable applications on the base of no fly it? The core idea is that you can not build up a scaleup and build a scaleout; that capacity expands indefinitely, and once a building collapses, it does not collapse as a whole and affects only one bungalow; then, in order to do this That is, the building mode (from waterfall to agile), building architecture (IT architecture from single to distribution), building materials (from IOE to IOE), property (operation and maintenance to sre, operation and maintenance development, Comprehensive operation, and gradually realize AIOps ... ...), all aspects of reinvent the wheel.  In essence, it must be realized that if there is a need to build a reliable house on unscrupulous building materials, there must be another place to fly! Above all those aspects must be transformed and strengthened, have to spend money, especially the cost! Technically, everything has become software-defined, with no stable infrastructure and only a stable architecture. That software is not defined as money? Architecture is not relying on money out of it?  This is the only building materials to replace (hardware, mainly IE) is to save money, and the province is not the cost of investment. Or, go directly to IE after the cost of direct maintenance greatly reduced, but the ensuing overhead and software redefinition of indirect costs. IE away from the application, this is not too serious, close to the application of O this point is particularly prominent, so go O is the most difficult thing to IOE.
TCO point of view, due to the statistical size is not uniform, I can not give the detailed figures, but from an experience point of view, personally think that IOE overall significant investment savings, but the cost increase at least doubled, the overall TCO flat or slightly lower. The main savings in IE, and in the O point of view is likely to rise and fall TCO (see how the statistics, if the total sold by the comprehensive TCO should still be reduced, if by imperialism and rogue operators certainly more expensive .In addition to How to count, the purely cost point of view, must have risen substantially, doubled are conservative.

The Essential Reason and Technical Value Behind "Going to IOE

Zhejiang Mobile to IOE road has gone a few years, from the questionnaire is also full of achievements: a considerable proportion of clouds, I almost finished (no technical bottlenecks, a small amount of old equipment gradually off-line as planned) E is not too much left (there are a small number of technical bottlenecks), O basic control in the CRM core transaction library, but also completely realized the CRM core database hosting x86. Core CRM application container-based micro-service, enterprise-level micro-service platform has begun to take shape, development and operation mode of agile transformation continued to promote. Now, it's time to look back and re-examine the IOE. The first core question: What is going to IOE? IOE is not a scourge, IOE and we have no hatred, IOE can not literally understand, can not go away. IOE has played a key role in history. For some time to come, there may not be no suitable application scenario. Even when it comes to technology development, there is also a saying called veteran immortality, which only gradually dwindles. There is no overnight or non-trivial matter. National Security Information Security? Level is too high, the top decision-making is our grass-roots implementation is, I do not comment on the issue should not be discussed, I just need to make it clear that the level of technical transformation of the cost and program, while doing a good job on the line. We turn technical problems into economic problems, and as long as we can afford the cost, we can ensure that we have the ability to execute. The essence of going to IOE Ali proposed a marketing slogan, technically not rigorous. Lack of rigor is behind the overall revamping of Ali's IT infrastructure, which reworked underlying infrastructure changes and, in fact, overshoots the rebuild of the application architecture. On the one hand formally removed (at least most of the IOE removed), but also for the construction of Aliyun business laid the technical foundation; on the one hand to promote the application of cloud services and micro-services, to promote a comprehensive structure decoupling, agile transformation, organizational restructuring , Promote small IT business transformation of medium and large Taiwan, to plug in the wings of business innovation, redefine the boundaries of enterprise IT and business. Ali went to IOE a good illustration of the value of IT lies in innovation , is to allow you to do things that have not done before, not just to save costs. The value of IT is offensive rather than defensive! Ali why this application architecture reconstruction it? A few years ago, Ali technicians to see Ali implementation of IOE reasons:  Annual GMV are growing rapidly (business growth too fast)  Annual infrastructure investment outpaces GMV growth (infrastructure investment growth exceeds GMV growth)
 The dilemma of a centralized core database:  - Unstable (downtime) affects network-wide access (centralized data storage can not be isolated in the event of a failure) - Expansion costs are high (minicomputers and commercial databases are too expensive)  Through the above information and data, we can see that Ali "double eleven" volume of business in a short span of 8 years turned 3360 times. Ali, the most important reason for the IOE transformation of enterprise IT systems, is that the original centralized commercial database architecture can not meet the ever-increasing business requirements. Although we do not underestimate Ali's tremendous efforts and achievements in the process of going to IOE, we can not completely interpret this as far as the information security of the country is concerned.  In some traditional enterprises, there are thousands of faces to the IOE. For so many years, the mainstream view still stays in two directions: First, the resistance is too large to dare to go and does not want to go; Boosting is often thought of as an innovative area of cost savings.