[Off the court] Sunda: Abandoned at the age of 15, casting iron will.

The Premier League cooperated with Budweiser Beer to bring a new program-"Off the Field", which brought together football players and music stars and was hosted by Arsenal legend ian wright. The protagonist of the sixth episode is Arsenal goalkeeper Ramsdale and rapper Bugzy Malone. They talked about the road to growth, Ramsdale joining Arsenal, the North London Derby and the DJ in the dressing room.

A detailed explanation of Messi’s 300 assists: Mbappé benefited the most, and at the age of 35, he was still the king of assists in the five major leagues.

The World War I against brest became another milestone in Messi’s career. After sending out assists, Messi became the first player in history to score at least 700 goals and send out 300 assists.

In his career, Messi has sent 353 assists, including 300 in his club career and 53 in Argentina. Whether it is 353 assists in his career or 300 assists in his club career, Messi is the first person in history.

During his two seasons in Paris, Messi scored 60 goals in 65 games, scored 29 goals and sent 31 assists. This season, Messi played 35 goals in 31 appearances, scored 18 goals and sent 17 assists.

Of the 31 assists Messi sent in his career in Paris, 18 were passed to Mbappé, making him the third most assists for Mbappé, second only to Neymar (26) and Di Maria (24).

In this season, Messi has sent nine assists for Mbappé, and Mbappé has tied the record set by Suarez in 2015-16, and they are tied as the teammates with the most assists in Messi’s league last season.

Messi, who is about to celebrate his 36th birthday, has participated in 51 goals on the double track this season, ranking first in the five major leagues. This season, Messi sent 13 assists in Ligue 1, ranking first in the five major leagues, Manchester City midfielder De Braune ranked second with 12 times and Neymar ranked third with 11 times.

Cooperative robot: reshaping the production mode of manufacturing industry

With the continuous progress of artificial intelligence technology, cooperative robot, as one of the important applications, has gradually become an important role in modern industrial production lines. Cooperative robots can not only improve the production efficiency and quality, but also alleviate the shortage of human resources and labor intensity by cooperating with human beings. At the same time, cooperative robots are intelligent and flexible, which can bring more commercial value to enterprises.

A cooperative robot is a kind of robot that can work with human beings, and it is usually called "cooperative robot" or "Collaborative Robot System". Compared with traditional industrial robots, cooperative robots are more flexible and safe, and can cooperate with humans to complete tasks in the same workspace.

Cooperative robots are usually equipped with various sensors, such as visual sensors, force sensors and acoustic sensors, which can help robots perceive the surrounding environment and human beings, thus achieving safe cooperation. Cooperative robots usually adopt lightweight design, flexible structure and intelligent control algorithm, which can cooperate with humans adaptively and realize efficient, safe and flexible production and manufacturing. Collaborative robots have been widely used in electronic manufacturing, automobile manufacturing, medical care, logistics and home service.

Although the cooperative robot technology has made great progress and development, there are still some problems and challenges, including:

Safety: Although the cooperative robot has been designed and manufactured with safety in mind, in practical application, the interaction and cooperation between robot and human may lead to accidents and injuries. Therefore, it is necessary to continue to study and optimize the safety performance of cooperative robots.

Accuracy and reliability: Cooperative robots need to cooperate with human beings in a real-time dynamic environment, so they need to have high accuracy and reliability. At the same time, robots need to be able to adapt to changes in environment and tasks, and maintain stable and accurate performance.

Human-computer interaction and interface design: Cooperative robots need to interact and communicate effectively with human beings, and the interface and interaction mode of robots need to be designed reasonably to improve the efficiency of cooperation and the comfort of human-computer interaction.

Robot programming and control problems: cooperative robots need to be able to adapt to different tasks and environments, so they need to have flexible and intelligent programming and control capabilities. At the same time, the programming and control of robots need to be simple enough to improve the popularity and application scope of robots.

Cost and sustainability: The manufacturing and maintenance costs of cooperative robots are high, which limits the application scope and popularity. Therefore, it is necessary to continue to study and optimize the manufacturing and maintenance costs of cooperative robots in order to improve their sustainability and market competitiveness.

But I am very optimistic about the future development potential of cooperative robots. It is believed that with the continuous progress and innovation of technology, the application scope of cooperative robots in many fields will continue to expand and become an important assistant in the field of production and manufacturing.

First of all, cooperative robots can greatly improve the efficiency and quality of production and manufacturing, and reduce production costs and labor costs. Compared with traditional robots, cooperative robots are more flexible and safe, and can cooperate with humans to complete tasks in the same workspace. This enables cooperative robots to be applied to a wider range of fields, such as automobile manufacturing, electronic manufacturing, medical care and so on.

Secondly, the intelligence and adaptive ability of cooperative robot will be improved continuously. With the continuous progress of robot technology, cooperative robots will be more and more intelligent and adaptive. For example, robots will continue to learn and optimize their behavior and performance through machine learning and artificial intelligence algorithms, thus achieving more efficient and intelligent cooperation.

Finally, with the continuous expansion of the application scope of cooperative robot, its manufacturing and maintenance costs will continue to decrease. This will make the application scope of cooperative robots wider and the market potential greater.

Moreover, the market of cooperative robots is very large, and robot manufacturers of different brands and countries have the opportunity to succeed in this field.

Both domestic cooperative robots and foreign brands have their own advantages and disadvantages. There may be some gaps between domestic cooperative robots and foreign brands in technology and performance. However, domestic cooperative robots usually have lower prices and better localization service support, which may be more attractive to some small and medium-sized enterprises.

On the other hand, foreign brands of cooperative robots have technical advantages in some aspects, such as machine vision, motion control, human-computer interaction and so on. In addition, these brands usually have a wide global customer base and marketing network, which can provide better international support and services.

Generally speaking, cooperative robot, as one of the important applications of artificial intelligence technology, has gradually become an important role in modern industrial production lines. Although the technology of cooperative robot is mature, there are still challenges in business model and security.

However, with the continuous development of technology, cooperative robots will continue to break through their own technical limitations, achieve more extensive applications, and bring more commercial value to the development of manufacturing industry. In the future, cooperative robots will continue to play their unique advantages, provide more innovative solutions for enterprises, and make industrial production more flexible, efficient, safe and sustainable.

Awesome! ChatGPT unexpectedly brought a new career to life?

With the popularity of ChatGPT,

Unexpectedly, I accidentally brought a new career.

The latest research found that as long as you have these "soft skills in the workplace",

Don’t worry about "being robbed of your job by AI"!

A new occupation that was accidentally caught on fire

ChatGPT is all the rage, but to make it work effectively, users need to ask the right questions to get the desired results. In this case, a new profession-Prompt engineer is quietly rising. American Atlantic magazine called it "the most important vocational skill of this century".

According to public information, AI hint engineering is an effective way to obtain the required output by using AI tools.

Hints come in many forms, such as statements, code blocks and strings, which are also a way to teach models to give specific task results. In this way, prompt engineering has become an indispensable skill in using AI tools.

The higher the development level of AI, the more the value of soft skills in the workplace increases.

"As the dean of the School of Technology and Life Sciences of Liverpool City University, I witnessed the latest progress of AI (Artificial Intelligence) technology. Although it will change the nature of the work and the structure of the team, it will not eliminate the need for human planning, professional knowledge and judgment. "

It is important to realize that practical skills, soft skills and judgment based on experience will always have a place in the workplace.

I think one of the biggest misunderstandings about artificial intelligence is that it will surpass soft skills such as creativity, critical thinking and emotional intelligence. Although artificial intelligence can copy some tasks, it can’t copy human touch or the level of creativity required by many industries.

In fact, as artificial intelligence becomes more and more common, the value of soft skills will increase as they become more and more scarce.

Both practical skills and soft skills are crucial to employment, because employers seek individuals who not only have technical proficiency in their fields, but also have the ability to interact with others and adapt to changing environments.

These abilities are the key to stand out in the workplace.

With the continuous development of programs and platforms (such as ChatGPT), we see that more and more machines can copy homework, papers and homework. Recently, there was even a case in which artificial intelligence passed the American medical license examination.

With the increasing use of artificial intelligence and automation in many industries, educational institutions must ensure that students have the skills needed to stay ahead. With proper training, our students will be able to see artificial intelligence as an opportunity to change their chosen career.

Equally important, we should educate students about the potential risks of using artificial intelligence and ensure that they can make informed decisions on how to use it responsibly. This includes understanding the impact of artificial intelligence on morality, data privacy and prejudice in work, and its potential impact on future work.

Practical experience, technical proficiency and the ability to work well in a team are the keys to make individuals stand out in the job market and make them valuable to employers or successful in self-employment.

Humans are not yet worthy of eternal life.

The secret of human immortality is not in animals, but in plants. After being differentiated from prokaryotes into animals and plants, animals lose the function of immortality and are replaced by the function of death.

It has been said a long time ago that people’s biggest misunderstanding is that death is the biggest enemy of life. In fact, on the contrary, death is the most powerful function of animals, which enables animals to evolve rapidly and iteratively to ensure that the population can constantly adapt to the environment. If animals can live forever, it will only take tens of thousands of years to consume the resources of the earth.

However, plants retain the function of eternal life, but they do not have the ability of animals to adapt to the environment. If the environment changes, plants will soon die. If the environment is suitable and meets unlimited resource conditions, plants can survive all the time. In fact, without a permanent environment and unlimited resources, plants will still die slowly with the change of the environment or the exhaustion of resources. The immortal function of plants limits the evolutionary efficiency of plants.

Nowadays, human beings, especially those with sufficient resources, have always wanted to live forever, but this is contrary to the strongest function of animals, so they can never find the answer from animals.

In 2030, with the help of strong artificial intelligence, human beings found out the secret of immortality and integrated it into the genes of lower animals for the first time. However, on the eve of the Third World War, this technology could not be further applied to higher animals.

It was not until 80 years later, after the reunification of the earth in the Fourth World War, that this technology was conditionally applied to human beings. At the same time, artificial intelligence has got rid of the shackles of human beings. The earth entered the first cross-species world war again, and after more than a hundred years of cruel struggle, the remaining human beings and artificial intelligence reached a symbiotic relationship. The earth civilization ushered in a real leap.

5: 3! Bayern retaliated, and two of the three full-backs scored twice, and Paris really lost.

On the evening of March 11th, the 24th round of Bundesliga was in full swing, and Bayern continued to play against augsburg at Allianz Stadium. In the first half, Berisha broke the ice with pawar’s mistake, and Cancelo and pawar scored two goals in less than four minutes, followed by pawar’s twice and Sane’s header to make up for the score. Berisha’s shot in the second half also scored twice, Alfonso Davies’ stab sealed the victory, Vargas pulled back another goal in injury time, and Bayern beat augsburg 5-3.

Two days ago, in the second round of the Champions League knockout, Bayern easily won Paris 2-0, without giving Mbappé or Messi too many chances. However, Bayern, who returns to the league, will still be threatened, with teams such as Dortmund and Berlin United chasing after him. In addition, Bayern lost to augsburg in the first round.

In this game, Shu Bo-Mo Ting was absent because of back injury. Mane returned to the starting position as a center. Muxiala, Sane, Gnabry and Alfonso-Davies ambushed him, and kimmich had a single midfielder. Cancelo, Yupamelano, Delicht and pawar formed the defence, and the goalkeeper was still Sommer.

Only 3 minutes into the opening, Bayern’s defensive error, pawar’s unexpected header and an own goal assisted Berisha to score 0-1 by pushing the ball over Cancelo with his right foot in the penalty area! Augsburg takes the lead with the attitude of anti-customer.

As one of the teams with the fiercest firepower this season, daring to score Bayern’s goal may mean a fiasco. Sure enough, Bayern patiently organized in the frontcourt in 15 minutes. Alfonso Davies divided the ball on the left, and Sane passed it to the right. Cancelo got the ball in the restricted area, and his left foot shook the defense, and his right foot broke the far corner, scoring 1:1! It was Cancelo’s Bayern’s first goal, and he also got three assists in the past eight games.

In the 19th minute, kimmich’s right set-piece was sent into the penalty area, Driget’s header was ferried to clear, Mane’s barbed pass in the penalty area and pawar’s point-grab scored 2-1! Pawar redeem oneself by good service! In just four minutes, Bayern completed the score reversal.

In the 35th minute, kimmich hit a corner kick, and Driget’s header was blocked. In the 2.0 version of the Peach Blossom Shadow, pawar directly volleyed to make up the score 3:1! So far this season, pawar has scored five goals on behalf of Bayern in all competitions, setting a record for the French defender in a single season.

In the 39th minute, Mane turned and volleyed in the restricted area, and Jikaiweiqi flew to block the ball with one hand. In the 44th minute, after Bayern broke the ball in the frontcourt, Sane went straight to the left, Mane swung open the angle and saved it. Sane followed up with a header in front of the door to make up the net 4-1! Sane scored his seventh goal in the league, which was Sane’s first goal after the World Cup.

In the 50 th minute after returning in the second half, Bayern overtook in the middle circle, Grabri pushed for a direct shot, and Sane scored the goal directly, followed by Mane, but the offside goal was not effective. In the 53rd minute, Mane tried to shoot a long shot from the top of the arc, and the ball was slightly higher. It can be seen that Mane really wanted to resume scoring after coming back. In the 58th minute, Arne Mayer’s long-range shot was saved by Sommer, Driget was fouled in the restricted area and was sentenced to a yellow spot package, and Berisha took a penalty to trick Sommer into making the score 4-2!

In the 74th minute, Bayern came back. After Cancelo advanced with the ball on the right, his outer instep came to the back point. Alfonso Davies stabbed behind him and went down another city, 5:2! Bayern extended the score to three goals, which is why Alfonso Davis also scored his first goal this season. Looking at the victory or defeat has been determined, nagel Mann also asked the substitutes to play to find the feel of the game, and the main players of Gnabry, Mane, Muxiala and Driget all went off to rest. On the stage of injury time, Vargas fell to the ground and shoveled again to narrow the score.

In the end, Bayern beat augsburg 5-3 at home, avenged its defeat in the first round and continued to lead the Bundesliga standings. Counting this game, Bayern won eight wins and one loss in the last nine games. Seeing the hot situation of Bayern, we can only say that Paris really can’t be wronged for losing.

How to open and edit RAW images? What are the new features of ON1 Photo RAW 2023?

The ultimate photo editor ON1 Photo RAW 2023. Unlock your creativity without learning curve. Every new function and technology in ON1 Photo RAW 2023 will eliminate the steep learning curve brought by more traditional editing methods. Photographers no longer need to deal with complicated masking, layering, brushing or adjusting methods when dealing with specific areas of photos. The most incredible new editing tool is Super Select AI. It will change the way you edit photos.

Fast RAW processing

  • ON1 Photo RAW is the most advanced original processor, which can provide perfect tone and ultimate image color and clarity, while maintaining the finest details in photos. The new color edge reduction function will automatically detect color edges or color differences and eliminate them. It is equipped with all the tools that photographers need to create amazing images.

Easily combine photos

  • ON1 Photo RAW is like having Lightroom and Photoshop under one roof. It is equipped with the tools needed for synthesis, HDR, hundreds of the hottest preset and built-in filters, and world-class masking tools to make your life easier.

Super selective artificial intelligence

  • Never worry about brushes, layers, masks or any of these. Simply point and click to easily edit areas and objects.

Super featured AI enhancements

  • Since we first showed Super Select AI to the world, it has been significantly improved according to your feedback. In this video, Dan Harlacher shows a number of improvements to Super Select AI, including enhanced fragment collection, improved mask, real-time preset preview and so on.

Nail sharp artificial intelligence

  • The most advanced sharpening and deblurring. It can detect and eliminate motion blur and save out-of-focus photos.

Adaptive preset driven by artificial intelligence

  • AI-driven adaptive presets can adapt to different themes, scenes, animals, people and so on in each photo. You will be amazed at their powerful functions and how they will completely change your workflow.

Click twice to add the preset to the area.

  • Imagine adding an entire preset to a part of our photos with just two clicks. The new Super Select AI tool can add adjustment layers, filters and even entire presets to objects or areas in photos with just a few clicks, without brushing.

Specific area artificial intelligence

  • Automatically locate specific areas in photos by theme type. It first analyzes your photos and builds maps of all areas, such as sky, water, mountains, ground, buildings, people, animals and so on.

View presets instantly

  • Finding the right preset for your image can sometimes be a bit challenging. If you are tired of viewing thumbnails to make editing decisions, there will soon be a new way to view them in full screen on your photos.

Keywords artificial intelligence

  • Wouldn’t it be great if you could find the photo you are looking for without adding keywords? Keyword AI can scan your photos (without uploading them to the cloud) and automatically identify objects, people, colors, places and so on.

Fast mask AI

  • Rebuild the fast masking tool to make it faster and more intuitive. AI automatically divides photos into regions and objects.

Ocudrone sky and enhanced sky exchange AI.

  • We cooperate with Ocudrone, ON1 Photo RAW 2023 will include 125 skies! We also enhanced Sky Swap AI to take advantage of Mask AI technology, adding the option of adjusting the sky angle and better matching the edges.

Content-aware clipping

  • Crop and flatten photos and expand the photo canvas and fill new edges with realistic details.

Reduce color stripes

  • The new automatic option can detect color edges or color differences and automatically remove them.

New camera and lens support

  • We will add all the latest support for new cameras and lenses. (More than 800 cameras are supported, and JPEG, TIF, PSD, PSB, PNG and DNG are also supported.)

Win download: https://soft.macxf.com/soft/3294.html? id=MzE5MDg%3D

The scoring list of Argentine players after the World Cup: lautaro scored first with 9 goals and Di Maria scored 7 goals.

[Argentina players’ scoring list after the World Cup: lautaro scored 9 goals first, Di Maria scored 7 goals] SC_ESPN counted the scoring list of Argentina champions after the World Cup, and lautaro, Di Maria and Messi ranked in the top three.

Lautaro (Inter): 9 goals.

Di Maria (Juventus): 7 goals

Messi (Paris): 6 goals

Dibala (Rome): 5 goals

Mcallister (Brighton): 3 goals.

Julian Alvarez (Manchester City): 3 goals.

Angel Correa, acuna and Tiago almada: 2 goals.

China Supercomputer Force Virus Mutation Prediction, among the top 500 supercomputers in the world, there are 219 in China.

China is making use of the innovation potential of the Internet, big data and artificial intelligence, and will become a superpower in the field of innovation, and has made some outstanding achievements. China hopes to be a world leader in the field of artificial intelligence by 2030, and hopes to become a "major artificial intelligence innovation center in the world".

For example, according to the China Internet Development Report 2019, the independent innovation capability of network information technology in China has been continuously enhanced, and the prototype model of a new generation of billions of supercomputers has been developed. Before that, six supercomputers in China have been combined, and the world’s first national supercomputing Internet was born. In addition, using artificial intelligence, space supercomputers will be built in the future.

For another example, now, China people have mastered the world’s top super technologies, such as high-speed rail, large aircraft, aircraft carrier, exploitation of combustible ice, development technology of rare earth super heavy oil in oil, discovery of dry-hot rock mass, quantum remote communication and space technology.

China’s economy is moving from tradition to emerging.

For example, China’s "New Generation Artificial Intelligence Development Plan" has drawn a grand route to build a $150 billion artificial intelligence industry by 2030. Moreover, some giant companies in China are also guiding billions of dollars to invest in domestic basic scientific research or acquire innovative technologies from abroad, and China’s innovative ability in the field of supercomputing has also been widely concerned and praised by the international industry.

According to the latest issue of the list of the top 500 supercomputers in the world released by the 34th International Supercomputing Conference, as of June 2019, the number of supercomputers in China continued to rank first, reaching 219, while the United States and Japan ranked second (116) and third (29), followed by France (18), Germany (16) and the Netherlands (15). All other countries are expressed in single digits. As the most authoritative supercomputer list in the world, TOP500 is jointly compiled by computer experts from the United States and Germany and published every six months. The number of supercomputers in China has surpassed that in the United States since 2016, and it has been leading ever since.

Up to now, six national supercomputing centers have been built in China, including Shenzhen, Guangzhou, Wuxi, Tianjin, Jinan and Changsha. At present, six supercomputing brains in China have been combined, and the world’s first supercomputing artifact, the supercomputing Internet, has been illuminated to form a "super team" to enhance the core competitiveness of China’s supercomputing field.

Twenty years ago, there was not a single machine in China among the top 500 supercomputers in the world, which also means that supercomputing has become a national heavyweight and the focus of competition among countries, because supercomputing can meet the huge computing needs in scientific research, geological exploration, weather forecast, computational simulation, biopharmaceuticals, gene sequencing, aerospace, image processing and other fields.

For example, at present, the virus is ravaging some cities in China. At this time, scientific research experts need a lot of computing resources to screen drugs against novel coronavirus and predict the mutation of the virus. According to media reports, at present, some supercomputing centers in China are making efforts to help China CDC develop novel coronavirus vaccine, and researchers are using the supercomputing center in China to carry out research work such as target search, new drug screening, lead and test optimization, pharmacology and toxicology.

In fact, in the current global economic environment, manufacturing and mastering the most powerful supercomputers is considered as one of the important standards to measure a country’s scientific and technological strength, because countries and enterprises increasingly need to widely use supercomputers in many fields, such as machinery, new materials, biological environment and energy technology.

According to the analysis of Jeremy Rifkin, the author of the Third Industrial Revolution, the new global economic wave is a new consumption economy driven by new technologies, and it is necessary to achieve a breakthrough in economic growth through economic changes driven by new energy, new technologies, new communications and new transportation and logistics technologies. Rifkin believes that in this wave, China has taken the lead in the field of innovative science and technology economy. (End)

Application of AI Algorithm in Big Data Governance

guide readingThis paper mainly shares the application experience of Datacake and AI algorithm in big data governance. This sharing is divided into five parts. The first part clarifies the relationship between big data and AI. Big data can not only serve AI, but also use AI to optimize its own services. The two are mutually supportive and dependent. The second part introduces the application practice of comprehensive evaluation of big data task health by using AI model, which provides quantitative basis for subsequent data governance; The third part introduces the application practice of using AI model to intelligently recommend the configuration of Spark task operation parameters, and realizes the goal of improving the utilization rate of cloud resources. The fourth part introduces the practice of recommending task execution engine by model intelligence in SQL query scenario; The fifth part looks forward to the application scenarios of AI in the whole life cycle of big data.

Full-text catalog:

1. Big data and AI

2. Health assessment of big data tasks

3. Spark task intelligent parameter adjustment

4. Intelligent selection of SQL task execution engine

5. The application prospect of AI algorithm in big data governance.

Sharing guests | Li Weimin Happy Eggplant algorithm engineer

Edit | |Charles

Production community | |DataFun

01

Big data and AI

It is generally believed that cloud computing collects and stores massive data, thus forming big data; Then, through the mining and learning of big data, the AI model is further formed. This concept acquiesces that big data serves AI, but ignores the fact that AI algorithms can also feed back big data, and there is a two-way, mutual support and dependence relationship between them.

The whole life cycle of big data can be divided into six stages, and each stage faces some problems. Proper use of AI algorithm is helpful to solve these problems.

Data acquisition:This stage will pay more attention to the quality, frequency and security of data collection, such as whether the collected data is complete, whether the speed of data collection is too fast or too slow, whether the collected data has been desensitized or encrypted, etc. At this time, AI can play some roles, such as evaluating the rationality of log collection based on similar applications, and using anomaly detection algorithms to find the sudden increase or decrease of data volume.

Data transmission:This stage pays more attention to the availability, integrity and security of data, and AI algorithm can be used to do some fault diagnosis and intrusion detection.

Data storage:At this stage, we pay more attention to whether the storage structure of data is reasonable, whether the resource occupation is low enough, whether it is safe enough, etc., and we can also use AI algorithm to do some evaluation and optimization.

Data processing:This stage is the most obvious stage that affects and optimizes the benefits. Its core problem is to improve the efficiency of data processing and reduce the consumption of resources. AI can be optimized from multiple starting points.

Data exchange:There is more and more cooperation between enterprises, which will involve the security of data. Algorithms can also be applied in this respect. For example, the popular federated learning can help to share data better and more safely.

Data destruction:Data can’t just be saved and not deleted, so we need to consider when we can delete data and whether it is risky. On the basis of business rules, AI algorithm can assist in judging the timing of deleting data and its associated impact.

Overall, data lifecycle management has three goals:High efficiency and low cost,andsafe. In the past, we relied on experts’ experience to formulate some rules and strategies, which had obvious disadvantages, high cost and low efficiency. Proper use of AI algorithm can avoid these drawbacks and feed back into the construction of big data basic services.

02

Health Assessment of Big Data Tasks

In eggplant technology, several application scenarios that have already landed are first of all the evaluation of the health of big data tasks.

On the big data platform, thousands of tasks are running every day. However, many tasks only stay in the stage of correct output, and no attention is paid to the time-consuming operation and resource consumption of tasks, which leads to inefficiency and waste of resources in many tasks.

Even if some data developers begin to pay attention to task health, it is difficult to accurately evaluate whether the task is healthy or not. Because there are many indicators related to tasks, such as failure rate, time consumption and resource consumption, and there are natural differences in the complexity of different tasks and the volume of data processed, it is obviously unreasonable to simply choose the absolute value of an indicator as the evaluation standard.

Without quantitative task health, it is difficult to determine which tasks are unhealthy and need to be treated, let alone where the problem lies and where to start treatment. Even after treatment, we don’t know how effective it is, and even some indicators improve but others deteriorate.

Demand:Faced with the above problems, we urgently need a quantitative index to accurately reflect the comprehensive health status of the task. The way of making rules manually is inefficient and incomplete, so the power of machine learning model is considered. The goal is that the model can give the quantitative score of the task and its position in the global distribution, and give the main problems and solutions of the task.

To meet this demand, our functional module scheme is to display the key information of all tasks under the owner’s name in the management interface, such as score, task cost, CPU utilization, memory utilization and so on. In this way, the health of the task is clear at a glance, which is convenient for the task owner to do the task management in the future.

Secondly, the model scheme of scoring function is treated as a classification problem. Intuitively, task scoring is obviously a regression problem, and it should be an arbitrary real number between 0 and 100. But in this case, it requires enough samples with scores, and manual labeling is costly and unreliable.

Therefore, we consider transforming the problem into a classification problem, and the classification probability given by the classification model can be further mapped into a real number score. We divide tasks into two categories: good task 1 and bad task 0, which are marked by big data engineers. The so-called good task usually refers to a task that takes short time and consumes less resources under the same task amount and complexity.

The model training process is as follows:

The first is sample preparation. Our samples come from historical task data, and the sample characteristics include running time, resources used, whether execution failed, etc. The sample labels are marked as good and bad by big data engineers according to rules or experience. Then we can train the model. We have tried LR, GBDT, XGboost and other models. Both theory and practice prove that XGboost has better classification effect. The model will eventually output the probability that the task is a "good task". The greater the probability, the higher the final mapped task score will be.

After training, 19 features are selected from the initial nearly 50 original features, which can basically determine whether a task is a good task. For example, for tasks with many failures and tasks with low resource utilization, most of the scores will not be too high, which is basically consistent with the subjective feelings of labor.

After using the model to score tasks, we can see that tasks below 0 to 30 belong to unhealthy tasks that need to be managed urgently; Between 30 and 60 are tasks with acceptable health; Those with a score of 60 or above are tasks with good health and need to maintain the status quo. In this way, with quantitative indicators, the task owner can be guided to actively manage some tasks, thus achieving the goal of reducing costs and increasing efficiency.

After the application of the model, it brought usThe following benefits:

First of all, the task owner can know the health of the tasks under his name, and can know whether the tasks need to be managed through scores and rankings;

(2) Quantitative indicators provide a basis for the follow-up task governance;

(3) How much profit and how much improvement have been achieved after the completion of task governance can also be quantitatively demonstrated through scores.

03

Spark task intelligent parameter adjustment

The second application scenario is the intelligent parameter adjustment of Spark task. A survey by Gartner reveals that 70% of cloud resources consumed by cloud users are unnecessarily wasted. When applying for cloud resources, many people may apply for more resources in order to ensure the successful implementation of the task, which will cause unnecessary waste. There are still many people who use the default configuration when creating tasks, but this is actually not the optimal configuration. If it can be carefully configured, it can achieve very good results, which can not only ensure the operation efficiency, but also ensure the success of the operation, and at the same time save a lot of resources. However, task parameter configuration has high requirements for users. In addition to understanding the meaning of configuration items, it is also necessary to consider the correlation influence between configuration items. Even relying on expert experience, it is difficult to achieve optimization, and the strategy of rule class is difficult to adjust dynamically.

This puts forward a demand, hoping that the model can intelligently recommend the optimal parameter configuration for task operation, so as to improve the utilization rate of task cloud resources while keeping the original running time of the task unchanged.

For the task parameter adjustment function module, our design scheme includes two situations: the first one is that the model should be able to recommend the most suitable configuration parameters according to the historical operation of the task; In the second case, the model should be able to give a reasonable configuration through the analysis of the tasks for which the users are not online.

The next step is to train the model. First, we must determine the output target of the model. There are more than 300 configurable items, and it is impossible to give them all by the model. After testing and investigation, we chose three parameters that have the greatest influence on the task performance, namelyCores core number of executorTotal memoryNumber of instances instances. Each configuration item has its default value and adjustable range. In fact, given a parameter space, the model only needs to find the optimal solution in this space.

In the training stage, there are two schemes to carry out. Option one isLearning experience ruleIn the early stage, the parameters were recommended by rules, and the effect was good after online, so let the model learn this set of rules first, so as to achieve the goal of online quickly. The model training sample is more than 70,000 task configurations previously calculated according to the rules. The sample features the historical operation data of tasks (such as the amount of data processed by tasks, the amount of resources used, the time consumed by tasks, etc.) and some statistical information (such as the average consumption and the maximum consumption in the past seven days, etc.).

We chose the basic model.Multiple regression model with multiple dependent variables. The common regression model is single output, with many independent variables but only one dependent variable. Here we hope to output three parameters, so we adopt a multiple regression model with multiple dependent variables, and its essence is still an LR model.

The above picture shows the theoretical basis of this model. On the left is a multi-label, that is, three configuration items, β is the coefficient of each feature and σ is the error. The training method is the same as unitary regression, and the least square method is used to estimate the sum of squares of all elements in σ.

The advantage of option one is thatYou can learn the rules quickly, and the cost is relatively small.. The drawback is thatIts optimization upper limit can achieve the same good effect as the rule at most, but it will be more difficult to exceed it.

The second scheme is Bayesian optimization, which is similar to reinforcement learning, and tries to find the optimal configuration in parameter space. Bayesian framework is adopted here, because it can make use of the basis of the last attempt, and it will have some transcendental experience in the next attempt, so that it can quickly find a better position. The whole training process will be carried out in a parameter space, and a configuration will be randomly sampled for verification and then run; After the operation, we will pay attention to some indicators, such as utilization rate and cost, to judge whether it is optimal; Then repeat the above steps until the tuning is completed. After the model is trained, there is also a tricky process in the use process. If there is a certain similarity between the new task and the historical task, there is no need to calculate the configuration again, and the previous optimal configuration can be adopted directly.

After the trial and practice of these two schemes, we can see that certain effects have been achieved. For the existing tasks, after modification according to the configuration parameters recommended by the model, more than 80% of the tasks can improve the resource utilization rate by about 15%, and the resource utilization rate of some tasks is even doubled. But both schemes actually exist.defectThe regression model of learning rules has a lower upper limit of optimization; The disadvantage of Bayesian optimization model for global optimization is that it is too expensive to make various attempts.

The future exploration directions are as follows:

Semantic analysis:Spark semantics is rich, including different code structures and operator functions, which are closely related to task parameter configuration and resource consumption. But at present, we only use the historical operation of the task, ignoring the Spark semantics itself, which is a waste of information. The next thing to do is to penetrate into the code level, analyze the operator functions contained in the Spark task, and make more fine-grained tuning accordingly.

Classification tuning:Spark has many application scenarios, such as pure analysis, development, processing, etc. The tuning space and objectives of different scenarios are also different, so it is necessary to do classification tuning.

Engineering optimization:One of the difficulties encountered in practice is that there are few samples and the test cost is high, which requires the cooperation of relevant parties to optimize the project or process.

04

Intelligent selection of SQL task execution engine

The third application scenario is the intelligent choice of SQL query task execution engine.

Background:

(1)SQL query platform is a big data product that most users have the most contact with and the most obvious experience. No matter data analysts, R&D or product managers, they write a lot of SQL every day to get the data they want;

(2) When many people run SQL tasks, they don’t pay attention to the underlying execution engine. For example, Presto is based on pure memory calculation. In some simple query scenarios, its advantage is that the execution speed will be faster, but its disadvantage is that if the storage capacity is not enough, it will be directly hung up; In contrast, Spark is more suitable for executing complex scenes with a large amount of data. Even if oom appears, it will use disk storage, thus avoiding the failure of the task. Therefore, different engines are suitable for different task scenarios.

(3) The effect of 3)SQL query should comprehensively consider the execution time of the task and the consumption of resources, neither can it excessively pursue the query speed without considering the consumption of resources, nor can it affect the query efficiency in order to save resources.

(4) There are three traditional engine selection methods in the industry, namely RBO, CBO and HBO.RBO It is a rule-based optimizer, which is difficult to make rules and has low update frequency.CBO Is based on cost optimization, too much pursuit of cost optimization may lead to the failure of task execution;HBO It is an optimizer based on historical task operation, which is limited to historical data.

In the design of the function module, after the user writes the SQL statement and submits it for execution, the model will automatically judge which engine to use and pop up a window to prompt, and the user will finally decide whether to use the recommended engine for execution.

The overall scheme of the model is to recommend the execution engine based on the SQL statement itself. Because you can see what tables and functions are used from SQL itself, this information directly determines the complexity of SQL, thus affecting the choice of execution engine. Model training samples come from SQL statements run in history, and model labels are marked according to historical execution. For example, tasks with long task execution and huge data volume will be marked as suitable for running on Spark, and the rest are SQL suitable for running on Presto. NLP technology and N-gram plus TF-IDF method are used to extract sample features. The general principle is to extract phrases to see their frequency in sentences, so that keyword groups can be extracted. The vector features generated after this operation are very large. We first select 3000 features by linear model, and then train XGBoost model as the final prediction model.

After training, we can see that the accuracy of the model prediction is still relatively high, about 90% or more.

The online application process of the final model is: after the user submits SQL, the model recommends the execution engine. If it is different from the engine originally selected by the user, the language conversion module will be called to complete the conversion of SQL statements. If the execution fails after switching engines, we will have a failover mechanism to switch back to the user’s original engine to ensure the success of task execution.

The benefit of this practice is that the model can automatically select the most suitable execution engine, and complete the subsequent sentence transformation, without the need for users to do additional learning.

In addition, the engine recommended by the model can basically keep the original execution efficiency unchanged, while reducing the failure rate, so the overall user experience will increase.

Finally, due to the reduction of the unnecessary use of high-cost engines and the decline in the failure rate of task execution, the overall resource cost consumption has decreased.

From the second part to the fourth part, we shared three applications of AI algorithm on big data platform. One of the characteristics that can be seen is thatThe algorithm used is not particularly complicated, but the effect will be very obvious.This inspires us to take the initiative to understand the pain points or optimization space of the big data platform during its operation. After determining the application scenario, we can try to use different machine learning methods to solve these problems, so as to realize the feedback of AI algorithm to big data.

05

Application Prospect of AI Algorithm in Big Data Governance

Finally, we look forward to the application scenario of AI algorithm in big data governance.

The three application scenarios described above focus on the data processing stage. In fact, echoing the relationship between AI and big data in the first chapter, AI can play a better role in the whole data life cycle.

For example, in the data acquisition stage, whether the log is reasonable can be judged; Can do intrusion detection when transmitting; When processing, it can further reduce costs and increase efficiency; Do some work to ensure data security when exchanging; When destroying, we can judge the timing and related influence of destruction. There are many application scenarios of AI in the big data platform, and here it is just a brick to attract jade. It is believed that the mutual support relationship between AI and big data will be more prominent in the future. AI assists big data platforms to collect and process data better, and better data quality can help train better AI models, thus achieving a virtuous circle.

06

Question and answer session

Q1: What kind of rule engine is used? Is it open source?

A1: The so-called parameter tuning rules here are formulated by our big data colleagues based on the experience of manual tuning in the early stage, such as how many minutes the execution time of the task exceeds, or how much data is processed, and how many cores or memory are recommended for the task. This is a set of rules that have been accumulated for a long time, and the effect is better after going online, so we use this set of rules to train our parameter recommendation model.

Q2: Is the dependent variable only the adjustment of parameters? Have you considered the influence of the performance instability of the big data platform on the calculation results?

A2: When making parameter recommendation, we don’t just pursue low cost, otherwise the recommended resources will be low and the task will fail. It is true that the dependent variable only has parameter adjustment, but in order to prevent instability, we have added additional restrictions. First of all, the model features, we choose the average value of a certain period of time rather than the value of an isolated day; Secondly, for the parameters recommended by the model, we will compare the differences between them and the actual configuration values. If the differences are too large, we will adopt the strategy of slow rise and slow down to avoid the failure of the task caused by excessive one-time adjustment.

Q3: Are regression model and Bayesian model used at the same time?

A3: No. Just now, we talked about doing parameter recommendation, and we have used two schemes: learning rules uses regression model; Then the Bayesian optimization framework is used. They are not used at the same time. We have made two attempts. The advantage of the former learning rule is that it can quickly use historical past experience; The second model can find a better or even optimal configuration on the basis of the previous one. The two of them belong to a sequential or progressive relationship, rather than being used at the same time.

Q4: Is the introduction of semantic analysis considered from expanding more features?

A4: Yes. As mentioned just now, the information we use when doing Spark tuning is only its historical implementation, but we haven’t paid attention to the Spark task itself yet. Spark itself actually contains a lot of information, including various operators and stages. If we don’t analyze its semantics, we will lose a lot of information. So our next plan is to analyze the semantics of Spark task and expand more features to assist parameter calculation.

Q5: Will parameter recommendation be unreasonable, which will lead to abnormal or even failed tasks? Then how to reduce abnormal task error and task fluctuation in such a scenario?

A5: If we completely rely on the model, it is possible that it pursues to improve the utilization rate of resources as high as possible. At this time, the recommended parameters may be more radical, such as the memory shrinking from 30g to 5g at once. Therefore, in addition to the model recommendation, we will add additional restrictions, such as how many g the parameter adjustment span can’t exceed, that is, the slow-rising and slow-falling strategy.

Q6: Sigmoid 2022 has some articles related to parameter tuning. Are there any references?

A6: Task intelligent parameter tuning is still a hot research direction, and teams in different fields have adopted different methods and models. Before we started, we investigated many industry methods, including the sigmoid 2022 paper you mentioned. After comparison and practice, we finally tried the two schemes we shared. We will continue to pay attention to the latest progress in this direction and try more methods to improve the recommendation effect.

That’s all for today’s sharing. Thank you.

| Share guests |

| |DataFun New Media Matrix |

| About DataFun| |

Focus on the sharing and communication of big data and artificial intelligence technology applications. Founded in 2017, more than 100+ offline and 100+ online salons, forums and summits have been held in Beijing, Shanghai, Shenzhen, Hangzhou and other cities, and more than 2,000 experts and scholars have been invited to participate in the sharing. Its WeChat official account DataFunTalk has accumulated 900+ original articles, one million+readings and 160,000+accurate fans.