The most important thing is the Champions League qualification Tenghage: Sniper Manchester City? Give the answer next week.

On May 20th, Beijing time, Manchester United coach Tenghage was asked many questions about Manchester City at the pre-match press conference.

When asked if Manchester United can reach the same level as Manchester City, Tenghage said: "First of all, the most important thing for us is to qualify for the Champions League. At present, I don’t think we can reach the level of Manchester City, but my focus is only on the next game, Manchester United’s game against Bournemouth. "

"We have to win this game to qualify for the Champions League next season. Therefore, for us, focusing on the game, focusing on the neighboring games, the nearest one, is the most important. "

Manchester City has the opportunity to recreate Manchester United’s triple crown last season, and Manchester United also has the opportunity to complete the attack in the FA Cup, preventing this city’s sworn enemy from achieving such achievements. When asked about this matter, Tenghage said: "At present, we still have three Premier League games, and these three games have not been played yet. You asked me if I would? Yes, I will enjoy doing that, but before the end of the Premier League, this matter is not important at all. "

"I think we must do what we have been doing all season, make progress every day, make progress in every game, don’t look too far ahead, and focus on the next game is the most important thing."

Manchester City has just won the Champions League overlord Real Madrid 4-0 at Etihad. When asked if Manchester City is invincible and can it be stopped, Tenghage said easily: "I will give you that answer after next week."

Morris is worried that Moudrek will become a "big head", and Mount’s attribute is close to Gotze.

Chelsea’s state has picked up. They beat leicester city in the away game, Porter lifted the coach crisis, and Havertz became the MVP of this game. Jody morris, a former English star, believes that if Kai doesn’t play as a center, his form will naturally pick up, and Moudrek will become the ninth player in this game. Morris is worried that Moudrek will become a "big head". Although the Ukrainian star’s performance is very good, the center position of this club feels like a "sinkhole", and most excellent strikers are hard to be qualified as Chelsea’s big center. Havertz has become a "sucker" in previous games, and Morris is worried that this role will be transferred to Moudrek.

Chilwell and kovacic also scored goals in this game. Many people think that Mount’s prospects at Stamford Bridge are fading, and Burleigh has a high probability of selling the star. According to the Premier League media, Liverpool are ready to introduce Mount, and the management of Chelsea does not want to sell the "Prince" to the competitors in the league. winstanley, the sports director of the Blue Army, wants Mount to join a foreign club. Morris talked about this topic-Mount’s attribute is close to Gotze. If Mount wants to seek better development, he should go to klopp. Uncle Zha promoted Geze to the level of golden boy, and Mount also needed the inspiration of this handsome man.

In Morris’s view, Mount’s ability is comprehensive, but he has no outstanding projects, which is close to Geze when he was young. Why did Gotze become a golden boy in those days? Morris said that Gotze’s all-around attribute in his twenties is better than that of his peers, because at this age, most young players have shortcomings. Gotze has no weaknesses. He naturally leads others in his youth. Whether such players can become great depends on how the coach digs and trains him. Morris said that if Gotze followed klopp all the time, the German star’s career would not only stop at the winning goal of the World Cup in Brazil, but also for Mount.

There may be some feng shui problems in the center position at Stamford Bridge, where "whoever is competent is unlucky". Havertz was in a bad state in the first two months. It is said that he has entered the team’s sale list. The fundamental reason is that he forced himself to be the ninth player at that time. Porter moved Kai to the position of winger in this game. His main activity area was in the midfield, and his performance immediately came to life. The center position still needs people to support it, so Moudrek is qualified for the position of "the big head", which is what Morris is worried about. Last season, Lu Kaku was in a bad mood at Stamford Bridge, which was also related to his position as a "sucker".

Chelsea have experienced many excellent strikers in the new century, and Shevchenko is also a representative striker of an era. Unfortunately, he can’t find a feeling in the Blue Army anyway. Torres had a great time in Liverpool and Atletico Madrid, but his career in the car did not meet expectations. These two excellent strikers were injured when they came to Stamford Bridge, which is related to the club’s "big head" attribute in the center position. However, the ninth position must be supported by someone. Whether Burleigh will replace Porter or not, Chelsea can’t lack the fulcrum in the restricted area. It can be seen that Morris is worried that Moudrek will become a "sucker" because the Ukrainian star has no fulcrum.

Morris pointed out that Mount had no obvious shortcomings at the beginning of his career, and he had accumulated rich experience in the youth team, and he was ahead of other young people in previous years. With the growth of age, we will find that Mount’s uncharacteristic problems are revealed, and his development direction is somewhat confused. Although he is the "prince" of Chelsea, everyone who knows the ball knows that if Mount was not a young English player and the club’s orthodox youth training background, the management of Abu era would not hand him over. If Mount joins Liverpool from the car, it will hurt his old club, but it will improve him.

Despite the ups and downs of Anfield’s record this season, the author thinks that klopp will not leave school for the time being, and Liverpool also need to add midfielders. Mount has some organizational skills, and more importantly, Mount needs good development. Since Mount’s attribute is close to Gotze, players like "Mumbao" are in great need of excellent Bole, and klopp can play this role. Morris also hinted that he will continue to coach Chelsea next season as long as the Baud team’s record improves. Not only Mount will be delayed, but Havertz will also be delayed.

Inspur information fell by nearly 4%, and the turnover of artificial intelligence ETF(159819) continued to increase.

On March 10, 2023, as of 10: 12, the intraday turnover of artificial intelligence ETF dropped slightly by 0.86%, reaching 26.82 million yuan, and the turnover continued to increase. The ETF closely tracks the CSI artificial intelligence theme index, which fell by 0.62%. In terms of related constituent stocks, Inspur Information fell by 3.98%, Desai Siwei fell by 3.31%, and Zhongke Chuangda fell by 2.89%; Yihualu rose by 19.75%, Mayer PaCO rose by 2.96%, and Dahua shares rose by 2.8%.

In the news, a few days ago, relevant departments said that emerging industries are the new pillars and new tracks that will lead the future development. It will focus on key areas such as 5G, artificial intelligence, bio-manufacturing, industrial Internet, intelligent networked vehicles, green and low carbon, and continuously enrich and expand new application scenarios. Expand the layout of the national manufacturing innovation center in emerging industries. Implement the "Robot+"application action to promote the large-scale and intensive development of the Internet of Things industry.

According to TF Securities’s analysis, the application of artificial intelligence covers computer vision, speech recognition, natural speech processing and data science. After deep learning was put forward in 2006, the AI industry entered a stage of rapid development. At present, it has been widely used in security, finance, medical care, smart cars and other fields to promote productivity progress. Artificial intelligence expenditure has become one of the main forces supporting the digital transformation expenditure of enterprises, and the market scale has grown rapidly. According to Sullivan’s report, the global expenditure on artificial intelligence technology will be 68.7 billion US dollars in 2020, and it is expected to reach 221.2 billion US dollars in 2025, with a compound annual growth rate of 26.3%, while China’s artificial intelligence market will reach 41.5%, which is expected to be the first in the world.

IDC predicts that the global AI system expenditure will reach $154 billion in 2023.

IDC’s "Global Artificial Intelligence Expenditure Guide" makes the latest prediction that the global AI expenditure, including the expenditure on software, hardware and services of various AI-centered systems, will reach US$ 154 billion in 2023, an increase of 26.9% compared with 2022, and it is estimated that the compound annual growth rate (CAGR) will be 27.0% from 2022 to 2026. By 2026, the total expenditure of AI-centric systems will exceed $300 billion.

Mike Glenn, senior market research analyst of IDC customer insight and analysis team, said, "No matter what the scale, as long as AI technology cannot be adopted quickly, enterprises will gradually fall behind in the market. AI can make great contributions to enhancing human capabilities, automatically performing repetitive tasks, providing personalized advice, and making data-driven decisions quickly and accurately. AI technology suppliers need to accurately predict and grasp business opportunities, which requires data support. IDC’s AI spending guide comprehensively covers the marketing strategies of various AI opportunities, which can provide a basis for publicity work and point out the market focus that is suitable for the company’s capabilities. "

In the next five years, only one of the 36 AI use cases identified by IDC will have a compound annual growth rate of less than 24% during the forecast period.

In terms of expenditure, the three major AI use cases all focus on sales and customer service functions: enhanced customer service agent, sales process recommendation and enhancement, and project consultant and recommendation system. These three use cases have strong investment attraction for all industries, which will account for more than a quarter of all AI expenditures in 2023. Within this year, other high-spending use cases will support more diverse operational tasks, including IT optimization, threat intelligence and prevention system enhancement, and fraud analysis and investigation.

During the forecast period, the two industries that invest the most in the AI field are banking and retail. The next hot direction of AI expenditure is professional services, followed by discrete manufacturing and process manufacturing. This year, these five directions will account for more than half of the system expenditure centered on AI. The biggest increase in AI expenditure comes from the media industry, with a five-year compound growth rate of 30.2%. And similar to the system use case expenditure, it is predicted that only two industries have a compound annual growth rate of AI expenditure below 25%.

Xueqing Zhang, senior market analyst of IDC China Enterprise Research Department, believes that "AI technology will continue to bring empowerment effects to users and industries. With the support of pre-training large model and multi-modal technology, AI capability will be applied to the whole production process on a large scale. In the future, whether it is a government-level urban problem or a life problem closely related to everyone, it will experience the process of AI technology from concept to application, and enjoy a wave of dividends brought by AI. "

From a regional perspective, the United States will become the largest market for AI-centric systems, accounting for more than 50% of the global total AI expenditure during the whole forecast period. Western Europe will account for more than 20% of global IT expenditure, with a five-year compound growth rate of 30.0%, with the largest growth rate in the forecast period. China is the third largest AI market, with a compound annual growth rate of 20.6%.

Feifan’s experience is incredible, a new luxury wisdom seat based on artificial intelligence technology.

When I first learned about Feifan automobile, I was attracted by it. I must say, this car is really unbelievable.# New Deluxe Wisdom Block #

Let’s start with the design of the new luxurious wisdom seat. The exterior of this car looks very fashionable and high-end, equipped with many high-tech elements. What surprised me most was its interior design. The new luxury wisdom seat adopts artificial intelligence technology, which makes all kinds of operations in the car more intelligent. For example, the entertainment system in the car can not only play music and videos, but also connect with the mobile phone to track your health and schedule. In this way, you can handle all kinds of affairs while driving. In an increasingly fast-paced society, such a function will certainly attract many people’s attention.

In addition, the new luxury seat has a very cool feature, that is, the seat in the car can rotate freely. You can choose to sit facing forward or backward, or face left or right. This design is very suitable for social occasions or when you want to communicate with people in the car. This kind of design is something I have never seen before. It’s simply amazing.

Of course, as a high-end luxury car, the handling performance of the new luxury seat is also excellent. Its four-wheel drive system can help you drive safely in complex road conditions. In addition, the new luxury wisdom seat is also equipped with an advanced electric engine, which makes it equally outstanding in performance.

Generally speaking, Feifan’s new luxury seat is an excellent and forward-looking car for me. Its design embodies the application of artificial intelligence technology in the automobile industry, and the intelligent elements in the car are eye-catching. In addition, the new luxury wisdom seat is also excellent in performance. As Feifan Automobile advocates, "Let intelligence connect life", and the appearance of the new luxury wisdom seat may mark the beginning of an intelligent life based on artificial intelligence technology.

Metauniverse: Quantum Entanglement between Virtual World and Real World

What is the metauniverse? It’s a bit difficult to answer this question. Some say that it is the third generation Internet, namely web3.0, a decentralized Internet running on blockchain technology; Some say it is a virtual world constructed by VR/AR and other technologies; The concept I once gave was the mirror image of the real world, that is, the digital real world built by artificial intelligence.

There are also three different attitudes towards the meta-universe.

The first is a positive attitude. Zuckerberg, including Facebook, directly renamed Facebook as Meta, the meta-universe; China’s attitude is positive from top to bottom. Shijingshan District of Beijing has made a specific plan to build a "meta-cosmic city". The construction of "twin cities" is actually a concrete application of metacosmic technology.

The second is a negative attitude. Including Microsoft President Bill Gates. He thinks the metauniverse is useless. Compared with the meta-universe, he values artificial intelligence more.

The third is a wait-and-see attitude. Mainly in countries or regions with insufficient scientific and technological innovation ability, look at the clouds and clouds.

In this regard, I am relatively positive. Because the emergence of any new technology is objective and subjective; The result of comprehensive driving of technical and natural factors. The emergence of the "Meta-Universe" is not groundless, but the result of the wide application and full accumulation of new technologies such as artificial intelligence, blockchain, AR/VR, ultra-large-scale computing, 5G/6G communication, satellite precise navigation and positioning, and digital maps, which can be integrated and innovated. Since it has appeared, we should give it a place, provide it with enough application scenarios and play a positive role in promoting the development of social productive forces.

The metauniverse is available not because it exists in isolation, just like a building block built in a glass box. But because of the internal connection between the metauniverse and the real world. Metauniverse cannot exist in isolation from the real world, but the unity of opposites between the virtual world and the real world. In this sense, it is more like quantum entanglement, that is, quantum entanglement between the virtual world and the real world. The real world moves, and the virtual world must move with it; On the contrary, the virtual world moves, and the real world must also act accordingly.

For human beings, the "metauniverse" has at least three functions:The first is the recording function.It is more like the archives of the real world. Everything in the real world can be completely and dynamically recorded in the form of simulation through the meta-universe, and it will last forever in the digital world.The second is the communication function.As a new generation of Internet, Metauniverse can establish a cross-time communication network between people through virtual human and VR technology, so that real people can communicate face to face through digital people in the virtual world.The third is the innovative function.People’s ideas can be sketched, designed and virtually constructed through the "meta-universe", and then return to the real world after reaching satisfactory results, and the real world will complete the construction process, thus forming a transformation and upgrading of the real world.

Therefore, I don’t agree with Bill Gates. Artificial intelligence technology is very good, but it also needs a good application scenario to build a platform for it to play its role. Metauniverse is actually an important platform for artificial intelligence to play its role.

It should be pointed out that the application of metauniverse requires sufficient conditions, especially 5G/6G communication. Without the foundation of a new generation of intelligent communication network with large throughput and low delay, the meta-universe simply cannot operate. Zuckerberg’s failure to engage in the "metauniverse" is something I have long concluded, because both the United States and Europe lack such a foundation. It can be said that today, only China has the sufficient conditions to engage in the meta-universe. Mentougou, Beijing, is bound to succeed in building a "city of meta-universe".

It can be concluded that in the near future, China will become the "top of the meta-universe".

5: 3! Bayern retaliated, and two of the three full-backs scored twice, and Paris really lost.

On the evening of March 11th, the 24th round of Bundesliga was in full swing, and Bayern continued to play against augsburg at Allianz Stadium. In the first half, Berisha broke the ice with pawar’s mistake, and Cancelo and pawar scored two goals in less than four minutes, followed by pawar’s twice and Sane’s header to make up for the score. Berisha’s shot in the second half also scored twice, Alfonso Davies’ stab sealed the victory, Vargas pulled back another goal in injury time, and Bayern beat augsburg 5-3.

Two days ago, in the second round of the Champions League knockout, Bayern easily won Paris 2-0, without giving Mbappé or Messi too many chances. However, Bayern, who returns to the league, will still be threatened, with teams such as Dortmund and Berlin United chasing after him. In addition, Bayern lost to augsburg in the first round.

In this game, Shu Bo-Mo Ting was absent because of back injury. Mane returned to the starting position as a center. Muxiala, Sane, Gnabry and Alfonso-Davies ambushed him, and kimmich had a single midfielder. Cancelo, Yupamelano, Delicht and pawar formed the defence, and the goalkeeper was still Sommer.

Only 3 minutes into the opening, Bayern’s defensive error, pawar’s unexpected header and an own goal assisted Berisha to score 0-1 by pushing the ball over Cancelo with his right foot in the penalty area! Augsburg takes the lead with the attitude of anti-customer.

As one of the teams with the fiercest firepower this season, daring to score Bayern’s goal may mean a fiasco. Sure enough, Bayern patiently organized in the frontcourt in 15 minutes. Alfonso Davies divided the ball on the left, and Sane passed it to the right. Cancelo got the ball in the restricted area, and his left foot shook the defense, and his right foot broke the far corner, scoring 1:1! It was Cancelo’s Bayern’s first goal, and he also got three assists in the past eight games.

In the 19th minute, kimmich’s right set-piece was sent into the penalty area, Driget’s header was ferried to clear, Mane’s barbed pass in the penalty area and pawar’s point-grab scored 2-1! Pawar redeem oneself by good service! In just four minutes, Bayern completed the score reversal.

In the 35th minute, kimmich hit a corner kick, and Driget’s header was blocked. In the 2.0 version of the Peach Blossom Shadow, pawar directly volleyed to make up the score 3:1! So far this season, pawar has scored five goals on behalf of Bayern in all competitions, setting a record for the French defender in a single season.

In the 39th minute, Mane turned and volleyed in the restricted area, and Jikaiweiqi flew to block the ball with one hand. In the 44th minute, after Bayern broke the ball in the frontcourt, Sane went straight to the left, Mane swung open the angle and saved it. Sane followed up with a header in front of the door to make up the net 4-1! Sane scored his seventh goal in the league, which was Sane’s first goal after the World Cup.

In the 50 th minute after returning in the second half, Bayern overtook in the middle circle, Grabri pushed for a direct shot, and Sane scored the goal directly, followed by Mane, but the offside goal was not effective. In the 53rd minute, Mane tried to shoot a long shot from the top of the arc, and the ball was slightly higher. It can be seen that Mane really wanted to resume scoring after coming back. In the 58th minute, Arne Mayer’s long-range shot was saved by Sommer, Driget was fouled in the restricted area and was sentenced to a yellow spot package, and Berisha took a penalty to trick Sommer into making the score 4-2!

In the 74th minute, Bayern came back. After Cancelo advanced with the ball on the right, his outer instep came to the back point. Alfonso Davies stabbed behind him and went down another city, 5:2! Bayern extended the score to three goals, which is why Alfonso Davis also scored his first goal this season. Looking at the victory or defeat has been determined, nagel Mann also asked the substitutes to play to find the feel of the game, and the main players of Gnabry, Mane, Muxiala and Driget all went off to rest. On the stage of injury time, Vargas fell to the ground and shoveled again to narrow the score.

In the end, Bayern beat augsburg 5-3 at home, avenged its defeat in the first round and continued to lead the Bundesliga standings. Counting this game, Bayern won eight wins and one loss in the last nine games. Seeing the hot situation of Bayern, we can only say that Paris really can’t be wronged for losing.

Wanda sells underwear for a limited time, and her ex-husband icardi likes it. The netizen left a message: Get remarried soon.

On March 12th, Beijing time, Argentine beauty Wanda Nora publicly sold her underwear series on her social platform again, and the activity was limited. Just a few seconds after posting, her ex-husband icardi immediately praised her. The two people have been on and off for more than half a year.Most of netizens’ messages have nothing to do with buying underwear, but let them remarry as soon as possible.

Last November, Argentine social media guru Wanda divorced former international icardi. As it was during the World Cup, no one paid attention to their news. However, their divorce story is just like a TV series, episode after episode, season after season, and it never stops. A week ago, some media broke the news and the two lived together again. Icardi publicly stated that the two had made up, but Wanda did not give a clear statement.

Today, Wanda suddenly launched a message, saying that in recent days, fans can buy Wanda intimate series underwear. The year before last, she published the content of selling swimwear twice, but she was banned by social media. But now, a social media has competitors, and they don’t want to lose Wanda. Wanda’s number of fans is 16 million, ranking high in the WAG of all European players, only less than Georgina and others.

Marca said Wanda has a group of loyal fans, including icardi.After learning the news of its underwear products, Argentine striker Guangsu praised and forwarded it.This shows the unusual relationship between the two. And a few months ago, they just divorced. It is reported that Wanda has returned to Argentina at this time, and icardi has returned to Turkey to wait for the opening of the league. However, the two have a feeling that they are divorced and do not share a room.

It is reported that icardi’s income is very high, but most of it needs to be delivered to Wanda. Actually, Wanda doesn’t need the money. She has her own company. This underwear is the product of its company. And these companies, 100% belong to Wanda personally, and have nothing to do with icardi. The swimsuit series launched before is expensive. However, due to the huge number of fans, Wanda Underwear Company has numerous orders.

Wanda’s fans have expressed that they should get back together with icardi. The two separated for a while, but they soon made up. Turkish media have reported that icardi spent a lot of money to coax Wanda, including buying a limited amount of bags. And this brand is Wanda’s favorite series.

At the same time, many media predicted that icardi, who rented Galatasaray this season, is likely to go to Serie A again. His next home may be Inter Milan, or the team in Rome or other teams. One of the reasons is that Wanda’s future focus will shift to Italy. Wanda has her own company in Italy and often appears on TV programs. At the same time, icardi had two properties and a farm in Milan before.

PS: It is not easy for old fans in Hubei to write. Please pay attention to old fans in Hubei and praise their works if it is convenient.

91 minutes to kill, 1-0! The national football team worked hard to create a miracle, advanced to the World Youth Championship, and celebrated like winning the championship.

Uzbekistan’s U20 Asian Cup has entered the quarter-finals, and the first match was between the Iranian men’s soccer team and the Iraqi men’s soccer team. In the end, relying on the goal in the 91st minute of the second half, Iraq eliminated Iran 1-0, becoming the first team to advance to the semi-finals, and at the same time locked in the qualification for the World Youth Championship. China’s men’s soccer team will play in the quarter-finals on the next match day, and the winner can also advance to the World Youth Championship against the Korean men’s soccer team.

In the history of the Asian Youth Championship, Iraq participated in a total of 17 competitions and won the championship five times in total. The last time was in 2000, and Iran also won the cup four times, but the winning experience was in the last century.

In the group stage, Iran scored 6 points with 2 wins and 1 loss. Finally, compared with Australia and Vietnam, the goal difference was superior to each other, and it won the first place in the group. Iraq scored 4 points with 1 win, 1 draw and 1 loss, which forced Indonesia to lock in the second place because of the winning or losing relationship. According to the regulations of AFC, the top four teams in this tournament have advanced to the 2023 World Youth Championship, so whoever wins this game will be locked in the ticket for the World Youth Championship.

This is a close confrontation. The national team and the youth team of Iran and Iraq are in fierce confrontation. Now, the Asian Youth Championship is contested. The two teams did not rewrite the score in the first 90 minutes, and the scene is almost 50-50.

In the 91st minute, Ali Jassim broke into the restricted area with the ball and scored a goal from the far corner with a low shot, helping the Iraqi team to complete the lore 1-0. After the goal was scored, the players thumped and celebrated wildly, and then they roared to the sky and were very excited. After the game, the whole team celebrated like winning the championship.

The contest between China and South Korea will start at 18 o’clock tomorrow night, and the winner can also advance to the finals of the 2023 World Youth Championship.

According to statistics, the two teams have played against each other 18 times in history. The men’s soccer team in China has scored 3 wins, 2 draws and 13 losses, which is obviously at a disadvantage. At the same time, in the last 8 games, it has not won, and it has not won the Korean men’s soccer team for 18 consecutive years. In addition, compared with China’s men’s soccer team, the Korean men’s soccer team won the cup 12 times, which is the most successful team in the history of Asian Youth Championship, so it is very difficult to win by surprise.

Nevertheless, China men’s soccer players are not afraid of this opponent. It is reported that after the successful team of China Men’s Football Team qualified, they celebrated with the fans at the scene. The fans shouted loudly in the stands: "The next game is going to play South Korea", and some players responded: "Kill South Korea! We are going to the World Youth Championship! "

It is certainly a good thing to have confidence, which gives fans more expectation. The last time China men’s soccer team participated in the World Youth Championship was in 2005, and it has been 18 years since now.

Application of AI Algorithm in Big Data Governance

guide readingThis paper mainly shares the application experience of Datacake and AI algorithm in big data governance. This sharing is divided into five parts. The first part clarifies the relationship between big data and AI. Big data can not only serve AI, but also use AI to optimize its own services. The two are mutually supportive and dependent. The second part introduces the application practice of comprehensive evaluation of big data task health by using AI model, which provides quantitative basis for subsequent data governance; The third part introduces the application practice of using AI model to intelligently recommend the configuration of Spark task operation parameters, and realizes the goal of improving the utilization rate of cloud resources. The fourth part introduces the practice of recommending task execution engine by model intelligence in SQL query scenario; The fifth part looks forward to the application scenarios of AI in the whole life cycle of big data.

Full-text catalog:

1. Big data and AI

2. Health assessment of big data tasks

3. Spark task intelligent parameter adjustment

4. Intelligent selection of SQL task execution engine

5. The application prospect of AI algorithm in big data governance.

Sharing guests | Li Weimin Happy Eggplant algorithm engineer

Edit | |Charles

Production community | |DataFun

01

Big data and AI

It is generally believed that cloud computing collects and stores massive data, thus forming big data; Then, through the mining and learning of big data, the AI model is further formed. This concept acquiesces that big data serves AI, but ignores the fact that AI algorithms can also feed back big data, and there is a two-way, mutual support and dependence relationship between them.

The whole life cycle of big data can be divided into six stages, and each stage faces some problems. Proper use of AI algorithm is helpful to solve these problems.

Data acquisition:This stage will pay more attention to the quality, frequency and security of data collection, such as whether the collected data is complete, whether the speed of data collection is too fast or too slow, whether the collected data has been desensitized or encrypted, etc. At this time, AI can play some roles, such as evaluating the rationality of log collection based on similar applications, and using anomaly detection algorithms to find the sudden increase or decrease of data volume.

Data transmission:This stage pays more attention to the availability, integrity and security of data, and AI algorithm can be used to do some fault diagnosis and intrusion detection.

Data storage:At this stage, we pay more attention to whether the storage structure of data is reasonable, whether the resource occupation is low enough, whether it is safe enough, etc., and we can also use AI algorithm to do some evaluation and optimization.

Data processing:This stage is the most obvious stage that affects and optimizes the benefits. Its core problem is to improve the efficiency of data processing and reduce the consumption of resources. AI can be optimized from multiple starting points.

Data exchange:There is more and more cooperation between enterprises, which will involve the security of data. Algorithms can also be applied in this respect. For example, the popular federated learning can help to share data better and more safely.

Data destruction:Data can’t just be saved and not deleted, so we need to consider when we can delete data and whether it is risky. On the basis of business rules, AI algorithm can assist in judging the timing of deleting data and its associated impact.

Overall, data lifecycle management has three goals:High efficiency and low cost,andsafe. In the past, we relied on experts’ experience to formulate some rules and strategies, which had obvious disadvantages, high cost and low efficiency. Proper use of AI algorithm can avoid these drawbacks and feed back into the construction of big data basic services.

02

Health Assessment of Big Data Tasks

In eggplant technology, several application scenarios that have already landed are first of all the evaluation of the health of big data tasks.

On the big data platform, thousands of tasks are running every day. However, many tasks only stay in the stage of correct output, and no attention is paid to the time-consuming operation and resource consumption of tasks, which leads to inefficiency and waste of resources in many tasks.

Even if some data developers begin to pay attention to task health, it is difficult to accurately evaluate whether the task is healthy or not. Because there are many indicators related to tasks, such as failure rate, time consumption and resource consumption, and there are natural differences in the complexity of different tasks and the volume of data processed, it is obviously unreasonable to simply choose the absolute value of an indicator as the evaluation standard.

Without quantitative task health, it is difficult to determine which tasks are unhealthy and need to be treated, let alone where the problem lies and where to start treatment. Even after treatment, we don’t know how effective it is, and even some indicators improve but others deteriorate.

Demand:Faced with the above problems, we urgently need a quantitative index to accurately reflect the comprehensive health status of the task. The way of making rules manually is inefficient and incomplete, so the power of machine learning model is considered. The goal is that the model can give the quantitative score of the task and its position in the global distribution, and give the main problems and solutions of the task.

To meet this demand, our functional module scheme is to display the key information of all tasks under the owner’s name in the management interface, such as score, task cost, CPU utilization, memory utilization and so on. In this way, the health of the task is clear at a glance, which is convenient for the task owner to do the task management in the future.

Secondly, the model scheme of scoring function is treated as a classification problem. Intuitively, task scoring is obviously a regression problem, and it should be an arbitrary real number between 0 and 100. But in this case, it requires enough samples with scores, and manual labeling is costly and unreliable.

Therefore, we consider transforming the problem into a classification problem, and the classification probability given by the classification model can be further mapped into a real number score. We divide tasks into two categories: good task 1 and bad task 0, which are marked by big data engineers. The so-called good task usually refers to a task that takes short time and consumes less resources under the same task amount and complexity.

The model training process is as follows:

The first is sample preparation. Our samples come from historical task data, and the sample characteristics include running time, resources used, whether execution failed, etc. The sample labels are marked as good and bad by big data engineers according to rules or experience. Then we can train the model. We have tried LR, GBDT, XGboost and other models. Both theory and practice prove that XGboost has better classification effect. The model will eventually output the probability that the task is a "good task". The greater the probability, the higher the final mapped task score will be.

After training, 19 features are selected from the initial nearly 50 original features, which can basically determine whether a task is a good task. For example, for tasks with many failures and tasks with low resource utilization, most of the scores will not be too high, which is basically consistent with the subjective feelings of labor.

After using the model to score tasks, we can see that tasks below 0 to 30 belong to unhealthy tasks that need to be managed urgently; Between 30 and 60 are tasks with acceptable health; Those with a score of 60 or above are tasks with good health and need to maintain the status quo. In this way, with quantitative indicators, the task owner can be guided to actively manage some tasks, thus achieving the goal of reducing costs and increasing efficiency.

After the application of the model, it brought usThe following benefits:

First of all, the task owner can know the health of the tasks under his name, and can know whether the tasks need to be managed through scores and rankings;

(2) Quantitative indicators provide a basis for the follow-up task governance;

(3) How much profit and how much improvement have been achieved after the completion of task governance can also be quantitatively demonstrated through scores.

03

Spark task intelligent parameter adjustment

The second application scenario is the intelligent parameter adjustment of Spark task. A survey by Gartner reveals that 70% of cloud resources consumed by cloud users are unnecessarily wasted. When applying for cloud resources, many people may apply for more resources in order to ensure the successful implementation of the task, which will cause unnecessary waste. There are still many people who use the default configuration when creating tasks, but this is actually not the optimal configuration. If it can be carefully configured, it can achieve very good results, which can not only ensure the operation efficiency, but also ensure the success of the operation, and at the same time save a lot of resources. However, task parameter configuration has high requirements for users. In addition to understanding the meaning of configuration items, it is also necessary to consider the correlation influence between configuration items. Even relying on expert experience, it is difficult to achieve optimization, and the strategy of rule class is difficult to adjust dynamically.

This puts forward a demand, hoping that the model can intelligently recommend the optimal parameter configuration for task operation, so as to improve the utilization rate of task cloud resources while keeping the original running time of the task unchanged.

For the task parameter adjustment function module, our design scheme includes two situations: the first one is that the model should be able to recommend the most suitable configuration parameters according to the historical operation of the task; In the second case, the model should be able to give a reasonable configuration through the analysis of the tasks for which the users are not online.

The next step is to train the model. First, we must determine the output target of the model. There are more than 300 configurable items, and it is impossible to give them all by the model. After testing and investigation, we chose three parameters that have the greatest influence on the task performance, namelyCores core number of executorTotal memoryNumber of instances instances. Each configuration item has its default value and adjustable range. In fact, given a parameter space, the model only needs to find the optimal solution in this space.

In the training stage, there are two schemes to carry out. Option one isLearning experience ruleIn the early stage, the parameters were recommended by rules, and the effect was good after online, so let the model learn this set of rules first, so as to achieve the goal of online quickly. The model training sample is more than 70,000 task configurations previously calculated according to the rules. The sample features the historical operation data of tasks (such as the amount of data processed by tasks, the amount of resources used, the time consumed by tasks, etc.) and some statistical information (such as the average consumption and the maximum consumption in the past seven days, etc.).

We chose the basic model.Multiple regression model with multiple dependent variables. The common regression model is single output, with many independent variables but only one dependent variable. Here we hope to output three parameters, so we adopt a multiple regression model with multiple dependent variables, and its essence is still an LR model.

The above picture shows the theoretical basis of this model. On the left is a multi-label, that is, three configuration items, β is the coefficient of each feature and σ is the error. The training method is the same as unitary regression, and the least square method is used to estimate the sum of squares of all elements in σ.

The advantage of option one is thatYou can learn the rules quickly, and the cost is relatively small.. The drawback is thatIts optimization upper limit can achieve the same good effect as the rule at most, but it will be more difficult to exceed it.

The second scheme is Bayesian optimization, which is similar to reinforcement learning, and tries to find the optimal configuration in parameter space. Bayesian framework is adopted here, because it can make use of the basis of the last attempt, and it will have some transcendental experience in the next attempt, so that it can quickly find a better position. The whole training process will be carried out in a parameter space, and a configuration will be randomly sampled for verification and then run; After the operation, we will pay attention to some indicators, such as utilization rate and cost, to judge whether it is optimal; Then repeat the above steps until the tuning is completed. After the model is trained, there is also a tricky process in the use process. If there is a certain similarity between the new task and the historical task, there is no need to calculate the configuration again, and the previous optimal configuration can be adopted directly.

After the trial and practice of these two schemes, we can see that certain effects have been achieved. For the existing tasks, after modification according to the configuration parameters recommended by the model, more than 80% of the tasks can improve the resource utilization rate by about 15%, and the resource utilization rate of some tasks is even doubled. But both schemes actually exist.defectThe regression model of learning rules has a lower upper limit of optimization; The disadvantage of Bayesian optimization model for global optimization is that it is too expensive to make various attempts.

The future exploration directions are as follows:

Semantic analysis:Spark semantics is rich, including different code structures and operator functions, which are closely related to task parameter configuration and resource consumption. But at present, we only use the historical operation of the task, ignoring the Spark semantics itself, which is a waste of information. The next thing to do is to penetrate into the code level, analyze the operator functions contained in the Spark task, and make more fine-grained tuning accordingly.

Classification tuning:Spark has many application scenarios, such as pure analysis, development, processing, etc. The tuning space and objectives of different scenarios are also different, so it is necessary to do classification tuning.

Engineering optimization:One of the difficulties encountered in practice is that there are few samples and the test cost is high, which requires the cooperation of relevant parties to optimize the project or process.

04

Intelligent selection of SQL task execution engine

The third application scenario is the intelligent choice of SQL query task execution engine.

Background:

(1)SQL query platform is a big data product that most users have the most contact with and the most obvious experience. No matter data analysts, R&D or product managers, they write a lot of SQL every day to get the data they want;

(2) When many people run SQL tasks, they don’t pay attention to the underlying execution engine. For example, Presto is based on pure memory calculation. In some simple query scenarios, its advantage is that the execution speed will be faster, but its disadvantage is that if the storage capacity is not enough, it will be directly hung up; In contrast, Spark is more suitable for executing complex scenes with a large amount of data. Even if oom appears, it will use disk storage, thus avoiding the failure of the task. Therefore, different engines are suitable for different task scenarios.

(3) The effect of 3)SQL query should comprehensively consider the execution time of the task and the consumption of resources, neither can it excessively pursue the query speed without considering the consumption of resources, nor can it affect the query efficiency in order to save resources.

(4) There are three traditional engine selection methods in the industry, namely RBO, CBO and HBO.RBO It is a rule-based optimizer, which is difficult to make rules and has low update frequency.CBO Is based on cost optimization, too much pursuit of cost optimization may lead to the failure of task execution;HBO It is an optimizer based on historical task operation, which is limited to historical data.

In the design of the function module, after the user writes the SQL statement and submits it for execution, the model will automatically judge which engine to use and pop up a window to prompt, and the user will finally decide whether to use the recommended engine for execution.

The overall scheme of the model is to recommend the execution engine based on the SQL statement itself. Because you can see what tables and functions are used from SQL itself, this information directly determines the complexity of SQL, thus affecting the choice of execution engine. Model training samples come from SQL statements run in history, and model labels are marked according to historical execution. For example, tasks with long task execution and huge data volume will be marked as suitable for running on Spark, and the rest are SQL suitable for running on Presto. NLP technology and N-gram plus TF-IDF method are used to extract sample features. The general principle is to extract phrases to see their frequency in sentences, so that keyword groups can be extracted. The vector features generated after this operation are very large. We first select 3000 features by linear model, and then train XGBoost model as the final prediction model.

After training, we can see that the accuracy of the model prediction is still relatively high, about 90% or more.

The online application process of the final model is: after the user submits SQL, the model recommends the execution engine. If it is different from the engine originally selected by the user, the language conversion module will be called to complete the conversion of SQL statements. If the execution fails after switching engines, we will have a failover mechanism to switch back to the user’s original engine to ensure the success of task execution.

The benefit of this practice is that the model can automatically select the most suitable execution engine, and complete the subsequent sentence transformation, without the need for users to do additional learning.

In addition, the engine recommended by the model can basically keep the original execution efficiency unchanged, while reducing the failure rate, so the overall user experience will increase.

Finally, due to the reduction of the unnecessary use of high-cost engines and the decline in the failure rate of task execution, the overall resource cost consumption has decreased.

From the second part to the fourth part, we shared three applications of AI algorithm on big data platform. One of the characteristics that can be seen is thatThe algorithm used is not particularly complicated, but the effect will be very obvious.This inspires us to take the initiative to understand the pain points or optimization space of the big data platform during its operation. After determining the application scenario, we can try to use different machine learning methods to solve these problems, so as to realize the feedback of AI algorithm to big data.

05

Application Prospect of AI Algorithm in Big Data Governance

Finally, we look forward to the application scenario of AI algorithm in big data governance.

The three application scenarios described above focus on the data processing stage. In fact, echoing the relationship between AI and big data in the first chapter, AI can play a better role in the whole data life cycle.

For example, in the data acquisition stage, whether the log is reasonable can be judged; Can do intrusion detection when transmitting; When processing, it can further reduce costs and increase efficiency; Do some work to ensure data security when exchanging; When destroying, we can judge the timing and related influence of destruction. There are many application scenarios of AI in the big data platform, and here it is just a brick to attract jade. It is believed that the mutual support relationship between AI and big data will be more prominent in the future. AI assists big data platforms to collect and process data better, and better data quality can help train better AI models, thus achieving a virtuous circle.

06

Question and answer session

Q1: What kind of rule engine is used? Is it open source?

A1: The so-called parameter tuning rules here are formulated by our big data colleagues based on the experience of manual tuning in the early stage, such as how many minutes the execution time of the task exceeds, or how much data is processed, and how many cores or memory are recommended for the task. This is a set of rules that have been accumulated for a long time, and the effect is better after going online, so we use this set of rules to train our parameter recommendation model.

Q2: Is the dependent variable only the adjustment of parameters? Have you considered the influence of the performance instability of the big data platform on the calculation results?

A2: When making parameter recommendation, we don’t just pursue low cost, otherwise the recommended resources will be low and the task will fail. It is true that the dependent variable only has parameter adjustment, but in order to prevent instability, we have added additional restrictions. First of all, the model features, we choose the average value of a certain period of time rather than the value of an isolated day; Secondly, for the parameters recommended by the model, we will compare the differences between them and the actual configuration values. If the differences are too large, we will adopt the strategy of slow rise and slow down to avoid the failure of the task caused by excessive one-time adjustment.

Q3: Are regression model and Bayesian model used at the same time?

A3: No. Just now, we talked about doing parameter recommendation, and we have used two schemes: learning rules uses regression model; Then the Bayesian optimization framework is used. They are not used at the same time. We have made two attempts. The advantage of the former learning rule is that it can quickly use historical past experience; The second model can find a better or even optimal configuration on the basis of the previous one. The two of them belong to a sequential or progressive relationship, rather than being used at the same time.

Q4: Is the introduction of semantic analysis considered from expanding more features?

A4: Yes. As mentioned just now, the information we use when doing Spark tuning is only its historical implementation, but we haven’t paid attention to the Spark task itself yet. Spark itself actually contains a lot of information, including various operators and stages. If we don’t analyze its semantics, we will lose a lot of information. So our next plan is to analyze the semantics of Spark task and expand more features to assist parameter calculation.

Q5: Will parameter recommendation be unreasonable, which will lead to abnormal or even failed tasks? Then how to reduce abnormal task error and task fluctuation in such a scenario?

A5: If we completely rely on the model, it is possible that it pursues to improve the utilization rate of resources as high as possible. At this time, the recommended parameters may be more radical, such as the memory shrinking from 30g to 5g at once. Therefore, in addition to the model recommendation, we will add additional restrictions, such as how many g the parameter adjustment span can’t exceed, that is, the slow-rising and slow-falling strategy.

Q6: Sigmoid 2022 has some articles related to parameter tuning. Are there any references?

A6: Task intelligent parameter tuning is still a hot research direction, and teams in different fields have adopted different methods and models. Before we started, we investigated many industry methods, including the sigmoid 2022 paper you mentioned. After comparison and practice, we finally tried the two schemes we shared. We will continue to pay attention to the latest progress in this direction and try more methods to improve the recommendation effect.

That’s all for today’s sharing. Thank you.

| Share guests |

| |DataFun New Media Matrix |

| About DataFun| |

Focus on the sharing and communication of big data and artificial intelligence technology applications. Founded in 2017, more than 100+ offline and 100+ online salons, forums and summits have been held in Beijing, Shanghai, Shenzhen, Hangzhou and other cities, and more than 2,000 experts and scholars have been invited to participate in the sharing. Its WeChat official account DataFunTalk has accumulated 900+ original articles, one million+readings and 160,000+accurate fans.