11884 stories
·
23 followers

Massive Cloudflare outage was triggered by file that suddenly doubled in size

1 Share

When a Cloudflare outage disrupted large numbers of websites and online services yesterday, the company initially thought it was hit by a “hyper-scale” DDoS (distributed denial-of-service) attack.

“I worry this is the big botnet flexing,” Cloudflare co-founder and CEO Matthew Prince wrote in an internal chat room yesterday, while he and others discussed whether Cloudflare was being hit by attacks from the prolific Aisuru botnet. But upon further investigation, Cloudflare staff realized the problem had an internal cause: an important file had unexpectedly doubled in size and propagated across the network.

This caused trouble for software that needs to read the file to maintain the Cloudflare bot management system that uses a machine learning model to protect against security threats. Cloudflare’s core CDN, security services, and several other services were affected.

“After we initially wrongly suspected the symptoms we were seeing were caused by a hyper-scale DDoS attack, we correctly identified the core issue and were able to stop the propagation of the larger-than-expected feature file and replace it with an earlier version of the file,” Prince wrote in a post-mortem of the outage.

Prince explained that the problem “was triggered by a change to one of our database systems’ permissions which caused the database to output multiple entries into a ‘feature file’ used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.”

These machines run software that routes traffic across the Cloudflare network. The software “reads this feature file to keep our Bot Management system up to date with ever changing threats,” Prince wrote. “The software had a limit on the size of the feature file that was below its doubled size. That caused the software to fail.”

Sorry for the pain, Internet

After replacing the bloated feature file with an earlier version, the flow of core traffic “largely” returned to normal, Prince wrote. But it took another two-and-a-half hours “to mitigate increased load on various parts of our network as traffic rushed back online.”

Like Amazon Web Services, Cloudflare is relied upon by many online services and can take down much of the web when it has a technical problem. “On behalf of the entire team at Cloudflare, I would like to apologize for the pain we caused the Internet today,” Prince wrote, saying that any outage is unacceptable because of “Cloudflare’s importance in the Internet ecosystem.”

Cloudflare’s bot management system classifies bots as good or bad with “a machine learning model that we use to generate bot scores for every request traversing our network,” Prince wrote. “Our customers use bot scores to control which bots are allowed to access their sites—or not.”

Prince explained that the configuration file this system relies upon describes “features,” or individual traits “used by the machine learning model to make a prediction about whether the request was automated or not.” This file is updated every five minutes “and published to our entire network and allows us to react to variations in traffic flows across the Internet. It allows us to react to new types of bots and new bot attacks. So it’s critical that it is rolled out frequently and rapidly as bad actors change their tactics quickly.”

Unexpected query response

Each new version of the file is generated by a query running on a ClickHouse database cluster, Prince wrote. When Cloudflare made a change granting additional permissions to database users, the query response suddenly contained more metadata than it previously had.

Cloudflare staff assumed “that the list of columns returned by a query like this would only include the ‘default’ database.” But the query didn’t include a filter for the database name, causing it to return duplicates of columns, Prince wrote.

This is the type of query that Cloudflare’s bot management system uses “to construct each input ‘feature’ for the file,” he wrote. The extra metadata more than doubled the rows in the response, “ultimately affecting the number of rows (i.e. features) in the final file output,” Prince wrote.

Cloudflare’s proxy service has limits to prevent excessive memory consumption, with the bot management system having “a limit on the number of machine learning features that can be used at runtime.” This limit is 200, well above the actual number of features used.

“When the bad file with more than 200 features was propagated to our servers, this limit was hit—resulting in the system panicking” and outputting errors, Prince wrote.

Worst Cloudflare outage since 2019

The number of 5xx error HTTP status codes served by the Cloudflare network is normally “very low” but soared after the bad file spread across the network. “The spike, and subsequent fluctuations, show our system failing due to loading the incorrect feature file,” Prince wrote. “What’s notable is that our system would then recover for a period. This was very unusual behavior for an internal error.”

This unusual behavior was explained by the fact “that the file was being generated every five minutes by a query running on a ClickHouse database cluster, which was being gradually updated to improve permissions management,” Prince wrote. “Bad data was only generated if the query ran on a part of the cluster which had been updated. As a result, every five minutes there was a chance of either a good or a bad set of configuration files being generated and rapidly propagated across the network.”

This fluctuation initially “led us to believe this might be caused by an attack. Eventually, every ClickHouse node was generating the bad configuration file and the fluctuation stabilized in the failing state,” he wrote.

Prince said that Cloudflare “solved the problem by stopping the generation and propagation of the bad feature file and manually inserting a known good file into the feature file distribution queue,” and then “forcing a restart of our core proxy.” The team then worked on “restarting remaining services that had entered a bad state” until the 5xx error code volume returned to normal later in the day.

Prince said the outage was Cloudflare’s worst since 2019 and that the firm is taking steps to protect against similar failures in the future. Cloudflare will work on “hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated input; enabling more global kill switches for features; eliminating the ability for core dumps or other error reports to overwhelm system resources; [and] reviewing failure modes for error conditions across all core proxy modules,” according to Prince.

While Prince can’t promise that Cloudflare will never have another outage of the same scale, he said that previous outages have “always led to us building new, more resilient systems.”

Read full article

Comments



Read the whole story
freeAgent
7 hours ago
reply
Los Angeles, CA
Share this story
Delete

Joe Rogan Subreddit Bans 'Political Posts' But Still Wants 'Free Speech'

1 Share

In a move that has confused and angered its users, the r/JoeRogan subreddit has banned all posts about politics. Adding to the confusion, the subreddit’s mods have said that political comments are still allowed, just not posts. “After careful consideration, internal discussion and tons of external feedback we have collectively decided that r/JoeRogan is not the place for politics anymore,” moderator OutdoorRink said in a post announcing the change today.

The new policy has not gone over well. For the last 10 years, the Joe Rogan Experience has been a central part of American political life. He interviews entertainers, yes, but also politicians and powerful businessmen. He had Donald Trump on the show and endorsed his bid for President. During the COVID and lockdown era, Rogan cast himself as an opposition figure to the heavy regulatory hand of the state. In a recent episode, Rogan’s guest was another podcaster, Adam Carolla, and the two spent hours talking about Covid lockdowns, Gavin Newsom, and specific environmental laws and building codes they argue is preventing Los Angeles from rebuilding after the Palisades fire.

To hear the mods tell it, the subreddit is banning politics out of concern for Rogan’s listeners. “For too long this subreddit has been overrun by users who are pushing a political agenda, both left and right, and that stops today,” the post announcing the ban said. “It is not lost on us that Joe has become increasingly political in recent years and that his endorsement of Trump may have helped get him elected. That said, we are not equipped to properly moderate, arbitrate and curate political posts…while also promoting free speech.” 

To be fair, as Rogan’s popularity exploded over the years, and as his politics have shifted to the right, many Reddit users have turned to the r/JoeRogan to complain about the direction Rogan and his podcast have taken. These posts are often antagonistic to Rogan and his fans, but are still “on-topic.”

Over the past few months, the moderator who announced the ban has posted several times about politics on r/JoeRogan. On November 3, they said that changes were coming to the moderation philosophy of the sub. “In the past few years, a significant group of users have been taking advantage of our ‘anything goes’ free speech policy,” they said. “This is not a political subreddit. Obviously Joe has dipped his toes in the political arena so we have allowed politics to become a component of the daily content here. That said, I think most of you will agree that it has gone too far and has attracted people who come here solely to push their political agenda with little interest in Rogan or his show.” A few days later the mod posted a link to a CBC investigation into MMA gym owners with neo-Nazi ties, a story only connected to Rogan by his interested in MMA and work as a UFC commentator.

r/JoeRogan’s users see the new “no political posts” policy as hypocrisy. And a lot of them think it has everything to do with recent revelations about Jeffrey Epstein. The connections between Epstein, Trump, and various other Rogan guests have been building for years. A recent, poorly formatted, dump of 200,000 Epstein files contained multiple references to Trump and Congress is set to release more. 

 “Random new mod appears and want to ruin this sub on a pathetic power trip. Transparently an attempt to cover for the pedophiles in power that Joe endorsed and supports. Not going to work,” one commenter said under the original post announcing the new ban.

“Perfectly timed around the Epstein files due to be released as well. So much for being free speech warriors eh space chimps?,” said one.

“Talking politics was great when it was all dunking on trans people and brown people but now that people have to defend pedophiles that banned hemp it's not so fun anymore,” said another.

You can see the remnants of pre-politics bans discussions lingering on r/JoeRogan. There are, of course, clips from the show and discussions of its guests but there’s also a lot of Epstein memes, posts about Epstein news, and fans questioning why Rogan hasn’t spoken out about Epstein recently after talking about it on the podcast for years.

Multiple guests Rogan has hosted on the show have turned up in the Epstein files, chief among them Donald Trump. The House GOP slipped a ban on hemp into the bill to re-open the government, a move that will close a loophole that’s allowed people to legally smoke weed in states like Texas. These are not the kinds of things the chill apes of Rogan’s fandom wanted.

“I think we all know what eventually happened to Joe and his podcast. The slow infiltration of right wing grifters coupled with Covid, it very much did change him. And I saw firsthand how that trickled down into the comedy community, especially one where he was instrumental in helping to rebuild. Instead of it being a platform to share his interests and eccentricities, it became a place to share his grievances and fears….how can we not expect to be allowed to talk about this?” user GreppMichaels said. “Do people really think this sub can go back to silly light chatter about aliens or conspiracies? Joe did this, how do the mods think we can pretend otherwise?”



Read the whole story
freeAgent
7 hours ago
reply
Los Angeles, CA
Share this story
Delete

Mastodon CEO steps down as the social network restructures

2 Shares
Eugen Rochko is stepping down as CEO of decentralized social network Mastodon. Felix Hlatky will now become the executive director as the company becomes structured as a nonprofit government by a board.
Read the whole story
freeAgent
1 day ago
reply
Los Angeles, CA
Share this story
Delete

Google CEO: If an AI bubble pops, no one is getting out clean

1 Comment

On Tuesday, Alphabet CEO Sundar Pichai warned of “irrationality” in the AI market, telling the BBC in an interview, “I think no company is going to be immune, including us.” His comments arrive as scrutiny over the state of the AI market has reached new heights, with Alphabet shares doubling in value over seven months to reach a $3.5 trillion market capitalization.

Speaking exclusively to the BBC at Google’s California headquarters, Pichai acknowledged that while AI investment growth is at an “extraordinary moment,” the industry can “overshoot” in investment cycles, as we’re seeing now. He drew comparisons to the late 1990s Internet boom, which saw early Internet company valuations surge before collapsing in 2000, leading to bankruptcies and job losses.

“We can look back at the Internet right now. There was clearly a lot of excess investment, but none of us would question whether the Internet was profound,” Pichai said. “I expect AI to be the same. So I think it’s both rational and there are elements of irrationality through a moment like this.”

Over the past year, some analysts and tech industry critics have expressed increasing skepticism about a web of $1.4 trillion in deals surrounding Google competitor OpenAI in particular. The company has committed to spending $1.4 trillion on infrastructure over eight years, while it expects to generate around $13 billion in revenue this year. OpenAI CEO Sam Altman told reporters at a private dinner in August that investors are “overexcited” about AI models and that “someone” will lose a “phenomenal amount of money.”

Reacting to the Pichai comments, prominent AI industry critic Ed Zitron told Ars Technica, “I think that this is the first moment where a magnificent 7 feels it’s necessary to be on the right side of history, leaning on the shaky talking point of ‘there was a lot of over investment in the Internet too’ because there really isn’t a defense for theto use his own terminology‘excess investment’ in AI.” He added, “I imagine others will follow.”

Market concerns and Google’s position

Alphabet’s recent market performance has been driven by investor confidence in the company’s ability to compete with OpenAI’s ChatGPT, as well as its development of specialized chips for AI that can compete with Nvidia’s. Nvidia recently reached a world-first $5 trillion valuation due to making GPUs that can accelerate the matrix math at the heart of AI computations.

Despite acknowledging that no company would be immune to a potential AI bubble burst, Pichai argued that Google’s unique position gives it an advantage. He told the BBC that the company owns what he called a “full stack” of technologies, from chips to YouTube data to models and frontier science research. This integrated approach, he suggested, would help the company weather any market turbulence better than competitors.

Pichai also told the BBC that people should not “blindly trust” everything AI tools output. The company currently faces repeated accuracy concerns about some of its AI models. Pichai said that while AI tools are helpful “if you want to creatively write something,” people “have to learn to use these tools for what they’re good at and not blindly trust everything they say.”

In the BBC interview, the Google boss also addressed the “immense” energy needs of AI, acknowledging that the intensive energy requirements of expanding AI ventures have caused slippage on Alphabet’s climate targets. However, Pichai insisted that the company still wants to achieve net zero by 2030 through investments in new energy technologies. “The rate at which we were hoping to make progress will be impacted,” Pichai said, warning that constraining an economy based on energy “will have consequences.”

Even with the warnings about a potential AI bubble, Pichai did not miss his chance to promote the technology, albeit with a hint of danger regarding its widespread impact. Pichai described AI as “the most profound technology” humankind has worked on.

“We will have to work through societal disruptions,” he said, adding that the technology would “create new opportunities” and “evolve and transition certain jobs.” He said people who adapt to AI tools “will do better” in their professions, whatever field they work in.

Read full article

Comments



Read the whole story
freeAgent
1 day ago
reply
I'm pretty sure my local game shop will do just fine if the AI bubble pops.
Los Angeles, CA
Share this story
Delete

Apple’s custom Wi-Fi chip gives the iPhone 17 a notable boost, according to speed tests

1 Comment
The iPhone 17 Pro in three different colors.

Ookla has found that Apple’s custom N1 networking chip that integrates the Wi-Fi 7, Bluetooth 6, and Thread radios in the iPhone 17 family “delivers a clear step-change in real-world Wi-Fi performance” when compared to the Broadcom chip used in the iPhone 16 models. In North America, the iPhone 17 family also outperformed flagship Android phones when it came to Wi-Fi download speeds during the same time period.

On paper, the N1 chip’s Wi-Fi capabilities appear “virtually identical to its Broadcom-based predecessor” in the iPhone 16, according to Ookla. The N1 is also limited to 160MHz channels and doesn’t take full advantage of Wi-Fi 7’s faster 320MHz channels, but for real world users that limitation didn’t have a significant impact.

Using Speedtest Intelligence data gathered during the six-week period after Apple’s latest smartphones were released, Ookla found that the median download and upload speeds of the iPhone 17 family were both up to 40 percent higher than the iPhone 16 family around the world. The N1’s 10th-percentile speeds were even faster at 60 percent higher than the iPhone 16, implying the new chip’s performance improvements are even more noticeable in “challenging Wi-Fi conditions.”

The iPhone 17 family outperformed flagship Android devices like the Pixel 10 family and the Galaxy S25 family in North America, where Wi-Fi 7 devices can use up to three 320 MHz channels, and the N1 should be at a disadvantage. The iPhone 17 family had the highest median and 90th percentile Wi-Fi download speeds of 416.14 Mbps and 976.39 Mbps, respectively. That could change as the number of 320MHz-capable Wi-Fi 7 routers increases in North America, but it reinforces the findings that the N1 can “deliver more consistent performance in non-ideal Wi-Fi conditions.”

Read the whole story
freeAgent
1 day ago
reply
I have noticed that Bluetooth performance is much better than my 14 Pro. Items pair/sync quicker and unlocking Bluetooth-enabled HID door locks at work is much quicker and works over a greater distance.
Los Angeles, CA
Share this story
Delete

Polestar 3 can now power your home – and cut your energy bill

1 Share

Polestar and home energy company dcbel are rolling out vehicle‑to‑home (V2H), blackout backup, and smart charging features for Polestar 3 owners in the US, starting in California.

more…
Read the whole story
freeAgent
1 day ago
reply
Los Angeles, CA
Share this story
Delete
Next Page of Stories