World news | 20Fix.com

home All News open_in_new Full Article

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

Every Sunday, NPR host Will Shortz, The New York Times’ crossword puzzle guru, gets to quiz thousands of listeners in a long-running segment called the Sunday Puzzle. While written to be solvable without too much foreknowledge, the brainteasers are usually challenging even for skilled contestants. That’s why some experts think they’re a promising way to […] © 2024 TechCrunch. All rights reserved. For personal use only.

Researchers from Wellesley College, Oberlin College, the University of Texas at Austin, Northeastern University, Charles University, and startup Cursor have developed an AI benchmark using riddles from NPR's Sunday Puzzle. This benchmark aims to test AI reasoning models with problems solvable by general knowledge, unlike many existing benchmarks that rely on specialized expertise. The study found that while models like OpenAI's o1 performed well, others such as DeepSeek's R1 sometimes provided incorrect answers or "gave up," revealing limitations in AI reasoning. The researchers plan to expand their testing to identify areas for improvement in AI models.

today 5 d. ago attach_file Politics

									
												1 hour ago											Hamas frees 6 hostages but questions cloud the future of the Gaza ceasefire
												
													Hamas frees 6 hostages but questions cloud the future of the Gaza ceasefire  NBC Los Angeles												
												
												Events												
											
									attach_file
									Events

									
												2 h. ago											Officials Are Fired at Traffic Safety Agency Investigating Musk’s Company
												
													The National Highway Traffic Safety Administration has raised questions about crashes involving Tesla’s self-driving technology.												
												
												Events												
											
									attach_file
									Events

									
												2 h. ago											Officials Fired at Traffic Safety Agency Investigating Musk’s Company
												
													The National Highway Traffic Safety Administration has raised questions about crashes involving Tesla’s self-driving technology.												
												
												Events												
											
									attach_file
									Events

									
												3 h. ago											Allison Epstein's 'Fagin the Thief' gives the Oliver Twist character a backstory
												
													NPR's Scott Simon speaks with novelist Allison Epstein about her new novel "Fagin the Thief," which imagines a backstory for the character from the Charles Dickens book "Oliver Twist."												
												
												Politics												
											
									attach_file
									Politics

									
												3 h. ago											Dear listeners: "Bambi" is whatever the band Anxious says it is
												
													We hear from musicians Grady Allen and Dante Melucci from the band Anxious, about their second album "Bambi." The young hardcore act says it's their most authentic outing yet.												
												
												Politics												
											
									attach_file
									Politics

									
												3 h. ago											NIH funding freeze stalls applications on $1.5 billion in medical research funds
												
													The National Institutes of Health had to stop considering new grant applications, delaying funding for research into diseases ranging from heart disease and cancer to Alzheimer's and allergies.												
												
												Science												
											
									attach_file
									Science

									
												4 h. ago											Burnout is a problem for caseworkers serving unhoused people
												
													People who provide assistance to the unhoused often feel traumatized by their work.												
												
												Society												
											
									attach_file
									Society

									
												4 h. ago											Economy, immigration, Elon Musk at center of German election; conservative candidate favored to win
												
													Germans go to the polls Sunday. Chancellor Scholz's Social Democrats are likely to lose to the conservative CDU party, as the right-wing Alternative for Germany party is likely to make gains.												
												
												Politics												
											
									attach_file
									Politics

									
												4 h. ago											Trump administration plans mass firing at office that funds homelessness programs
												
													Staffing at the HUD office that pays for housing and support services across the country is slated to be cut by 84%. Advocates warn such heavy cuts could make record-high homelessness even worse.												
												
												Politics												
											
									attach_file
									Politics

									
												4 h. ago											More New Tesla Model Y Arrive At Los Angeles Store Ahead Of Deliveries
												
													More New Tesla Model Y Arrive At Los Angeles Store Ahead Of Deliveries  Forbes												
												
												Politics												
											
									attach_file
									Politics

									
												5 h. ago											Elon Musk blurs the line between his government and business roles
												
													The tech titan and President Trump say they will avoid any conflicts of interest, but it's difficult for the public to verify that.												
												
												Politics												
											
									attach_file
									Politics

									
												6 h. ago											Want to reduce soreness after a workout? Make time for this 4
												
													These simple post-workout activities can help reduce pain and even improve athletic performance. But many people don't prioritize recovery and self-care after exercise.												
												
												Politics												
											
									attach_file
									Politics

									
												7 h. ago											Hamas releases Israeli hostages, returns remains of Shiri Bibas
												
													Hamas to release the last six live hostages whose freedom they agreed to under the current ceasefire deal.												
												
												Politics												
											
									attach_file
									Politics

									
												7 h. ago											Van Jones: Brown Was Fired Due to 'Slander that He's Woke and DEI', He Just Had Policies to Make Diverse Military 'Cohesive'
												
													On Friday’s broadcast of CNN’s “Laura Coates Live,” CNN Political Commentator and former Obama Adviser Van Jones said that the only reason he can find to dismiss Gen. CQ Brown as Chairman of the Joint Chiefs of Staff is “this slander The post Van Jones: Brown Was Fired Due to ‘Slander that He’s Woke and DEI’, He Just Had Policies to Make Diverse Military ‘Cohesive’ appeared first on Breitbart.												
												
												Events												
											
									attach_file
									Events

									
												9 h. ago											Arab leaders huddle in Saudi Arabia in pushback to Trump's Gaza plans
												
													Leaders from Egypt, Jordan and other Arab states met in Saudi Arabia to discuss alternative plans for Gaza's future than the one laid out by President Trump, which calls for displacing Palestinians.												
												
												Politics												
											
									attach_file
									Politics

									
												12 h. ago											Federal prisons prep to move trans inmates as early as next week
												
													The Bureau of Prisons is moving forward with plans to move transgender inmates out of prisons that align with their gender identity and into facilities that align with their assigned sex at birth.												
												
												Events												
											
									attach_file
									Events

									
												14 h. ago											Trump on When He Owns Economy: 'Takes a Period of Six Months to a Year'
												
													On Friday’s broadcast of Fox News Radio’s “Brian Kilmeade Show,” President Donald Trump responded to a question on when the economy will become his and we’ll get to see if his economic policies are working by stating that “it takes The post Trump on When He Owns Economy: ‘Takes a Period of Six Months to a Year’ appeared first on Breitbart.												
												
												Politics												
											
									attach_file
									Politics

									
												14 h. ago											Judge largely blocks Trump's executive orders ending federal support for DEI programs
												
													A U.S. district judge granted a preliminary injunction blocking the administration from terminating or changing federal contracts they consider equity-related.												
												
												Politics												
											
									attach_file
									Politics

									
												15 h. ago											Education Department Moves to Reimbursement Model for COVID Relief Funds After Billions in Wasteful Spending
												
													The Department of Education (DOE) has shifted all future spending related to the $4.4 billion in remaining COVID-19 school relief funds to a reimbursement structure after identifying massive waste, fraud, and abuse — including hundreds of thousands spent to rent out an MLB stadium and tens of thousands spent on casino hotel rooms. The post Education Department Moves to Reimbursement Model for COVID Relief Funds After Billions in Wasteful Spending appeared first on Breitbart.												
												
												Politics												
											
									attach_file
									Politics

									
												15 h. ago											NBC settles defamation lawsuit with doctor falsely labeled "uterus collector"
												
													MSNBC had aired stories falsely claiming the doctor performed mass hysterectomies on female detainees at an Immigration and Customs Enforcement facility in Georgia.												
												
												Politics												
											
									attach_file
									Politics

ID: 4013566443