IEEE Spectrum IEEE Spectrum

  • Nvidia Conquers Latest AI Tests​
    by Samuel K. Moore on 12. June 2024. at 15:00

    For years, Nvidia has dominated many machine learning benchmarks, and now there are two more notches in its belt. MLPerf, the AI benchmarking suite sometimes called “the Olympics of machine learning,” has released a new set of training tests to help make more and better apples-to-apples comparisons between competing computer systems. One of MLPerf’s new tests concerns fine-tuning of large language models, a process that takes an existing trained model and trains it a bit more with specialized knowledge to make it fit for a particular purpose. The other is for graph neural networks, a type of machine learning behind some literature databases, fraud detection in financial systems, and social networks. Even with the additions and the participation of computers using Google’s and Intel’s AI accelerators, systems powered by Nvidia’s Hopper architecture dominated the results once again. One system that included 11,616 Nvidia H100 GPUs—the largest collection yet—topped each of the nine benchmarks, setting records in five of them (including the two new benchmarks). “If you just throw hardware at the problem, it’s not a given that you’re going to improve.” —Dave Salvator, Nvidia The 11,616-H100 system is “the biggest we’ve ever done,” says Dave Salvator, director of accelerated computing products at Nvidia. It smashed through the GPT-3 training trial in less than 3.5 minutes. A 512-GPU system, for comparison, took about 51 minutes. (Note that the GPT-3 task is not a full training, which could take weeks and cost millions of dollars. Instead, the computers train on a representative portion of the data, at an agreed-upon point well before completion.) Compared to Nvidia’s largest entrant on GPT-3 last year, a 3,584 H100 computer, the 3.5-minute result represents a 3.2-fold improvement. You might expect that just from the difference in the size of these systems, but in AI computing that isn’t always the case, explains Salvator. “If you just throw hardware at the problem, it’s not a given that you’re going to improve,” he says. “We are getting essentially linear scaling,” says Salvatore. By that he means that twice as many GPUs lead to a halved training time. “[That] represents a great achievement from our engineering teams,” he adds. Competitors are also getting closer to linear scaling. This round Intel deployed a system using 1,024 GPUs that performed the GPT-3 task in 67 minutes versus a computer one-fourth the size that took 224 minutes six months ago. Google’s largest GPT-3 entry used 12-times the number of TPU v5p accelerators as its smallest entry and performed its task nine times as fast. Linear scaling is going to be particularly important for upcoming “AI factories” housing 100,000 GPUs or more, Salvatore says. He says to expect one such data center to come online this year, and another, using Nvidia’s next architecture, Blackwell, to startup in 2025. Nvidia’s streak continues Nvidia continued to boost training times despite using the same architecture, Hopper, as it did in last year’s training results. That’s all down to software improvements, says Salvatore. “Typically, we’ll get a 2-2.5x [boost] from software after a new architecture is released,” he says. For GPT-3 training, Nvidia logged a 27 percent improvement from the June 2023 MLPerf benchmarks. Salvatore says there were several software changes behind the boost. For example, Nvidia engineers tuned up Hopper’s use of less accurate, 8-bit floating point operations by trimming unnecessary conversions between 8-bit and 16-bit numbers and better targeting of which layers of a neural network could use the lower precision number format. They also found a more intelligent way to adjust the power budget of each chip’s compute engines, and sped communication among GPUs in a way that Salvatore likened to “buttering your toast while it’s still in the toaster.” Additionally, the company implemented a scheme called flash attention. Invented in the Stanford University laboratory of Samba Nova founder Chris Re, flash attention is an algorithm that speeds transformer networks by minimizing writes to memory. When it first showed up in MLPerf benchmarks, flash attention shaved as much as 10 percent from training times. (Intel, too, used a version of flash attention but not for GPT-3. It instead used the algorithm for one of the new benchmarks, fine-tuning.) Using other software and network tricks, Nvidia delivered an 80 percent speedup in the text-to-image test, Stable Diffusion, versus its submission in November 2023. New benchmarks MLPerf adds new benchmarks and upgrades old ones to stay relevant to what’s happening in the AI industry. This year saw the addition of fine-tuning and graph neural networks. Fine tuning takes an already trained LLM and specializes it for use in a particular field. Nvidia, for example took a trained 43-billion-parameter model and trained it on the GPU-maker’s design files and documentation to create ChipNeMo, an AI intended to boost the productivity of its chip designers. At the time, the company’s chief technology officer Bill Dally said that training an LLM was like giving it a liberal arts education, and fine tuning was like sending it to graduate school. The MLPerf benchmark takes a pretrained Llama-2-70B model and asks the system to fine tune it using a dataset of government documents with the goal of generating more accurate document summaries. There are several ways to do fine-tuning. MLPerf chose one called low-rank adaptation (LoRA). The method winds up training only a small portion of the LLM’s parameters leading to a 3-fold lower burden on hardware and reduced use of memory and storage versus other methods, according to the organization. The other new benchmark involved a graph neural network (GNN). These are for problems that can be represented by a very large set of interconnected nodes, such as a social network or a recommender system. Compared to other AI tasks, GNNs require a lot of communication between nodes in a computer. The benchmark trained a GNN on a database that shows relationships about academic authors, papers, and institutes—a graph with 547 million nodes and 5.8 billion edges. The neural network was then trained to predict the right label for each node in the graph. Future fights Training rounds in 2025 may see head-to-head contests comparing new accelerators from AMD, Intel, and Nvidia. AMD’s MI300 series was launched about six months ago, and a memory-boosted upgrade the MI325x is planned for the end of 2024, with the next generation MI350 slated for 2025. Intel says its Gaudi 3, generally available to computer makers later this year, will appear in MLPerf’s upcoming inferencing benchmarks. Intel executives have said the new chip has the capacity to beat H100 at training LLMs. But the victory may be short-lived, as Nvidia has unveiled a new architecture, Blackwell, which is planned for late this year.

  • Giant Chips Give Supercomputers a Run for Their Money
    by Dina Genkina on 12. June 2024. at 14:00

    As large supercomputers keep getting larger, Sunnyvale, California-based Cerebras has been taking a different approach. Instead of connecting more and more GPUs together, the company has been squeezing as many processors as it can onto one giant wafer. The main advantage is in the interconnects—by wiring processors together on-chip, the wafer-scale chip bypasses many of the computational speed losses that come from many GPUs talking to each other, as well as losses from loading data to and from memory. Now, Cerebras has flaunted the advantages of their wafer-scale chips in two separate but related results. First, the company demonstrated that its second generation wafer-scale engine, WSE-2, was significantly faster than world’s fastest supercomputer, Frontier, in molecular dynamics calculations—the field that underlies protein folding, modeling radiation damage in nuclear reactors, and other problems in material science. Second, in collaboration with machine learning model optimization company Neural Magic, Cerebras demonstrated that a sparse large language model could perform inference at one-third of the energy cost of a full model without losing any accuracy. Although the results are in vastly different fields, they were both possible because of the interconnects and fast memory access enabled by Cerebras’ hardware. Speeding Through the Molecular World “Imagine there’s a tailor and he can make a suit in a week,” says Cerebras CEO and co-founder Andrew Feldman. “He buys the neighboring tailor, and she can also make a suit in a week, but they can’t work together. Now, they can now make two suits in a week. But what they can’t do is make a suit in three and a half days.” According to Feldman, GPUs are like tailors that can’t work together, at least when it comes to some problems in molecular dynamics. As you connect more and more GPUs, they can simulate more atoms at the same time, but they can’t simulate the same number of atoms more quickly. Cerebras’ wafer-scale engine, however, scales in a fundamentally different way. Because the chips are not limited by interconnect bandwidth, they can communicate quickly, like two tailors collaborating perfectly to make a suit in three and a half days. “It’s difficult to create materials that have the right properties, that have a long lifetime and sufficient strength and don’t break.” —Tomas Oppelstrup, Lawrence Livermore National Laboratory To demonstrate this advantage, the team simulated 800,000 atoms interacting with each other, calculating the interactions in increments of one femtosecond at a time. Each step took just microseconds to compute on their hardware. Although that’s still 9 orders of magnitude slower than the actual interactions, it was also 179 times as fast as the Frontier supercomputer. The achievement effectively reduced a year’s worth of computation to just two days. This work was done in collaboration with Sandia, Lawrence Livermore, and Los Alamos National Laboratories. Tomas Oppelstrup, staff scientist at Lawrence Livermore National Laboratory, says this advance makes it feasible to simulate molecular interactions that were previously inaccessible. Oppelstrup says this will be particularly useful for understanding the longer-term stability of materials in extreme conditions. “When you build advanced machines that operate at high temperatures, like jet engines, nuclear reactors, or fusion reactors for energy production,” he says, “you need materials that can withstand these high temperatures and very harsh environments. It’s difficult to create materials that have the right properties, that have a long lifetime and sufficient strength and don’t break.” Being able to simulate the behavior of candidate materials for longer, Oppelstrup says, will be crucial to the material design and development process. Ilya Sharapov, principal engineer at Cerebras, say the company is looking forward to extending applications of its wafer-scale engine to a larger class of problems, including molecular dynamics simulations of biological processes and simulations of airflow around cars or aircrafts. Downsizing Large Language Models As large language models (LLMs) are becoming more popular, the energy costs of using them are starting to overshadow the training costs—potentially by as much as a factor of ten in some estimates. “Inference is is the primary workload of AI today because everyone is using ChatGPT,” says James Wang, director of product marketing at Cerebras, “and it’s very expensive to run especially at scale.” One way to reduce the energy cost (and speed) of inference is through sparsity—essentially, harnessing the power of zeros. LLMs are made up of huge numbers of parameters. The open-source Llama model used by Cerebras, for example, has 7 billion parameters. During inference, each of those parameters is used to crunch through the input data and spit out the output. If, however, a significant fraction of those parameters are zeros, they can be skipped during the calculation, saving both time and energy. The problem is that skipping specific parameters is a difficult to do on a GPU. Reading from memory on a GPU is relatively slow, because they’re designed to read memory in chunks, which means taking in groups of parameters at a time. This doesn’t allow GPUs to skip zeros that are randomly interspersed in the parameter set. Cerebras CEO Feldman offered another analogy: “It’s equivalent to a shipper, only wanting to move stuff on pallets because they don’t want to examine each box. Memory bandwidth is the ability to examine each box to make sure it’s not empty. If it’s empty, set it aside and then not move it.” “There’s a million cores in a very tight package, meaning that the cores have very low latency, high bandwidth interactions between them.” —Ilya Sharapov, Cerebras Some GPUs are equipped for a particular kind of sparsity, called 2:4, where exactly two out of every four consecutively stored parameters are zeros. State-of-the-art GPUs have terabytes per second of memory bandwidth. The memory bandwidth of Cerebras’ WSE-2 is more than one thousand times as high, at 20 petabytes per second. This allows for harnessing unstructured sparsity, meaning the researchers can zero out parameters as needed, wherever in the model they happen to be, and check each one on the fly during a computation. “Our hardware is built right from day one to support unstructured sparsity,” Wang says. Even with the appropriate hardware, zeroing out many of the model’s parameters results in a worse model. But the joint team from Neural Magic and Cerebras figured out a way to recover the full accuracy of the original model. After slashing 70 percent of the parameters to zero, the team performed two further phases of training to give the non-zero parameters a chance to compensate for the new zeros. This extra training uses about 7 percent of the original training energy, and the companies found that they recover full model accuracy with this training. The smaller model takes one-third of the time and energy during inference as the original, full model. “What makes these novel applications possible in our hardware,” Sharapov says, “Is that there’s a million cores in a very tight package, meaning that the cores have very low latency, high bandwidth interactions between them.”

  • Is the Future of Moore’s Law in a Particle Accelerator?
    by John Boyd on 10. June 2024. at 13:00

    As Intel, Samsung, TSMC, and Japan’s upcoming advanced foundry Rapidus each make their separate preparations to cram more and more transistors into every square millimeter of silicon, one thing they all have in common is that the extreme ultraviolet (EUV) lithography technology underpinning their efforts is extremely complex, extremely expensive, and extremely costly to operate. A prime reason is that the source of this system’s 13.5-nanometer light is the precise and costly process of blasting flying droplets of molten tin with the most powerful commercial lasers on the planet. But an unconventional alternative is in the works. A group of researchers at the High Energy Accelerator Research Organization, known as KEK, in Tsukuba, Japan, is betting EUV lithography might be cheaper, quicker, and more efficient if it harnesses the power of a particle accelerator. Even before the first EUV machines had been installed in fabs, researchers saw possibilities for EUV lithography using a powerful light source called a free-electron laser (FEL), which is generated by a particle accelerator. However, not just any particle accelerator will do, say the scientists at KEK. They claim the best candidate for EUV lithography incorporates the particle-accelerator version of regenerative braking. Known as an energy recovery linear accelerator, it could enable a free electron laser to economically generate tens of kilowatts of EUV power. This is more than enough to drive not one but many next-generation lithography machines simultaneously, pushing down the cost of advanced chipmaking. “The FEL beam’s extreme power, its narrow spectral width, and other features make it suitable as an application for future lithography,” Norio Nakamura, researcher in advanced light sources at KEK, told me on a visit to the facility. Linacs Vs. Laser-Produced Plasma Today’s EUV systems are made by a single manufacturer, ASML, headquartered in Veldhoven, Netherlands. When ASML introduced the first generation of these US $100-million-plus precision machines in 2016, the industry was desperate for them. Chipmakers had been getting by with workaround after workaround for the then most advanced system, lithography using 193-nm light. Moving to a much shorter, 13.5-nm wavelength was a revolution that would collapse the number of steps needed in chipmaking and allow Moore’s Law to continue well into the next decade. The chief cause of the continual delays was a light source that was too dim. The technology that ultimately delivered a bright enough source of EUV light is called laser-produced plasma, or EUV-LPP. It employs a carbon dioxide laser to blast molten droplets of tin into plasma thousands of times per second. The plasma emits a spectrum of photonic energy, and specialized optics then capture the necessary 13.5-nm wavelength from the spectrum and guide it through a sequence of mirrors. Subsequently, the EUV light is reflected off a patterned mask and then projected onto a silicon wafer. The experimental compact energy recovery linac at KEK uses most of the energy from electrons on a return journey to speed up a new set of electrons.KEK It all adds up to a highly complex process. And although it starts off with kilowatt-consuming lasers, the amount of EUV light that is reflected onto the wafer is just several watts. The dimmer the light, the longer it takes to reliably expose a pattern on the silicon. Without enough photons carrying the pattern, EUV would be uneconomically slow. And pushing too hard for speed can lead to costly errors. When the machines were first introduced, the power level was enough to process about 100 wafers per hour. Since then, ASML has managed to steadily hike the output to about 200 wafers per hour for the present series of machines. ASML’s current light sources are rated at 500 watts. But for the even finer patterning needed in the future, Nakamura says it could take 1 kilowatt or more. ASML says it has a road map to develop a 1,000-W light source. But it could be difficult to achieve, says Nakamura, who formerly led the beam dynamics and magnet group at KEK and came out of retirement to work on the EUV project. Difficult but not necessarily impossible. Doubling the source power is “very challenging,” agrees Ahmed Hassanein who leads the Center for Materials Under Extreme Environment, at Purdue University, in Indiana. But he points out that ASML has achieved similarly difficult targets in the past using an integrated approach of improving and optimizing the light source and other components, and he isn’t ruling out a repeat. In a free electron laser, accelerated electrons are subject to alternating magnetic fields, causing them to undulate and emit electromagnetic radiation. The radiation bunches up the electrons, leading to their amplifying only a specific wavelength, creating a laser beam.Chris Philpot But brightness isn’t the only issue ASML faces with laser-produced plasma sources. “There are a number of challenging issues in upgrading to higher EUV power,” says Hassanein. He rattles off several, including “contamination, wavelength purity, and the performance of the mirror-collection system.” High operating costs are another problem. These systems consume some 600 liters of hydrogen gas per minute, most of which goes into keeping tin and other contaminants from getting onto the optics and wafers. (Recycling, however, could reduce this figure.) But ultimately, operating costs come down to electricity consumption. Stephen Benson, recently retired senior research scientist at the Thomas Jefferson National Accelerator Facility, in Virginia., estimates that the wall-plug efficiency of the whole EUV-LPP system might be less than 0.1 percent. Free electron lasers, like the one KEK is developing, could be as much as 10 to 100 times as efficient, he says. The Energy Recovery Linear Accelerator The system KEK is developing generates light by boosting electrons to relativistic speeds and then deviating their motion in a particular way. The process starts, Nakamura explains, when an electron gun injects a beam of electrons into a meters-long cryogenically cooled tube. Inside this tube, superconductors deliver radio-frequency (RF) signals that drive the electrons along faster and faster. The electrons then make a 180-degree turn and enter a structure called an undulator, a series of oppositely oriented magnets. (The KEK system currently has two.) The undulators force the speeding electrons to follow a sinusoidal path, and this motion causes the electrons to emit light. In linear accelerator, injected electrons gain energy from an RF field. Ordinarily, the electrons would then enter a free electron laser and are immediately disposed of in a beam dump. But in an energy recovery linear accelerator (ERL), the electrons circle back into the RF field and lend their energy to newly injected electrons before exiting to a beam dump. What happens next is a phenomenon called self-amplified spontaneous emissions, or SASE. The light interacts with the electrons, slowing some and speeding up others, so they gather into “microbunches,” peaks in density that occur periodically along the undulator’s path. The now-structured electron beam amplifies only the light that’s in phase with the period of these microbunches, generating a coherent beam of laser light. It’s at this point that KEK’s compact energy recovery linac (cERL), diverges from lasers driven by conventional linear accelerators. Ordinarily, the spent beam of electrons is disposed of by diverting the particles into what is called a beam dump. But in the cERL, the electrons first loop back into the RF accelerator. This beam is now in the opposite phase to newly injected electrons that are just starting their journey. The result is that the spent electrons transfer much of their energy to the new beam, boosting its energy. Once the original electrons have had some of their energy drained away like this, they are diverted into a beam dump. “The acceleration energy in the linac is recovered, and the dumped beam power is drastically reduced compared to [that of] an ordinary linac,” Nakamura explains to me while scientists in another room operate the laser. Reusing the electrons’ energy means that for the same amount of electricity the system sends more current through the accelerator and can fire the laser more frequently, he says. Other experts agree. The energy-recover linear accelerator’s improved efficiency can lower costs, “which is a major concern of using EUV laser-produced plasma,” says Hassanein. The Energy Recovery Linac for EUV The KEK compact energy-recovery linear accelerator was initially constructed between 2011 and 2013 with the aim of demonstrating its potential as a synchrotron radiation source for researchers working for the institution’s physics and materials-science divisions. But researchers were dissatisfied with the planned system, which had a lower performance target than could be achieved by some storage ring-based synchrotrons—huge circular accelerators that keep a beam of electrons moving with a constant kinetic energy. So, the KEK researchers went in search of a more appropriate application. After talking with Japanese tech companies, including Toshiba, which had a flash memory chip division at the time, the researchers conducted an initial study that confirmed that a kilowatt-class light source was possible with a compact energy-recovery linear accelerator. And so, the EUV free-electron-laser project was born. In 2019 and 2020, the researchers modified the existing experimental accelerator to start the journey to EUV light. The system is housed in an all-concrete room to protect researchers from the intense electromagnetic radiation produced during operation. The room is some 60 meters long and 20 meters wide with much of the space taken up by a bewildering tangle of complex equipment, pipes, and cables that snakes along both sides of its length in the form of an elongated racetrack. The accelerator is not yet able to generate EUV wavelengths. With an electron beam energy of 17 megaelectronvolts, the researchers have been able to generate SASE emissions in bursts of 20-micrometer infrared light. Early test results were published in the Japanese Journal of Applied Physics in April 2023. The next step, which is underway, is to generate much greater laser power in continuous-wave mode. To be sure, 20 micrometers is a far cry from 13.5 nanometers. And there are already types of particle accelerators that produce synchrotron radiation of even shorter wavelengths than EUV. But lasers based on energy-recovery linear accelerators could generate significantly more EUV power due to their inherent efficiency, the KEK researchers claim. In synchrotron radiation sources, light intensity increases proportionally to the number of injected electrons. By comparison, in free-electron laser systems, light intensity increases roughly with the square of the number of injected electrons, resulting in much more brightness and power. For an energy-recovery linear accelerator to reach the EUV range will require equipment upgrades beyond what KEK currently has room for. So, the researchers are now making the case for constructing a new prototype system that can produce the needed 800 MeV. An electron gun injects charge into the compact energy recovery linear accelerator at KEK.KEK In 2021, before severe inflation affected economies around the globe, the KEK team estimated the construction cost (excluding land) for a new system at 40 billion yen ($260 million) for a system that delivers 10 kW of EUV and supplies multiple lithography machines. Annual running costs were judged to be about 4 billion yen. So even taking recent inflation into account, “the estimated costs per exposure tool in our setup are still rather low compared to the estimated costs” for today’s laser-produced plasma source, says Nakamura. There are plenty of technical challenges to work out before such a system can achieve the high levels of performance and stability of operations demanded by semiconductor manufacturers, admits Nakamura. The team will have to develop new editions of key components such as the superconducting cavity, the electron gun, and the undulator. Engineers will also have to develop good procedural techniques to ensure, for instance, that the electron beam does not degrade or falter during operations. And to ensure their approach is cost effective enough to grab the attention of chipmakers, the researchers will need to create a system that can reliably transport more than 1 kW of EUV power simultaneously to multiple lithography machines. The researchers already have a conceptual design for an arrangement of special mirrors that would convey the EUV light to multiple exposure tools without significant loss of power or damage to the mirrors. Other EUV Possibilities It’s too early in the development of EUV free-electron lasers for rapidly expanding chipmakers to pay it much attention. But the KEK team is not alone in chasing the technology. A venture-backed startup xLight, in Palo Alto, Calif. is also among those chasing it. The company, which is packed with particle-accelerator veterans from the Stanford Linear Accelerator and elsewhere, recently inked an R&D deal with Fermi National Accelerator Laboratory, in Illinois, to develop superconducting cavities and cryomodule technology. Attempts to contact xLight went unanswered, but in January, the company took part in the 8th Workshop EUV-FEL in Tokyo, and former CEO Erik Hosler gave a presentation on the technology. Significantly, ASML considered turning to particle accelerators a decade ago and again more recently when it compared the progress of free-electron laser technology to the laser-produced plasma road map. But company executives decided LLP presented fewer risks. And, indeed, it is a risky road. Independent views on KEK’s project emphasize that reliability and funding will be the biggest challenges the researchers face going forward. “The R&D road map will involve numerous demanding stages in order to develop a reliable, mature system,” says Hassanein. “This will require serious investment and take considerable time.” “The machine design must be extremely robust, with redundancy built in,” adds retired research scientist Benson. The design must also ensure that components are not damaged from radiation or laser light.” And this must be accomplished “without compromising performance, which must be good enough to ensure decent wall-plug efficiency.” More importantly, Benson warns that without a forthcoming commitment to invest in the technology, “development of EUV-FELs might not come in time to help the semiconductor industry.”

  • The Mythical Non-Roboticist
    by Benjie Holson on 9. June 2024. at 13:00

    The original version of this post by Benjie Holson was published on Substack here, and includes Benjie’s original comics as part of his series on robots and startups. I worked on this idea for months before I decided it was a mistake. The second time I heard someone mention it, I thought, “That’s strange, these two groups had the same idea. Maybe I should tell them it didn’t work for us.” The third and fourth time I rolled my eyes and ignored it. The fifth time I heard about a group struggling with this mistake, I decided it was worth a blog post all on its own. I call this idea “The Mythical Non-Roboticist.” The Mistake The idea goes something like this: Programming robots is hard. And there are some people with really arcane skills and PhDs who are really expensive and seem to be required for some reason. Wouldn’t it be nice if we could do robotics without them? 1 What if everyone could do robotics? That would be great, right? We should make a software framework so that non-roboticists can program robots. This idea is so close to a correct idea that it’s hard to tell why it doesn’t work out. On the surface, it’s not wrong: All else being equal, it would be good if programming robots was more accessible. The problem is that we don’t have a good recipe for making working robots. So we don’t know how to make that recipe easier to follow. In order to make things simple, people end up removing things that folks might need, because no one knows for sure what’s absolutely required. It’s like saying you want to invent an invisibility cloak and want to be able to make it from materials you can buy from Home Depot. Sure, that would be nice, but if you invented an invisibility cloak that required some mercury and neodymium to manufacture would you toss the recipe? In robotics, this mistake is based on a very true and very real observation: Programming robots is super hard. Famously hard. It would be super great if programming robots was easier. The issue is this: Programming robots has two different kinds of hard parts. Robots are hard because the world is complicated Moor Studio/Getty Images The first kind of hard part is that robots deal with the real world, imperfectly sensed and imperfectly actuated. Global mutable state is bad programming style because it’s really hard to deal with, but to robot software the entire physical world is global mutable state, and you only get to unreliably observe it and hope your actions approximate what you wanted to achieve. Getting robotics to work at all is often at the very limit of what a person can reason about, and requires the flexibility to employ whatever heuristic might work for your special problem. This is the intrinsic complexity of the problem: Robots live in complex worlds, and for every working solution there are millions of solutions that don’t work, and finding the right one is hard, and often very dependent on the task, robot, sensors, and environment. Folks look at that challenge, see that it is super hard, and decide that, sure, maybe some fancy roboticist could solve it in one particular scenario, but what about “normal” people? “We should make this possible for non-roboticists” they say. I call these users “Mythical Non-Roboticists” because once they are programming a robot, I feel they become roboticists. Isn’t anyone programming a robot for a purpose a roboticist? Stop gatekeeping, people. Don’t design for amorphous groups I call also them “mythical” because usually the “non-roboticist” implied is a vague, amorphous group. Don’t design for amorphous groups. If you can’t name three real people (that you have talked to) that your API is for, then you are designing for an amorphous group and only amorphous people will like your API. And with this hazy group of users in mind (and seeing how difficult everything is), folks think, “Surely we could make this easier for everyone else by papering over these things with simple APIs?” No. No you can’t. Stop it. You can’t paper over intrinsic complexity with simple APIs because if your APIs are simple they can’t cover the complexity of the problem. You will inevitably end up with a beautiful looking API, with calls like “grasp_object” and “approach_person” which demo nicely in a hackathon kickoff but last about 15 minutes of someone actually trying to get some work done. It will turn out that, for their particular application, “grasp_object()” makes 3 or 4 wrong assumptions about “grasp” and “object” and doesn’t work for them at all. Your users are just as smart as you This is made worse by the pervasive assumption that these people are less savvy (read: less intelligent) than the creators of this magical framework. 2 That feeling of superiority will cause the designers to cling desperately to their beautiful, simple “grasp_object()”s and resist adding the knobs and arguments needed to cover more use cases and allow the users to customize what they get. Ironically this foists a bunch of complexity on to the poor users of the API who have to come up with clever workarounds to get it to work at all. Moor Studio/Getty Images The sad, salty, bitter icing on this cake-of-frustration is that, even if done really well, the goal of this kind of framework would be to expand the group of people who can do the work. And to achieve that, it would sacrifice some performance you can only get by super-specializing your solution to your problem. If we lived in a world where expert roboticists could program robots that worked really well, but there was so much demand for robots that there just wasn’t enough time for those folks to do all the programming, this would be a great solution. 3 The obvious truth is that (outside of really constrained environments like manufacturing cells) even the very best collection of real bone-fide, card-carrying roboticists working at the best of their ability struggle to get close to a level of performance that makes the robots commercially viable, even with long timelines and mountains of funding. 4 We don’t have any headroom to sacrifice power and effectiveness for ease. What problem are we solving? So should we give up making it easier? Is robotic development available only to a small group of elites with fancy PhDs? 5 No to both! I have worked with tons of undergrad interns who have been completely able to do robotics.6 I myself am mostly self-taught in robot programming.7 While there is a lot of intrinsic complexity in making robots work, I don’t think there is any more than, say, video game development. In robotics, like in all things, experience helps, some things are teachable, and as you master many areas you can see things start to connect together. These skills are not magical or unique to robotics. We are not as special as we like to think we are. But what about making programming robots easier? Remember way back at the beginning of the post when I said that there were two different kinds of hard parts? One is the intrinsic complexity of the problem, and that one will be hard no matter what. 8 But the second is the incidental complexity, or as I like to call it, the stupid BS complexity. Stupid BS Complexity Robots are asynchronous, distributed, real-time systems with weird hardware. All of that will be hard to configure for stupid BS reasons. Those drivers need to work in the weird flavor of Linux you want for hard real-time for your controls and getting that all set up will be hard for stupid BS reasons. You are abusing Wi-Fi so you can roam seamlessly without interruption but Linux’s Wi-Fi will not want to do that. Your log files are huge and you have to upload them somewhere so they don’t fill up your robot. You’ll need to integrate with some cloud something or other and deal with its stupid BS. 9 Moor Studio/Getty Images There is a ton of crap to deal with before you even get to complexity of dealing with 3D rotation, moving reference frames, time synchronization, messaging protocols. Those things have intrinsic complexity (you have to think about when something was observed and how to reason about it as other things have moved) and stupid BS complexity (There’s a weird bug because someone multiplied two transform matrices in the wrong order and now you’re getting an error message that deep in some protocol a quaternion is not normalized. WTF does that mean?) 10 One of the biggest challenges of robot programming is wading through the sea of stupid BS you need to wrangle in order to start working on your interesting and challenging robotics problem. So a simple heuristic to make good APIs is: Design your APIs for someone as smart as you, but less tolerant of stupid BS. That feels universal enough that I’m tempted to call it Holson’s Law of Tolerable API Design. When you are using tools you’ve made, you know them well enough to know the rough edges and how to avoid them. But rough edges are things that have to be held in a programmer’s memory while they are using your system. If you insist on making a robotics framework 11, you should strive to make it as powerful as you can with the least amount of stupid BS. Eradicate incidental complexity everywhere you can. You want to make APIs that have maximum flexibility but good defaults. I like python’s default-argument syntax for this because it means you can write APIs that can be used like: It is possible to have easy things be simple and allow complex things. And please, please, please don’t make condescending APIs. Thanks! 1. Ironically it is very often the expensive arcane-knowledge-having PhDs who are proposing this. 2. Why is it always a framework? 3. The exception that might prove the rule is things like traditional manufacturing-cell automation. That is a place where the solutions exist, but the limit to expanding is set up cost. I’m not an expert in this domain, but I’d worry that physical installation and safety compliance might still dwarf the software programming cost, though. 4. As I well know from personal experience. 5. Or non-fancy PhDs for that matter? 6. I suspect that many bright highschoolers would also be able to do the work. Though, as Google tends not to hire them, I don’t have good examples. 7. My schooling was in Mechanical Engineering and I never got a PhD, though my ME classwork did include some programming fundamentals. 8. Unless we create effective general purpose AI. It feels weird that I have to add that caveat, but the possibility that it’s actually coming for robotics in my lifetime feels much more possible than it did two years ago. 9. And if you are unlucky, its API was designed by someone who thought they were smarter than their customers. 10. This particular flavor of BS complexity is why I wrote If you do robotics, you should check it out. 11. Which, judging by the trail of dead robot-framework-companies, is a fraught thing to do.

  • Video Friday: 1X Robots Tidy Up
    by Evan Ackerman on 7. June 2024. at 16:00

    Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion. RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS ICRA@40: 23–26 September 2024, ROTTERDAM, NETHERLANDS IROS 2024: 14–18 October 2024, ABU DHABI, UNITED ARAB EMIRATES ICSR 2024: 23–26 October 2024, ODENSE, DENMARK Cybathlon 2024: 25–27 October 2024, ZURICH Enjoy today’s videos! In this video, you see the start of 1X’s development of an advanced AI system that chains simple tasks into complex actions using voice commands, allowing seamless multi-robot control and remote operation. By starting with single-task models, we ensure smooth transitions to more powerful unified models, ultimately aiming to automate high-level actions using AI. This video does not contain teleoperation, computer graphics, cuts, video speedups, or scripted trajectory playback. It’s all controlled via neural networks. [ 1X ] As the old adage goes, one cannot claim to be a true man without a visit to the Great Wall of China. XBot-L, a full-sized humanoid robot developed by Robot Era, recently acquitted itself well in a walk along sections of the Great Wall. [ Robot Era ] The paper presents a novel rotary wing platform, that is capable of folding and expanding its wings during flight. Our source of inspiration came from birds’ ability to fold their wings to navigate through small spaces and dive. The design of the rotorcraft is based on the monocopter platform, which is inspired by the flight of Samara seeds. [ AirLab ] We present a variable stiffness robotic skin (VSRS), a concept that integrates stiffness-changing capabilities, sensing, and actuation into a single, thin modular robot design. Reconfiguring, reconnecting, and reshaping VSRSs allows them to achieve new functions both on and in the absence of a host body. [ Yale Faboratory ] Heimdall is a new rover design for the 2024 University Rover Challenge (URC). This video shows highlights of Heimdall’s trip during the four missions at URC 2024. Heimdall features a split body design with whegs (wheel legs), and a drill for sub-surface sample collection. It also has the ability to manipulate a variety of objects, collect surface samples, and perform onboard spectrometry and chemical tests. [ WVU ] I think this may be the first time I’ve seen an autonomous robot using a train? This one is delivering lunch boxes! [ JSME ] The AI system used identifies and separates red apples from green apples, after which a robotic arm picks up the red apples identified with a qb SoftHand Industry and gently places them in a basket. My favorite part is the magnetic apple stem system. [ QB Robotics ] DexNex (v0, June 2024) is an anthropomorphic teleoperation testbed for dexterous manipulation at the Center for Robotics and Biosystems at Northwestern University. DexNex recreates human upper-limb functionality through a near 1-to-1 mapping between Operator movements and Avatar actions. Motion of the Operator’s arms, hands, fingers, and head are fed forward to the Avatar, while fingertip pressures, finger forces, and camera images are fed back to the Operator. DexNex aims to minimize the latency of each subsystem to provide a seamless, immersive, and responsive user experience. Future research includes gaining a better understanding of the criticality of haptic and vision feedback for different manipulation tasks; providing arm-level grounded force feedback; and using machine learning to transfer dexterous skills from the human to the robot. [ Northwestern ] Sometimes the best path isn’t the smoothest or straightest surface, it’s the path that’s actually meant to be a path. [ RaiLab ] Fulfilling a school requirement by working in a Romanian locomotive factory one week each month, Daniela Rus learned to operate “machines that help us make things.” Appreciation for the practical side of math and science stuck with Daniela, who is now Director of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). [ MIT ] For AI to achieve its full potential, non-experts need to be let into the development process, says Rumman Chowdhury, CEO and cofounder of Humane Intelligence. She tells the story of farmers fighting for the right to repair their own AI-powered tractors (which some manufacturers actually made illegal), proposing everyone should have the ability to report issues, patch updates or even retrain AI technologies for their specific uses. [ TED ]

  • Errors in Navigational Models Could Have an Easy Answer
    by Rahul Rao on 7. June 2024. at 11:00

    Just as early mariners used simple compasses to chart courses across the sea, today’s ships, planes, satellites, and smartphones can rely on Earth’s magnetic field to find their bearings. The difference is that today’s rather more sophisticated compasses have the aid of complex models, like the commonly used World Magnetic Model (WMM), that try to capture the multifaceted processes that create Earth’s magnetosphere. A compass can rely on the WMM or similar models to convert a needle pointing to magnetic north to a heading with respect to true north. (The two norths differ by ever-changing angles.) These models are not perfect: There are differences between the magnetosphere that they predict and the magnetosphere that satellites observe. Scientists have traditionally ascribed these differences to space currents that flow through the magnetic field high in Earth’s upper atmosphere. But new research complicates the picture, suggesting that the differences are the result of observational biases, incomplete models, or both. For craft that require sensitive navigation, particularly around Earth’s poles, any of these complications pose a problem. And those problems stand to grow as polar ice melts around the North Pole, opening up potential new shipping routes. Earth’s magnetic field is multifaceted and complex, but models like the WMM can project it out a few years at a time. The WMM’s current edition, released in December 2019, contains estimates of Earth’s magnetic field from the start of 2020 to the end of 2024. (The next version, covering 2025 through 2029, is scheduled for release in December of this year.) “Compasses need to account for space currents already, but this adds more complication and sources of noise that have to be dealt with.” —Mark Moldwin, University of Michigan These models do not always account for space currents, which are often pushed around by extraterrestrial forces like the solar wind. But if space currents are responsible for the discrepancies between models and observations, scientists could identify them by simply finding the differences, which they call “residuals.” Moreover, there would then be little reason for one of Earth’s hemispheres to display more residuals than the other—except that’s what existing models predict. But the new study’s authors, space physicists Yining Shi and Mark Moldwin from the University of Michigan, had been among a number of researchers who had spotted an imbalance in residuals. More residuals seemed to emerge from the magnetic woodwork, so to speak, in the southern hemisphere than in the Northern Hemisphere. “We wanted to take a closer look at them,” Moldwin said. Shi and Moldwin compared estimates between 2014 and 2020 from another Earth magnetic field model, IGRF-13, with observations from the European Space Agency’s Swarm mission, a trio of satellites that have continually measured Earth’s magnetic field since their 2014 launch. When they focused on residuals over that time period, they did indeed find about 12 percent more major residuals in the Southern Hemisphere than in the Northern. All of these large residuals were found in the polar regions. Many were concentrated at latitudes of 70 degrees north and south, where scientists expect to find space currents. But another spate of residuals were concentrated closer to Earth’s geographic poles, about 80 degrees north and south, where they have no obvious geophysical explanation. Moreover, the distributions of these poles differed—matching the fact that Earth’s geographic poles map to different magnetic coordinates. This second peak in particular led the researchers to consider alternative explanations. It is possible, for instance, that IGRF-13 simply does not capture all of the factors driving Earth’s magnetosphere around the poles. But another cause could be the satellites themselves. Shi and Moldwin say that, because Swarm satellites reside in orbits that cross the poles, Earth’s northern and southern polar regions are overrepresented in their magnetic measurements. “Compasses need to account for space currents already, but this adds more complication and sources of noise that have to be dealt with,” Moldwin said. Now, Shi is examining these residuals more closely to pick apart the causes of the residuals—which ones have actual geophysical explanations and which are the result of statistical errors. Shi and Moldwin published their work on 6 May in Journal of Geophysical Research: Space Physics.

  • This Japanese Aircraft Became a 5G Base Station
    by Tim Hornyak on 6. June 2024. at 14:51

    Skies over Tokyo are thick with air traffic these days amid an influx of international tourists. But one plane recently helped revive the dream of airborne Internet access for all. Researchers in Japan announced on 28 May that they have successfully tested 5G communications equipment in the 38 gigahertz band from an altitude of 4 kilometers. The experiment was aimed at developing an aerial relay backhaul with millimeter-wave band links between ground stations and a simulated High-Altitude Platform Station (HAPS), a radio station aboard an uncrewed aircraft that stays aloft in the stratosphere for extended periods of time. A Cessna flying out of Chofu Airfield in western Tokyo was outfitted with a 38 GHz 5G base station and core network device, and three ground stations were equipped with lens antennas with automatic tracking. With the Cessna as a relay station, the setup enabled communication between one ground station connected to the 5G terrestrial network and a terrestrial base station connected to a user terminal, according to a consortium of Japanese companies and the National Institute of Information and Communications Technology. “We developed technology that enables communication using 5G [New Radio] by correctly directing 38 GHz beams toward three ground stations while adapting to the flight attitude, speed, direction, position, altitude, etc. during aircraft rotation,” said Shinichi Tanaka, a manager in broadcaster SKY Perfect JSAT’s Space Business Division. “We confirmed that the onboard system, designed for the stratosphere, has adequate communication and tracking performance even under the flight speed and attitude fluctuations of a Cessna aircraft, which are more severe than those of HAPS.” The sharpest beam width of the ground station antenna is 0.8 degrees, and the trial demonstrated a tracking method that always captures the Cessna in this angular range, Tanaka added. A Cessna [top left] carried a 38 GHz antenna [top right] during a flight, functioning as a 5G base station for receivers on the ground [bottom right]. The plane was able to connect to multiple ground stations at once [illustration, bottom left].NTT Docomo Millimeter wave bands, such as the 38 GHz band, have the highest data capacity for 5G and are suited for crowded venues such as stadiums and shopping centers. When used outdoors, however, the signals can be attenuated by rain and other moisture in the atmosphere. To counter this, the consortium successfully tested an algorithm that automatically switches between multiple ground stations to compensate for moisture-weakened signals. Unlike Google’s failed Loon effort, which focused on providing direct communication to user terminals, the HAPS trial is aimed at creating backhaul lines for base stations. Led by Japan’s Ministry of Internal Affairs and Communications, the experiment is designed to deliver high-speed, high-capacity communications both for the development of 5G and 6G networks as well as emergency response. The latter is critical in disaster-prone Japan—in January, communication lines around the Noto Peninsula on the Sea of Japan were severed following a magnitude-7 earthquake that caused over 1,500 casualties. “This is the world’s first successful 5G communication experiment via the sky using the Q-band frequency,” said Hinata Kohara, a researcher with mobile carrier NTT Docomo’s 6G Network Innovation Department. “In addition, the use of 5G communication base stations and core network equipment on the aircraft for communication among multiple ground stations enables flexible and fast route switching of the ground [gateway] station for a feeder link, and is robust against propagation characteristics such as rainfall. Another key feature is the use of a full digital beamforming method for beam control, which uses multiple independent beams to improve frequency utilization efficiency.” Doppler shift compensation was a challenge in the experiment, Kohara said, adding that the researchers will conduct further tests to find a solution with the aim of commercializing a HAPS service in 2026. Aside from SKY Perfect JSAT and NTT Docomo, the consortium includes Panasonic Holdings, known for its electronics equipment. The HAPS push comes as NTT Docomo announced it has led another consortium in a $100 million investment in Airbus’ AALTO HAPS, operator of the Zephyr fixed-wing uncrewed aerial vehicle. The solar-powered wing can be used for 5G direct-to-device communications or Earth observation, and has set records including 64 days of stratospheric flight. According to Airbus, it has a reach of “up to 250 terrestrial towers in difficult mountainous terrain.” Docomo said the investment is aimed at commercializing Zephyr services in Japan, including coverage of rural areas and disaster zones, and around the world in 2026.

  • How Large Language Models Are Changing My Job
    by Glenn Zorpette on 6. June 2024. at 13:59

    Generative artificial intelligence, and large language models in particular, are starting to change how countless technical and creative professionals do their jobs. Programmers, for example, are getting code segments by prompting large language models. And graphic arts software packages such as Adobe Illustrator already have tools built in that let designers conjure illustrations, images, or patterns by describing them. But such conveniences barely hint at the massive, sweeping changes to employment predicted by some analysts. And already, in ways large and small, striking and subtle, the tech world’s notables are grappling with changes, both real and envisioned, wrought by the onset of generative AI. To get a better idea of how some of them view the future of generative AI, IEEE Spectrum asked three luminaries—an academic leader, a regulator, and a semiconductor industry executive—about how generative AI has begun affecting their work. The three, Andrea Goldsmith, Juraj Čorba, and Samuel Naffziger, agreed to speak with Spectrum at the 2024 IEEE VIC Summit & Honors Ceremony Gala, held in May in Boston. Click to read more thoughts from: Andrea Goldsmith, dean of engineering at Princeton University. Juraj Čorba, senior expert on digital regulation and governance, Slovak Ministry of Investments, Regional Development Samuel Naffziger, senior vice president and a corporate fellow at Advanced Micro Devices Andrea Goldsmith Andrea Goldsmith is dean of engineering at Princeton University. There must be tremendous pressure now to throw a lot of resources into large language models. How do you deal with that pressure? How do you navigate this transition to this new phase of AI? Andrea J. Goldsmith Andrea Goldsmith: Universities generally are going to be very challenged, especially universities that don’t have the resources of a place like Princeton or MIT or Stanford or the other Ivy League schools. In order to do research on large language models, you need brilliant people, which all universities have. But you also need compute power and you need data. And the compute power is expensive, and the data generally sits in these large companies, not within universities. So I think universities need to be more creative. We at Princeton have invested a lot of money in the computational resources for our researchers to be able to do—well, not large language models, because you can’t afford it. To do a large language model… look at OpenAI or Google or Meta. They’re spending hundreds of millions of dollars on compute power, if not more. Universities can’t do that. But we can be more nimble and creative. What can we do with language models, maybe not large language models but with smaller language models, to advance the state of the art in different domains? Maybe it’s vertical domains of using, for example, large language models for better prognosis of disease, or for prediction of cellular channel changes, or in materials science to decide what’s the best path to pursue a particular new material that you want to innovate on. So universities need to figure out how to take the resources that we have to innovate using AI technology. We also need to think about new models. And the government can also play a role here. The [U.S.] government has this new initiative, NAIRR, or National Artificial Intelligence Research Resource, where they’re going to put up compute power and data and experts for educators to use—researchers and educators. That could be a game-changer because it’s not just each university investing their own resources or faculty having to write grants, which are never going to pay for the compute power they need. It’s the government pulling together resources and making them available to academic researchers. So it’s an exciting time, where we need to think differently about research—meaning universities need to think differently. Companies need to think differently about how to bring in academic researchers, how to open up their compute resources and their data for us to innovate on. As a dean, you are in a unique position to see which technical areas are really hot, attracting a lot of funding and attention. But how much ability do you have to steer a department and its researchers into specific areas? Of course, I’m thinking about large language models and generative AI. Is deciding on a new area of emphasis or a new initiative a collaborative process? Goldsmith: Absolutely. I think any academic leader who thinks that their role is to steer their faculty in a particular direction does not have the right perspective on leadership. I describe academic leadership as really about the success of the faculty and students that you’re leading. And when I did my strategic planning for Princeton Engineering in the fall of 2020, everything was shut down. It was the middle of COVID, but I’m an optimist. So I said, “Okay, this isn’t how I expected to start as dean of engineering at Princeton.” But the opportunity to lead engineering in a great liberal arts university that has aspirations to increase the impact of engineering hasn’t changed. So I met with every single faculty member in the School of Engineering, all 150 of them, one-on-one over Zoom. And the question I asked was, “What do you aspire to? What should we collectively aspire to?” And I took those 150 responses, and I asked all the leaders and the departments and the centers and the institutes, because there already were some initiatives in robotics and bioengineering and in smart cities. And I said, “I want all of you to come up with your own strategic plans. What do you aspire to in these areas? And then let’s get together and create a strategic plan for the School of Engineering.” So that’s what we did. And everything that we’ve accomplished in the last four years that I’ve been dean came out of those discussions, and what it was the faculty and the faculty leaders in the school aspired to. So we launched a bioengineering institute last summer. We just launched Princeton Robotics. We’ve launched some things that weren’t in the strategic plan that bubbled up. We launched a center on blockchain technology and its societal implications. We have a quantum initiative. We have an AI initiative using this powerful tool of AI for engineering innovation, not just around large language models, but it’s a tool—how do we use it to advance innovation and engineering? All of these things came from the faculty because, to be a successful academic leader, you have to realize that everything comes from the faculty and the students. You have to harness their enthusiasm, their aspirations, their vision to create a collective vision. Juraj Čorba Juraj Čorba is senior expert on digital regulation and governance, Slovak Ministry of Investments, Regional Development, and Information, and Chair of the Working Party on Governance of AI at the Organization for Economic Cooperation and Development. What are the most important organizations and governing bodies when it comes to policy and governance on artificial intelligence in Europe? Juraj Čorba Juraj Čorba: Well, there are many. And it also creates a bit of a confusion around the globe—who are the actors in Europe? So it’s always good to clarify. First of all we have the European Union, which is a supranational organization composed of many member states, including my own Slovakia. And it was the European Union that proposed adoption of a horizontal legislation for AI in 2021. It was the initiative of the European Commission, the E.U. institution, which has a legislative initiative in the E.U. And the E.U. AI Act is now finally being adopted. It was already adopted by the European Parliament. So this started, you said 2021. That’s before ChatGPT and the whole large language model phenomenon really took hold. Čorba: That was the case. Well, the expert community already knew that something was being cooked in the labs. But, yes, the whole agenda of large models, including large language models, came up only later on, after 2021. So the European Union tried to reflect that. Basically, the initial proposal to regulate AI was based on a blueprint of so-called product safety, which somehow presupposes a certain intended purpose. In other words, the checks and assessments of products are based more or less on the logic of the mass production of the 20th century, on an industrial scale, right? Like when you have products that you can somehow define easily and all of them have a clearly intended purpose. Whereas with these large models, a new paradigm was arguably opened, where they have a general purpose. So the whole proposal was then rewritten in negotiations between the Council of Ministers, which is one of the legislative bodies, and the European Parliament. And so what we have today is a combination of this old product-safety approach and some novel aspects of regulation specifically designed for what we call general-purpose artificial intelligence systems or models. So that’s the E.U. By product safety, you mean, if AI-based software is controlling a machine, you need to have physical safety. Čorba: Exactly. That’s one of the aspects. So that touches upon the tangible products such as vehicles, toys, medical devices, robotic arms, et cetera. So yes. But from the very beginning, the proposal contained a regulation of what the European Commission called stand-alone systems—in other words, software systems that do not necessarily command physical objects. So it was already there from the very beginning, but all of it was based on the assumption that all software has its easily identifiable intended purpose—which is not the case for general-purpose AI. Also, large language models and generative AI in general brings in this whole other dimension, of propaganda, false information, deepfakes, and so on, which is different from traditional notions of safety in real-time software. Čorba: Well, this is exactly the aspect that is handled by another European organization, different from the E.U., and that is the Council of Europe. It’s an international organization established after the Second World War for the protection of human rights, for protection of the rule of law, and protection of democracy. So that’s where the Europeans, but also many other states and countries, started to negotiate a first international treaty on AI. For example, the United States have participated in the negotiations, and also Canada, Japan, Australia, and many other countries. And then these particular aspects, which are related to the protection of integrity of elections, rule-of-law principles, protection of fundamental rights or human rights under international law—all these aspects have been dealt with in the context of these negotiations on the first international treaty, which is to be now adopted by the Committee of Ministers of the Council of Europe on the 16th and 17th of May. So, pretty soon. And then the first international treaty on AI will be submitted for ratifications. So prompted largely by the activity in large language models, AI regulation and governance now is a hot topic in the United States, in Europe, and in Asia. But of the three regions, I get the sense that Europe is proceeding most aggressively on this topic of regulating and governing artificial intelligence. Do you agree that Europe is taking a more proactive stance in general than the United States and Asia? Čorba: I’m not so sure. If you look at the Chinese approach and the way they regulate what we call generative AI, it would appear to me that they also take it very seriously. They take a different approach from the regulatory point of view. But it seems to me that, for instance, China is taking a very focused and careful approach. For the United States, I wouldn’t say that the United States is not taking a careful approach because last year you saw many of the executive orders, or even this year, some of the executive orders issued by President Biden. Of course, this was not a legislative measure, this was a presidential order. But it seems to me that the United States is also trying to address the issue very actively. The United States has also initiated the first resolution of the General Assembly at the U.N. on AI, which was passed just recently. So I wouldn’t say that the E.U. is more aggressive in comparison with Asia or North America, but maybe I would say that the E.U. is the most comprehensive. It looks horizontally across different agendas and it uses binding legislation as a tool, which is not always the case around the world. Many countries simply feel that it’s too early to legislate in a binding way, so they opt for soft measures or guidance, collaboration with private companies, et cetera. Those are the differences that I see. Do you think you perceive a difference in focus among the three regions? Are there certain aspects that are being more aggressively pursued in the United States than in Europe or vice versa? Čorba: Certainly the E.U. is very focused on the protection of human rights, the full catalog of human rights, but also, of course, on safety and human health. These are the core goals or values to be protected under the E.U. legislation. As for the United States and for China, I would say that the primary focus in those countries—but this is only my personal impression—is on national and economic security. Samuel Naffziger Samuel Naffziger is senior vice president and a corporate fellow at Advanced Micro Devices, where he is responsible for technology strategy and product architectures. Naffziger was instrumental in AMD’s embrace and development of chiplets, which are semiconductor dies that are packaged together into high-performance modules. To what extent is large language model training starting to influence what you and your colleagues do at AMD? Samuel Naffziger Samuel Naffziger: Well, there are a couple levels of that. LLMs are impacting the way a lot of us live and work. And we certainly are deploying that very broadly internally for productivity enhancements, for using LLMs to provide starting points for code—simple verbal requests, such as “Give me a Python script to parse this dataset.” And you get a really nice starting point for that code. Saves a ton of time. Writing verification test benches, helping with the physical design layout optimizations. So there’s a lot of productivity aspects. The other aspect to LLMs is, of course, we are actively involved in designing GPUs [graphics processing units] for LLM training and for LLM inference. And so that’s driving a tremendous amount of workload analysis on the requirements, hardware requirements, and hardware-software codesign, to explore. So that brings us to your current flagship, the Instinct MI300X, which is actually billed as an AI accelerator. How did the particular demands influence that design? I don’t know when that design started, but the ChatGPT era started about two years ago or so. To what extent did you read the writing on the wall? Naffziger: So we were just into the MI300—in 2019, we were starting the development. A long time ago. And at that time, our revenue stream from the Zen [an AMD architecture used in a family of processors] renaissance had really just started coming in. So the company was starting to get healthier, but we didn’t have a lot of extra revenue to spend on R&D at the time. So we had to be very prudent with our resources. And we had strategic engagements with the [U.S.] Department of Energy for supercomputer deployments. That was the genesis for our MI line—we were developing it for the supercomputing market. Now, there was a recognition that munching through FP64 COBOL code, or Fortran, isn’t the future, right? [laughs] This machine-learning [ML] thing is really getting some legs. So we put some of the lower-precision math formats in, like Brain Floating Point 16 at the time, that were going to be important for inference. And the DOE knew that machine learning was going to be an important dimension of supercomputers, not just legacy code. So that’s the way, but we were focused on HPC [high-performance computing]. We had the foresight to understand that ML had real potential. Although certainly no one predicted, I think, the explosion we’ve seen today. So that’s how it came about. And, just another piece of it: We leveraged our modular chiplet expertise to architect the 300 to support a number of variants from the same silicon components. So the variant targeted to the supercomputer market had CPUs integrated in as chiplets, directly on the silicon module. And then it had six of the GPU chiplets we call XCDs around them. So we had three CPU chiplets and six GPU chiplets. And that provided an amazingly efficient, highly integrated, CPU-plus-GPU design we call MI300A. It’s very compelling for the El Capitan supercomputer that’s being brought up as we speak. But we also recognize that for the maximum computation for these AI workloads, the CPUs weren’t that beneficial. We wanted more GPUs. For these workloads, it’s all about the math and matrix multiplies. So we were able to just swap out those three CPU chiplets for a couple more XCD GPUs. And so we got eight XCDs in the module, and that’s what we call the MI300X. So we kind of got lucky having the right product at the right time, but there was also a lot of skill involved in that we saw the writing on the wall for where these workloads were going and we provisioned the design to support it. Earlier you mentioned 3D chiplets. What do you feel is the next natural step in that evolution? Naffziger: AI has created this bottomless thirst for more compute [power]. And so we are always going to be wanting to cram as many transistors as possible into a module. And the reason that’s beneficial is, these systems deliver AI performance at scale with thousands, tens of thousands, or more, compute devices. They all have to be tightly connected together, with very high bandwidths, and all of that bandwidth requires power, requires very expensive infrastructure. So if a certain level of performance is required—a certain number of petaflops, or exaflops—the strongest lever on the cost and the power consumption is the number of GPUs required to achieve a zettaflop, for instance. And if the GPU is a lot more capable, then all of that system infrastructure collapses down—if you only need half as many GPUs, everything else goes down by half. So there’s a strong economic motivation to achieve very high levels of integration and performance at the device level. And the only way to do that is with chiplets and with 3D stacking. So we’ve already embarked down that path. A lot of tough engineering problems to solve to get there, but that’s going to continue. And so what’s going to happen? Well, obviously we can add layers, right? We can pack more in. The thermal challenges that come along with that are going to be fun engineering problems that our industry is good at solving.

  • IEEE Offers New Transportation Platform With Advanced Analytics Tools
    by Kathy Pretz on 5. June 2024. at 18:00

    To help find ways to solve transportation issues such as poorly maintained roads, traffic jams, and the high rate of accidents, researchers need access to the most current datasets on a variety of topics. But tracking down information about roadway conditions, congestion, and other statistics across multiple websites can be time-consuming. Plus, the data isn’t always accurate. The new National Transportation Data & Analytics Solution (NTDAS), developed with the help of IEEE, makes it easier to retrieve, visualize, and analyze data in one place. NTDAS combines advanced research tools with access to high-quality transportation datasets from the U.S. Federal Highway Administration’s National Highway System and the entire Traffic Message Channel network, which distributes information on more than 1 million road segments. Anonymous data on millions of cars and trucks is generated from vehicle probes, which are vehicles equipped with GPS or global navigation satellite systems that gather traffic data on location, speed, and direction. This information helps transportation planners improve traffic flow, make transportation networks more efficient, and plan budgets. The platform is updated monthly and contains archival data back to 2017. “The difference between NTDAS and other competitors is that our data comes from a trusted source that means the most: the U.S. Federal Highway Administration,” says Lavanya Sayam, senior manager of data analytics alliances and programs for IEEE Global Products and Marketing. “The data has been authenticated and validated. The ability to download this massive dataset provides an unparalleled ease to data scientists and machine-learning engineers to explore and innovate.” IEEE is diversifying its line of products beyond its traditional fields of electrical engineering, Sayam adds. “We are not just focused on electrical or computer science,” she says. “IEEE is so diverse, and this state-of-the-art platform reflects that.” Robust analytical tools NTDAS was built in partnership with INRIX, a transportation analytics solutions provider, and the University of Maryland’s Center for Advanced Transportation Technology Laboratory, a leader in transportation science research. INRIX provided the data, while UMD built the analytics tools. The platform leverages the National Performance Management Research Data Set, a highly granular data source from the Federal Highway Administration. The suite of tools allows users to do tasks such as creating a personal dashboard to monitor traffic conditions on specific roads, downloading raw data for analysis, building animated maps of road conditions, and measuring the flow of traffic. There are tutorials available on the platform on how to use each tool, and templates for creating reports, documents, and pamphlets. “The difference between National Transportation Data & Analytics Solutions and other competitors is that our data comes from a trusted source that means the most: the U.S. Federal Highway Administration.” —Lavanya Sayam “This is the first time this type of platform is being offered by IEEE to the global academic institutional audience,” she says. “IEEE is always looking for new ways to serve the engineering community.” A subscription-based service, NTDAS has multidisciplinary relevance, Sayam says. The use cases it includes serve researchers and educators who need a robust platform that has all the data that helps them conduct analytics in one place, she says. For university instructors, it’s an innovative way to teach the courses, and for students, it’s a unique way to apply what they’ve learned with real-world data and uses. The platform goes beyond just those working in transportation, Sayam notes. Others who might find NTDAS useful include those who study traffic as it relates to sustainability, the environment, civil engineering, public policy, business, and logistics, she adds. 50 ways to minimize the impact of traffic NTDAS also includes more than 50 use cases created by IEEE experts to demonstrate how the data could be analyzed. The examples identify ways to protect the environment, better serve disadvantaged communities, support alternative transportation, and improve the safety of citizens. “Data from NTDAS can be easily extrapolated to non-U.S. geographies, making it highly relevant to global researchers,” according to Sayam. This is explained in specific use cases too. The cases cover topics such as the impact of traffic on bird populations, air-quality issues in underserved communities, and optimal areas to install electric vehicle charging stations. Two experts covered various strategies for how to use the data to analyze the impact of transportation and infrastructure on the environment in this on-demand webinar held in May. Thomas Brennan, a professor of civil engineering at the College of New Jersey, discussed how using NTDAS data could aid in better planning of evacuation routes during wildfires, such as determining the location of first responders and traffic congestion in the area, including seasonal traffic. This and other data could lead to evacuating residents faster, new evacuation road signage, and better communication warning systems, he said. “Traffic systems are super complex and very difficult to understand and model,” said presenter Jane MacFarlane, director of the Smart Cities and Sustainable Mobility Center at the University of California’s Institute of Transportation Studies, in Berkeley. “Now that we have datasets like these, that’s giving us a huge leg up in trying to use them for predictive modeling and also helping us with simulating things so that we can gain a better understanding.” Watch this short demonstration about the National Transportation Data & Analytics Solutions platform. “Transportation is a basic fabric of society,” Sayam says. “Understanding its impact is an imperative for better living. True to IEEE’s mission of advancing technology for humanity, NTDAS, with its interdisciplinary relevance, helps us understand the impact of transportation across several dimensions.”

  • Noise Cancellation for Your Brain
    by Tekla S. Perry on 4. June 2024. at 13:06

    Elemind, a 5-year-old startup based in Cambridge, Mass., today unveiled a US $349 wearable for neuromodulation, the company’s first product. According to cofounder and CEO Meredith Perry, the technology tracks the oscillation of brain waves using electroencephalography (EEG) sensors that detect the electrical activity of the brain and then influence those oscillations using bursts of sound delivered via bone conduction. Elemind’s first application for this wearable aims to suppress alpha waves to help induce sleep. There are other wearables on the market that monitor brain waves and, through biofeedback, encourage users to actively modify their alpha patterns. Elemind’s headband appears to be the first device to use sound to directly influence the brain waves of a passive user. In a clinical trial, says Perry [no relation to author], 76 percent of subjects fell asleep more quickly. Those who did see a difference averaged 48 percent less time to progress from awake to asleep. The results were similar to those of comparable trials of pharmaceutical sleep aids, Perry indicated. “For me,” Perry said, “it cuts through my rumination, quiets my thinking. It’s like noise cancellation for the brain.” I briefly tested Elemind’s headband in May. I found it comfortable, with a thick cushioned band that sits across the forehead connected to a stretchy elastic loop to keep it in place. In the band are multiple EEG electrodes, a processor, a three-axis accelerometer, a rechargeable lithium-polymer battery, and custom electronics that gather the brain’s electrical signals, estimate their phase, and generate pink noise through a bone-conduction speaker. The whole thing weighs about 60 grams—about as much as a small kiwi fruit. My test conditions were far from optimal for sleep: early afternoon, a fairly bright conference room, a beanbag chair as bed, and a vent blowing. And my test lasted just 4 minutes. I can say that I didn’t find the little bursts of pink noise (white noise without the higher frequencies) unpleasant. And since I often wear an eye mask, feeling fabric on my face wasn’t disturbing. It wasn’t the time or place to try for sound sleep, but I—and the others in the room—noted that after 2 minutes I was yawning like crazy. How Elemind tweaks brain waves What was going on in my brain? Briefly, different brain states are associated with different frequencies of waves. Someone who is relaxed with eyes closed but not asleep produces alpha waves at around 10 hertz. As they drift off to sleep, the alpha waves are supplanted by theta waves, at around 5 Hz. Eventually, the delta waves of deep sleep show up at around 1 Hz. Ryan Neely, Elemind’s vice president of science and research, explains: “As soon as you put the headband on,” he says, “the EEG system starts running. It uses straightforward signal processing with bandpass filtering to isolate the activity in the 8- to 12-Hz frequency range—the alpha band.” “Then,” Neely continues, “our algorithm looks at the filtered signal to identify the phase of each oscillation and determines when to generate bursts of pink noise.” To help a user fall asleep more quickly [top], bursts of pink noise are timed to generate a brain response that is out of phase with alpha waves and so suppresses them. To enhance deep sleep [bottom], the pink noise is timed to generate a brain response that is in phase with delta waves.Source: Elemind These auditory stimuli, he explains, create ripples in the waves coming from the brain. Elemind’s system tries to align these ripples with a particular phase in the wave. Because there is a gap between the stimulus and the evoked response, Elemind tested its system on 21 people and calculated the average delay, taking that into account when determining when to trigger a sound. To induce sleep, Elemind’s headband targets the trough in the alpha wave, the point at which the brain is most excitable, Neely says. “You can think of the alpha rhythm as a gate for communication between different areas of the brain,” he says. “By interfering with that communication, that coordination between different brain areas, you can disrupt patterns, like the ruminations that keep you awake.” With these alpha waves suppressed, Neely says, the slower oscillations, like the theta waves of light sleep, take over. Elemind doesn’t plan to stop there. The company plans to add an algorithm that addresses delta waves, the low-frequency 0.5- to 2-Hz waves characteristic of deep sleep. Here, Elemind’s technology will attempt to amplify this pattern with the intent of improving sleep quality. Is this safe? Yes, Neely says, because auditory stimulation is self-limiting. “Your brain waves have a natural space they can occupy,” he explains, “and this stimulation just moved it within that natural space, unlike deep-brain stimulation, which can move the brain activity outside natural parameters.” Going beyond sleep to sedation, memory, and mental health Applications may eventually go beyond inducing and enhancing sleep. Researchers at the University of Washington and McGill University have completed a clinical study to determine if Elemind’s technology can be used to increase the pain threshold of subjects undergoing sedation. The results are being prepared for peer review. Elemind is also working with a team involving researchers at McGill and the Leuven Brain Institute to determine if the technology can enhance memory consolidation in deep sleep and perhaps have some usefulness for people with mild cognitive impairment and other memory disorders. Neely would love to see more applications investigated in the future. “Inverse alpha stimulation [enhancing instead of suppressing the signal] could increase arousal,” he says. “That’s something I’d love to look into. And looking into mental-health treatment would be interesting, because phase coupling between the different brain regions appears to be an important factor in depression and anxiety disorders.” Perry, who previously founded the wireless power startup UBeam, cofounded Elemind with four university professors with expertise in neuroscience, optogenetics, biomedical engineering, and artificial intelligence. The company has $12 million in funding to date and currently has 13 employees. Preorders at $349 start today for beta units, and Elemind expects to start general sales later this year. The company will offer customers an optional membership at $7 to $13 monthly that will allow cloud storage of sleep data and access to new apps as they are released.

  • Hybrid Bonding Plays Starring Role in 3D Chips
    by Samuel K. Moore on 4. June 2024. at 12:00

    Researchers at the IEEE Electronic Components and Technology Conference (ECTC) last week pushed the state of the art in a technology that is becoming critical to cutting-edge processors and memory. Called hybrid bonding, the technology stacks two or more chips atop each other in the same package, allowing chipmakers to increase the number of transistors in their processors and memories despite a general slowdown in the pace of the traditional transistor shrinking that once defined Moore’s Law. Research groups from major chipmakers and universities demonstrated a variety of hard-fought improvements, with a few—including from Applied Materials, Imec, Intel, and Sony—showing results that could lead to a record density of connections between 3D stacked chips of around 7 million links in a square millimeter of silicon. All those connections are needed because of the new nature of progress in semiconductors, Intel’s Yi Shi told engineers at ECTC. As Intel general manager of technology development Ann Kelleher explained to IEEE Spectrum in 2022, Moore’s Law is now governed by a concept called system technology co-optimization, or STCO. In STCO, a chip’s functions, such as cache memory, input/output, and logic are separated out and made using the best manufacturing technology for each. Hybrid bonding and other advanced packaging tech can then reassemble them so that they work like a single piece of silicon. But that can only happen with a high density of connections that can shuttle bits between pieces of silicon with little delay or energy consumption. Hybrid bonding is not the only advanced packaging technology in use, but it provides the highest density of vertical connections. And it dominated ECTC, making up about one-fifth of the research presented, according to Chris Scanlan, senior vice president of technology at Besi, whose tools were behind several of the breakthroughs. “It’s difficult to say what will be the limit. Things are moving very fast.” —Jean-Charles Souriau, CEA Leti In hybrid bonding, copper pads are constructed at the top face of each chip. The copper is surrounded by insulation, usually silicon oxide, and the pads themselves are slightly recessed from the surface of the insulation. After the oxide is chemically modified, the two chips are then pressed together face-to-face, so the recessed pads align with each. This sandwich is then slowly heated, causing the copper to expand across the gap, connecting the two chips. Hybrid bonding can either attach individual chips of one size to a wafer full of chips of a larger size or used to bond two full wafers of chips of the same size together. Thanks in part to its use in camera chips, the latter is a more mature process than the former. Imec, for example, reported some of the most dense wafer-on-wafer (WoW) bonds ever with a bond-to-bond distance (or pitch) of just 400 nanometers. The same research center managed a 2-micrometer pitch for the chip-on-wafer (CoW) scenario. (Commercial chips today have connections about 9 μm apart.) Hybrid bonding starts by forming recessed copper pads at the top of the chip [top]. The surrounding oxide dielectric bonds when the two chips are pressed together [middle]. Annealing expands the copper to form a conductive connection [bottom]. “With the equipment available, it’s easier to align wafer to wafer than chip to wafer. Most processes for microelectronics are made for [full] wafers,” says Jean-Charles Souriau, scientific leader in integration and packaging at the French research organization, CEA Leti. However, it’s chip-on-wafer (or die-to-wafer) that’s making a splash in high-end processors such as AMD’s Epyc line, where the technique is used to assemble compute cores and cache memory in its advanced CPUs and AI accelerators. In pushing for tighter and tighter pitches for both scenarios, researchers focused on making surfaces fractionally flatter, getting bound wafers to stick together better, and cutting the time and complexity of the whole process. Getting it right could ultimately mean enabling a revolution in how chips are designed. WoW, those are some tight pitches The wafer-on-wafer (WoW) research that reported the tightest pitches—500 nm to 360 nm—all spent a lot of effort on one thing: flatness. To bind two wafers together with 100-nm-level accuracy, the whole wafer has to be nearly perfectly flat. If it’s bowed or warped, whole sections of the materials won’t connect. Flattening wafers is the job of a process called chemical mechanical planarization, or CMP. It’s key to chipmaking generally, especially for the parts of the process that produce the layers of interconnects above the transistors. “CMP is a key parameter we have to control for hybrid bonding,” says Souriau. Results presented this week at ECTC took CMP to another level, not just flattening across the wafer but reducing mere nanometers of roundness on the insulation between the copper pads to ensure better connections. Other research focused on ensuring those flattened parts stuck together strongly enough by experimenting with different surface materials such as silicon carbonitride instead of silicon oxide or by using different schemes to chemically activate the surface. Initially, when wafers or dies are pressed together, they are held in place with relatively weak hydrogen bonds, and the concern is ensuring that everything stays in place between the bonding and further steps. Bound wafers and chips are then heated slowly (a process called annealing) to form stronger chemical bonds. Just how strong these bonds are—and how to even figure that out—was the subject of a lot of research at ECTC. Part of that final bond strength would come from the copper connections as well. The annealing step expands the copper across the gap to form a conductive bridge. Controlling the size of that gap is key, explained Samsung’s Seung Ho Hahn. Too much of a gap and the copper won’t connect. Too little and it will push the wafers apart. It’s a matter of nanometers, and Hahn reported research on a new chemical process that hopes to get it just right by etching away the copper a single atomic layer at a time. The quality of the connection counts, too. Even after the copper expands, most schemes showed that the metal’s grain boundaries don’t cross from one side to another. Such a crossing reduces a connection’s electrical resistance and should boost its reliability. Researchers at Tohoku University in Japan reported a new metallurgical scheme that could finally generate large, single grains of copper that cross the boundary. “This is a drastic change,” said Takafumi Fukushima, an associate professor at Tohoku University. “We are now analyzing what underlies it.” “I think it’s possible to make more than 20-layer stack using this technology.” —Hyeonmin Lee, Samsung Other experiments focused on streamlining the hybrid bonding process. Several sought to reduce the annealing temperature needed to form bonds—typically around 300 °C—motivated by the potential to reduce any risk of damage to the chips from the prolonged heating. And researchers from Applied Materials presented progress on a method to radically reduce the time needed for annealing—from hours to just 5 minutes. CoWs that are outstanding in the field Chip-on-wafer (CoW) hybrid bonding is more useful to industry at the moment: It allows chipmakers to stack chiplets of different sizes together, and to test each chip before it’s bound to another, ensuring that they aren’t fatally dooming an expensive CPU with a single flawed part. But CoW comes with all of the difficulties of WoW and fewer of the options to alleviate them. For example, CMP is designed to flatten wafers, not individual dies. Once dies have been cut from their source wafer and tested, there’s less that can be done to improve their readiness for bonding. Nevertheless, Intel reported CoW hybrid bonds with a 3-μm pitch, and Imec managed 2 μm, largely by making the transferred dies very flat while they were still attached to the wafer and keeping them extra clean going forward. Efforts by both groups used plasma etching to dice up the dies instead of the usual method, which uses a specialized blade. Plasma won’t lead to chipping at the edges, which creates debris that interferes with connections. It also allowed the Imec group to shape the die, making chamfered corners that relieved mechanical stress that could break connections. CoW hybrid bonding is going to be critical to the future of high-bandwidth memory (HBM), several researchers told IEEE Spectrum. HBM is a stack of DRAM dies atop a control logic chip—currently 8 to 12 dies high. Often placed within the same package as high-end GPUs, HBM is crucial to providing the tsunami of data needed to run large language models like ChatGPT. Today, HBM dies are stacked using so-called microbump technology, in which tiny balls of solder between each layer are surrounded by an organic filler. But with AI pushing memory demand even higher, DRAM makers want to do 20 layers or more in HBM chips. However, the volume microbumps take up mean that these stacks will soon be too tall to fit in the package with GPUs. Hybrid bonding would not just shrink the height of HBMs, it should also make it easier to remove excess heat from the package, because there is less thermal resistance between its layers. A 200-nanometer WoW pitch is not just possible, but desirable. At ECTC, Samsung engineers showed that a hybrid bonding scheme could make a 16-layer HBM stack. “I think it’s possible to make more than 20-layer stack using this technology,” said Hyeonmin Lee, a senior engineer at Samsung. Other new CoW technology could help bring hybrid bonding to high-bandwidth memory. Though they didn’t present research on this at ECTC, researchers at CEA Leti are working on so-called self-alignment technology, says Souriau. That would help ensure CoW connections using chemical processes. Some parts of each surface would be made hydrophobic and some hydrophilic, resulting in surfaces that would slide into place automatically. At ECTC, researchers at Tohoku University and Yamaha Robotics reported work on a similar scheme, using the surface tension of water to align 5-μm pads on experimental DRAM chips with better than 50-nm accuracy. How far can hybrid bonding go? Researchers will almost certainly keep pushing the pitch of hybrid bonding connections. A 200-nm WoW pitch is not just possible but desirable, Han-Jong Chia, a program manager pathfinding systems at Taiwan Semiconductor Manufacturing Co., told engineers at ECTC. Within two years, TSMC plans to introduce a technology called backside power delivery. (Intel plans it for the end of this year.) That’s a technology that puts the chip’s chunky power-delivery interconnects beneath the silicon instead of above it. With those out of the way, the uppermost interconnect levels can connect better to smaller hybrid bonding bond pads, TSMC researchers calculate. Back side power delivery with 200-nm bond pads would cut down the capacitance of 3D connections so much that the product of energy efficiency and signal delay would be as much as nine times as high as what can be achieved with 400-nm bond pads. At some point in the future, if bond pitches are narrowed even further, Chia suggested, it might become practical to “fold” blocks of circuitry so they are built across two wafers. That way some of the longer connections within the block might be made shorter by the vertical pathway, potentially speeding computations and lowering power consumption. And hybrid bonding may not be limited to silicon. “Today there is a lot of development in silicon-to-silicon wafers, but we are also looking to do hybrid bonding between gallium nitride and silicon wafers and glass wafers…everything on everything,” says CEA Leti’s Souriau. His organization even presented research on hybrid bonding for quantum-computing chips, which involves aligning and binding superconducting niobium instead of copper. “It’s difficult to say what will be the limit,” Souriau says. “Things are moving very fast.”

  • Quantum Navigational Tech Takes Flight in New Trial
    by Margo Anderson on 3. June 2024. at 18:22

    A short-haul aircraft in the United Kingdom recently became the first airborne platform to test delicate quantum technologies that could usher in a post-GPS world—in which satellite-based navigation (be it GPS, BeiDou, Galileo, or others) cedes its singular place as a trusted navigational tool. The question now is how soon will it take for this quantum tomorrow to actually arrive. But is this tech just around the corner, as its proponents suggest? Or will the world need to wait until the 2030s or beyond, as skeptics maintain. Whenever the technology can scale up, potential civilian applications will be substantial. “The very first application or very valuable application is going to be autonomous shipping,” says Max Perez, vice president for strategic initiatives at the Boulder, Colo.–based company Infleqtion. “As we get these systems down smaller, they’re going to start to be able to address other areas like autonomous mining, for example, and other industrial settings where GPS might be degraded. And then, ultimately, the largest application will be generalized, personal autonomous vehicles—whether terrestrial or air-based.” The big idea Infleqtion and its U.K. partners are testing is whether the extreme sensitivity that quantum sensors can provide is worth the trade-off of all the expensive kit needed to miniaturize such tech so it can fit on a plane, boat, spacecraft, car, truck, or train. Turning Bose-Einstein Condensates Into Navigational Tools At the core of Infleqtion’s technology is a state of matter called a Bose-Einstein condensate (BEC), which can be made to be extremely sensitive to acceleration. And in the absence of an external GPS signal, an aircraft that can keep a close tally on its every rotation and acceleration is an aircraft that can infer its exact location relative to its last known position. As Perez describes it—the company has not yet published a paper on its latest, landmark accomplishment—Infleqtion’s somewhat-portable BEC device occupies 8 to 10 rack units of space. (One rack unit represents a standard server rack’s width of 48.3 centimeters and a standard server rack depth of 60–100 cm.) Scientists tested delicate Bose-Einstein condensates in their instruments, which could one day undergird ultrasensitive accelerometers.Qinetiq In May, the company flew its rig aboard a British Aerospace 146 (BAe 146/Avro RJ100) tech demonstrator aircraft. Inside the rig, a set of lasers blasted a small, supercooled cloud of rubidium atoms to establish a single quantum state among the atoms. The upshot of this cold atom trap is to create ultrasensitive quantum conditions among the whole aggregation of atoms, which is then a big enough cloud of matter to be able to be manipulated with standard laboratory equipment. Using the quantum wave-particle duality, in which matter behaves both like tiny billiard balls and wave packets, engineers can then use lasers and magnetic fields to split the BEC cloud into two or more coherent matter-wave packets. When later recombined, the interference patterns of the multiple wave packets are studied to discover even the most minuscule accelerations—tinier than conventional accelerometers could measure—to the wave packets’ positions in three-dimensional space. That’s the theoretical idea, at least. Real-World Conditions Muddy Timetables In practice, any BEC-based accelerometer would need to at least match the sensitivity of existing, conventional accelerometer technologies. “The best inertial systems in the world, based on ring laser gyroscopes, or fiber-optic gyroscopes, can...maintain a nautical mile of precision over about two weeks of mission,” Perez says. “That’s the standard.” The Infleqtion rig has provided only a proof of principle for creating a manipulable BEC state in a rubidium cloud, Perez adds, so there’s no one-to-one comparison yet available for the quantum accelerometer technology. That said, he expects Infleqtion to be able to either maintain the same nautical-mile precision over a month or more mission time—or, conversely, increase the sensitivity over a week’s mission to something like one-tenth of a nautical mile. The eventual application space for the technology is vast, says Doug Finke, chief content officer at the New York City–based market research firm Global Quantum Intelligence. “Quantum navigation devices could become the killer application for quantum-sensing technology,” Finke says. “However, many challenges remain to reduce the cost, size, and reliability. But potentially, if this technology follows it similar path to what happened in computing, from room-size mainframes to something that fits inside one’s pocket, it could become ubiquitous and possibly even replace GPS later this century.” The timeframe for such a takeover remains an unanswered question. “It won’t happen immediately due to the engineering challenges still to be resolved,” Finke says. “And the technology may require many more years to reach maturation.” Dana Goward, president of the Alexandria, Va.–based Resilient Navigation and Timing Foundation, even ventures a prediction. “It will be 10 to 15 years at least before we see something that is practical for broad application,” he says. Perez says that by 2026, Infleqtion will be testing the reliability of its actual accelerometer technology—not just setting up a BEC in midflight, as it did in May. “It’s basically trading off getting the technology out there a little faster versus something that is more precise for more demanding applications that’ll be just behind that,” Perez says. UPDATE 4 June 2024: The story was updated to modify the accuracy estimate for the best inertial navigation systems today—from one nautical mile per one-week mission (as a previous version of this story stated) to one nautical mile per two-week mission.

  • How Online Privacy Is Like Fishing
    by Bruce Schneier on 3. June 2024. at 11:00

    Microsoft recently caught state-backed hackers using its generative AI tools to help with their attacks. In the security community, the immediate questions weren’t about how hackers were using the tools (that was utterly predictable), but about how Microsoft figured it out. The natural conclusion was that Microsoft was spying on its AI users, looking for harmful hackers at work. Some pushed back at characterizing Microsoft’s actions as “spying.” Of course cloud service providers monitor what users are doing. And because we expect Microsoft to be doing something like this, it’s not fair to call it spying. We see this argument as an example of our shifting collective expectations of privacy. To understand what’s happening, we can learn from an unlikely source: fish. In the mid-20th century, scientists began noticing that the number of fish in the ocean—so vast as to underlie the phrase “There are plenty of fish in the sea”—had started declining rapidly due to overfishing. They had already seen a similar decline in whale populations, when the post-WWII whaling industry nearly drove many species extinct. In whaling and later in commercial fishing, new technology made it easier to find and catch marine creatures in ever greater numbers. Ecologists, specifically those working in fisheries management, began studying how and when certain fish populations had gone into serious decline. One scientist, Daniel Pauly, realized that researchers studying fish populations were making a major error when trying to determine acceptable catch size. It wasn’t that scientists didn’t recognize the declining fish populations. It was just that they didn’t realize how significant the decline was. Pauly noted that each generation of scientists had a different baseline to which they compared the current statistics, and that each generation’s baseline was lower than that of the previous one. What seems normal to us in the security community is whatever was commonplace at the beginning of our careers. Pauly called this “shifting baseline syndrome” in a 1995 paper. The baseline most scientists used was the one that was normal when they began their research careers. By that measure, each subsequent decline wasn’t significant, but the cumulative decline was devastating. Each generation of researchers came of age in a new ecological and technological environment, inadvertently masking an exponential decline. Pauly’s insights came too late to help those managing some fisheries. The ocean suffered catastrophes such as the complete collapse of the Northwest Atlantic cod population in the 1990s. Internet surveillance, and the resultant loss of privacy, is following the same trajectory. Just as certain fish populations in the world’s oceans have fallen 80 percent, from previously having fallen 80 percent, from previously having fallen 80 percent (ad infinitum), our expectations of privacy have similarly fallen precipitously. The pervasive nature of modern technology makes surveillance easier than ever before, while each successive generation of the public is accustomed to the privacy status quo of their youth. What seems normal to us in the security community is whatever was commonplace at the beginning of our careers. Historically, people controlled their computers, and software was standalone. The always-connected cloud-deployment model of software and services flipped the script. Most apps and services are designed to be always-online, feeding usage information back to the company. A consequence of this modern deployment model is that everyone—cynical tech folks and even ordinary users—expects that what you do with modern tech isn’t private. But that’s because the baseline has shifted. AI chatbots are the latest incarnation of this phenomenon: They produce output in response to your input, but behind the scenes there’s a complex cloud-based system keeping track of that input—both to improve the service and to sell you ads. Shifting baselines are at the heart of our collective loss of privacy. The U.S. Supreme Court has long held that our right to privacy depends on whether we have a reasonable expectation of privacy. But expectation is a slippery thing: It’s subject to shifting baselines. The question remains: What now? Fisheries scientists, armed with knowledge of shifting-baseline syndrome, now look at the big picture. They no longer consider relative measures, such as comparing this decade with the last decade. Instead, they take a holistic, ecosystem-wide perspective to see what a healthy marine ecosystem and thus sustainable catch should look like. They then turn these scientifically derived sustainable-catch figures into limits to be codified by regulators. In privacy and security, we need to do the same. Instead of comparing to a shifting baseline, we need to step back and look at what a healthy technological ecosystem would look like: one that respects people’s privacy rights while also allowing companies to recoup costs for services they provide. Ultimately, as with fisheries, we need to take a big-picture perspective and be aware of shifting baselines. A scientifically informed and democratic regulatory process is required to preserve a heritage—whether it be the ocean or the Internet—for the next generation.

  • Lord Kelvin and His Analog Computer
    by Allison Marsh on 2. June 2024. at 13:00

    In 1870, William Thomson, mourning the death of his wife and flush with cash from various patents related to the laying of the first transatlantic telegraph cable, decided to buy a yacht. His schooner, the Lalla Rookh, became Thomson’s summer home and his base for hosting scientific parties. It also gave him firsthand experience with the challenge of accurately predicting tides. Mariners have always been mindful of the tides lest they find themselves beached on low-lying shoals. Naval admirals guarded tide charts as top-secret information. Civilizations recognized a relationship between the tides and the moon early on, but it wasn’t until 1687 that Isaac Newton explained how the gravitational forces of the sun and the moon caused them. Nine decades later, the French astronomer and mathematician Pierre-Simon Laplace suggested that the tides could be represented as harmonic oscillations. And a century after that, Thomson used that concept to design the first machine for predicting them. Lord Kelvin’s Rising Tide William Thomson was born on 26 June 1824, which means this month marks his 200th birthday and a perfect time to reflect on his all-around genius. Thomson was a mathematician, physicist, engineer, and professor of natural philosophy. Queen Victoria knighted him in 1866 for his work on the transatlantic cable, then elevated him to the rank of baron in 1892 for his contributions to thermodynamics, and so he is often remembered as Lord Kelvin. He determined the correct value of absolute zero, for which he is honored by the SI unit of temperature—the kelvin. He dabbled in atmospheric electricity, was a proponent of the vortex theory of the atom, and in the absence of any knowledge of radioactivity made a rather poor estimation of the age of the Earth, which he gave as somewhere between 24 million and 400 million years. William Thomson, also known as Lord Kelvin, is best known for establishing the value of absolute zero. He believed in the practical application of scientific knowledge and invented a wide array of useful, and beautiful, devices. Pictorial Press/Alamy Thomson’s tide-predicting machine calculated the tide pattern for a given location based on 10 cyclic constituents associated with the periodic motions of the Earth, sun, and moon. (There are actually hundreds of periodic motions associated with these objects, but modern tidal analysis uses only the 37 of them that have the most significant effects.) The most notable one is the lunar semidiurnal, observable in areas that have two high tides and two low tides each day, due to the effects of the moon. The period of a lunar semidiurnal is 12 hours and 25 minutes—half of a lunar day, which lasts 24 hours and 50 minutes. As Laplace had suggested in 1775, each tidal constituent can be represented as a repeating cosine curve, but those curves are specific to a location and can be calculated only through the collection of tidal data. Luckily for Thomson, many ports had been logging tides for decades. For places that did not have complete logs, Thomson designed both an improved tide gauge and a tidal harmonic analyzer. On Thomson’s tide-predicting machine, each of 10 components was associated with a specific tidal constituent and had its own gearing to set the amplitude. The components were geared together so that their periods were proportional to the periods of the tidal constituents. A single crank turned all of the gears simultaneously, having the effect of summing each of the cosine curves. As the user turned the crank, an ink pen traced the resulting complex curve on a moving roll of paper. The device marked each hour with a small horizontal mark, making a deeper notch each day at noon. Turning the wheel rapidly allowed the user to run a year’s worth of tide readings in about 4 hours. Although Thomson is credited with designing the machine, in his paper “The Tide Gauge, Tidal Harmonic Analyser, and Tide Predicter” (published in Minutes of the Proceedings of the Institution of Civil Engineers), he acknowledges a number of people who helped him solve specific problems. Craftsman Alexander Légé drew up the plan for the screw gearing for the motions of the shafts and constructed the initial prototype machine and subsequent models. Edward Roberts of the Nautical Almanac Office completed the arithmetic to express the ratio of shaft speeds. Thomson’s older brother, James, a professor of civil engineering at Queen’s College Belfast, designed the disk-globe-and-cylinder integrator that was used for the tidal harmonic analyzer. Thomson’s generous acknowledgments are a reminder that the work of engineers is almost always a team effort. Like Thomson’s tide-prediction machine, these two devices, developed at the U.S. Coast and Geodetic Survey, also looked at tidal harmonic oscillations. William Ferrel’s machine [left] used 19 tidal constituents, while the later machine by Rollin A. Harris and E.G. Fischer [right], relied on 37 constituents. U.S. Coast and Geodetic Survey/NOAA As with many inventions, the tide predictor was simultaneously and independently developed elsewhere and continued to be improved by others, as did the science of tide prediction. In 1874 in the United States, William Ferrel, a mathematician with the Coast and Geodetic Survey, developed a similar harmonic analysis and prediction device that used 19 harmonic constituents. George Darwin, second son of the famous naturalist, modified and improved the harmonic analysis and published several articles on tides throughout the 1880s. Oceanographer Rollin A. Harris wrote several editions of the Manual of Tides for the Coast and Geodetic Survey from 1897 to 1907, and in 1910 he developed, with E.G. Fischer, a tide-predicting machine that used 37 constituents. In the 1920s, Arthur Doodson of the Tidal Institute of the University of Liverpool, in England, and Paul Schureman of the Coast and Geodetic Survey further refined techniques for harmonic analysis and prediction that served for decades. Because of the complexity of the math involved, many of these old brass machines remained in use into the 1950s, when electronic computers finally took over the work of predicting tides. What Else Did Lord Kelvin Invent? As regular readers of this column know, I always feature a museum object from the history of computer or electrical engineering and then spin out a story. When I started scouring museum collections for a suitable artifact for Thomson, I was almost paralyzed by the plethora of choices. I considered Thomson’s double-curb transmitter, which was designed for use with the 1858 transatlantic cable to speed up telegraph signals. Thomson had sailed on the HMS Agamemnon in 1857 on its failed mission to lay a transatlantic cable and was instrumental to the team that finally succeeded. Thomson invented the double-curb transmitter to speed up signals in transatlantic cables.Science Museum Group I also thought about featuring one of his quadrant electrometers, which measured electrical charge. Indeed, Thomson introduced a number of instruments for measuring electricity, and a good part of his legacy is his work on the precise specifications of electrical units. But I chose to highlight Thomson’s tide-predicting machine for a number of reasons: Thomson had a lifelong love of seafaring and made many contributions to marine technology that are sometimes overshadowed by his other work. And the tide-predicting machine is an example of an early analog computer that was much more useful than Babbage’s difference engine but not nearly as well known. Also, it is simply a beautiful machine. In fact, Thomson seems to have had a knack for designing stunningly gorgeous devices. (The tide-predicting machine at top and many other Kelvin inventions are in the collection of the Science Museum, in London.) Thomson devised the quadrant electrometer to measure electric charge. Science Museum Group The tide-predicting machine was not Thomson’s only contribution to maritime technology. He also patented a compass, an astronomical clock, a sounding machine, and a binnacle (a pedestal that houses nautical instruments). With respect to maritime science, Thomson thought and wrote much about the nature of waves. He mathematically explained the v-shaped wake patterns that ships and waterfowl make as they move across a body of water, which is aptly named the Kelvin wake pattern. He also described what is now known as a Kelvin wave, a type of wave that retains its shape as it moves along the shore due to the balancing of the Earth’s spin against a topographic boundary, such as a coastline. Considering how much Thomson contributed to all things seafaring, it is amazing that these are some of his lesser known achievements. I guess if you have an insatiable curiosity, a robust grasp of mathematics and physics, and a strong desire to tinker with machinery and apply your scientific knowledge to solving practical problems that benefit humankind, you too have the means to come to great conclusions about the natural world. It can’t hurt to have a nice yacht to spend your summers on. Part of a continuing series looking at historical artifacts that embrace the boundless potential of technology. An abridged version of this article appears in the June 2024 print issue as “Brass for Brains.” References Before the days of online databases for their collections, museums would periodically publish catalogs of their collections. In 1877, the South Kensington Museum (originator of the collections of the Science Museum, in London, and now known as the Victoria & Albert Museum) published the third edition of its Catalogue of the Special Loan Collection of Scientific Apparatus, which lists a description of Lord Kelvin’s tide-predicting machine on page 11. That description is much more detailed, albeit more confusing, than its current online one. In 1881, William Thomson published “The Tide Gauge, Tidal Harmonic Analyser, and Tide Predicter” in the Minutes of the Proceedings of the Institute of Civil Engineers, where he gave detailed information on each of those three devices.I also relied on a number of publications from the National Oceanic and Atmospheric Administration to help me understand tidal analysis and prediction.

  • IEEE President’s Note: Amplifying IEEE's Reach
    by Tom Coughlin on 1. June 2024. at 18:00

    In my March column, I discussed the need for IEEE to increase its retention of younger members and its engagement with industry. Another one of my priorities is to increase the organization’s outreach to the broader public. I want people to know who we are and what we do. To tell the story of IEEE is to share the impact our members, products, and services make around the globe. Did you know the top 50 patenting organizations worldwide cite IEEE publications three times more than those of any other publisher? And that IEEE publishes three of the top five publications on artificial intelligence, automation and control systems, and computer hardware and software? And that IEEE has an active portfolio of more than 1,100 standards in areas including the Internet, the metaverse, blockchain, sustainable and ethical design, and age-appropriate design for children’s digital services? I bet you didn’t know that IEEE members file more than 140,000 patents yearly and have won 21 Nobel Prizes thus far. Our volunteers write, review, and publish much of the world’s technical literature and convene conferences on every conceivable technical topic. We also establish future directions communities on emerging technologies, pursue technical megatrends, provide opportunities for continued professional development, and develop and publish technology road maps on semiconductors and other important technologies. Here are some of the ways IEEE is working to amplify its reach. A powerful voice As we navigate a new era in technology—one driven by AI and other disruptive technologies—the role of IEEE in advocating for pivotal policy issues in science and technology and engaging with policymakers and stakeholders cannot be understated. As the world’s largest technical professional organization, IEEE is uniquely positioned to be the bridge among the experts who work in areas across IEEE’s organic technical breadth, including communications, computer science, power and energy, management, reliability, and ethics. IEEE can engage with the policymakers who devise the regulatory environment, and with the public who have varying levels of interaction and acceptance of emerging technologies. That includes collaborating with local technical communities worldwide, promoting outreach and educational activities to the public, and connecting with other organizations that are actively working in these spaces. For example, in April I participated in the annual IEEE-USA Congressional Visits Day, which provides volunteers with the opportunity to interact with their senators and representatives. The event, a cornerstone in the technology and engineering community, serves as a platform to elevate the voices of engineers, scientists, mathematicians, researchers, educators, and technology executives. It plays a vital role in driving dialogue among engineering and technology professionals and policymakers to advocate for issues pertinent to IEEE members in the United States. It’s a unique opportunity for participants to engage directly with elected officials, fostering discussions on legislation and policies that shape the country’s technology landscape. By empowering our voice in assisting with global public policymaking, we can reinforce IEEE’s position as the world’s trusted source for information and insights on emerging technology and trends in the marketplace. Each one of us can be an ambassador for the IEEE, telling people about how IEEE has helped us in our careers and benefits humanity. Thinking outside the box Other ways IEEE is expanding its reach is by participating at events one might not normally associate with the organization, as well as a new series of videos about members. One such event is the 2024 World Science Fiction Convention to be held in August in Glasgow. Many IEEE members, myself included, were inspired to become involved in technology by science fiction movies, TV shows, and books. As a young man, I dreamed of going into outer space to explore new worlds and discover new things. My interest in science fiction inspired me to want to understand the physical sciences and to learn how to use natural laws and logic to make things. My hope is that IEEE’s presence at such events can inspire the next generation to see the myriad of potential career and professional opportunities available to those interested in science, technology, and engineering. I am also excited about a new series of videos being distributed to broadcast TV and cable stations, social media platforms, and news media outlets worldwide, targeting early career technology professionals, existing IEEE members, and the general public. The international “IEEE Is Your Competitive Edge” videos tell stories of IEEE members and how their membership gave them a competitive edge. We selected individuals with diverse backgrounds for the videos, which are being shot on location around the globe. The goal of the videos is to encourage technologists to recognize IEEE as a vital part of their profession and career, as well as to see the advantages of membership and participating in IEEE activities. The benefits of this campaign are wide ranging and include raising IEEE’s public visibility and growing its membership. It is a way to tell our story and increase awareness of a great organization. These videos will also be available to IEEE organizational units, regions, and sections for their promotional efforts to use. By celebrating the pride and prestige of our professions, we can help increase the public’s understanding of the contributions electrical, electronics, and computer engineers make to society. IEEE consistently and proudly demonstrates how its members improve the global community and have helped to build today’s technologically advanced world. 2024 IEEE President’s Award At the IEEE Vision, Innovation, and Challenges Summit and Honors Ceremony, Dr. Gladys B. West was recognized as the recipient of the 2024 IEEE President’s Award for her trailblazing career in mathematics and her vital contributions to modern technology. Dr. West is known for her contributions to the mathematical modeling of the shape of the Earth. While working at the Naval Surface Warfare Center in Dahlgren, Va., she conducted seminal work on satellite geodesy models that was pivotal in the development of the GPS. She worked at the center for 42 years, retiring in 1998. As IEEE continues to enhance its reach, relevance, and value to an inclusive and global community, it was my honor to recognize such a technology giant who serves as a role model and inspiration for early career and young engineers and technologists, as well as those from underrepresented communities, to innovate to solve grand world challenges. —Tom Coughlin IEEE president and CEO This article appears in the June 2024 print issue as “Amplifying IEEE’s Reach.”

  • Space-based Solar Power: A Great Idea Whose Time May Never Come
    by Harry Goldstein on 1. June 2024. at 13:04

    The scene: A space-based solar power station called the Converter being commissioned some time in the Future. The characters: Two astronauts, Powell and Donovan, and a robot named QT-1 (“Cutie” to its human friends). The plot: The astronauts are training Cutie to take over the station’s operations, which involve collecting solar energy in space and then directing it as intense beams of microwaves down to Earth. This is the backdrop for Isaac Asimov’s 1941 short story “Reason.” Most of the story centers around Asimov’s Three Laws of Robotics and the humans’ relationship with the robot. But the station itself is worth a second look. It’s pretty clear Asimov had no idea how a system like the Converter would actually work, except in the most basic terms. Here’s how Powell tries to explain it to Cutie: “Our beams feed these worlds energy drawn from one of those huge incandescent globes that happens to be near us. We call that globe the Sun and it is on the other side of the station where you can’t see it.” Harnessing the power of the sun in space is certainly an enticing idea. A decade ago we featured a project at the Japan Aerospace Exploration Agency that aimed to launch a 1-gigawatt solar station by 2031. As a step in that direction, JAXA says it will demonstrate a small satellite transmitting 1 kilowatt of power to Earth from an altitude of 400 kilometers next year. We’ve also reported on Caltech’s SSPD-1 demonstrator project and the US $100 million from a billionaire donor who funds it. A space solar project would “waste capital that could be better spent improving less risky ways to shore up renewable energy, such as batteries, hydrogen, and grid improvements.” And yet, space-based solar power remains more science fiction than science fact, as Henri Barde writes in “Castles in the Sky?” Barde should know: He recently retired from the European Space Agency, where among other things he evaluated space power systems. As Barde’s article makes abundantly clear, this clean energy would come at an enormous cost, if it can be done at all, “[wasting] capital that could be better spent improving less risky ways to shore up renewable energy, such as batteries, hydrogen, and grid improvements.” For example, U.K.-based Space Solar estimates it will need 68 (!) SpaceX Starship launches to loft all the assets necessary to build one 1.7-km-long solar array in orbit. Nevermind that SpaceX hasn’t yet successfully launched a Starship into orbit and brought it back in one piece. Even if the company can eventually get the price down to $10 million per launch, we’re still talking hundreds of millions of dollars in launch costs alone. We also don’t have real-life Cuties to build such a station. And the ground stations and rectennas necessary for receiving the beamed power and putting it on the grid are still just distant dots on a road map in someone’s multimillion dollar research proposal. Engineers are often inspired by science fiction. But inspiration only gets you so far. Space-based solar power will remain sci-fi fodder for the foreseeable future. For the monumental task of electrifying everything while reducing greenhouse gas emissions, it’s better to focus on solutions based on technology already in hand, like conventional geothermal, nuclear, wind, and Earth-based solar, rather than wasting time, brainpower, and money on a fantasy. This article appears in the June 2024 print issue as “The Chasm Between Imagination and Feasibility.”

  • AI and DEI Spotlighted at IEEE’s Futurist Summit
    by Joanna Goodrich on 31. May 2024. at 18:00

    This year’s IEEE Vision, Innovation, and Challenges Summit and Honors Ceremony, held on 2 and 3 May in Boston, celebrated pioneers in engineering who have developed technologies that changed people’s lives, such as the Internet and GPS. The event also included a trip to the headquarters of cloud service provider Akamai Technologies. Here are highlights of the sessions, which are available on Akamai hosted a panel discussion on 2 May on innovation, moderated by Robert Blumoff, the company’s executive vice president and CTO. The panel featured IEEE Senior Member Simay Akar, IEEE Life Fellow Deepak Divan, and IEEE Fellows Andrea Goldsmith and Tsu-Jae King Liu. Akar is the founder and CEO of AK Energy Consulting, which helps companies meet their sustainability goals. Divan heads Georgia Tech’s Center for Distributed Energy. Goldsmith is Princeton’s dean of engineering and applied sciences, and King Liu is the dean of the University of California, Berkeley’s College of Engineering. The panelists were asked about what or who inspired them to pursue a career in engineering, as well as their thoughts on continuing education and diversity, equity, and inclusion. Most said they were inspired to become engineers by a parent. Goldsmith, the recipient of this year’s IEEE James H. Mulligan Jr. Education Medal, credits her father. He was a mechanical engineering professor at UC Berkeley and suggested she consider majoring in engineering because she excelled in math and science in high school. “When I was young, I didn’t really understand what being an engineer meant,” Goldsmith said at the panel. Because her parents were divorced and she didn’t see her father often, she thought he drove trains. It wasn’t until she was at UC Berkeley, she said, that she realized how technology could change people’s lives for the better. That’s what pushed her to follow in her father’s footsteps. When asked what keeps them motivated to stay in the engineering field, King Liu said that it’s IEEE’s mission of developing technology for the benefit of humanity. She is this year’s IEEE Founders Medal recipient. “Diversity is about excellence. The biggest battle is convincing people who don’t believe that diversity has a positive impact on teams and companies.” —Andrea Goldsmith “Engineering work is done for people and by people,” she said. “I draw inspiration from not only the people we serve, but also the people behind the technology.” The panelists also spoke about the importance of continuing education. “Learning is a lifelong process,” King Liu said. “Engineers need to seek out learning opportunities, whether it’s from having a design fail or from more experienced engineers in their field of interest.” Diversity, equity, and inclusion was a hot discussion topic. “Diversity is about excellence,” Goldsmith said. “The biggest battle is convincing people who don’t believe that diversity has a positive impact on teams and companies. “Another issue is finding ways to bring in diverse talent and helping them achieve their full potential,” she added. “One of the things I’m most proud of is the work I’ve done with IEEE on DEI.” Goldsmith helped launch the IEEE Diversity and Inclusion Committee and is its past chair. Established in 2022 by the IEEE Board of Directors, the committee revised several policies, procedures, and bylaws to ensure that members have a safe and inclusive place for collegial discourse and that all feel welcome. It also launched a website. Robert E. Kahn proudly displays his IEEE Medal of Honor at this year’s IEEE Honors Ceremony. He is accompanied by IEEE President-Elect Kathleen Kramer and IEEE President Tom Couglin.Robb Cohen Photography & Video Career advice and the role of AI in society The IEEE Vision, Innovation, and Challenges Summit got underway on 3 May at the Encore Boston Harbor. It featured a “fireside chat” with Robert E. Kahn followed by discussions with panels of award recipients on topics such as career advice and concerns related to artificial intelligence. Kahn was interviewed by Caroline Hyde, a business and technology journalist. Widely known as one of the “fathers of the Internet,” he is this year’s IEEE Medal of Honor recipient for “pioneering technical and leadership contributions in packet communication technologies and foundations of the Internet.” The IEEE Life Fellow reminisced about his experience collaborating with Vint Cerf on the design of the Transmission Control Protocol and the Internet Protocol. Cerf, an IEEE Life Fellow, is another father of the Internet and the 2023 IEEE Medal of Honor recipient. While working as a program manager in the U.S. Defense Advanced Research Projects Agency’s information processing techniques office in 1973, Kahn and Cerf designed the Internet’s core architecture. One audience member asked Kahn how engineers can create opportunities for young people to collaborate like he and Cerf did. Kahn said that it begins with having a problem to solve, and then thinking about it holistically. He also advised students and young professionals to partner with others when such opportunities arise. The conversation on career advice continued at the Innovation and Collaboration in Leading Technology Laboratories panel. Panelists and IEEE Fellows Eric Evans, Anthony Vetro, and Peter Vetter offered insights on how to be a successful researcher. It’s important to identify the right problem and develop a technology to solve it, said Evans, director of MIT Lincoln Laboratory. When asked what qualities are important for job candidates to showcase when interviewing for a position, Vetro said he looks for employees who are willing to collaborate and are self-driven. Vetro is president and CEO of Mitsubishi Electric Research Labs in Cambridge, Mass. He also stressed the importance of learning how to fail. During the AI and Society: Building a Future with Responsible Innovation session, Juraj Corba, Christopher D. Manning, Renard T. Jenkins, and IEEE Fellow Claire Tomlin discussed how the technology could affect a variety of fields. They agreed the technology is unlikely to replace humans in the workforce. “People need to think of AI systems as tools—like what Photoshop is to a photographer.”- Renard T. Jenkins “People need to think of AI systems as tools—like what Photoshop is to a photographer,” said Jenkins, president of consulting firm I2A2 Technologies, Labs and Studios. “AI doesn’t have learning and adaptability [capabilities] like humans do,” Manning added. The director of Stanford’s Artificial Intelligence Laboratory is this year’s IEEE John von Neumann Medal recipient. “But there is a good role for technology—it can be life-changing for people.” One example he cited was Neuralink’s brain implant, which would enable a person to control a computer “just by thinking,” according to the startup’s founder, Elon Musk. ChatGPT, a generative AI program, has become a hot topic among educators since its launch two years ago, said panel moderator Armen Pischdotchian, data scientist at IBM in Cambridge, Mass. Tomlin, chair of the electrical engineering and computer science department at UC Berkeley, said AI will make education more interactive and provide a better experience. “It will help both students and educators,” said the recipient of this year’s IEEE Mildred Dresselhaus Medal. Pioneers of assistive technology, GPS, and the Internet The highlight of the evening was the Honors Ceremony, which recognized those who had developed technologies such as assistive robots, GPS, and the Internet. The IEEE Spectrum Technology in the Service of Society Award went to startup Hello Robot, headquartered in Atlanta, for its Stretch robot. The machine gives those with a severe disability, such as paralysis, the ability to maintain their independence while living at home. For example, users can operate the robot to feed themselves, scratch an itch, or cover themselves with a blanket. The machine consists of a mobile platform with a single arm that moves up and down a retractable pole. A wrist joint at the end of the arm bends back and forth and controls a gripper, which can grasp nearby objects. Sensors mounted at the base of the arm and a camera located at the top of the pole provide the sensing needed to move around from room to room, avoid obstacles, and pick up small items such as books, eating utensils, and pill bottles. More than six billion people around the world use GPS to navigate their surroundings, according to GPS World. The technology wouldn’t have been possible without Gladys West, who contributed to the mathematical modeling of the shape of the Earth. While working at the Naval Surface Warfare Center, in Dahlgren, Va., she conducted seminal work on satellite geodesy models that was pivotal in the development of the GPS. West, who is 93, retired in 1998 after working at the center for 42 years. For her contributions, she received the IEEE President’s Award. The ceremony concluded with the presentation of the IEEE Medal of Honor to Bob Kahn, who received a standing ovation. “This is the honor of my career,” he said. He ended his speech saying that he “hasn’t stopped yet and still has more to do.”

  • Video Friday: Multitasking
    by Evan Ackerman on 31. May 2024. at 16:00

    Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion. RoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS ICSR 2024: 23–26 October 2024, ODENSE, DENMARK Cybathlon 2024: 25–27 October 2024, ZURICH Enjoy today’s videos! Do you have trouble multitasking? Cyborgize yourself through muscle stimulation to automate repetitive physical tasks while you focus on something else. [ SplitBody ] By combining a 5,000 frame-per-second (FPS) event camera with a 20-FPS RGB camera, roboticists from the University of Zurich have developed a much more effective vision system that keeps autonomous cars from crashing into stuff, as described in the current issue of Nature. [ Nature ] Mitsubishi Electric has been awarded the GUINNESS WORLD RECORDS title for the fastest robot to solve a puzzle cube. The robot’s time of 0.305 second beat the previous record of 0.38 second, for which it received a GUINNESS WORLD RECORDS certificate on 21 May 2024. [ Mitsubishi ] Sony’s AIBO is celebrating its 25th anniversary, which seems like a long time, and it is. But back then, the original AIBO could check your email for you. Email! In 1999! I miss Hotmail. [ AIBO ] SchniPoSa: schnitzel with french fries and a salad. [ Dino Robotics ] Cloth-folding is still a really hard problem for robots, but progress was made at ICRA! [ ICRA Cloth Competition ] Thanks, Francis! MIT CSAIL researchers enhance robotic precision with sophisticated tactile sensors in the palm and agile fingers, setting the stage for improvements in human-robot interaction and prosthetic technology. [ MIT ] We present a novel adversarial attack method designed to identify failure cases in any type of locomotion controller, including state-of-the-art reinforcement-learning-based controllers. Our approach reveals the vulnerabilities of black-box neural network controllers, providing valuable insights that can be leveraged to enhance robustness through retraining. [ Fan Shi ] In this work, we investigate a novel integrated flexible OLED display technology used as a robotic skin-interface to improve robot-to-human communication in a real industrial setting at Volkswagen or a collaborative human-robot interaction task in motor assembly. The interface was implemented in a workcell and validated qualitatively with a small group of operators (n=9) and quantitatively with a large group (n=42). The validation results showed that using flexible OLED technology could improve the operators’ attitude toward the robot; increase their intention to use the robot; enhance their perceived enjoyment, social influence, and trust; and reduce their anxiety. [ Paper ] Thanks, Bram! We introduce InflatableBots, shape-changing inflatable robots for large-scale encountered-type haptics in VR. Unlike traditional inflatable shape displays, which are immobile and limited in interaction areas, our approach combines mobile robots with fan-based inflatable structures. This enables safe, scalable, and deployable haptic interactions on a large scale. [ InflatableBots ] We present a bioinspired passive dynamic foot in which the claws are actuated solely by the impact energy. Our gripper simultaneously resolves the issue of smooth absorption of the impact energy and fast closure of the claws by linking the motion of an ankle linkage and the claws through soft tendons. [ Paper ] In this video, a 3-UPU exoskeleton robot for a wrist joint is designed and controlled to perform wrist extension, flexion, radial-deviation, and ulnar-deviation motions in stroke-affected patients. This is the first time a 3-UPU robot has been used effectively for any kind of task. “UPU” stands for “universal-prismatic-universal” and refers to the actuators—the prismatic joints between two universal joints. [ BAS ] Thanks, Tony! BRUCE Got Spot-ted at ICRA2024. [ Westwood Robotics ] Parachutes: maybe not as good of an idea for drones as you might think. [ Wing ] In this paper, we propose a system for the artist-directed authoring of stylized bipedal walking gaits, tailored for execution on robotic characters. To demonstrate the utility of our approach, we animate gaits for a custom, free-walking robotic character, and show, with two additional in-simulation examples, how our procedural animation technique generalizes to bipeds with different degrees of freedom, proportions, and mass distributions. [ Disney Research ] The European drone project Labyrinth aims to keep new and conventional air traffic separate, especially in busy airspaces such as those expected in urban areas. The project provides a new drone-traffic service and illustrates its potential to improve the safety and efficiency of civil land, air, and sea transport, as well as emergency and rescue operations. [ DLR ] This Carnegie Mellon University Robotics Institute seminar, by Kim Baraka at Vrije Universiteit Amsterdam, is on the topic “Why We Should Build Robot Apprentices and Why We Shouldn’t Do It Alone.” For robots to be able to truly integrate human-populated, dynamic, and unpredictable environments, they will have to have strong adaptive capabilities. In this talk, I argue that these adaptive capabilities should leverage interaction with end users, who know how (they want) a robot to act in that environment. I will present an overview of my past and ongoing work on the topic of human-interactive robot learning, a growing interdisciplinary subfield that embraces rich, bidirectional interaction to shape robot learning. I will discuss contributions on the algorithmic, interface, and interaction design fronts, showcasing several collaborations with animal behaviorists/trainers, dancers, puppeteers, and medical practitioners. [ CMU RI ]

  • Five Cool Tech Demos From the ARPA-E Summit
    by Emily Waltz on 31. May 2024. at 14:25

    Nearly 400 exhibitors representing the boldest energy innovations in the United States came together last week at the annual ARPA-E Energy Innovation Summit. The conference, hosted in Dallas by the U.S. Advanced Research Projects Agency–Energy (ARPA-E), showcased the agency’s bets on early-stage energy technologies that can disrupt the status quo. U.S. Secretary of Energy Jennifer Granholm spoke at the summit. “The people in this room are America’s best hope” in the race to unleash the power of clean energy, she said. “The technologies you create will decide whether we win that race. But no pressure,” she quipped. IEEE Spectrum spent three days meandering the aisles of the showcase. Here are five of our favorite demonstrations. Gas Li-ion batteries thwart extreme cold South 8 Technologies demonstrates the cold tolerance of its Li-ion battery by burying it in ice at the 2024 ARPA-E Energy Innovation Summit. Emily Waltz Made with a liquified gas electrolyte instead of the standard liquid solvent, a new kind of lithium-ion battery that stands up to extreme cold, made by South 8 Technologies in San Diego, won’t freeze until temps drop below –80 °C. That’s a big improvement on conventional Li-ion batteries, which start to degrade when temps reach 0 °C and shut down at about –20 °C. “You lose about half of your range in an electric vehicle if you drive it in the middle of winter in Michigan,” says Cyrus Rustomji, cofounder of South 8. To prove the company’s point, Rustomji and his team set out a bucket of dry ice at nearly –80 °C at their booth at the ARPA-E summit and put flashlights in it—one powered by a South 8 battery and one powered by a conventional Li-ion cell. The latter flashlight went out after about 10 minutes, and South 8’s kept going for the next 15 hours. Rustomji says he expects EV batteries made with South 8’s technology to maintain nearly full range at –40 °C, and gradually degrade in temperatures lower than that. South 8 Technologies Conventional Li-ion batteries use liquid solvents, such as ethylene and dimethyl carbonate, as the electrolyte. The electrolyte serves as a medium through which lithium salt moves from one electrode to the other in the battery, shuttling electricity. When it’s cold, the carbonates thicken, which lowers the power of the battery. They can also freeze, which shuts down all conductivity. South 8 swapped out the carbonate for some industrial liquified gases with low freezing points (a recipe the company won’t disclose). Using liquified gases also reduces fire risk because the gas very quickly evaporates from a damaged battery cell, removing fuel that could burn and cause the battery to catch fire. If a conventional Li-ion battery gets damaged, it can short-circuit and quickly become hot—like over 800 °C hot. This causes the liquid electrolyte to heat adjacent cells and potentially start a fire. There’s another benefit to this battery, and this one will make EV drivers very happy: It will take only 10 minutes to reach an 80 percent charge in EVs powered by these batteries, Rustomji estimates. That’s because liquified gas has a lower viscosity than carbonate-based electrolytes, which allows the lithium salt to move from one electrode to the other at a faster rate, shortening the time it takes to recharge the battery. South 8’s latest improvement is a high-voltage cathode that reduces material costs and could enable fast charging down to 5 minutes for a full charge. “We have the world record for a high-voltage, low-temperature cathode,” says Rustomji. Liquid cooling won’t leak on servers Chilldyne guarantees that its liquid-cooling system won’t leak even if tubes get hacked in half, as IEEE Spectrum editor Emily Waltz demonstrates at the 2024 ARPA-E Energy Innovation Summit. Emily Waltz Data centers need serious cooling technologies to keep servers from overheating, and sometimes air-conditioning just isn’t enough. In fact, the latest Blackwell chips from Nvidia require liquid cooling, which is more energy efficient than air. But liquid cooling tends to make data-center operators nervous. “A bomb won’t do as much damage as a leaky liquid-cooling system,” says Steve Harrington, CEO of Chilldyne. His company, based in Carlsbad, Calif., offers liquid cooling that’s guaranteed not to leak, even if the coolant lines get chopped in half. (They aren’t kidding: Chilldyne brought an axe to its demonstration at ARPA-E and let Spectrum try it out. Watch the blue cooling liquid immediately disappear from the tube after it’s chopped.) Chilldyne The system is leakproof because Chilldyne’s negative-pressure system pulls rather than pushes liquid coolant through tubes, like a vacuum. The tubes wind through servers, absorbing heat through cold plates, and return the warmed liquid to tanks in a cooling distribution unit. This unit transfers the heat outside and supplies cooled liquid back to the servers. If a component anywhere in the cooling loop breaks, the liquid is immediately sucked back into the tanks before it can leak. Key to the technology: low-thermal-resistance cold plates attached to each server’s processors, such as the CPUs or GPUs. The cold plates absorb heat by convection, transferring the heat to the coolant tube that runs through it. Chilldyne optimized the cold plate using corkscrew-shaped metal channels, called turbulators, that force water around them “like little tornadoes,” maximizing the heat absorbed, says Harrington. The company developed the cold plate under an ARPA-E grant and is now measuring the energy savings of liquid cooling through an ARPA-E program. Salvaged mining waste also sequesters CO2 Phoenix Tailings’ senior research scientist Rita Silbernagel explains how mining waste contains useful metals and rare earth elements and can also be used as a place to store carbon dioxide.Emily Waltz Mining leaves behind piles of waste after the commercially viable material is extracted. This waste, known as tailings, can contain rare earth elements and valuable metals that are too difficult to extract with conventional mining techniques. Phoenix Tailings—a startup based in Woburn, Mass.—extracts metals and rare earth elements from tailings in a process that leaves behind no waste and creates no direct carbon dioxide emissions. The company’s process starts with a hydrometallurgical treatment that separates rare earth elements from the tailings, which contain iron, aluminum, and other common elements. Next the company uses a novel solvent extraction method to separate the rare earth elements from one another and purify the desired element in the form of an oxide. The rare earth oxide then undergoes a molten-salt electrolysis process that converts it into a solid metal form. Phoenix Tailings focuses on extracting neodymium, neodymium-praseodymium alloy, dysprosium, and ferro dysprosium alloy, which are rare earth metals used in permanent magnets for EVs, wind turbines, jet engines, and other applications. The company is evaluating several tailings sites in the United States, including in upstate New York. The company has also developed a process to extract metals such as nickel, copper, and cobalt from mining tailings while simultaneously sequestering carbon dioxide. The approach involves injecting CO2 into the tailings, where it reacts with minerals, transforming them into carbonates—compounds that contain the carbonate ion, which contains three oxygen atoms and one carbon atom. After the mineral carbonation process, the nickel or other metals are selectively leached from the mixture, yielding high-quality nickel that can be used by EV-battery and stainless-steel industries. Better still, this whole process, says Rita Silbernagel, senior research scientist at Phoenix Tailings, absorbs more CO2 than it emits. Hydrokinetic turbines: a new business model Emrgy adjusts the height of its hydrokinetic turbines at the 2024 ARPA-E Energy Innovation Summit. The company plans to install them in old irrigation channels to generate renewable energy and new revenue streams for rural communities. Emily Waltz These hydrokinetic turbines run in irrigation channels, generating electricity and revenue for rural communities. Developed by Emrgy in Atlanta, the turbines can change in height and blade pitch based on the flow of the water. The company plans to put them in irrigation channels that were built to bring water from snowmelt in the Rocky Mountains to agricultural areas in the western United States. Emrgy estimates that there are more than 160,000 kilometers of these waterways in the country. The system is aging and losing water, but it’s hard for water districts to justify the cost of repairing them, says Tom Cuthbert, chief technology officer at Emrgy. The company’s solution is to place its hydrokinetic turbines throughout these waterways as a way to generate renewable electricity and pay for upgrades to the irrigation channels. The concept of placing hydrokinetic turbines in waterways isn’t new, but until recent years, connecting them to the grid wasn’t practical. Emrgy’s timing takes advantage of the groundwork laid by the solar power industry. The company has five pilot projects in the works in the United States and New Zealand. “We found that existing water infrastructure is a massive overlooked real estate segment that is ripe for renewable energy development,” says Emily Morris, CEO and founder of Emrgy. Pressurized water stores energy deep underground Quidnet Energy brought a wellhead to the 2024 ARPA-E Energy Innovation Summit to demonstrate its geoengineered energy-storage system.Emily Waltz Quidnet Energy brought a whole wellhead to the ARPA-E summit to demonstrate its underground pumped hydro storage technique. The Houston-based company’s geoengineered system stores energy as pressurized water deep underground. It consists of a surface-level pond, a deep well, an underground reservoir at the end of the well, and a pump system that moves pressurized water from the pond to the underground reservoir and back. The design doesn’t require an elevation change like traditional pumped storage hydropower. Quidnet’s system consists of a surface-level pond, a deep well, an underground reservoir at the end of the well, and a pump system that moves pressurized water from the pond to the underground reservoir and back.Quidnet Energy It works like this: Electricity from renewable sources powers a pump that sends water from the surface pond into a wellhead and down a well that’s about 300 meters deep. At the end of the well, the pressure from the pumped water flows into a previously engineered fracture in the rock, creating a reservoir that’s hundreds of meters wide and sits beneath the weight of the whole column of rock above it, says Bunker Hill, vice president of engineering at Quidnet. The wellhead then closes and the water remains under high pressure, keeping energy stored in the reservoir for days if necessary. When electricity is needed, the well is opened, letting the pressurized water run up the same well. Above ground, the water passes through a hydroelectric turbine, generating 2 to 8 megawatts of electricity. The spent water then returns to the surface pond, ready for the next cycle. “The hard part is making sure the underground reservoir doesn’t lose water,” says Hill. To that end, the company developed customized sealing solutions that get injected into the fracture, sealing in the water.

  • 1-bit LLMs Could Solve AI’s Energy Demands
    by Matthew Hutson on 30. May 2024. at 18:28

    Large language models, the AI systems that power chatbots like ChatGPT, are getting better and better—but they’re also getting bigger and bigger, demanding more energy and computational power. For LLMs that are cheap, fast, and environmentally friendly, they’ll need to shrink, ideally small enough to run directly on devices like cellphones. Researchers are finding ways to do just that by drastically rounding off the many high-precision numbers that store their memories to equal just 1 or -1. LLMs, like all neural networks, are trained by altering the strengths of connections between their artificial neurons. These strengths are stored as mathematical parameters. Researchers have long compressed networks by reducing the precision of these parameters—a process called quantization—so that instead of taking up 16 bits each, they might take up 8 or 4. Now researchers are pushing the envelope to a single bit. How to Make a 1-bit LLM There are two general approaches. One approach, called post-training quantization (PTQ) is to quantize the parameters of a full-precision network. The other approach, quantization-aware training (QAT), is to train a network from scratch to have low-precision parameters. So far, PTQ has been more popular with researchers. In February, a team including Haotong Qin at ETH Zurich, Xianglong Liu at Beihang University, and Wei Huang at the University of Hong Kong introduced a PTQ method called BiLLM. It approximates most parameters in a network using 1 bit, but represents a few salient weights—those most influential to performance—using 2 bits. In one test, the team binarized a version of Meta’s LLaMa LLM that has 13 billion parameters. “One-bit LLMs open new doors for designing custom hardware and systems specifically optimized for 1-bit LLMs.” —Furu Wei, Microsoft Research Asia To score performance, the researchers used a metric called perplexity, which is basically a measure of how surprised the trained model was by each ensuing piece of text. For one dataset, the original model had a perplexity of around 5, and the BiLLM version scored around 15, much better than the closest binarization competitor, which scored around 37 (for perplexity, lower numbers are better). That said, the BiLLM model required about a tenth of the memory capacity as the original. PTQ has several advantages over QAT, says Wanxiang Che, a computer scientist at Harbin Institute of Technology, in China. It doesn’t require collecting training data, it doesn’t require training a model from scratch, and the training process is more stable. QAT, on the other hand, has the potential to make models more accurate, since quantization is built into the model from the beginning. 1-bit LLMs Find Success Against Their Larger Cousins Last year, a team led by Furu Wei and Shuming Ma, at Microsoft Research Asia, in Beijing, created BitNet, the first 1-bit QAT method for LLMs. After fiddling with the rate at which the network adjusts its parameters, in order to stabilize training, they created LLMs that performed better than those created using PTQ methods. They were still not as good as full-precision networks, but roughly 10 times as energy efficient. In February, Wei’s team announced BitNet 1.58b, in which parameters can equal -1, 0, or 1, which means they take up roughly 1.58 bits of memory per parameter. A BitNet model with 3 billion parameters performed just as well on various language tasks as a full-precision LLaMA model with the same number of parameters and amount of training, but it was 2.71 times as fast, used 72 percent less GPU memory, and used 94 percent less GPU energy. Wei called this an “aha moment.” Further, the researchers found that as they trained larger models, efficiency advantages improved. A BitNet model with 3 billion parameters performed just as well on various language tasks as a full-precision LLaMA model. This year, a team led by Che, of Harbin Institute of Technology, released a preprint on another LLM binarization method, called OneBit. OneBit combines elements of both PTQ and QAT. It uses a full-precision pretrained LLM to generate data for training a quantized version. The team’s 13-billion-parameter model achieved a perplexity score of around 9 on one dataset, versus 5 for a LLaMA model with 13 billion parameters. Meanwhile, OneBit occupied only 10 percent as much memory. On customized chips, it could presumably run much faster. Wei, of Microsoft, says quantized models have multiple advantages. They can fit on smaller chips, they require less data transfer between memory and processors, and they allow for faster processing. Current hardware can’t take full advantage of these models, though. LLMs often run on GPUs like those made by Nvidia, which represent weights using higher precision and spend most of their energy multiplying them. New hardware could natively represent each parameter as a -1 or 1 (or 0), and then simply add and subtract values and avoid multiplication. “One-bit LLMs open new doors for designing custom hardware and systems specifically optimized for 1-bit LLMs,” Wei says. “They should grow up together,” Huang, of the University of Hong Kong, says of 1-bit models and processors. “But it’s a long way to develop new hardware.”

  • Build Long-Range IoT Applications Fast With Meshtastic
    by Stephen Cass on 29. May 2024. at 15:00

    Oh me, oh mesh! Many journalists in this business have at least one pet technology that’s never taken off in the way they think it should. Hypersonic passenger planes, deep-sea thermal-energy power plants, chording keyboards—all have their adherents, eager to jump at the chance of covering their infatuation. For me, it’s mesh radio systems, which first captivated me while I was zipping around downtown Las Vegas back in 2004. In that pre-smartphone, practically pre-3G era, I was testing a mesh network deployed by a local startup, downloading files at what was then a mind-boggling rate of 1.5 megabits per second in a moving car. Clearly, mesh and its ad hoc decentralized digital architecture were the future of wireless comms! Alas, in the two decades since, mesh networking has been slow to displace conventional radio systems. It’s popped up on a small scale in things like the Zigbee wireless protocol for the Internet of Things, and in recent years it’s become common to see Wi-Fi networks extended using mesh-based products such as the Eero. But it’s still a technology that I think has yet to fulfill its potential. So I’ve been excited to see the emergence of the open-source Meshtastic protocol, and the proliferation of maker-friendly hardware around it. I had to try it out myself. Meshtastic is built on top of the increasingly popular LoRa (long-range) technology, which relies on spread-spectrum methods to send low-power, low-bandwidth signals over distances up to about 16 kilometers (in perfect conditions) using unlicensed radio bands. Precise frequencies vary by region, but they’re in the 863- to 928-megahertz range. You’re not going to use a Meshtastic network for 1.5-Mb/s downloads, or even voice communications. But you can use it to exchange text messages, location data, and the like in the absence of any other communications infrastructure. The stand-alone communicator [bottom of illustration] can be ordered assembled, or you can build your own from open-source design files. The RAKwireless Meshtastic development board is based around plug-in modules, including the carrier board, an environmental sensor, I/O expander board, radio module, OLED screen, and LoRa and Bluetooth modules.James Provost To test out text messaging, I bought three HelTXT handheld communicators for US $85 each on Tindie. These are essentially just a battery, keyboard, small screen, ESP32-based microcontroller, and a LoRa radio in a 3D-printed enclosure. My original plan was to coerce a couple of my fellow IEEE Spectrum editors to spread out around Manhattan to get a sense of the range of the handhelds in a dense urban environment. By turning an intermediate device on and off, we would demonstrate the relaying of signals between handhelds that would otherwise be out of range of each other. This plan was rendered moot within a few minutes of turning the handhelds on. A test “hello” transmission was greeted by an unexpected “hey.” The handhelds’ default setting is to operate on a public channel, and my test message had been received by somebody with a Meshtastic setup about 4 kilometers away, across the East River. Then I noticed my handheld had detected a bunch of other Meshtastic nodes, including one 5 km away at the southern tip of Manhattan. Clearly, range was not going to be an issue, even with a forest of skyscrapers blocking the horizon. Indeed, given the evident popularity of Meshtastic, it was going to be impossible to test the communicators in isolation! (Two Spectrum editors live in Minnesota, so I hope to persuade them to try the range tests with fewer Meshtastic users per square kilometer.) I turned to my next test idea—exchanging real-time data and commands via the network. I bought a $25 WisBlock meshtastic starter kit from RAKwireless, which marries a LoRA radio/microcontroller and an expansion board. This board can accommodate a selection of cleverly designed and inexpensive plug-in hardware modules, including sensors and displays. The radio has both LoRa and Bluetooth antennas, and there’s a nice smartphone app that uses the Bluetooth connection to relay text messages through the radio and configure many settings. You can also configure the radios via a USB cable and a Python command-line-interface program. In addition to basic things like establishing private encrypted channels, you can enable a number of software modules in the firmware. These modules are designed to accomplish common tasks, such as periodically reading and transmitting data from an attached environmental sensor plug-in. Probably the most useful software module is the serial module, which lets the Meshtastic hardware act as a gateway between the radio network and a second microcontroller running your own custom IoT application, communicating via a two- or three-wire connection. The Meshtastic protocol has seen significant evolution. In the initial system, any node that heard a broadcast would rebroadcast it, leading to local congestion [top row]. But now, signal strength is used as a proxy for distance, with more-distant nodes broadcasting first. Nodes that hear a broadcast twice will not rebroadcast it, reducing congestion [bottom row].James Provost For my demo, I wired up a button and an LED to an Adafruit Grand Central board running CircuitPython. (I chose this board because its 3.3-volt levels are compatible with the RAKwireless hardware.) I programmed the Grand Central to send an ASCII-encoded message to the RAKwireless radio over a serial connection when I pressed the button, and to illuminate the LED if it received an ASCII string containing the word “btn.” On the radio side, I used a plug-in I/O expander to connect the serial transmit and receive wires. The tricky part was mapping the pin names as labeled on the adapter with the corresponding microcontroller pins. You need to know the microcontroller pins when setting up the receive and transmit pins with the serial module, as it doesn’t know how the adapter is set up. But after some paging through the documentation, I eventually found the mapping. I pressed the button connected to my Grand Central microcontroller, and “button down” instantly popped up on my handheld communicators. Then I sent “btn,” and the LED lit up. Success! With that proof of concept done, pretty much anything else is doable as well. Will makers building applications on top of Meshtastic lead to the mesh renaissance I’ve been waiting for? With more hands on deck, I hope to see some surprising uses emerge that will make the case for mesh better than any starry-eyed argument from me.

  • Using AI to Clear Land Mines in Ukraine
    by Eliza Strickland on 29. May 2024. at 09:00

    Stephen Cass: Hello. I’m Stephen Cass, Special Projects Director at IEEE Spectrum. Before starting today’s episode hosted by Eliza Strickland, I wanted to give you all listening out there some news about this show. This is our last episode of Fixing the Future. We’ve really enjoyed bringing you some concrete solutions to some of the world’s toughest problems, but we’ve decided we’d like to be able to go deeper into topics than we can in the course of a single episode. So we’ll be returning later in the year with a program of limited series that will enable us to do those deep dives into fascinating and challenging stories in the world of technology. I want to thank you all for listening and I hope you’ll join us again. And now, on to today’s episode. Eliza Strickland: Hi, I’m Eliza Strickland for IEEE Spectrum‘s Fixing the Future podcast. Before we start, I want to tell you that you can get the latest coverage from some of Spectrum’s most important beats, including AI, climate change, and robotics, by signing up for one of our free newsletters. Just go to to subscribe. Around the world, about 60 countries are contaminated with land mines and unexploded ordnance, and Ukraine is the worst off. Today, about a third of its land, an area the size of Florida, is estimated to be contaminated with dangerous explosives. My guest today is Gabriel Steinberg, who co-founded both the nonprofit Demining Research Community and the startup Safe Pro AI with his friend, Jasper Baur. Their technology uses drones and artificial intelligence to radically speed up the process of finding land mines and other explosives. Okay, Gabriel, thank you so much for joining me on Fixing the Future today. Gabriel Steinberg: Yeah, thank you for having me. Strickland: So I want to start by hearing about the typical process for demining, and so the standard operating procedure. What tools do people use? How long does it take? What are the risks involved? All that kind of stuff. Steinberg: Sure. So humanitarian demining hasn’t changed significantly. There’s been evolutions, of course, since its inception and about the end of World War I. But mostly, the processes have been the same. People stand from a safe location and walk around an area in areas that they know are safe, and try to get as much intelligence about the contamination as they can. They ask villagers or farmers, people who work around the area and live around the area, about accidents and potential sightings of minefields and former battle positions and stuff. The result of this is a very general idea, a polygon, of where the contamination is. After that polygon and some prioritization based on danger to civilians and economic utility, the field goes into clearance. The first part is the non-technical survey, and then this is clearance. Clearance happens one of three ways, usually, but it always ends up with a person on the ground basically doing extreme gardening. They dig out a certain standard amount of the soil, usually 13 centimeters. And with a metal detector, they walk around the field and a mine probe. They find the land mines and nonexploded ordnance. So that always is how it ends. To get to that point, you can also use mechanical assets, which are large tillers, and sometimes dogs and other animals are used to walk in lanes across the contaminated polygon to sniff out the land mines and tell the clearance operators where the land mines are. Strickland: How do you hope that your technology will change this process? Steinberg: Well, my technology is a drone-based mapping solution, basically. So we provide a software to the humanitarian deminers. They are already flying drones over these areas. Really, it started ramping up in Ukraine. The humanitarian demining organizations have started really adopting drones just because it’s such a massive problem. The extent is so extreme that they need to innovate. So we provide AI and mapping software for the deminers to analyze their drone imagery much more effectively. We hope that this process, or our software, will decrease the amount of time that deminers use to analyze the imagery of the land, thereby more quickly and more effectively constraining the areas with the most contamination. So if you can constrain an area, a polygon with a certainty of contamination and a high density of contamination, then you can deploy the most expensive parts of the clearance process, which are the humans and the machines and the dogs. You can deploy them to a very specific area. You can much more cost-effectively and efficiently demine large areas. Strickland: Got it. So it doesn’t replace the humans walking around with metal detectors and dogs, but it gets them to the right spots faster. Steinberg: Exactly. Exactly. At the moment, there is no conception of replacing a human in demining operations, and people that try to push that eventuality are usually disregarded pretty quickly. Strickland: How did you and your co-founder, Jasper, first start experimenting with the use of drones and AI for detecting explosives? Steinberg: So it started in 2016 with my partner, Jasper Baur, doing a research project at Binghamton University in the remote sensing and geophysics lab. And the project was to detect a specific anti-personnel land mine, the PFM-1. Then found— it’s a Russian-made land mine. It was previously found in Afghanistan. It still is found in Afghanistan, but it’s found in much higher quantities right now in Ukraine. And so his project was to detect the PFM-1 anti-personnel land mine using thermal imagery from drones. It sort of snowballed into quite an intensive research project. It had multiple papers from it, multiple researchers, some awards, and most notably, it beat NASA at a particular Tech Briefs competition. So that was quite a morale boost. And at some point, Jasper had the idea to integrate AI into the project. Rightfully, he saw the real bottleneck as not the detecting of land mines in drone imagery, but the analysis of land mines in drone imagery. And that really has become— I mean, he knew, somehow, that that would really become the issue that everybody is facing. And everybody we talked to in Ukraine is facing that issue. So machine learning really was the key for solving that problem. And I joined the project in 2018 to integrate machine learning into the research project. We had some more papers, some more presentations, and we were nearing the end of our college tenure, of our undergraduate degree, in 2020. So at that time– but at that time, we realized how much the field needed this. We started getting more and more into the mine action field, and realizing how neglected the field was in terms of technology and innovation. And we felt an obligation to bring our technology, really, to the real world instead of just a research project. There were plenty of research projects about this, but we knew that it could be more and that it should. It really should be more. And we felt we had the– for some reason, we felt like we had the capability to make that happen. So we formed a nonprofit, the Demining Research Community, in 2020 to try to raise some funding for this project. Our for-profit end of that, of our endeavors, was acquired by a company called Safe Pro Group in 2023. Yeah, 2023, about one year ago exactly. And the drone and AI technology became Safe Pro AI and our flagship product spotlight. And that’s where we’re bringing the technology to the real world. The Demining Research Community is providing resources for other organizations who want to do a similar thing, and is doing more research into more nascent technologies. But yeah, the real drone and AI stuff that’s happening in the real world right now is through Safe Pro. Strickland: So in that early undergraduate work, you were using thermal sensors. I know now the Spotlight AI system is using more visual. Can you talk about the different modalities of sensing explosives and the sort of trade-offs you get with them? Steinberg: Sure. So I feel like I should preface this by saying the more high tech and nascent the technology is, the more people want to see it apply to land mine detection. But really, we have found from the problems that people are facing, by far the most effective modality right now is just visual imagery. People have really good visual sensors built into their face, and you don’t need a trained geophysicist to observe the data and very, very quickly get actionable intelligence. There’s also plenty of other benefits. It’s cheaper, much more readily accessible in Ukraine and around the world to get built-in visual sensors on drones. And yeah, just processing the data, and getting the intelligence from the data, is way easier than anything else. I’ll talk about three different modalities. Well, I guess I could talk about four. There’s thermal, ground penetrating radar, magnetometry, and lidar. So thermal is what we started with. Thermal is really good at detecting living things, as I’m sure most people can surmise. But it’s also pretty good at detecting land mines, mostly large anti-tank land mines buried under a couple millimeters, or up to a couple centimeters, of soil. It’s not super good at this. The research is still not super conclusive, and you have to do it at a very specific time of day, in the morning and at night when, basically the soil around the land mine heats up faster than the land mine and you cause a thermal anomaly, or the sun causes a thermal anomaly. So it can detect things, land mines, in some amount of depth in certain soils, in certain weather conditions, and can only detect certain types of land mines that are big and hefty enough. So yeah, that’s thermal. Ground penetrating radar is really good for some things. It’s not really great for land mine detection. You have to have really expensive equipment. It takes a really long time to do the surveys. However, it can get plastic land mines under the surface. And it’s kind of the only modality that can do that with reliability. However, you need to train geophysicists to analyze the data. And a lot of the time, the signatures are really non-unique and there’s going to be a lot of false positives. Magnetometry is the other-- by the way, all of this is airborne that I’m referring to. Ground-based GPR and magnetometry are used in demining of various types, but airborne is really what I’m talking about. For magnetometry, it’s more developed and more capable than ground penetrating radar. It’s used, actually, in the field in Ukraine in some scenarios, but it’s still very expensive. It needs a trained geophysicist to analyze the data, and the signatures are non-unique. So whether it’s a bottle can or a small anti-personnel land mine, you really don’t know until you dig it up. However, I think if I were to bet on one of the other modalities becoming increasingly useful in the next couple of years, it would be airborne magnetometry. Lidar is another modality that people use. It’s pretty quick, also very expensive, but it can reliably map and find surface anomalies. So if you want to find former fighting positions, sometimes an indicator of that is a trench line or foxholes. Lidar is really good at doing that in conflicts from long ago. So there’s a paper that the HALO Trust published of flying a lidar mission over former fighting positions, I believe, in Angola. And they reliably found a former trench line. And from that information, they confirmed that as a hazardous area. Because if there is a former front line on this position, you can pretty reliably say that there is going to be some explosives there. Strickland: And so you’ve done some experiments with some of these modalities, but in the end, you found that the visual sensor was really the best bet for you guys? Steinberg: Yeah. It’s different. The requirements are different for different scenarios and different locations, really. Ukraine has a lot of surface ordnance. Yeah. And that’s really the main factor that allows visual imagery to be so powerful. Strickland: So tell me about what role machine learning plays in your Spotlight AI software system. Did you create a model trained on a lot of— did you create a model based on a lot of data showing land mines on the surface? Steinberg: Yeah. Exactly. We used real-world data from inert, non-explosive items, and flew drone missions over them, and did some physical augmentation and some programmatic augmentation. But all of the items that we are training on are real-life Russian or American ordnance, mostly. We’re also using the real-world data in real minefields that we’re getting from Ukraine right now. That is, obviously, the most valuable data and the most effective in building a machine learning model. But yeah, a lot of our data is from inert explosives, as well. Strickland: So you’ve talked a little bit about the current situation in Ukraine, but can you tell me more about what people are dealing with there? Are there a lot of areas where the battle has moved on and civilians are trying to reclaim roads or fields? Steinberg: Yeah. So the fighting is constantly ongoing, obviously, in eastern Ukraine, but I think sometimes there’s a perspective of a stalemate. I think that’s a little misleading. There’s lots of action and violence happening on the front line, which constantly contaminates, cumulatively, the areas that are the front line and the gray zone, as well as areas up to 50 kilometers back from both sides. So there’s constantly artillery shells going into villages and cities along the front line. There’s constantly land mines, new mines, being laid to reinforce the positions. And there’s constantly mortars. And everything is constant. In some fights—I just watched the video yesterday—one of the soldiers said you could not count to five without an explosion going off. And this is just one location in one city along the front. So you can imagine the amount of explosive ordnance that are being fired, and inevitably 10, 20, 30 percent of them are sometimes not exploding upon impact, on top of all the land mines that are being purposely laid and not detonating from a vehicle or a person. These all just remain after the war. They don’t go anywhere. So yeah, Ukraine is really being littered with explosive ordnance and land mines every day. This past year, there hasn’t been terribly much movement on the front line. But in the Ukrainian counteroffensive in 2020— I guess the last major Ukrainian counteroffensive where areas of Mykolaiv, which is in the southeast, were reclaimed, the civilians started repopulating the city almost immediately. There are definitely some villages that are heavily contaminated, that people just deserted and never came back to, and still haven’t come back to after them being liberated. But a lot of the areas that have been liberated, they’re people’s homes. And even if they’re destroyed, people would rather be in their homes than be refugees. And I mean, I totally understand that. And it just puts the responsibility on the deminers and the Ukrainian government to try to clear the land as fast as possible. Because after large liberations are made, people want to come back almost all the time. So it is a very urgent problem as the lines change and as land is liberated. Strickland: And I think it was about a year ago that you and Jasper went to the Ukraine for a technology demonstration set up by the United Nations. Can you tell about that, and what the task was, and how your technology fared? Steinberg: Sure. So yeah, the United Nations Development Program invited us to do a demonstration in northern Ukraine to see how our technology, and other technologies similar to it, performed in a military training facility in Ukraine. So everybody who’s doing this kind of thing, which is not many people, but there are some other organizations, they have their own metrics and their own test fields— not always, but it would be good if they did. But the UNDP said, “No, we want to standardize this and try to give recommendations to the organizations on the ground who are trying to adopt these technologies.” So we had five hours to survey the field and collect as much data as we could. And then we had 72 hours to return the results. We— Strickland: Sorry. How big was the field? Steinberg: The field was 25 hectares. So yeah, the audience at home can type 25 hectares to amount of football fields. I think it’s about 60. But it’s a large area. So we’d never done anything like that. That was really, really a shock that it was that large of an area. I think we’d only done half a hectare at a time up to that point. So yeah, it was pretty daunting. But we basically slept very, very little in those 72 hours, and as a result, produced what I think is one of the best results that the UNDP got from that test. We didn’t detect everything, but we detected most of the ordnance and land mines that they had laid. We also detected some that they didn’t know were there because it was a military training facility. So there were some mortars being fired that they didn’t know about. Strickland: And I think Jasper told me that you had to sort of rewrite your software on the fly. You realized that the existing approach wasn’t going to work and you had to do some all-nighter to recode? Steinberg: Yeah. Yeah, I remember us sitting in a Georgian restaurant— Georgia, the country, not the state, and racking our brain, trying to figure out how we were going to map this amount of land. We just found out how big the area was going to be and we were a little bit stunned. So we devised a plan to do it in two stages. The first stage was where we figured out in the drone images where the contaminated regions were. And then the second stage was to map those areas, just those areas. Now, our software can actually map the whole thing, and pretty casually too. So not to brag. But at the time, we had lots less development under our belt. And yeah, therefore we just had to brute force it through Georgian food and brainpower. Strickland: You and Jasper just got back from another trip to the Ukraine a couple of weeks ago, I think. Can you talk about what you were doing on this trip, and who you met with? Steinberg: Sure. This trip was much less stressful, although stressful in different ways than the UNDP demo. Our main objectives were to see operations in action. We had never actually been to real minefields before. We’d been in some perhaps contaminated areas, but never in a real minefield where you can say, “Here was the Russian position. There are the land mines. Do not go there.” So that was one of the main objectives. That was very powerful for us to see the villages that were destroyed and are denied to the citizens because of land mines and unexploded ordnance. It’s impossible to describe how that feels being there. It’s really impactful, and it makes the work that I’m doing feel not like I have a choice anymore. I feel very much obligated to do my absolute best to help these people. Strickland: Well, I hope your work continues. I hope there’s less and less need for it over time. But yeah, thank you for doing this. It’s important work. And thanks for joining me on Fixing the Future. Steinberg: My pleasure. Thank you for having me. Strickland: That was Gabriel Steinberg speaking to me about the technology that he and Jasper Baur developed to help rid the world of land mines. I’m Eliza Strickland, and I hope you’ll join us next time on Fixing the Future.

  • The Forgotten History of Chinese Keyboards
    by Thomas S. Mullaney on 28. May 2024. at 19:00

    Today, typing in Chinese works by converting QWERTY keystrokes into Chinese characters via a software interface, known as an input method editor. But this was not always the case. Thomas S. Mullaney’s new book, The Chinese Computer: A Global History of the Information Age, published by the MIT Press, unearths the forgotten history of Chinese input in the 20th century. In this article, which was adapted from an excerpt of the book, he details the varied Chinese input systems of the 1960s and ’70s that renounced QWERTY altogether. “This will destroy China forever,” a young Taiwanese cadet thought as he sat in rapt attention. The renowned historian Arnold J. Toynbee was on stage, delivering a lecture at Washington and Lee University on “A Changing World in Light of History.” The talk plowed the professor’s favorite field of inquiry: the genesis, growth, death, and disintegration of human civilizations, immortalized in his magnum opus A Study of History. Tonight’s talk threw the spotlight on China. China was Toynbee’s outlier: Ancient as Egypt, it was a civilization that had survived the ravages of time. The secret to China’s continuity, he argued, was character-based Chinese script. Character-based script served as a unifying medium, placing guardrails against centrifugal forces that might otherwise have ripped this grand and diverse civilization apart. This millennial integrity was now under threat. Indeed, as Toynbee spoke, the government in Beijing was busily deploying Hanyu pinyin, a Latin alphabet–based Romanization system. The Taiwanese cadet listening to Toynbee was Chan-hui Yeh, a student of electrical engineering at the nearby Virginia Military Institute (VMI). That evening with Arnold Toynbee forever altered the trajectory of his life. It changed the trajectory of Chinese computing as well, triggering a cascade of events that later led to the formation of arguably the first successful Chinese IT company in history: Ideographix, founded by Yeh 14 years after Toynbee stepped offstage. During the late 1960s and early 1970s, Chinese computing underwent multiple sea changes. No longer limited to small-scale laboratories and solo inventors, the challenge of Chinese computing was taken up by engineers, linguists, and entrepreneurs across Asia, the United States, and Europe—including Yeh’s adoptive home of Silicon Valley. Chan-hui Yeh’s IPX keyboard featured 160 main keys, with 15 characters each. A peripheral keyboard of 15 keys was used to select the character on each key. Separate “shift” keys were used to change all of the character assignments of the 160 keys. Computer History Museum The design of Chinese computers also changed dramatically. None of the competing designs that emerged in this era employed a QWERTY-style keyboard. Instead, one of the most successful and celebrated systems—the IPX, designed by Yeh—featured an interface with 120 levels of “shift,” packing nearly 20,000 Chinese characters and other symbols into a space only slightly larger than a QWERTY interface. Other systems featured keyboards with anywhere from 256 to 2,000 keys. Still others dispensed with keyboards altogether, employing a stylus and touch-sensitive tablet, or a grid of Chinese characters wrapped around a rotating cylindrical interface. It’s as if every kind of interface imaginable was being explored except QWERTY-style keyboards. IPX: Yeh’s 120-dimensional hypershift Chinese keyboard Yeh graduated from VMI in 1960 with a B.S. in electrical engineering. He went on to Cornell University, receiving his M.S. in nuclear engineering in 1963 and his Ph.D. in electrical engineering in 1965. Yeh then joined IBM, not to develop Chinese text technologies but to draw upon his background in automatic control to help develop computational simulations for large-scale manufacturing plants, like paper mills, petrochemical refineries, steel mills, and sugar mills. He was stationed in IBM’s relatively new offices in San Jose, Calif. Toynbee’s lecture stuck with Yeh, though. While working at IBM, he spent his spare time exploring the electronic processing of Chinese characters. He felt convinced that the digitization of Chinese must be possible, that Chinese writing could be brought into the computational age. Doing so, he felt, would safeguard Chinese script against those like Chairman Mao Zedong, who seemed to equate Chinese modernization with the Romanization of Chinese script. The belief was so powerful that Yeh eventually quit his good-paying job at IBM to try and save Chinese through the power of computing. Yeh started with the most complex parts of the Chinese lexicon and worked back from there. He fixated on one character in particular: ying 鷹 (“eagle”), an elaborate graph that requires 24 brushstrokes to compose. If he could determine an appropriate data structure for such a complex character, he reasoned, he would be well on his way. Through careful analysis, he determined that a bitmap comprising 24 vertical dots and 20 horizontal dots would do the trick, taking up 60 bytes of memory, excluding metadata. By 1968, Yeh felt confident enough to take the next big step—to patent his project, nicknamed “Iron Eagle.” The Iron Eagle project quickly garnered the interest of the Taiwanese military. Four years later, with the promise of Taiwanese government funding, Yeh founded Ideographix, in Sunnyvale, Calif. A single key of the IPX keyboard contained 15 characters. This key contains the character zhong (中 “central”), which is necessary to spell “China.” MIT Press The flagship product of Ideographix was the IPX, a computational typesetting and transmission system for Chinese built upon the complex orchestration of multiple subsystems. The marvel of the IPX system was the keyboard subsystem, which enabled operators to enter a theoretical maximum of 19,200 Chinese characters despite its modest size: 59 centimeters wide, 37 cm deep, and 11 cm tall. To achieve this remarkable feat, Yeh and his colleagues decided to treat the keyboard not merely as an electronic peripheral but as a full-fledged computer unto itself: a microprocessor-controlled “intelligent terminal” completely unlike conventional QWERTY-style devices. Seated in front of the IPX interface, the operator looked down on 160 keys arranged in a 16-by-10 grid. Each key contained not a single Chinese character but a cluster of 15 characters arranged in a miniature 3-by-5 array. Those 160 keys with 15 characters on each key yielded 2,400 Chinese characters. The process of typing on the IPX keyboard involved using a booklet of characters used to depress one of 160 keys, selecting one of 15 numbers to pick a character within the key, and using separate “shift” keys to indicate when a page of the booklet was flipped. MIT Press Chinese characters were not printed on the keys, the way that letters and numbers are emblazoned on the keys of QWERTY devices. The 160 keys themselves were blank. Instead, the 2,400 Chinese characters were printed on laminated paper, bound together in a spiral-bound booklet that the operator laid down flat atop the IPX interface.The IPX keys weren’t buttons, as on a QWERTY device, but pressure-sensitive pads. An operator would push down on the spiral-bound booklet to depress whichever key pad was directly underneath. To reach characters 2,401 through 19,200, the operator simply turned the spiral-bound booklet to whichever page contained the desired character. The booklets contained up to eight pages—and each page contained 2,400 characters—so the total number of potential symbols came to just shy of 20,000. For the first seven years of its existence, the use of IPX was limited to the Taiwanese military. As years passed, the exclusivity relaxed, and Yeh began to seek out customers in both the private and public sectors. Yeh’s first major nonmilitary clients included Taiwan’s telecommunication administration and the National Taxation Bureau of Taipei. For the former, the IPX helped process and transmit millions of phone bills. For the latter, it enabled the production of tax return documents at unprecedented speed and scale. But the IPX wasn’t the only game in town. Loh Shiu-chang, a professor at the Chinese University of Hong Kong, developed what he called “Loh’s keyboard” (Le shi jianpan 樂氏鍵盤), featuring 256 keys. Loh Shiu-chang Mainland China’s “medium-sized” keyboards By the mid-1970s, the People’s Republic of China was far more advanced in the arena of mainframe computing than most outsiders realized. In July 1972, just months after the famed tour by U.S. president Richard Nixon, a veritable blue-ribbon committee of prominent American computer scientists visited the PRC. The delegation visited China’s main centers of computer science at the time, and upon learning what their counterparts had been up to during the many years of Sino-American estrangement, the delegation was stunned. But there was one key arena of computing that the delegation did not bear witness to: the computational processing of Chinese characters. It was not until October 1974 that mainland Chinese engineers began to dive seriously into this problem. Soon after, in 1975, the newly formed Chinese Character Information Processing Technology Research Office at Peking University set out upon the goal of creating a “Chinese Character Information Processing and Input System” and a “Chinese Character Keyboard.” The group evaluated more than 10 proposals for Chinese keyboard designs. The designs fell into three general categories: a large-keyboard approach, with one key for every commonly used character; a small-keyboard approach, like the QWERTY-style keyboard; and a medium-size keyboard approach, which attempted to tread a path between these two poles. Peking University’s medium-sized keyboard design included a combination of Chinese characters and character components, as shown in this explanatory diagram. Public Domain The team leveled two major criticisms against QWERTY-style small keyboards. First, there were just too few keys, which meant that many Chinese characters were assigned identical input sequences. What’s more, QWERTY keyboards did a poor job of using keys to their full potential. For the most part, each key on a QWERTY keyboard was assigned only two symbols, one of which required the operator to depress and hold the shift key to access. A better approach, they argued, was the technique of “one key, many uses”— yijian duoyong—assigning each key a larger number of symbols to make the most use of interface real estate. The team also examined the large-keyboard approach, in which 2,000 or more commonly used Chinese characters were assigned to a tabletop-size interface. Several teams across China worked on various versions of these large keyboards. The Peking team, however, regarded the large-keyboard approach as excessive and unwieldy. Their goal was to exploit each key to its maximum potential, while keeping the number of keys to a minimum. After years of work, the team in Beijing settled upon a keyboard with 256 keys, 29 of which would be dedicated to various functions, such as carriage return and spacing, and the remaining 227 used to input text. Each keystroke generated an 8-bit code, stored on punched paper tape (hence the choice of 256, or 28, keys). These 8-bit codes were then translated into a 14-bit internal code, which the computer used to retrieve the desired character. In their assignment of multiple characters to individual keys, the team’s design was reminiscent of Ideographix’s IPX machine. But there was a twist. Instead of assigning only full-bodied, stand-alone Chinese characters to each key, the team assigned a mixture of both Chinese characters and character components. Specifically, each key was associated with up to four symbols, divided among three varieties: full-body Chinese characters (limited to no more than two per key) partial Chinese character components (no more than three per key) the uppercase symbol, reserved for switching to other languages (limited to one per key) In all, the keyboard contained 423 full-body Chinese characters and 264 character components. When arranging these 264 character components on the keyboard, the team hit upon an elegant and ingenious way to help operators remember the location of each: They treated the keyboard as if it were a Chinese character itself. The team placed each of the 264 character components in the regions of the keyboard that corresponded to the areas where they usually appeared in Chinese characters. In its final design, the Peking University keyboard was capable of inputting a total of 7,282 Chinese characters, which in the team’s estimation would account for more than 90 percent of all characters encountered on an average day. Within this character set, the 423 most common characters could be produced via one keystroke; 2,930 characters could be produced using two keystrokes; and a further 3,106 characters could be produced using three keystrokes. The remaining 823 characters required four or five keystrokes. The Peking University keyboard was just one of many medium-size designs of the era. IBM created its own 256-key keyboard for Chinese and Japanese. In a design reminiscent of the IPX system, this 1970s-era keyboard included a 12-digit keypad with which the operator could “shift” between the 12 full-body Chinese characters outfitted on each key (for a total of 3,072 characters in all). In 1980, Chinese University of Hong Kong professor Loh Shiu-chang developed what he called “Loh’s keyboard” (Le shi jianpan 樂氏鍵盤), which also featured 256 keys. But perhaps the strangest Chinese keyboard of the era was designed in England. The cylindrical Chinese keyboard On a winter day in 1976, a young boy in Cambridge, England, searched for his beloved Meccano set. A predecessor of the American Erector set, the popular British toy offered aspiring engineers hours of modular possibility. Andrew had played with the gears, axles, and metal plates recently, but today they were nowhere to be found. Wandering into the kitchen, he caught the thief red-handed: his father, the Cambridge University researcher Robert Sloss. For three straight days and nights, Sloss had commandeered his son’s toy, engrossed in the creation of a peculiar gadget that was cylindrical and rotating. It riveted the young boy’s attention—and then the attention of the Telegraph-Herald, which dispatched a journalist to see it firsthand. Ultimately, it attracted the attention and financial backing of the U.K. telecommunications giant Cable & Wireless. Robert Sloss was building a Chinese computer. The elder Sloss was born in 1927 in Scotland. He joined the British navy, and was subjected to a series of intelligence tests that revealed a proclivity for foreign languages. In 1946 and 1947, he was stationed in Hong Kong. Sloss went on to join the civil service as a teacher and later, in the British air force, became a noncommissioned officer. Owing to his pedagogical experience, his knack for language, and his background in Asia, he was invited to teach Chinese at Cambridge and appointed to a lectureship in 1972. At Cambridge, Sloss met Peter Nancarrow. Twelve years Sloss’s junior, Nancarrow trained as a physicist but later found work as a patent agent. The bearded 38-year-old then taught himself Norwegian and Russian as a “hobby” before joining forces with Sloss in a quest to build an automatic Chinese-English translation machine. In 1976, Robert Sloss and Peter Nancarrow designed the Ideo-Matic Encoder, a Chinese input keyboard with a grid of 4,356 keys wrapped around a cylinder. PK Porthcurno They quickly found that the choke point in their translator design was character input— namely, how to get handwritten Chinese characters, definitions, and syntax data into a computer. Over the following two years, Sloss and Nancarrow dedicated their energy to designing a Chinese computer interface. It was this effort that led Sloss to steal and tinker with his son’s Meccano set. Sloss’s tinkering soon bore fruit: a working prototype that the duo called the “Binary Signal Generator for Encoding Chinese Characters into Machine-compatible form”—also known as the Ideo-Matic Encoder and the Ideo-Matic 66 (named after the machine’s 66-by-66 grid of characters). Each cell in the machine’s grid was assigned a binary code corresponding to the X-column and the Y-row values. In terms of total space, each cell was 7 millimeters squared, with 3,500 of the 4,356 cells dedicated to Chinese characters. The rest were assigned to Japanese syllables or left blank. The distinguishing feature of Sloss and Nancarrow’s interface was not the grid, however. Rather than arranging their 4,356 cells across a rectangular interface, the pair decided to wrap the grid around a rotating, tubular structure. The typist used one hand to rotate the cylindrical grid and the other hand to move a cursor left and right to indicate one of the 4,356 cells. The depression of a button produced a binary signal that corresponded to the selected Chinese character or other symbol. The Ideo-Matic Encoder was completed and delivered to Cable & Wireless in the closing years of the 1970s. Weighing in at 7 kilograms and measuring 68 cm wide, 57 cm deep, and 23 cm tall, the machine garnered industry and media attention. Cable & Wireless purchased rights to the machine in hopes of mass-manufacturing it for the East Asian market. QWERTY’s comeback The IPX, the Ideo-Matic 66, Peking University’s medium-size keyboards, and indeed all of the other custom-built devices discussed here would soon meet exactly the same fate—oblivion. There were changes afoot. The era of custom-designed Chinese text-processing systems was coming to an end. A new era was taking shape, one that major corporations, entrepreneurs, and inventors were largely unprepared for. This new age has come to be known by many names: the software revolution, the personal-computing revolution, and less rosily, the death of hardware. From the late 1970s onward, custom-built Chinese interfaces steadily disappeared from marketplaces and laboratories alike, displaced by wave upon wave of Western-built personal computers crashing on the shores of the PRC. With those computers came the resurgence of QWERTY for Chinese input, along the same lines as the systems used by Sinophone computer users today—ones mediated by a software layer to transform the Latin alphabet into Chinese characters. This switch to typing mediated by an input method editor, or IME, did not lead to the downfall of Chinese civilization, as the historian Arnold Toynbee may have predicted. However, it did fundamentally change the way Chinese speakers interact with the digital world and their own language. This article appears in the June 2024 print issue.

  • Physics Nobel Laureate Herbert Kroemer Dies at 95
    by Amanda Davis on 28. May 2024. at 18:00

    Herbert Kroemer Nobel Laureate Life Fellow, 95; died 8 March Kroemer, a pioneering physicist, is a Nobel laureate, receiving the 2000 Nobel Prize in Physics for developing semiconductor heterostructures for high-speed and opto-electronics. The devices laid the foundation for the modern era of microchips, computers, and information technology. Heterostructures describe the interfaces between two semiconductors that serve as the building blocks between more elaborate nanostructures. He also received the 2002 IEEE Medal of Honor for “contributions to high-frequency transistors and hot-electron devices, especially heterostructure devices from heterostructure bipolar transistors to lasers, and their molecular beam epitaxy technology.” Kroemer was professor emeritus of electrical and computer engineering at the University of California, Santa Barbara, when he died. He began his career in 1952 at the telecommunications research laboratory of the German Postal Service, in Darmstadt. The postal service also ran the telephone system and had a small semiconductor research group, which included Kroemer and about nine other scientists, according to IEEE Spectrum. In the mid-1950s, he took a research position at RCA Laboratories, in Princeton, N.J. There, Kroemer originated the concept of the heterostructure bipolar transistor (HBT), a device that contains differing semiconductor materials for the emitter and base regions, creating a heterojunction. HBTs can handle high-frequency signals (up to several thousand gigahertz) and are commonly used in radio frequency systems, including RF power amplifiers in cell phones. In 1957, he returned to Germany to research potential uses of gallium arsenide at Phillips Research Laboratory, in Hamburg. Two years later, Kroemer moved back to the United States to join Varian Associates, an electronics company in Palo Alto, Calif., where he invented the double heterostructure laser. It was the first laser to operate continuously at room temperature. The innovation paved the way for semiconductor lasers used in CD players, fiber optics, and other applications. In 1964, Kroemer became the first researcher to publish an explanation of the Gunn Effect, a high-frequency oscillation of electrical current flowing through certain semiconducting solids. The effect, first observed by J.B. Gunn in the early 1960s, produces short radio waves called microwaves. Kroemer taught electrical engineering at the University of Colorado, Boulder, from 1968 to 1976 before joining UCSB, where he led the university’s semiconductor research program. With his colleague Charles Kittel, Kroemer co-authored the 1980 textbook Thermal Physics. He also wrote Quantum Mechanics for Engineering, Materials Science, and Applied Physics, published in 1994. He was a Fellow of the American Physics Society and a foreign associate of the U.S. National Academy of Engineering. Born and educated in Germany, Kroemer received a bachelor’s degree from the University of Jena, and master’s and doctoral degrees from the University of Göttingen, all in physics. Vladimir G. “Walt” Gelnovatch Past president of the IEEE Microwave Theory and Technology Society Life Fellow, 86; died 1 March Gelnovatch served as 1989 president of the IEEE Microwave Theory and Technology Society (formerly the IEEE Microwave Theory and Techniques Society). He was an electrical engineer for nearly 40 years at the Signal Corps Laboratories, in Fort Monmouth, N.J. Gelnovatch served in the U.S. Army from 1956 to 1959. While stationed in Germany, he helped develop a long-line microwave radiotelephone network, a military telecommunications network that spanned most of Western Europe. As an undergraduate student at Monmouth University, in West Long Branch, N.J., he founded the school’s first student chapter of the Institute of Radio Engineers, an IEEE predecessor society. After graduating with a bachelor’s degree in electronics engineering, Gelnovatch earned a master’s degree in electrical engineering in 1967 from New York University, in New York City. Following a brief stint as a professor of electrical engineering at the University of Virginia, in Charlottesville, Gelnovatch joined the Signal Corps Engineering Laboratory (SCEL) as a research engineer. His initial work focused on developing CAD programs to help researchers design microwave circuits and communications networks. He then shifted his focus to developing mission electronics. Over the next four years, he studied vacuum technology, germanium, silicon, and semiconductors. He also spearheaded the U.S. Army’s research on monolithic microwave-integrated circuits. The integrated circuit devices operate at microwave frequencies and typically perform functions such as power amplification, low-noise amplification, and high-frequency switching. Gelnovatch retired in 1997 as director of the U.S. Army Electron Devices and Technology Laboratory, the successor to SCEL. During his career, Gelnovatch published 50 research papers and was granted eight U.S. patents. He also served as associate editor and contributor to the Microwave Journal for more than 20 years. Gelnovatch received the 1997 IEEE MTT-S Distinguished Service Award. The U.S. Army also honored him in 1990 with its highest civilian award—the Exceptional Service Award. Adolf Goetzberger Solar energy pioneer Life Fellow, 94; died 24 February Goetzberger founded the Fraunhofer Institute for Solar Energy Systems (ISE), a solar energy R&D company in Freiburg, Germany. He is known for pioneering the concept of agrivoltaics—the dual use of land for solar energy production and agriculture. After earning a Ph.D. in physics in 1955 from the University of Munich, Goetzberger moved to the United States. He joined Shockley Semiconductor Laboratory in Palo Alto, Calif., in 1956 as a researcher. The semiconductor manufacturer was founded by Nobel laureate William Shockley. Goetzberger later left Shockley to join Bell Labs, in Murray Hill, N.J. He moved back to Germany in 1968 and was appointed director of the Fraunhofer Institute for Applied Solid-State Physics, in Breisgau. There, he founded a solar energy working group and pushed for an independent institute dedicated to the field, which became ISE in 1981. In 1983, Goetzberger became the first German national to receive the J.J. Ebers Award from the IEEE Electron Devices Society. It honored him for developing a silicon field-effect transistor. Goetzberger also received the 1997 IEEE William R. Cherry Award, the 1989 Medal of the Merit of the State of Baden-Württemberg, and the 1992 Order of Merit First Class of the Federal Republic of Germany. Michael Barnoski Fiber optics pioneer Life senior member, 83; died 23 February Barnoski founded two optics companies and codeveloped the optical time domain reflectometer, a device that detects breaks in fiber optic cables. After receiving a bachelor’s degree in electrical engineering from the University of Dayton, in Ohio, Barnoski joined Honeywell in Boston. After 10 years at the company, he left to work at Hughes Research Laboratories, in Malibu, Calif. For a decade, he led all fiber optics–related activities for Hughes Aircraft and managed a global team of scientists, engineers, and technicians. In 1976, Barnoski collaborated with Corning Glass Works, a materials science company in New York, to develop the optical time domain reflectometer. Three years later, Theodore Mainman, inventor of the laser, recruited Barnoski to join TRW, an electronics company in Euclid, Ohio. In 1980, Barnoski founded PlessCor Optronics laboratory, an integrated electrical-optical interface supplier, in Chatsworth, Calif. He served as president and CEO until 1990, when he left and began consulting. In 2002, Barnoski founded Nanoprecision Products Inc., a company that specialized in ultraprecision 3D stamping, in El Segundo, Calif. In addition to his work in the private sector, Barnoski taught summer courses at the University of California, Santa Barbara, for 20 years. He also wrote and edited three books on the fundamentals of optical fiber communications. He retired in 2018. For his contributions to fiber optics, he received the 1988 John Tyndall Award, jointly presented by the IEEE Photonics Society and the Optical Society of America. Barnoski also earned a master’s degree in microwave electronics and a Ph.D. in electrical engineering and applied physics, both from Cornell. Kanaiyalal R. Shah Founder of Shah and Associates Senior member, 84; died 6 December Shah was founder and president of Shah and Associates (S&A), an electrical systems consulting firm, in Gaithersburg, Md. Shah received a bachelor’s degree in electrical engineering in 1961 from the Baroda College (now the Maharaja Sayajirao University of Baroda), in India. After earning a master’s degree in electrical machines in 1963 from Gujarat University, in India, Shah emigrated to the United States. Two years later, he received a master’s degree in electrical engineering from the University of Missouri in Rolla. In 1967, he moved to Virginia and joined the Virginia Military Institute’s electrical engineering faculty, in Lexington. He left to move to Missouri, earning a Ph.D. in EE from the University of Missouri in Columbia, in 1969. He then moved back to Virginia and taught electrical engineering for two years at Virginia Tech. From 1971 to 1973, Shah worked as a research engineer at Hughes Research Laboratories, in Malibu, Calif. He left to manage R&D at engineering services company Gilbert/Commonwealth International, in Jackson, Mich. Around this time, Shah founded S&A, where he designed safe and efficient electrical systems. He developed novel approaches to ensuring safety in electrical power transmission and distribution, including patenting a UV lighting power system. He also served as an expert witness in electrical safety injury lawsuits. He later returned to academia, lecturing at George Washington University and Ohio State University. Shah also wrote a series of short courses on power engineering. In 2005, he funded the construction and running of the Dr. K.R. Shah Higher Secondary School and the Smt. D.K. Shah Primary School in his hometown of Bhaner, Gujarat, in India. John Brooks Slaughter First African American director of the National Science Foundation Life Fellow, 89; died 6 December Slaughter, former director of the NSF in the early 1980s, was a passionate advocate for providing opportunities for underrepresented minorities and women in the science, technology, engineering, and mathematics fields. Later in his career, he was a distinguished professor of engineering and education at the University of Southern California Viterbi School of Engineering, in Los Angeles. He helped found the school’s Center for Engineering Diversity, which was renamed the John Brooks Slaughter Center for Engineering Diversity in 2023, as a tribute to his efforts. After earning a bachelor’s degree in engineering in 1956 from Kansas State University, in Manhattan, Slaughter developed military aircraft at General Dynamics’ Convair division in San Diego. From there, he moved on to the information systems technology department in the U.S. Navy Electronics Laboratory, also located in the city. He earned a master’s degree in engineering in 1961 from the University of California, Los Angeles. Slaughter earned his Ph.D. from the University of California, San Diego, in 1971 and was promoted to director of the Navy Electronics Laboratory on the same day he defended his dissertation, according to The Institute. In 1975, he left the organization to become director of the Applied Physics Laboratory at the University of Washington, in Seattle. Two years later, Slaughter was appointed assistant director in charge of the NSF’s Astronomical, Atmospheric, Earth and Ocean Sciences Division (now called the Division of Atmospheric and Geospace Sciences), in Washington, D.C. In 1979, he accepted the position of academic vice president and provost of Washington State University, in Pullman. The following year, he was appointed director of the NSF by U.S. President Jimmy Carter’s administration. Under Slaughter’s leadership, the organization bolstered funding for science programs at historically Black colleges and universities, including Howard University, in Washington, D.C. While Harvard, Stanford, and CalTech traditionally received preference from the NSF for funding new facilities and equipment, Slaughter encouraged less prestigious universities to apply and compete for those grants. He resigned just two years after accepting the post because he could not publicly support President Ronald Reagan’s initiatives to eradicate funding for science education, he told The Institute in a 2023 interview. In 1981, Slaughter was appointed chancellor of the University of Maryland, in College Park. He left in 1988 to become president of Occidental College, in Los Angeles, where he helped transform the school into one of the country’s most diverse liberal arts colleges. In 2000, Slaughter became CEO and president of the National Action Council for Minorities in Engineering, the largest provider of college scholarships for underrepresented minorities pursuing degrees at engineering schools, in Alexandria, Va. Slaughter left the council in 2010 and joined USC. He taught courses on leadership, diversity, and technological literacy at Rossier Graduate School of Education until retiring in 2022. Slaughter received the 2002 IEEE Founders Medal for “leadership and administration significantly advancing inclusion and racial diversity in the engineering profession across government, academic, and nonprofit organizations.” Don Bramlett Former IEEE Region 4 Director Life senior member, 73; died 2 December Bramlett served as 2009–2010 director of IEEE Region 4. He was an active volunteer with the IEEE Southeastern Michigan Section. He worked as a senior project manager for 35 years at DTE Energy, an energy services company, in Detroit. Bramlett was also active in the Boy Scouts of America (which will be known as Scouting America beginning in 2025). He served as leader of his local troop and was a council member. The Boy Scouts honored him with a Silver Beaver award recognizing his “exceptional character and distinguished service.” Bramlett earned a bachelor’s degree in electrical engineering from the University of Detroit Mercy.

  • Will Scaling Solve Robotics?
    by Nishanth J. Kumar on 28. May 2024. at 10:00

    This post was originally published on the author’s personal blog. Last year’s Conference on Robot Learning (CoRL) was the biggest CoRL yet, with over 900 attendees, 11 workshops, and almost 200 accepted papers. While there were a lot of cool new ideas (see this great set of notes for an overview of technical content), one particular debate seemed to be front and center: Is training a large neural network on a very large dataset a feasible way to solve robotics?1 Of course, some version of this question has been on researchers’ minds for a few years now. However, in the aftermath of the unprecedented success of ChatGPT and other large-scale “foundation models” on tasks that were thought to be unsolvable just a few years ago, the question was especially topical at this year’s CoRL. Developing a general-purpose robot, one that can competently and robustly execute a wide variety of tasks of interest in any home or office environment that humans can, has been perhaps the holy grail of robotics since the inception of the field. And given the recent progress of foundation models, it seems possible that scaling existing network architectures by training them on very large datasets might actually be the key to that grail. Given how timely and significant this debate seems to be, I thought it might be useful to write a post centered around it. My main goal here is to try to present the different sides of the argument as I heard them, without bias towards any side. Almost all the content is taken directly from talks I attended or conversations I had with fellow attendees. My hope is that this serves to deepen people’s understanding around the debate, and maybe even inspire future research ideas and directions. I want to start by presenting the main arguments I heard in favor of scaling as a solution to robotics. Why Scaling Might Work It worked for Computer Vision (CV) and Natural Language Processing (NLP), so why not robotics? This was perhaps the most common argument I heard, and the one that seemed to excite most people given recent models like GPT4-V and SAM. The point here is that training a large model on an extremely large corpus of data has recently led to astounding progress on problems thought to be intractable just 3 to 4 years ago. Moreover, doing this has led to a number of emergent capabilities, where trained models are able to perform well at a number of tasks they weren’t explicitly trained for. Importantly, the fundamental method here of training a large model on a very large amount of data is general and not somehow unique to CV or NLP. Thus, there seems to be no reason why we shouldn’t observe the same incredible performance on robotics tasks. We’re already starting to see some evidence that this might work well: Chelsea Finn, Vincent Vanhoucke, and several others pointed to the recent RT-X and RT-2 papers from Google DeepMind as evidence that training a single model on large amounts of robotics data yields promising generalization capabilities. Russ Tedrake of Toyota Research Institute (TRI) and MIT pointed to the recent Diffusion Policies paper as showing a similar surprising capability. Sergey Levine of UC Berkeley highlighted recent efforts and successes from his group in building and deploying a robot-agnostic foundation model for navigation. All of these works are somewhat preliminary in that they train a relatively small model with a paltry amount of data compared to something like GPT4-V, but they certainly do seem to point to the fact that scaling up these models and datasets could yield impressive results in robotics. Progress in data, compute, and foundation models are waves that we should ride: This argument is closely related to the above one, but distinct enough that I think it deserves to be discussed separately. The main idea here comes from Rich Sutton’s influential essay: The history of AI research has shown that relatively simple algorithms that scale well with data always outperform more complex/clever algorithms that do not. A nice analogy from Karol Hausman’s early career keynote is that improvements to data and compute are like a wave that is bound to happen given the progress and adoption of technology. Whether we like it or not, there will be more data and better compute. As AI researchers, we can either choose to ride this wave, or we can ignore it. Riding this wave means recognizing all the progress that’s happened because of large data and large models, and then developing algorithms, tools, datasets, etc. to take advantage of this progress. It also means leveraging large pre-trained models from vision and language that currently exist or will exist for robotics tasks. Robotics tasks of interest lie on a relatively simple manifold, and training a large model will help us find it: This was something rather interesting that Russ Tedrake pointed out during a debate in the workshop on robustly deploying learning-based solutions. The manifold hypothesis as applied to robotics roughly states that, while the space of possible tasks we could conceive of having a robot do is impossibly large and complex, the tasks that actually occur practically in our world lie on some much lower-dimensional and simpler manifold of this space. By training a single model on large amounts of data, we might be able to discover this manifold. If we believe that such a manifold exists for robotics—which certainly seems intuitive—then this line of thinking would suggest that robotics is not somehow different from CV or NLP in any fundamental way. The same recipe that worked for CV and NLP should be able to discover the manifold for robotics and yield a shockingly competent generalist robot. Even if this doesn’t exactly happen, Tedrake points out that attempting to train a large model for general robotics tasks could teach us important things about the manifold of robotics tasks, and perhaps we can leverage this understanding to solve robotics. Large models are the best approach we have to get at “commonsense” capabilities, which pervade all of robotics: Another thing Russ Tedrake pointed out is that “common sense” pervades almost every robotics task of interest. Consider the task of having a mobile manipulation robot place a mug onto a table. Even if we ignore the challenging problems of finding and localizing the mug, there are a surprising number of subtleties to this problem. What if the table is cluttered and the robot has to move other objects out of the way? What if the mug accidentally falls on the floor and the robot has to pick it up again, re-orient it, and place it on the table? And what if the mug has something in it, so it’s important it’s never overturned? These “edge cases” are actually much more common that it might seem, and often are the difference between success and failure for a task. Moreover, these seem to require some sort of ‘common sense’ reasoning to deal with. Several people argued that large models trained on a large amount of data are the best way we know of to yield some aspects of this ‘common sense’ capability. Thus, they might be the best way we know of to solve general robotics tasks. As you might imagine, there were a number of arguments against scaling as a practical solution to robotics. Interestingly, almost no one directly disputes that this approach could work in theory. Instead, most arguments fall into one of two buckets: (1) arguing that this approach is simply impractical, and (2) arguing that even if it does kind of work, it won’t really “solve” robotics. Why Scaling Might Not Work It’s impractical We currently just don’t have much robotics data, and there’s no clear way we’ll get it: This is the elephant in pretty much every large-scale robot learning room. The Internet is chock-full of data for CV and NLP, but not at all for robotics. Recent efforts to collect very large datasets have required tremendous amounts of time, money, and cooperation, yet have yielded a very small fraction of the amount of vision and text data on the Internet. CV and NLP got so much data because they had an incredible “data flywheel”: tens of millions of people connecting to and using the Internet. Unfortunately for robotics, there seems to be no reason why people would upload a bunch of sensory input and corresponding action pairs. Collecting a very large robotics dataset seems quite hard, and given that we know that a lot of important “emergent” properties only showed up in vision and language models at scale, the inability to get a large dataset could render this scaling approach hopeless. Robots have different embodiments: Another challenge with collecting a very large robotics dataset is that robots come in a large variety of different shapes, sizes, and form factors. The output control actions that are sent to a Boston Dynamics Spot robot are very different to those sent to a KUKA iiwa arm. Even if we ignore the problem of finding some kind of common output space for a large trained model, the variety in robot embodiments means we’ll probably have to collect data from each robot type, and that makes the above data-collection problem even harder. There is extremely large variance in the environments we want robots to operate in: For a robot to really be “general purpose,” it must be able to operate in any practical environment a human might want to put it in. This means operating in any possible home, factory, or office building it might find itself in. Collecting a dataset that has even just one example of every possible building seems impractical. Of course, the hope is that we would only need to collect data in a small fraction of these, and the rest will be handled by generalization. However, we don’t know how much data will be required for this generalization capability to kick in, and it very well could also be impractically large. Training a model on such a large robotics dataset might be too expensive/energy-intensive: It’s no secret that training large foundation models is expensive, both in terms of money and in energy consumption. GPT-4V—OpenAI’s biggest foundation model at the time of this writing—reportedly cost over US $100 million and 50 million KWh of electricity to train. This is well beyond the budget and resources that any academic lab can currently spare, so a larger robotics foundation model would need to be trained by a company or a government of some kind. Additionally, depending on how large both the dataset and model itself for such an endeavor are, the costs may balloon by another order-of-magnitude or more, which might make it completely infeasible. Even if it works as well as in CV/NLP, it won’t solve robotics The 99.X problem and long tails: Vincent Vanhoucke of Google Robotics started a talk with a provocative assertion: Most—if not all—robot learning approaches cannot be deployed for any practical task. The reason? Real-world industrial and home applications typically require 99.X percent or higher accuracy and reliability. What exactly that means varies by application, but it’s safe to say that robot learning algorithms aren’t there yet. Most results presented in academic papers top out at 80 percent success rate. While that might seem quite close to the 99.X percent threshold, people trying to actually deploy these algorithms have found that it isn’t so: getting higher success rates requires asymptotically more effort as we get closer to 100 percent. That means going from 85 to 90 percent might require just as much—if not more—effort than going from 40 to 80 percent. Vincent asserted in his talk that getting up to 99.X percent is a fundamentally different beast than getting even up to 80 percent, one that might require a whole host of new techniques beyond just scaling. Existing big models don’t get to 99.X percent even in CV and NLP: As impressive and capable as current large models like GPT-4V and DETIC are, even they don’t achieve 99.X percent or higher success rate on previously-unseen tasks. Current robotics models are very far from this level of performance, and I think it’s safe to say that the entire robot learning community would be thrilled to have a general model that does as well on robotics tasks as GPT-4V does on NLP tasks. However, even if we had something like this, it wouldn’t be at 99.X percent, and it’s not clear that it’s possible to get there by scaling either. Self-driving car companies have tried this approach, and it doesn’t fully work (yet): This is closely related to the above point, but important and subtle enough that I think it deserves to stand on its own. A number of self-driving car companies—most notably Tesla and Wayve—have tried training such an end-to-end big model on large amounts of data to achieve Level 5 autonomy. Not only do these companies have the engineering resources and money to train such models, but they also have the data. Tesla in particular has a fleet of over 100,000 cars deployed in the real world that it is constantly collecting and then annotating data from. These cars are being teleoperated by experts, making the data ideal for large-scale supervised learning. And despite all this, Tesla has so far been unable to produce a Level 5 autonomous driving system. That’s not to say their approach doesn’t work at all. It competently handles a large number of situations—especially highway driving—and serves as a useful Level 2 (i.e., driver assist) system. However, it’s far from 99.X percent performance. Moreover, data seems to suggest that Tesla’s approach is faring far worse than Waymo or Cruise, which both use much more modular systems. While it isn’t inconceivable that Tesla’s approach could end up catching up and surpassing its competitors performance in a year or so, the fact that it hasn’t worked yet should serve as evidence perhaps that the 99.X percent problem is hard to overcome for a large-scale ML approach. Moreover, given that self-driving is a special case of general robotics, Tesla’s case should give us reason to doubt the large-scale model approach as a full solution to robotics, especially in the medium term. Many robotics tasks of interest are quite long-horizon: Accomplishing any task requires taking a number of correct actions in sequence. Consider the relatively simple problem of making a cup of tea given an electric kettle, water, a box of tea bags, and a mug. Success requires pouring the water into the kettle, turning it on, then pouring the hot water into the mug, and placing a tea-bag inside it. If we want to solve this with a model trained to output motor torque commands given pixels as input, we’ll need to send torque commands to all 7 motors at around 40 Hz. Let’s suppose that this tea-making task requires 5 minutes. That requires 7 * 40 * 60 * 5 = 84,000 correct torque commands. This is all just for a stationary robot arm; things get much more complicated if the robot is mobile, or has more than one arm. It is well-known that error tends to compound with longer-horizons for most tasks. This is one reason why—despite their ability to produce long sequences of text—even LLMs cannot yet produce completely coherent novels or long stories: small deviations from a true prediction over time tend to add up and yield extremely large deviations over long-horizons. Given that most, if not all robotics tasks of interest require sending at least thousands, if not hundreds of thousands, of torques in just the right order, even a fairly well-performing model might really struggle to fully solve these robotics tasks. Okay, now that we’ve sketched out all the main points on both sides of the debate, I want to spend some time diving into a few related points. Many of these are responses to the above points on the ‘against’ side, and some of them are proposals for directions to explore to help overcome the issues raised. Miscellaneous Related Arguments We can probably deploy learning-based approaches robustly One point that gets brought up a lot against learning-based approaches is the lack of theoretical guarantees. At the time of this writing, we know very little about neural network theory: we don’t really know why they learn well, and more importantly, we don’t have any guarantees on what values they will output in different situations. On the other hand, most classical control and planning approaches that are widely used in robotics have various theoretical guarantees built-in. These are generally quite useful when certifying that systems are safe. However, there seemed to be general consensus amongst a number of CoRL speakers that this point is perhaps given more significance than it should. Sergey Levine pointed out that most of the guarantees from controls aren’t really that useful for a number of real-world tasks we’re interested in. As he put it: “self-driving car companies aren’t worried about controlling the car to drive in a straight line, but rather about a situation in which someone paints a sky onto the back of a truck and drives in front of the car,” thereby confusing the perception system. Moreover, Scott Kuindersma of Boston Dynamics talked about how they’re deploying RL-based controllers on their robots in production, and are able to get the confidence and guarantees they need via rigorous simulation and real-world testing. Overall, I got the sense that while people feel that guarantees are important, and encouraged researchers to keep trying to study them, they don’t think that the lack of guarantees for learning-based systems means that they cannot be deployed robustly. What if we strive to deploy Human-in-the-Loop systems? In one of the organized debates, Emo Todorov pointed out that existing successful ML systems, like Codex and ChatGPT, work well only because a human interacts with and sanitizes their output. Consider the case of coding with Codex: it isn’t intended to directly produce runnable, bug-free code, but rather to act as an intelligent autocomplete for programmers, thereby making the overall human-machine team more productive than either alone. In this way, these models don’t have to achieve the 99.X percent performance threshold, because a human can help correct any issues during deployment. As Emo put it: “humans are forgiving, physics is not.” Chelsea Finn responded to this by largely agreeing with Emo. She strongly agreed that all successfully-deployed and useful ML systems have humans in the loop, and so this is likely the setting that deployed robot learning systems will need to operate in as well. Of course, having a human operate in the loop with a robot isn’t as straightforward as in other domains, since having a human and robot inhabit the same space introduces potential safety hazards. However, it’s a useful setting to think about, especially if it can help address issues brought on by the 99.X percent problem. Maybe we don’t need to collect that much real-world data for scaling A number of people at the conference were thinking about creative ways to overcome the real-world data bottleneck without actually collecting more real world data. Quite a few of these people argued that fast, realistic simulators could be vital here, and there were a number of works that explored creative ways to train robot policies in simulation and then transfer them to the real world. Another set of people argued that we can leverage existing vision, language, and video data and then just ‘sprinkle in’ some robotics data. Google’s recent RT-2 model showed how taking a large model trained on internet scale vision and language data, and then just fine-tuning it on a much smaller set robotics data can produce impressive performance on robotics tasks. Perhaps through a combination of simulation and pretraining on general vision and language data, we won’t actually have to collect too much real-world robotics data to get scaling to work well for robotics tasks. Maybe combining classical and learning-based approaches can give us the best of both worlds As with any debate, there were quite a few people advocating the middle path. Scott Kuindersma of Boston Dynamics titled one of his talks “Let’s all just be friends: model-based control helps learning (and vice versa)”. Throughout his talk, and the subsequent debates, his strong belief that in the short to medium term, the best path towards reliable real-world systems involves combining learning with classical approaches. In her keynote speech for the conference, Andrea Thomaz talked about how such a hybrid system—using learning for perception and a few skills, and classical SLAM and path-planning for the rest—is what powers a real-world robot that’s deployed in tens of hospital systems in Texas (and growing!). Several papers explored how classical controls and planning, together with learning-based approaches can enable much more capability than any system on its own. Overall, most people seemed to argue that this ‘middle path’ is extremely promising, especially in the short to medium term, but perhaps in the long-term either pure learning or an entirely different set of approaches might be best. What Can/Should We Take Away From All This? If you’ve read this far, chances are that you’re interested in some set of takeaways/conclusions. Perhaps you’re thinking “this is all very interesting, but what does all this mean for what we as a community should do? What research problems should I try to tackle?” Fortunately for you, there seemed to be a number of interesting suggestions that had some consensus on this. We should pursue the direction of trying to just scale up learning with very large datasets Despite the various arguments against scaling solving robotics outright, most people seem to agree that scaling in robot learning is a promising direction to be investigated. Even if it doesn’t fully solve robotics, it could lead to a significant amount of progress on a number of hard problems we’ve been stuck on for a while. Additionally, as Russ Tedrake pointed out, pursuing this direction carefully could yield useful insights about the general robotics problem, as well as current learning algorithms and why they work so well. We should also pursue other existing directions Even the most vocal proponents of the scaling approach were clear that they don’t think everyone should be working on this. It’s likely a bad idea for the entire robot learning community to put its eggs in the same basket, especially given all the reasons to believe scaling won’t fully solve robotics. Classical robotics techniques have gotten us quite far, and led to many successful and reliable deployments: pushing forward on them or integrating them with learning techniques might be the right way forward, especially in the short to medium terms. We should focus more on real-world mobile manipulation and easy-to-use systems Vincent Vanhoucke made an observation that most papers at CoRL this year were limited to tabletop manipulation settings. While there are plenty of hard tabletop problems, things generally get a lot more complicated when the robot—and consequently its camera view—moves. Vincent speculated that it’s easy for the community to fall into a local minimum where we make a lot of progress that’s specific to the tabletop setting and therefore not generalizable. A similar thing could happen if we work predominantly in simulation. Avoiding these local minima by working on real-world mobile manipulation seems like a good idea. Separately, Sergey Levine observed that a big reason why LLM’s have seen so much excitement and adoption is because they’re extremely easy to use: especially by non-experts. One doesn’t have to know about the details of training an LLM, or perform any tough setup, to prompt and use these models for their own tasks. Most robot learning approaches are currently far from this. They often require significant knowledge of their inner workings to use, and involve very significant amounts of setup. Perhaps thinking more about how to make robot learning systems easier to use and widely applicable could help improve adoption and potentially scalability of these approaches. We should be more forthright about things that don’t work There seemed to be a broadly-held complaint that many robot learning approaches don’t adequately report negative results, and this leads to a lot of unnecessary repeated effort. Additionally, perhaps patterns might emerge from consistent failures of things that we expect to work but don’t actually work well, and this could yield novel insight into learning algorithms. There is currently no good incentive for researchers to report such negative results in papers, but most people seemed to be in favor of designing one. We should try to do something totally new There were a few people who pointed out that all current approaches—be they learning-based or classical—are unsatisfying in a number of ways. There seem to be a number of drawbacks with each of them, and it’s very conceivable that there is a completely different set of approaches that ultimately solves robotics. Given this, it seems useful to try think outside the box. After all, every one of the current approaches that’s part of the debate was only made possible because the few researchers that introduced them dared to think against the popular grain of their times. Acknowledgements: Huge thanks to Tom Silver and Leslie Kaelbling for providing helpful comments, suggestions, and encouragement on a previous draft of this post. — 1 In fact, this was the topic of a popular debate hosted at a workshop on the first day; many of the points in this post were inspired by the conversation during that debate.

  • Do We Dare Use Generative AI for Mental Health?
    by Aaron Pavez on 26. May 2024. at 15:00

    The mental-health app Woebot launched in 2017, back when “chatbot” wasn’t a familiar term and someone seeking a therapist could only imagine talking to a human being. Woebot was something exciting and new: a way for people to get on-demand mental-health support in the form of a responsive, empathic, AI-powered chatbot. Users found that the friendly robot avatar checked in on them every day, kept track of their progress, and was always available to talk something through. Today, the situation is vastly different. Demand for mental-health services has surged while the supply of clinicians has stagnated. There are thousands of apps that offer automated support for mental health and wellness. And ChatGPT has helped millions of people experiment with conversational AI. But even as the world has become fascinated with generative AI, people have also seen its downsides. As a company that relies on conversation, Woebot Health had to decide whether generative AI could make Woebot a better tool, or whether the technology was too dangerous to incorporate into our product. Woebot is designed to have structured conversations through which it delivers evidence-based tools inspired by cognitive behavioral therapy (CBT), a technique that aims to change behaviors and feelings. Throughout its history, Woebot Health has used technology from a subdiscipline of AI known as natural-language processing (NLP). The company has used AI artfully and by design—Woebot uses NLP only in the service of better understanding a user’s written texts so it can respond in the most appropriate way, thus encouraging users to engage more deeply with the process. Woebot, which is currently available in the United States, is not a generative-AI chatbot like ChatGPT. The differences are clear in both the bot’s content and structure. Everything Woebot says has been written by conversational designers trained in evidence-based approaches who collaborate with clinical experts; ChatGPT generates all sorts of unpredictable statements, some of which are untrue. Woebot relies on a rules-based engine that resembles a decision tree of possible conversational paths; ChatGPT uses statistics to determine what its next words should be, given what has come before. With ChatGPT, conversations about mental health ended quickly and did not allow a user to engage in the psychological processes of change. The rules-based approach has served us well, protecting Woebot’s users from the types of chaotic conversations we observed from early generative chatbots. Prior to ChatGPT, open-ended conversations with generative chatbots were unsatisfying and easily derailed. One famous example is Microsoft’s Tay, a chatbot that was meant to appeal to millennials but turned lewd and racist in less than 24 hours. But with the advent of ChatGPT in late 2022, we had to ask ourselves: Could the new large language models (LLMs) powering chatbots like ChatGPT help our company achieve its vision? Suddenly, hundreds of millions of users were having natural-sounding conversations with ChatGPT about anything and everything, including their emotions and mental health. Could this new breed of LLMs provide a viable generative-AI alternative to the rules-based approach Woebot has always used? The AI team at Woebot Health, including the authors of this article, were asked to find out. Woebot, a mental-health chatbot, deploys concepts from cognitive behavioral therapy to help users. This demo shows how users interact with Woebot using a combination of multiple-choice responses and free-written text. The Origin and Design of Woebot Woebot got its start when the clinical research psychologist Alison Darcy, with support from the AI pioneer Andrew Ng, led the build of a prototype intended as an emotional support tool for young people. Darcy and another member of the founding team, Pierre Rappolt, took inspiration from video games as they looked for ways for the tool to deliver elements of CBT. Many of their prototypes contained interactive fiction elements, which then led Darcy to the chatbot paradigm. The first version of the chatbot was studied in a randomized control trial that offered mental-health support to college students. Based on the results, Darcy raised US $8 million from New Enterprise Associates and Andrew Ng’s AI Fund. The Woebot app is intended to be an adjunct to human support, not a replacement for it. It was built according to a set of principles that we call Woebot’s core beliefs, which were shared on the day it launched. These tenets express a strong faith in humanity and in each person’s ability to change, choose, and grow. The app does not diagnose, it does not give medical advice, and it does not force its users into conversations. Instead, the app follows a Buddhist principle that’s prevalent in CBT of “sitting with open hands”—it extends invitations that the user can choose to accept, and it encourages process over results. Woebot facilitates a user’s growth by asking the right questions at optimal moments, and by engaging in a type of interactive self-help that can happen anywhere, anytime. A Convenient Companion Users interact with Woebot either by choosing prewritten responses or by typing in whatever text they’d like, which Woebot parses using AI techniques. Woebot deploys concepts from cognitive behavioral therapy to help users change their thought patterns. Here, it first asks a user to write down negative thoughts, then explains the cognitive distortions at work. Finally, Woebot invites the user to recast a negative statement in a positive way. (Not all exchanges are shown.) These core beliefs strongly influenced both Woebot’s engineering architecture and its product-development process. Careful conversational design is crucial for ensuring that interactions conform to our principles. Test runs through a conversation are read aloud in “table reads,” and then revised to better express the core beliefs and flow more naturally. The user side of the conversation is a mix of multiple-choice responses and “free text,” or places where users can write whatever they wish. Building an app that supports human health is a high-stakes endeavor, and we’ve taken extra care to adopt the best software-development practices. From the start, enabling content creators and clinicians to collaborate on product development required custom tools. An initial system using Google Sheets quickly became unscalable, and the engineering team replaced it with a proprietary Web-based “conversational management system” written in the JavaScript library React. Within the system, members of the writing team can create content, play back that content in a preview mode, define routes between content modules, and find places for users to enter free text, which our AI system then parses. The result is a large rules-based tree of branching conversational routes, all organized within modules such as “social skills training” and “challenging thoughts.” These modules are translated from psychological mechanisms within CBT and other evidence-based techniques. How Woebot Uses AI While everything Woebot says is written by humans, NLP techniques are used to help understand the feelings and problems users are facing; then Woebot can offer the most appropriate modules from its deep bank of content. When users enter free text about their thoughts and feelings, we use NLP to parse these text inputs and route the user to the best response. In Woebot’s early days, the engineering team used regular expressions, or “regexes,” to understand the intent behind these text inputs. Regexes are a text-processing method that relies on pattern matching within sequences of characters. Woebot’s regexes were quite complicated in some cases, and were used for everything from parsing simple yes/no responses to learning a user’s preferred nickname. Later in Woebot’s development, the AI team replaced regexes with classifiers trained with supervised learning. The process for creating AI classifiers that comply with regulatory standards was involved—each classifier required months of effort. Typically, a team of internal-data labelers and content creators reviewed examples of user messages (with all personally identifiable information stripped out) taken from a specific point in the conversation. Once the data was placed into categories and labeled, classifiers were trained that could take new input text and place it into one of the existing categories. This process was repeated many times, with the classifier repeatedly evaluated against a test dataset until its performance satisfied us. As a final step, the conversational-management system was updated to “call” these AI classifiers (essentially activating them) and then to route the user to the most appropriate content. For example, if a user wrote that he was feeling angry because he got in a fight with his mom, the system would classify this response as a relationship problem. The technology behind these classifiers is constantly evolving. In the early days, the team used an open-source library for text classification called fastText, sometimes in combination with regular expressions. As AI continued to advance and new models became available, the team was able to train new models on the same labeled data for improvements in both accuracy and recall. For example, when the early transformer model BERT was released in October 2018, the team rigorously evaluated its performance against the fastText version. BERT was superior in both precision and recall for our use cases, and so the team replaced all fastText classifiers with BERT and launched the new models in January 2019. We immediately saw improvements in classification accuracy across the models. Woebot and Large Language Models When ChatGPT was released in November 2022, Woebot was more than 5 years old. The AI team faced the question of whether LLMs like ChatGPT could be used to meet Woebot’s design goals and enhance users’ experiences, putting them on a path to better mental health. We were excited by the possibilities, because ChatGPT could carry on fluid and complex conversations about millions of topics, far more than we could ever include in a decision tree. However, we had also heard about troubling examples of chatbots providing responses that were decidedly not supportive, including advice on how to maintain and hide an eating disorder and guidance on methods of self-harm. In one tragic case in Belgium, a grieving widow accused a chatbot of being responsible for her husband’s suicide. The first thing we did was try out ChatGPT ourselves, and we quickly became experts in prompt engineering. For example, we prompted ChatGPT to be supportive and played the roles of different types of users to explore the system’s strengths and shortcomings. We described how we were feeling, explained some problems we were facing, and even explicitly asked for help with depression or anxiety. A few things stood out. First, ChatGPT quickly told us we needed to talk to someone else—a therapist or doctor. ChatGPT isn’t intended for medical use, so this default response was a sensible design decision by the chatbot’s makers. But it wasn’t very satisfying to constantly have our conversation aborted. Second, ChatGPT’s responses were often bulleted lists of encyclopedia-style answers. For example, it would list six actions that could be helpful for depression. We found that these lists of items told the user what to do but didn’t explain how to take these steps. Third, in general, the conversations ended quickly and did not allow a user to engage in the psychological processes of change. It was clear to our team that an off-the-shelf LLM would not deliver the psychological experiences we were after. LLMs are based on reward models that value the delivery of correct answers; they aren’t given incentives to guide a user through the process of discovering those results themselves. Instead of “sitting with open hands,” the models make assumptions about what the user is saying to deliver a response with the highest assigned reward. We had to decide whether generative AI could make Woebot a better tool, or whether the technology was too dangerous to incorporate into our product. To see if LLMs could be used within a mental-health context, we investigated ways of expanding our proprietary conversational-management system. We looked into frameworks and open-source techniques for managing prompts and prompt chains—sequences of prompts that ask an LLM to achieve a task through multiple subtasks. In January of 2023, a platform called LangChain was gaining in popularity and offered techniques for calling multiple LLMs and managing prompt chains. However, LangChain lacked some features that we knew we needed: It didn’t provide a visual user interface like our proprietary system, and it didn’t provide a way to safeguard the interactions with the LLM. We needed a way to protect Woebot users from the common pitfalls of LLMs, including hallucinations (where the LLM says things that are plausible but untrue) and simply straying off topic. Ultimately, we decided to expand our platform by implementing our own LLM prompt-execution engine, which gave us the ability to inject LLMs into certain parts of our existing rules-based system. The engine allows us to support concepts such as prompt chains while also providing integration with our existing conversational routing system and rules. As we developed the engine, we were fortunate to be invited into the beta programs of many new LLMs. Today, our prompt-execution engine can call more than a dozen different LLM models, including variously sized OpenAI models, Microsoft Azure versions of OpenAI models, Anthropic’s Claude, Google Bard (now Gemini), and open-source models running on the Amazon Bedrock platform, such as Meta’s Llama 2. We use this engine exclusively for exploratory research that’s been approved by an institutional review board, or IRB. It took us about three months to develop the infrastructure and tooling support for LLMs. Our platform allows us to package features into different products and experiments, which in turn lets us maintain control over software versions and manage our research efforts while ensuring that our commercially deployed products are unaffected. We’re not using LLMs in any of our products; the LLM-enabled features can be used only in a version of Woebot for exploratory studies. A Trial for an LLM-Augmented Woebot We had some false starts in our development process. We first tried creating an experimental chatbot that was almost entirely powered by generative AI; that is, the chatbot directly used the text responses from the LLM. But we ran into a couple of problems. The first issue was that the LLMs were eager to demonstrate how smart and helpful they are! This eagerness was not always a strength, as it interfered with the user’s own process. For example, the user might be doing a thought-challenging exercise, a common tool in CBT. If the user says, “I’m a bad mom,” a good next step in the exercise could be to ask if the user’s thought is an example of “labeling,” a cognitive distortion where we assign a negative label to ourselves or others. But LLMs were quick to skip ahead and demonstrate how to reframe this thought, saying something like “A kinder way to put this would be, ‘I don’t always make the best choices, but I love my child.’” CBT exercises like thought challenging are most helpful when the person does the work themselves, coming to their own conclusions and gradually changing their patterns of thinking. A second difficulty with LLMs was in style matching. While social media is rife with examples of LLMs responding in a Shakespearean sonnet or a poem in the style of Dr. Seuss, this format flexibility didn’t extend to Woebot’s style. Woebot has a warm tone that has been refined for years by conversational designers and clinical experts. But even with careful instructions and prompts that included examples of Woebot’s tone, LLMs produced responses that didn’t “sound like Woebot,” maybe because a touch of humor was missing, or because the language wasn’t simple and clear. The LLM-augmented Woebot was well-behaved, refusing to take inappropriate actions like diagnosing or offering medical advice. However, LLMs truly shone on an emotional level. When coaxing someone to talk about their joys or challenges, LLMs crafted personalized responses that made people feel understood. Without generative AI, it’s impossible to respond in a novel way to every different situation, and the conversation feels predictably “robotic.” We ultimately built an experimental chatbot that possessed a hybrid of generative AI and traditional NLP-based capabilities. In July 2023 we registered an IRB-approved clinical study to explore the potential of this LLM-Woebot hybrid, looking at satisfaction as well as exploratory outcomes like symptom changes and attitudes toward AI. We feel it’s important to study LLMs within controlled clinical studies due to their scientific rigor and safety protocols, such as adverse event monitoring. Our Build study included U.S. adults above the age of 18 who were fluent in English and who had neither a recent suicide attempt nor current suicidal ideation. The double-blind structure assigned one group of participants the LLM-augmented Woebot while a control group got the standard version; we then assessed user satisfaction after two weeks. We built technical safeguards into the experimental Woebot to ensure that it wouldn’t say anything to users that was distressing or counter to the process. The safeguards tackled the problem on multiple levels. First, we used what engineers consider “best in class” LLMs that are less likely to produce hallucinations or offensive language. Second, our architecture included different validation steps surrounding the LLM; for example, we ensured that Woebot wouldn’t give an LLM-generated response to an off-topic statement or a mention of suicidal ideation (in that case, Woebot provided the phone number for a hotline). Finally, we wrapped users’ statements in our own careful prompts to elicit appropriate responses from the LLM, which Woebot would then convey to users. These prompts included both direct instructions such as “don’t provide medical advice” as well as examples of appropriate responses in challenging situations. While this initial study was short—two weeks isn’t much time when it comes to psychotherapy—the results were encouraging. We found that users in the experimental and control groups expressed about equal satisfaction with Woebot, and both groups had fewer self-reported symptoms. What’s more, the LLM-augmented chatbot was well-behaved, refusing to take inappropriate actions like diagnosing or offering medical advice. It consistently responded appropriately when confronted with difficult topics like body image issues or substance use, with responses that provided empathy without endorsing maladaptive behaviors. With participant consent, we reviewed every transcript in its entirety and found no concerning LLM-generated utterances—no evidence that the LLM hallucinated or drifted off-topic in a problematic way. What’s more, users reported no device-related adverse events. This study was just the first step in our journey to explore what’s possible for future versions of Woebot, and its results have emboldened us to continue testing LLMs in carefully controlled studies. We know from our prior research that Woebot users feel a bond with our bot. We’re excited about LLMs’ potential to add more empathy and personalization, and we think it’s possible to avoid the sometimes-scary pitfalls related to unfettered LLM chatbots. We believe strongly that continued progress within the LLM research community will, over time, transform the way people interact with digital tools like Woebot. Our mission hasn’t changed: We’re committed to creating a world-class solution that helps people along their mental-health journeys. For anyone who wants to talk, we want the best possible version of Woebot to be there for them. This article appears in the June 2024 print issue. Disclaimers The Woebot Health Platform is the foundational development platform where components are used for multiple types of products in different stages of development and enforced under different regulatory guidances. Woebot for Mood & Anxiety (W-MA-00), Woebot for Mood & Anxiety (W-MA-01), and Build Study App (W-DISC-001) are investigational medical devices. They have not been evaluated, cleared, or approved by the FDA. Not for use outside an IRB-approved clinical trial.

  • How to EMP-Proof a Building
    by Emily Waltz on 25. May 2024. at 13:00

    This year, the sun will reach solar maximum, a period of peak magnetic activity that occurs approximately once every 11 years. That means more sunspots and more frequent intense solar storms. Here on Earth, these result in beautiful auroral activity, but also geomagnetic storms and the threat of electromagnetic pulses (EMPs), which can bring widespread damage to electronic equipment and communications systems. Yilu Liu Yilu Liu is a Governor’s Chair/Professor at the University of Tennessee, in Knoxville, and Oak Ridge National Laboratory. And the sun isn’t the only source of EMPs. Human-made EMP generators mounted on trucks or aircraft can be used as tactical weapons to knock out drones, satellites, and infrastructure. More seriously, a nuclear weapon detonated at a high altitude could, among its more catastrophic effects, generate a wide-ranging EMP blast. IEEE Spectrum spoke with Yilu Liu, who has been researching EMPs at Oak Ridge National Laboratory, in Tennessee, about the potential effects of the phenomenon on power grids and other electronics. What are the differences between various kinds of EMPs? Yilu Liu: A nuclear explosion at an altitude higher than 30 kilometers would generate an EMP with a much broader spectrum than one from a ground-level weapon or a geomagnetic storm, and it would arrive in three phases. First comes E1, a powerful pulse that brings very fast high-frequency waves. The second phase, E2, produces current similar to that of a lightning strike. The third phase, E3, brings a slow, varying waveform, kind of like direct current [DC], that can last several minutes. A ground-level electromagnetic weapon would probably be designed for emitting high-frequency waves similar to those produced by an E1. Solar magnetic disturbances produce a slow, varying waveform similar to that of E3. How do EMPs damage power grids and electronic equipment? Liu: Phase E1 induces current in conductors that travels to sensitive electronic circuits, destroying them or causing malfunctions. We don’t worry about E2 much because it’s like lightning, and grids are protected against that. Phase E3 and solar magnetic EMPs inject a foreign, DC-like current into transmission lines, which saturates transformers, causing a lot of high-frequency currents that have led to blackouts. How do you study the effects of an EMP without generating one? Liu: We measured the propagation into a building of low-level electromagnetic waves from broadcast radio. We wanted to know if physical structures, like buildings, could act as a filter, so we took measurements of radio signals both inside and outside a hydropower station and other buildings to figure out how much gets inside. Our computer models then amplified the measurements to simulate how an EMP would affect equipment. What did you learn about protecting buildings from damage by EMPs? Liu: When constructing buildings, definitely use rebar in your concrete. It’s very effective as a shield against electromagnetic waves. Large windows are entry points, so don’t put unshielded control circuits near them. And if there are cables coming into the building carrying power or communication, make sure they are well-shielded; otherwise, they will act like antennas. Have solar EMPs caused damage in the past? Liu: The most destructive recent occurrence was in Quebec in 1989, which resulted in a blackout. Once a transformer is saturated, the current flowing into the grid is no longer just 60 hertz but multiples of 60 Hz, and it trips the capacitors, and then the voltage collapses and the grid experiences an outage. The industry is better prepared now. But you never know if the next solar storm will surpass those of the past. This article appears in the June 2024 issues as “5 Questions for Yilu Liu.”

  • Andrew Ng: Unbiggen AI
    by Eliza Strickland on 9. February 2022. at 15:31

    Andrew Ng has serious street cred in artificial intelligence. He pioneered the use of graphics processing units (GPUs) to train deep learning models in the late 2000s with his students at Stanford University, cofounded Google Brain in 2011, and then served for three years as chief scientist for Baidu, where he helped build the Chinese tech giant’s AI group. So when he says he has identified the next big shift in artificial intelligence, people listen. And that’s what he told IEEE Spectrum in an exclusive Q&A. Ng’s current efforts are focused on his company Landing AI, which built a platform called LandingLens to help manufacturers improve visual inspection with computer vision. He has also become something of an evangelist for what he calls the data-centric AI movement, which he says can yield “small data” solutions to big issues in AI, including model efficiency, accuracy, and bias. Andrew Ng on... What’s next for really big models The career advice he didn’t listen to Defining the data-centric AI movement Synthetic data Why Landing AI asks its customers to do the work The great advances in deep learning over the past decade or so have been powered by ever-bigger models crunching ever-bigger amounts of data. Some people argue that that’s an unsustainable trajectory. Do you agree that it can’t go on that way? Andrew Ng: This is a big question. We’ve seen foundation models in NLP [natural language processing]. I’m excited about NLP models getting even bigger, and also about the potential of building foundation models in computer vision. I think there’s lots of signal to still be exploited in video: We have not been able to build foundation models yet for video because of compute bandwidth and the cost of processing video, as opposed to tokenized text. So I think that this engine of scaling up deep learning algorithms, which has been running for something like 15 years now, still has steam in it. Having said that, it only applies to certain problems, and there’s a set of other problems that need small data solutions. When you say you want a foundation model for computer vision, what do you mean by that? Ng: This is a term coined by Percy Liang and some of my friends at Stanford to refer to very large models, trained on very large data sets, that can be tuned for specific applications. For example, GPT-3 is an example of a foundation model [for NLP]. Foundation models offer a lot of promise as a new paradigm in developing machine learning applications, but also challenges in terms of making sure that they’re reasonably fair and free from bias, especially if many of us will be building on top of them. What needs to happen for someone to build a foundation model for video? Ng: I think there is a scalability problem. The compute power needed to process the large volume of images for video is significant, and I think that’s why foundation models have arisen first in NLP. Many researchers are working on this, and I think we’re seeing early signs of such models being developed in computer vision. But I’m confident that if a semiconductor maker gave us 10 times more processor power, we could easily find 10 times more video to build such models for vision. Having said that, a lot of what’s happened over the past decade is that deep learning has happened in consumer-facing companies that have large user bases, sometimes billions of users, and therefore very large data sets. While that paradigm of machine learning has driven a lot of economic value in consumer software, I find that that recipe of scale doesn’t work for other industries. Back to top It’s funny to hear you say that, because your early work was at a consumer-facing company with millions of users. Ng: Over a decade ago, when I proposed starting the Google Brain project to use Google’s compute infrastructure to build very large neural networks, it was a controversial step. One very senior person pulled me aside and warned me that starting Google Brain would be bad for my career. I think he felt that the action couldn’t just be in scaling up, and that I should instead focus on architecture innovation. “In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn.” —Andrew Ng, CEO & Founder, Landing AI I remember when my students and I published the first NeurIPS workshop paper advocating using CUDA, a platform for processing on GPUs, for deep learning—a different senior person in AI sat me down and said, “CUDA is really complicated to program. As a programming paradigm, this seems like too much work.” I did manage to convince him; the other person I did not convince. I expect they’re both convinced now. Ng: I think so, yes. Over the past year as I’ve been speaking to people about the data-centric AI movement, I’ve been getting flashbacks to when I was speaking to people about deep learning and scalability 10 or 15 years ago. In the past year, I’ve been getting the same mix of “there’s nothing new here” and “this seems like the wrong direction.” Back to top How do you define data-centric AI, and why do you consider it a movement? Ng: Data-centric AI is the discipline of systematically engineering the data needed to successfully build an AI system. For an AI system, you have to implement some algorithm, say a neural network, in code and then train it on your data set. The dominant paradigm over the last decade was to download the data set while you focus on improving the code. Thanks to that paradigm, over the last decade deep learning networks have improved significantly, to the point where for a lot of applications the code—the neural network architecture—is basically a solved problem. So for many practical applications, it’s now more productive to hold the neural network architecture fixed, and instead find ways to improve the data. When I started speaking about this, there were many practitioners who, completely appropriately, raised their hands and said, “Yes, we’ve been doing this for 20 years.” This is the time to take the things that some individuals have been doing intuitively and make it a systematic engineering discipline. The data-centric AI movement is much bigger than one company or group of researchers. My collaborators and I organized a data-centric AI workshop at NeurIPS, and I was really delighted at the number of authors and presenters that showed up. You often talk about companies or institutions that have only a small amount of data to work with. How can data-centric AI help them? Ng: You hear a lot about vision systems built with millions of images—I once built a face recognition system using 350 million images. Architectures built for hundreds of millions of images don’t work with only 50 images. But it turns out, if you have 50 really good examples, you can build something valuable, like a defect-inspection system. In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn. When you talk about training a model with just 50 images, does that really mean you’re taking an existing model that was trained on a very large data set and fine-tuning it? Or do you mean a brand new model that’s designed to learn only from that small data set? Ng: Let me describe what Landing AI does. When doing visual inspection for manufacturers, we often use our own flavor of RetinaNet. It is a pretrained model. Having said that, the pretraining is a small piece of the puzzle. What’s a bigger piece of the puzzle is providing tools that enable the manufacturer to pick the right set of images [to use for fine-tuning] and label them in a consistent way. There’s a very practical problem we’ve seen spanning vision, NLP, and speech, where even human annotators don’t agree on the appropriate label. For big data applications, the common response has been: If the data is noisy, let’s just get a lot of data and the algorithm will average over it. But if you can develop tools that flag where the data’s inconsistent and give you a very targeted way to improve the consistency of the data, that turns out to be a more efficient way to get a high-performing system. “Collecting more data often helps, but if you try to collect more data for everything, that can be a very expensive activity.” —Andrew Ng For example, if you have 10,000 images where 30 images are of one class, and those 30 images are labeled inconsistently, one of the things we do is build tools to draw your attention to the subset of data that’s inconsistent. So you can very quickly relabel those images to be more consistent, and this leads to improvement in performance. Could this focus on high-quality data help with bias in data sets? If you’re able to curate the data more before training? Ng: Very much so. Many researchers have pointed out that biased data is one factor among many leading to biased systems. There have been many thoughtful efforts to engineer the data. At the NeurIPS workshop, Olga Russakovsky gave a really nice talk on this. At the main NeurIPS conference, I also really enjoyed Mary Gray’s presentation, which touched on how data-centric AI is one piece of the solution, but not the entire solution. New tools like Datasheets for Datasets also seem like an important piece of the puzzle. One of the powerful tools that data-centric AI gives us is the ability to engineer a subset of the data. Imagine training a machine-learning system and finding that its performance is okay for most of the data set, but its performance is biased for just a subset of the data. If you try to change the whole neural network architecture to improve the performance on just that subset, it’s quite difficult. But if you can engineer a subset of the data you can address the problem in a much more targeted way. When you talk about engineering the data, what do you mean exactly? Ng: In AI, data cleaning is important, but the way the data has been cleaned has often been in very manual ways. In computer vision, someone may visualize images through a Jupyter notebook and maybe spot the problem, and maybe fix it. But I’m excited about tools that allow you to have a very large data set, tools that draw your attention quickly and efficiently to the subset of data where, say, the labels are noisy. Or to quickly bring your attention to the one class among 100 classes where it would benefit you to collect more data. Collecting more data often helps, but if you try to collect more data for everything, that can be a very expensive activity. For example, I once figured out that a speech-recognition system was performing poorly when there was car noise in the background. Knowing that allowed me to collect more data with car noise in the background, rather than trying to collect more data for everything, which would have been expensive and slow. Back to top What about using synthetic data, is that often a good solution? Ng: I think synthetic data is an important tool in the tool chest of data-centric AI. At the NeurIPS workshop, Anima Anandkumar gave a great talk that touched on synthetic data. I think there are important uses of synthetic data that go beyond just being a preprocessing step for increasing the data set for a learning algorithm. I’d love to see more tools to let developers use synthetic data generation as part of the closed loop of iterative machine learning development. Do you mean that synthetic data would allow you to try the model on more data sets? Ng: Not really. Here’s an example. Let’s say you’re trying to detect defects in a smartphone casing. There are many different types of defects on smartphones. It could be a scratch, a dent, pit marks, discoloration of the material, other types of blemishes. If you train the model and then find through error analysis that it’s doing well overall but it’s performing poorly on pit marks, then synthetic data generation allows you to address the problem in a more targeted way. You could generate more data just for the pit-mark category. “In the consumer software Internet, we could train a handful of machine-learning models to serve a billion users. In manufacturing, you might have 10,000 manufacturers building 10,000 custom AI models.” —Andrew Ng Synthetic data generation is a very powerful tool, but there are many simpler tools that I will often try first. Such as data augmentation, improving labeling consistency, or just asking a factory to collect more data. Back to top To make these issues more concrete, can you walk me through an example? When a company approaches Landing AI and says it has a problem with visual inspection, how do you onboard them and work toward deployment? Ng: When a customer approaches us we usually have a conversation about their inspection problem and look at a few images to verify that the problem is feasible with computer vision. Assuming it is, we ask them to upload the data to the LandingLens platform. We often advise them on the methodology of data-centric AI and help them label the data. One of the foci of Landing AI is to empower manufacturing companies to do the machine learning work themselves. A lot of our work is making sure the software is fast and easy to use. Through the iterative process of machine learning development, we advise customers on things like how to train models on the platform, when and how to improve the labeling of data so the performance of the model improves. Our training and software supports them all the way through deploying the trained model to an edge device in the factory. How do you deal with changing needs? If products change or lighting conditions change in the factory, can the model keep up? Ng: It varies by manufacturer. There is data drift in many contexts. But there are some manufacturers that have been running the same manufacturing line for 20 years now with few changes, so they don’t expect changes in the next five years. Those stable environments make things easier. For other manufacturers, we provide tools to flag when there’s a significant data-drift issue. I find it really important to empower manufacturing customers to correct data, retrain, and update the model. Because if something changes and it’s 3 a.m. in the United States, I want them to be able to adapt their learning algorithm right away to maintain operations. In the consumer software Internet, we could train a handful of machine-learning models to serve a billion users. In manufacturing, you might have 10,000 manufacturers building 10,000 custom AI models. The challenge is, how do you do that without Landing AI having to hire 10,000 machine learning specialists? So you’re saying that to make it scale, you have to empower customers to do a lot of the training and other work. Ng: Yes, exactly! This is an industry-wide problem in AI, not just in manufacturing. Look at health care. Every hospital has its own slightly different format for electronic health records. How can every hospital train its own custom AI model? Expecting every hospital’s IT personnel to invent new neural-network architectures is unrealistic. The only way out of this dilemma is to build tools that empower the customers to build their own models by giving them tools to engineer the data and express their domain knowledge. That’s what Landing AI is executing in computer vision, and the field of AI needs other teams to execute this in other domains. Is there anything else you think it’s important for people to understand about the work you’re doing or the data-centric AI movement? Ng: In the last decade, the biggest shift in AI was a shift to deep learning. I think it’s quite possible that in this decade the biggest shift will be to data-centric AI. With the maturity of today’s neural network architectures, I think for a lot of the practical applications the bottleneck will be whether we can efficiently get the data we need to develop systems that work well. The data-centric AI movement has tremendous energy and momentum across the whole community. I hope more researchers and developers will jump in and work on it. Back to top This article appears in the April 2022 print issue as “Andrew Ng, AI Minimalist.”

  • How AI Will Change Chip Design
    by Rina Diane Caballar on 8. February 2022. at 14:00

    The end of Moore’s Law is looming. Engineers and designers can do only so much to miniaturize transistors and pack as many of them as possible into chips. So they’re turning to other approaches to chip design, incorporating technologies like AI into the process. Samsung, for instance, is adding AI to its memory chips to enable processing in memory, thereby saving energy and speeding up machine learning. Speaking of speed, Google’s TPU V4 AI chip has doubled its processing power compared with that of its previous version. But AI holds still more promise and potential for the semiconductor industry. To better understand how AI is set to revolutionize chip design, we spoke with Heather Gorr, senior product manager for MathWorks’ MATLAB platform. How is AI currently being used to design the next generation of chips? Heather Gorr: AI is such an important technology because it’s involved in most parts of the cycle, including the design and manufacturing process. There’s a lot of important applications here, even in the general process engineering where we want to optimize things. I think defect detection is a big one at all phases of the process, especially in manufacturing. But even thinking ahead in the design process, [AI now plays a significant role] when you’re designing the light and the sensors and all the different components. There’s a lot of anomaly detection and fault mitigation that you really want to consider. Heather GorrMathWorks Then, thinking about the logistical modeling that you see in any industry, there is always planned downtime that you want to mitigate; but you also end up having unplanned downtime. So, looking back at that historical data of when you’ve had those moments where maybe it took a bit longer than expected to manufacture something, you can take a look at all of that data and use AI to try to identify the proximate cause or to see something that might jump out even in the processing and design phases. We think of AI oftentimes as a predictive tool, or as a robot doing something, but a lot of times you get a lot of insight from the data through AI. What are the benefits of using AI for chip design? Gorr: Historically, we’ve seen a lot of physics-based modeling, which is a very intensive process. We want to do a reduced order model, where instead of solving such a computationally expensive and extensive model, we can do something a little cheaper. You could create a surrogate model, so to speak, of that physics-based model, use the data, and then do your parameter sweeps, your optimizations, your Monte Carlo simulations using the surrogate model. That takes a lot less time computationally than solving the physics-based equations directly. So, we’re seeing that benefit in many ways, including the efficiency and economy that are the results of iterating quickly on the experiments and the simulations that will really help in the design. So it’s like having a digital twin in a sense? Gorr: Exactly. That’s pretty much what people are doing, where you have the physical system model and the experimental data. Then, in conjunction, you have this other model that you could tweak and tune and try different parameters and experiments that let sweep through all of those different situations and come up with a better design in the end. So, it’s going to be more efficient and, as you said, cheaper? Gorr: Yeah, definitely. Especially in the experimentation and design phases, where you’re trying different things. That’s obviously going to yield dramatic cost savings if you’re actually manufacturing and producing [the chips]. You want to simulate, test, experiment as much as possible without making something using the actual process engineering. We’ve talked about the benefits. How about the drawbacks? Gorr: The [AI-based experimental models] tend to not be as accurate as physics-based models. Of course, that’s why you do many simulations and parameter sweeps. But that’s also the benefit of having that digital twin, where you can keep that in mind—it’s not going to be as accurate as that precise model that we’ve developed over the years. Both chip design and manufacturing are system intensive; you have to consider every little part. And that can be really challenging. It’s a case where you might have models to predict something and different parts of it, but you still need to bring it all together. One of the other things to think about too is that you need the data to build the models. You have to incorporate data from all sorts of different sensors and different sorts of teams, and so that heightens the challenge. How can engineers use AI to better prepare and extract insights from hardware or sensor data? Gorr: We always think about using AI to predict something or do some robot task, but you can use AI to come up with patterns and pick out things you might not have noticed before on your own. People will use AI when they have high-frequency data coming from many different sensors, and a lot of times it’s useful to explore the frequency domain and things like data synchronization or resampling. Those can be really challenging if you’re not sure where to start. One of the things I would say is, use the tools that are available. There’s a vast community of people working on these things, and you can find lots of examples [of applications and techniques] on GitHub or MATLAB Central, where people have shared nice examples, even little apps they’ve created. I think many of us are buried in data and just not sure what to do with it, so definitely take advantage of what’s already out there in the community. You can explore and see what makes sense to you, and bring in that balance of domain knowledge and the insight you get from the tools and AI. What should engineers and designers consider when using AI for chip design? Gorr: Think through what problems you’re trying to solve or what insights you might hope to find, and try to be clear about that. Consider all of the different components, and document and test each of those different parts. Consider all of the people involved, and explain and hand off in a way that is sensible for the whole team. How do you think AI will affect chip designers’ jobs? Gorr: It’s going to free up a lot of human capital for more advanced tasks. We can use AI to reduce waste, to optimize the materials, to optimize the design, but then you still have that human involved whenever it comes to decision-making. I think it’s a great example of people and technology working hand in hand. It’s also an industry where all people involved—even on the manufacturing floor—need to have some level of understanding of what’s happening, so this is a great industry for advancing AI because of how we test things and how we think about them before we put them on the chip. How do you envision the future of AI and chip design? Gorr: It’s very much dependent on that human element—involving people in the process and having that interpretable model. We can do many things with the mathematical minutiae of modeling, but it comes down to how people are using it, how everybody in the process is understanding and applying it. Communication and involvement of people of all skill levels in the process are going to be really important. We’re going to see less of those superprecise predictions and more transparency of information, sharing, and that digital twin—not only using AI but also using our human knowledge and all of the work that many people have done over the years.

  • Atomically Thin Materials Significantly Shrink Qubits
    by Dexter Johnson on 7. February 2022. at 16:12

    Quantum computing is a devilishly complex technology, with many technical hurdles impacting its development. Of these challenges two critical issues stand out: miniaturization and qubit quality. IBM has adopted the superconducting qubit road map of reaching a 1,121-qubit processor by 2023, leading to the expectation that 1,000 qubits with today’s qubit form factor is feasible. However, current approaches will require very large chips (50 millimeters on a side, or larger) at the scale of small wafers, or the use of chiplets on multichip modules. While this approach will work, the aim is to attain a better path toward scalability. Now researchers at MIT have been able to both reduce the size of the qubits and done so in a way that reduces the interference that occurs between neighboring qubits. The MIT researchers have increased the number of superconducting qubits that can be added onto a device by a factor of 100. “We are addressing both qubit miniaturization and quality,” said William Oliver, the director for the Center for Quantum Engineering at MIT. “Unlike conventional transistor scaling, where only the number really matters, for qubits, large numbers are not sufficient, they must also be high-performance. Sacrificing performance for qubit number is not a useful trade in quantum computing. They must go hand in hand.” The key to this big increase in qubit density and reduction of interference comes down to the use of two-dimensional materials, in particular the 2D insulator hexagonal boron nitride (hBN). The MIT researchers demonstrated that a few atomic monolayers of hBN can be stacked to form the insulator in the capacitors of a superconducting qubit. Just like other capacitors, the capacitors in these superconducting circuits take the form of a sandwich in which an insulator material is sandwiched between two metal plates. The big difference for these capacitors is that the superconducting circuits can operate only at extremely low temperatures—less than 0.02 degrees above absolute zero (-273.15 °C). Superconducting qubits are measured at temperatures as low as 20 millikelvin in a dilution refrigerator.Nathan Fiske/MIT In that environment, insulating materials that are available for the job, such as PE-CVD silicon oxide or silicon nitride, have quite a few defects that are too lossy for quantum computing applications. To get around these material shortcomings, most superconducting circuits use what are called coplanar capacitors. In these capacitors, the plates are positioned laterally to one another, rather than on top of one another. As a result, the intrinsic silicon substrate below the plates and to a smaller degree the vacuum above the plates serve as the capacitor dielectric. Intrinsic silicon is chemically pure and therefore has few defects, and the large size dilutes the electric field at the plate interfaces, all of which leads to a low-loss capacitor. The lateral size of each plate in this open-face design ends up being quite large (typically 100 by 100 micrometers) in order to achieve the required capacitance. In an effort to move away from the large lateral configuration, the MIT researchers embarked on a search for an insulator that has very few defects and is compatible with superconducting capacitor plates. “We chose to study hBN because it is the most widely used insulator in 2D material research due to its cleanliness and chemical inertness,” said colead author Joel Wang, a research scientist in the Engineering Quantum Systems group of the MIT Research Laboratory for Electronics. On either side of the hBN, the MIT researchers used the 2D superconducting material, niobium diselenide. One of the trickiest aspects of fabricating the capacitors was working with the niobium diselenide, which oxidizes in seconds when exposed to air, according to Wang. This necessitates that the assembly of the capacitor occur in a glove box filled with argon gas. While this would seemingly complicate the scaling up of the production of these capacitors, Wang doesn’t regard this as a limiting factor. “What determines the quality factor of the capacitor are the two interfaces between the two materials,” said Wang. “Once the sandwich is made, the two interfaces are “sealed” and we don’t see any noticeable degradation over time when exposed to the atmosphere.” This lack of degradation is because around 90 percent of the electric field is contained within the sandwich structure, so the oxidation of the outer surface of the niobium diselenide does not play a significant role anymore. This ultimately makes the capacitor footprint much smaller, and it accounts for the reduction in cross talk between the neighboring qubits. “The main challenge for scaling up the fabrication will be the wafer-scale growth of hBN and 2D superconductors like [niobium diselenide], and how one can do wafer-scale stacking of these films,” added Wang. Wang believes that this research has shown 2D hBN to be a good insulator candidate for superconducting qubits. He says that the groundwork the MIT team has done will serve as a road map for using other hybrid 2D materials to build superconducting circuits.

previous arrowprevious arrow
next arrownext arrow