Explanation-based learning

Explanation-based learning (EBL) is a form of machine learning that exploits a very strong, or even perfect, domain theory (i.e. a formal theory of an application domain akin to a domain model in ontology engineering, not to be confused with Scott's domain theory) in order to make generalizations or form concepts from training examples. It is also linked with Encoding (memory) to help with Learning. == Details == An example of EBL using a perfect domain theory is a program that learns to play chess through example. A specific chess position that contains an important feature such as "Forced loss of black queen in two moves" includes many irrelevant features, such as the specific scattering of pawns on the board. EBL can take a single training example and determine what are the relevant features in order to form a generalization. A domain theory is perfect or complete if it contains, in principle, all information needed to decide any question about the domain. For example, the domain theory for chess is simply the rules of chess. Knowing the rules, in principle, it is possible to deduce the best move in any situation. However, actually making such a deduction is impossible in practice due to combinatoric explosion. EBL uses training examples to make searching for deductive consequences of a domain theory efficient in practice. In essence, an EBL system works by finding a way to deduce each training example from the system's existing database of domain theory. Having a short proof of the training example extends the domain-theory database, enabling the EBL system to find and classify future examples that are similar to the training example very quickly. The main drawback of the method—the cost of applying the learned proof macros, as these become numerous—was analyzed by Minton. === Basic formulation === EBL software takes four inputs: a hypothesis space (the set of all possible conclusions) a domain theory (axioms about a domain of interest) training examples (specific facts that rule out some possible hypothesis) operationality criteria (criteria for determining which features in the domain are efficiently recognizable, e.g. which features are directly detectable using sensors) == Application == An especially good application domain for an EBL is natural language processing (NLP). Here a rich domain theory, i.e., a natural language grammar—although neither perfect nor complete, is tuned to a particular application or particular language usage, using a treebank (training examples). Rayner pioneered this work. The first successful industrial application was to a commercial NL interface to relational databases. The method has been successfully applied to several large-scale natural language parsing systems, where the utility problem was solved by omitting the original grammar (domain theory) and using specialized LR-parsing techniques, resulting in huge speed-ups, at a cost in coverage, but with a gain in disambiguation. EBL-like techniques have also been applied to surface generation, the converse of parsing. When applying EBL to NLP, the operationality criteria can be hand-crafted, or can be inferred from the treebank using either the entropy of its or-nodes or a target coverage/disambiguation trade-off (= recall/precision trade-off = f-score). EBL can also be used to compile grammar-based language models for speech recognition, from general unification grammars. Note how the utility problem, first exposed by Minton, was solved by discarding the original grammar/domain theory, and that the quoted articles tend to contain the phrase grammar specialization—quite the opposite of the original term explanation-based generalization. Perhaps the best name for this technique would be data-driven search space reduction. Other people who worked on EBL for NLP include Guenther Neumann, Aravind Joshi, Srinivas Bangalore, and Khalil Sima'an.

Geofence warrant

A geofence warrant or a reverse location warrant is a search warrant issued by a court to allow law enforcement to search a database to find all active mobile devices within a particular geo-fence area. Courts have granted law enforcement geo-fence warrants to obtain information from databases such as Google's Sensorvault, which collects users' historical geolocation data. Geo-fence warrants are a part of a category of warrants known as reverse search warrants. == History == Geofence warrants were first used in 2016. Google reported that it had received 982 such warrants in 2018, 8,396 in 2019, and 11,554 in 2020. A 2021 transparency report showed that 25% of data requests from law enforcement to Google were geo-fence data requests. Google is the most common recipient of geo-fence warrants and the main provider of such data, although companies including Apple, Snapchat, Lyft, and Uber have also received such warrants. == Legality == === United States === Some lawyers and privacy experts believe reverse search warrants are unconstitutional under the Fourth Amendment to the United States Constitution, which protects people from unreasonable searches and seizures, and requires any search warrants be specific to what and to whom they apply. The Fourth Amendment specifies that warrants may only be issued "upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized." Some lawyers, legal scholars, and privacy experts have likened reverse search warrants to general warrants, which were made illegal by the Fourth Amendment. Groups including the Electronic Frontier Foundation have opposed geo-fence warrants in amicus briefs filed in motions to quash such orders to disclose geo-fence data. In 2024, a panel of the United States Fourth Circuit Court of Appeals considered data acquired from Google’s Sensorvault not to be a search, but non-private business records when users opt-in to Google’s location history. However, upon a rehearing en banc, the Court vacated that decision. In April 2025, the full Court affirmed the judgment solely on the 'good faith' exception, leaving the underlying constitutional question of whether geofence warrants constitute a search unsettled in the Circuit. However, the United States Fifth Circuit Court of Appeals found that geofence warrants are "categorically prohibited by the Fourth Amendment." The split in Circuits prompted the United States Supreme Court to agree to hear Chatrie v. United States in January 2026.

Deep learning speech synthesis

Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded speech and, in the case of a text-to-speech system, the associated labels and/or input text. == Formulation == Given an input text or some sequence of linguistic units Y {\displaystyle Y} , the target speech X {\displaystyle X} can be derived by X = arg ⁡ max P ( X | Y , θ ) {\displaystyle X=\arg \max P(X|Y,\theta )} where θ {\displaystyle \theta } is the set of model parameters. Typically, the input text will first be passed to an acoustic feature generator, then the acoustic features are passed to the neural vocoder. For the acoustic feature generator, the loss function is typically L1 loss (Mean Absolute Error, MAE) or L2 loss (Mean Square Error, MSE). These loss functions impose a constraint that the output acoustic feature distributions must be Gaussian or Laplacian. In practice, since the human voice band ranges from approximately 300 to 4000 Hz, the loss function will be designed to have more penalty on this range: l o s s = α loss human + ( 1 − α ) loss other {\displaystyle loss=\alpha {\text{loss}}_{\text{human}}+(1-\alpha ){\text{loss}}_{\text{other}}} where loss human {\displaystyle {\text{loss}}_{\text{human}}} is the loss from human voice band and α {\displaystyle \alpha } is a scalar, typically around 0.5. The acoustic feature is typically a spectrogram or Mel scale. These features capture the time-frequency relation of the speech signal, and thus are sufficient to generate intelligent outputs. The Mel-frequency cepstrum feature used in the speech recognition task is not suitable for speech synthesis, as it reduces too much information. == History == In September 2016, DeepMind released WaveNet, which demonstrated that deep learning-based models are capable of modeling raw waveforms and generating speech from acoustic features like spectrograms or mel-spectrograms. Although WaveNet was initially considered to be computationally expensive and slow to be used in consumer products at the time, a year after its release, DeepMind unveiled a modified version of WaveNet known as "Parallel WaveNet," a production model 1,000 faster than the original. This was followed by Google AI's Tacotron 2 in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. Tacotron 2 used an autoencoder architecture with attention mechanisms to convert input text into mel-spectrograms, which were then converted to waveforms using a separate neural vocoder. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron 2 failed to produce intelligible speech. In 2019, Microsoft Research introduced FastSpeech, which addressed speed limitations in autoregressive models like Tacotron 2. FastSpeech utilized a non-autoregressive architecture that enabled parallel sequence generation, significantly reducing inference time while maintaining audio quality. Its feedforward transformer network with length regulation allowed for one-shot prediction of the full mel-spectrogram sequence, avoiding the sequential dependencies that bottlenecked previous approaches. The same year saw the release of HiFi-GAN, a generative adversarial network (GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech. In 2020, the release of Glow-TTS introduced a flow-based approach that allowed for fast inference and voice style transfer capabilities. In March 2020, the free text-to-speech website 15.ai was launched. 15.ai gained widespread international attention in early 2021 for its ability to synthesize emotionally expressive speech of fictional characters from popular media with minimal amount of data. The creator of 15.ai (known pseudonymously as 15) stated that 15 seconds of training data is sufficient to perfectly clone a person's voice (hence its name, "15.ai"), a significant reduction from the previously known data requirement of tens of hours. 15.ai is credited as the first platform to popularize AI voice cloning in memes and content creation. 15.ai used a multi-speaker model that enabled simultaneous training of multiple voices and emotions, implemented sentiment analysis using DeepMoji, and supported precise pronunciation control via ARPABET. The 15-second data efficiency benchmark was later corroborated by OpenAI in 2024. == Semi-supervised learning == Currently, self-supervised learning has gained much attention through better use of unlabelled data. Research has shown that, with the aid of self-supervised loss, the need for paired data decreases. == Zero-shot speaker adaptation == Zero-shot speaker adaptation is promising because a single model can generate speech with various speaker styles and characteristic. In June 2018, Google proposed to use pre-trained speaker verification models as speaker encoders to extract speaker embeddings. The speaker encoders then become part of the neural text-to-speech models, so that it can determine the style and characteristics of the output speech. This procedure has shown the community that it is possible to use only a single model to generate speech with multiple styles. == Neural vocoder == In deep learning-based speech synthesis, neural vocoders play an important role in generating high-quality speech from acoustic features. The WaveNet model proposed in 2016 achieves excellent performance on speech quality. Wavenet factorised the joint probability of a waveform x = { x 1 , . . . , x T } {\displaystyle \mathbf {x} =\{x_{1},...,x_{T}\}} as a product of conditional probabilities as follows p θ ( x ) = ∏ t = 1 T p ( x t | x 1 , . . . , x t − 1 ) {\displaystyle p_{\theta }(\mathbf {x} )=\prod _{t=1}^{T}p(x_{t}|x_{1},...,x_{t-1})} where θ {\displaystyle \theta } is the model parameter including many dilated convolution layers. Thus, each audio sample x t {\displaystyle x_{t}} is conditioned on the samples at all previous timesteps. However, the auto-regressive nature of WaveNet makes the inference process dramatically slow. To solve this problem, Parallel WaveNet was proposed. Parallel WaveNet is an inverse autoregressive flow-based model which is trained by knowledge distillation with a pre-trained teacher WaveNet model. Since such inverse autoregressive flow-based models are non-auto-regressive when performing inference, the inference speed is faster than real-time. Meanwhile, Nvidia proposed a flow-based WaveGlow model, which can also generate speech faster than real-time. However, despite the high inference speed, parallel WaveNet has the limitation of needing a pre-trained WaveNet model, so that WaveGlow takes many weeks to converge with limited computing devices. This issue has been solved by Parallel WaveGAN, which learns to produce speech through multi-resolution spectral loss and GAN learning strategies.

Flux (text-to-image model)

Flux (also known as FLUX.1 and FLUX.2) is a text-to-image model developed by Black Forest Labs (BFL), based in Freiburg im Breisgau, Germany. Black Forest Labs was founded by former employees of Stability AI. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts. == History == Black Forest Labs (BFL) was founded in 2024 by Robin Rombach, Andreas Blattmann, and Patrick Esser, former employees of Stability AI. All three founders had previously researched the artificial intelligence image generation at LMU Munich as research assistants under Björn Ommer. They published their research results on image generation in 2022, which resulted in creation of Stable Diffusion. Investors in BFL included venture capital firm Andreessen Horowitz, Brendan Iribe, Michael Ovitz, Garry Tan, and Vladlen Koltun. The company received an initial investment of US$31 million. In August 2024, Flux was integrated into the Grok chatbot developed by xAI and made available as part of premium feature on X (formerly Twitter). Grok later switched to its own text-to-image model Aurora in December 2024. On 18 November 2024, Mistral AI announced that its Le Chat chatbot had integrated Flux Pro as its image generation model. On 21 November 2024, BFL announced the release of Flux.1 Tools, a suite of editing tools designed to be used on top of existing Flux models. The tools consisting of Flux.1 Fill for inpainting and outpainting, Flux.1 Depth for control based on extracted depth map of input images and prompts, Flux.1 Canny for control based on extracted canny edges of input images and prompts, and Flux.1 Redux for mixing existing input images and prompts. Each tools are available in both Pro and Dev models. In January 2025, BFL announced a partnership with Nvidia for inclusion of Flux models as foundation models for Nvidia's Blackwell microarchitecture. The company also announced the release of Flux Pro Finetuning API, designed for customisation and fine-tuning of Flux-generated images and a partnership with German media company Hubert Burda Media for usage of Flux Pro as part of content creation. On 29 May 2025, BFL announced Flux.1 Kontext, a suite of models that enable in-context image generation and editing, allowing users to prompt with both text and images. Alongside this, BFL Playground, an interface for testing Flux models was released. On 31 July 2025, BFL announced Flux.1 Krea Dev, a model developed in collaboration with Krea AI that trained to achieve better performance, more varied aesthetics, and better realism compared to existing text-to-image models. In September 2025, Adobe Inc. announced that Photoshop (beta) users can use Flux.1 Kontext Pro as a model for its generative fill tool. BFL collaborated with Meta on Vibes, a video-generation app. On 25 November 2025, BFL announced the release of Flux.2 model series, consisting of Pro, Flex, Dev, and Apache 2.0-licensed Klein (meaning Little or Small in German language) models along with Flux.2 variational autoencoder which also released as open-source software under Apache 2.0 licence. This series claimed improvements for image reference, photorealism, typography, and prompt understanding. == Models == Flux is a series of text-to-image models. The models are based on rectified flow transformer blocks scaled to 12 billion parameters. Flux.1 models were released under different licences with Schnell (meaning Fast or Quick in German language) released as open-source software under Apache License, Dev released as source-available software under a non-commercial licence (users can obtain a self-serving commercial licence for Dev from BFL), and Pro released as proprietary software and only available as API that can be licensed by third-party users. Users retained the ownership of resulting output regardless of models used. An improved flagship model, Flux 1.1 Pro was released on 2 October 2024. Two additional modes were added on 6 November, Ultra which can generate image at four times higher resolution and up to 4 megapixel without affecting generation speed and Raw which can generate hyper-realistic image in the style of candid photography. Flux.1 Kontext is a series with in-context image generation and editing capabilities. It is available in Max, Pro, and Dev models. Max is the highest quality model and can be used to iteratively modify an existing image by using prompt while Pro is optimized to balance quality and speed of generation. Dev is an open-weight model released under non-commercial license, same as Flux.1 Dev. Flux.2 models are based on latent flow matching architecture with Mistral AI's Mistral-3 model (24 billion parameters) for its vision-language model. As with Flux.1, Flux.2 models were also released under different licences with Klein released as open-source software under Apache License, Dev released as source-available software under a non-commercial licence (users can obtain a self-serving commercial licence from BFL), and both Flex and Pro released as proprietary software and only available as API. The models can be used either online or locally by using generative AI user interfaces such as ComfyUI, Recraft Studio and Stable Diffusion WebUI Forge (a fork of Automatic1111 WebUI). Related to Flux is a text-to-video model by Black Forest Labs, under development as of February 2026. == Reception == According to a test performed by Ars Technica, the outputs generated by Flux.1 Dev and Flux.1 Pro are comparable with DALL-E 3 in terms of prompt fidelity, with the photorealism closely matched Midjourney 6 and generated human hands with more consistency over previous models such as Stable Diffusion XL. Flux has been criticised for its very realistic generated images. According to media reports, depictions ranged from an image of Donald Trump posing with guns to disturbing scenes, which triggered discussions about ethical implications of Flux models. After the release of the model, social media platform X was flooded with Flux-generated images. Black Forest Labs has not provided exact details of the data used to train the model. Ars Technica suspected that Flux is based on a large, unauthorised collection of images scraped from the internet, a controversial practice with potential legal consequences. According to a test performed by Japanese technology news website Gigazine for Flux.1 Kontext, the model series has a good understanding of the English language and can easily transfer style of the image from photorealistic into anime-style according to prompts given by the user; however, its ability to understand Japanese is quite poor. == Availability == In addition to the official BFL Playground on its website, the Flux models are also widely available through various third-party platforms for creative and professional use. These include repositories on platforms like Hugging Face and Replicate. == Further readings == FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space (29 May 2025) FLUX.2: Analyzing and Enhancing the Latent Space of FLUX – Representation Comparison (25 November 2025)

Hostile Waters: Antaeus Rising

Hostile Waters, released as Hostile Waters: Antaeus Rising in America, is a hybrid vehicle and strategy game developed and published by Rage Software for Microsoft Windows. It was inspired by Carrier Command (Realtime Games, 1988). It has won several awards and one unofficial award from Rock Paper Shotgun as a "lost classic" or "The best game you've never played". == Plot == Hostile Waters takes place in a Utopian future where war has been abolished. In the year 2012, a revolutionary war takes place between the corrupt and power-hungry politicians, leaders and businessmen (described as the "Old Guard") and the people. The Old Guard were defeated, with only a few of their leaders escaping. By 2032, the world has been rebuilt as a utopia, with the help of nano-technological assemblers, which are used in "creation engines" to create matter from energy and waste, for free. The newly united world is governed from a capital city known as Central. Missile attacks are suddenly launched against major cities all over the world from an unknown location. This is eventually discovered to be an island chain in the South Pacific Ocean. A response to the missile attacks was a special forces team sent in to investigate the area for preliminary investigations. The Ministry of Intelligence (MinIntel) loses contact with it shortly thereafter. The world government authorises a reactivation of the Antaeus program, a series of warships able to create any weapon of their choosing using their on-board nano-technological creation engine. Two of these were left on the seabed in the case of an emergency, capable of being re-activated and refloating itself. On board are a series of "soulcatcher" chips, a classified 1990s military program researched into for the storage of human brain functions on a silicon chip. The soulcatcher technology was used to store the minds of every crew member ever assigned to an Antaeus vessel. It is soon discovered that one of the cruisers does not respond to the awakening signal. The other cruiser, however, is refloated and re-activated, with heavy damage to vital ship components. A course is plotted for a nearby disused wet-dock. As the Antaeus progresses from the wet-dock, unusual biological life-forms are discovered amongst the enemy bases on the islands. The identity of the aggressor firing the missiles is confirmed as the leftovers of the old, pre-Central forces, known as the Cabal. Outnumbering Central's army a thousand to one, they are fighting with thousands of troops and weapons that they hid away when it was apparent that the war was lost. The Antaeus is deployed into the chicane to stop the Cabal's operations there. It's later discovered that along with their superior numbers, they have also biologically engineered a species of organic machines, designed in the popular likeness of extraterrestrials, which they intend to use to create the fear of an alien invasion, to facilitate their taking over the world and the removal of the public use of creation engines. The Cabal later lose control of the species, which eventually turns on its masters, destroying them. The species starts spreading, modifying the planetary climate and geographical features in an attempt to exterminate humanity and make the planet more hospitable to itself. Having exterminated its creators, the species resolves to cleanse humanity as a whole from the planet using a massive 'disassembler cannon', only to be stopped by the Antaeus. The species subsequently attempts to flee into the cosmos and colonise the surrounding planets and stars, by launching a massive number of 'culture stones' (information devices that also double as creation engines) into space from an enormous, artificially-grown organic "island", the final staging point. Central's only option is to bind the Antaeus' creation engine and the disassembler cannon stolen from the aliens together to create a makeshift bomb, and detonate it at the central "column" containing the culture stones. The plan succeeds, and the Antaeus is sacrificed to save the world. The final cinematic show the organic disassembler cannon and the Antaeus' creation engine moving closer together and fusing, creating something new. A post-credits scene also shows that two of the species' culture stones have managed to get into space. == Gameplay == Each Mission takes place on and or near a fortified enemy island containing various forms of anti-air and ground defence, with scattered unit-production complexes powered by oil-derricks and fuel containers (which are dependent on the oil-derricks) that the player can destroy to keep the enemy from replacing destroyed forces. Vehicles are built on the Antaeus and, if desired, land vehicles can be delivered to a location by the air-lifting "magpie". Units are created by providing Antaeus with a number of resources which are obtained at the beginning of the level and debris which are taken from destroyed enemy units and structures. Transport helicopters such as the "Pegasus" can fly to an object and airlift it to the ship-board recycling system with little resources required. The carrier can analyse objects it disassembles at the rear of the Antaeus cruiser, and several of the game's vehicles and items are unlocked by "sampling" them in this fashion. The game has a number of vehicles that are progressively unlocked as the missions progress. Vehicles contain a number of slots for equipment and a selection of different types of weapons to use in the vehicle. A variety of vehicle equipment combinations can be designed. Vehicles have an individual damage multiplier such that different vehicles with the same weapon will do different damage. In addition to this, each soul-chip personality specializes in one unit along with specific equipment, which, if equipped will gain them a bonus in efficiency. == Development == The game was developed by 12 people. == Reception == The game received "favourable" reviews according to the review aggregation website Metacritic. Carla Harker of NextGen said, "You'll feel like a real battlefield general when you take to the field in Antaeus Rising." Jake The Snake of GamePro said, "If the usual game categories leave you unscathed, get bloodied in these Hostile Waters."

Compute (machine learning)

In machine learning and deep learning, compute is the amount of computing power or computational resources required to train machine learning models and large language models. More broadly, compute is the computational power or resources necessary for a computer or computer program to function. == Definition == Compute is commonly defined as the amount of computing power or computational resources required to train machine learning and large language models. The term "compute" has also been more broadly applied to cloud computing, referencing processing power, memory, networking, storage, and other resources required for the computation of any program. Compute is measured in petaflop/s-days and is used to document AI training. A petaflop/s-day (pfs-day) consists of performing 1015 neural net operations per second for one day, or a total of about 1020 operations. The compute-time product serves as a mental convenience, similar to kilowatt-hour for energy. An amount of compute is meant to give an idea of the number of actual operations performed. == History == In a 2018 analysis titled "AI and compute", artificial intelligence company OpenAI introduced the concept of compute. OpenAI identified two eras of training AI systems in terms of compute-usage. From 1959 to 2012, compute roughly followed Moore’s law. Between 2012 and 2018, the amount of compute used in the largest AI training runs increased exponentially, growing by more than 300,000 times — roughly doubling every 3.4 months. By comparison, Moore’s Law doubled every two years over the same period. One of the largest models, released in 2020, used 600,000 times more computing power than the 2012 model. After 2020, compute growth began to slow down, with the compute needed for the largest AI models continuing to slow down in 2023. The notion of compute has become increasingly used from the mid-2020s onwards. == Compute growth and AI progress == Larger AI models trained on more data and using more computational resources, tend to perform better. This happens even if the algorithms themselves remain unchanged. As early as 2018, OpenAI noted the exponential increase in compute to be have a key role in AI progress. OpenAI considers three factors drive the advance of AI: algorithmic innovation, data, and the amount of compute available for training. AI models with more compute not only improve in the tasks they were trained on but can develop emergent abilities. Incremental improvements can lead to more abrupt leaps in capabilities. AI provider SpaceXAI said in 2026 that their AI progress is driven by compute and used it a key metric in the AI training of its supercomputer Colossus, the which contains 1 million GPUs. Anthropic has a contract of $1.25 billion per month with SpaceXAI to buy all the compute capacity at Colossus 1 data center. === Criticism and policy === Increasing, promoting or constraining progress in artificial intelligence has often be done via controlling the amount of compute. Policymarkers have enacted policies and provided support to make compute resources more accessible to domestic AI researchers. In a January 2022 report, the Center for Security and Emerging Technology (CSET) suggested to institutions that increasingly powerful and generalizable AI (AGI) will likely require other strategies than maximizing compute. Some AI researchers are also concerned that government might exclusively focus on scaling compute instead of other strategies. The CSET has reported on the various bottlenecks which could explain why deep learning needs for compute have slow down: training is expensive and training extremely large models generates traffic jams across many processors that are difficult to manage. there is a limited supply of AI chips (see AI chip memory shortage). CSET advances that the main resource is human capital, specifically talented researchers — according to a 2023 published survey of more than 400 AI researchers, academic and private sector workers. The survey found that AI researchers are not primarily or exclusively constrained by compute access. However, both academic and industry AI researchers equally report concerns that insufficient compute could prevent them from contributing meaningfully to AI research in the future. High compute users are more concerned about compute access. When asked about which resource provided by the government would be the most useful to them, some AI researchers select compute, other prefer grant funding. For this goal, CSET advised policymakers to ensure that even researchers with smaller budgets could effectively contribute to AI research. Other proposed strategies include using contemporary AI algorithms, managing modern AI infrastructure or focusing on interdisciplinary work between the AI field and other fields of computer science. A 2024 study on compute access found that academic-only AI research teams often have less compute intensive research topics, especially foundation models, compared to industry AI labs. As a consequence, academia is likely to play a smaller role in advancing such techniques. The researchers suggest nationally-sponsored computing infrastructure as well as open science initiatives to boost academic compute access. === Data === A 2022 study found that current large language models are significantly under-trained, a consequence of focusing on scaling language models whilst keeping the amount of training data constant. By training over 400 language models of various parameter and token size, they found that "for compute-optimal training", the model size and the number of training tokens should ideally be scaled equally: for every doubling of model size the number of training tokens should also be doubled.

Random-fuzzy variable

In measurements, the measurement obtained can suffer from two types of uncertainties. The first is the random uncertainty which is due to the noise in the process and the measurement. The second contribution is due to the systematic uncertainty which may be present in the measuring instrument. Systematic errors, if detected, can be easily compensated as they are usually constant throughout the measurement process as long as the measuring instrument and the measurement process are not changed. But it can not be accurately known while using the instrument if there is a systematic error and if there is, how much? Hence, systematic uncertainty could be considered as a contribution of a fuzzy nature. This systematic error can be approximately modeled based on our past data about the measuring instrument and the process. Statistical methods can be used to calculate the total uncertainty from both systematic and random contributions in a measurement. However, the computational complexity is very high, and hence not desirable. L.A.Zadeh introduced the concepts of fuzzy variables and fuzzy sets. Fuzzy variables are based on the theory of possibility and hence are possibility distributions. This makes them suitable to handle any type of uncertainty, i.e., both systematic and random contributions to the total uncertainty. Random-fuzzy variable (RFV) is a type 2 fuzzy variable, defined using the mathematical possibility theory, used to represent the entire information associated to a measurement result. It has an internal possibility distribution and an external possibility distribution called membership functions. The internal distribution is the uncertainty contributions due to the systematic uncertainty and the bounds of the RFV are because of the random contributions. The external distribution gives the uncertainty bounds from all contributions. == Definition == A random-fuzzy Variable (RFV) is defined as a type 2 fuzzy variable which satisfies the following conditions: Both the internal and the external functions of the RFV can be identified. Both the internal and the external functions are modeled as possibility distributions (PD). Both the internal and external functions have a unitary value for possibility to the same interval of values. An RFV can be seen in the figure. The external membership function is the distribution in blue and the internal membership function is the distribution in red. Both the membership functions are possibility distributions. Both the internal and external membership functions have a unitary value of possibility only in the rectangular part of the RFV. Therefore, all three conditions have been satisfied. If there are only systematic errors in the measurement, then the RFV simply becomes a fuzzy variable which consists of just the internal membership function. Similarly, if there is no systematic error, then the RFV becomes a fuzzy variable with just the random contributions and therefore, is just the possibility distribution of the random contributions. == Construction == A random-fuzzy variable can be constructed using an internal possibility distribution (rinternal) and a random possibility distribution (rrandom). === The random distribution (rrandom) === rrandom is the possibility distribution of the random contributions to the uncertainty. Any measurement instrument or process suffers from random error contributions due to intrinsic noise or other effects. This is completely random in nature and is a normal probability distribution when several random contributions are combined according to the central limit theorem. However, there can also be random contributions from other probability distributions, such as a uniform distribution, gamma distribution and so on. The probability distribution can be modeled from the measurement data. Then, the probability distribution can be used to model an equivalent possibility distribution using the maximally specific probability-possibility transformation. Some common probability distributions and the corresponding possibility distributions can be seen in the figures. === The internal distribution (rinternal) === rinternal is the internal distribution in the RFV which is the possibility distribution of the systematic contribution to the total uncertainty. This distribution can be built based on the information that is available about the measuring instrument and the process. The largest possible distribution is the uniform or rectangular possibility distribution. This means that every value in the specified interval is equally possible. This actually represents the state of total ignorance according to the theory of evidence which means it represents a scenario in which there is maximum lack of information. This distribution is used for the systematic error when we have absolutely no idea about the systematic error except that it belongs to a particular interval of values. This is quite common in measurements. However, in certain cases, it may be known that certain values have a higher or lower degrees of belief than certain other values. In this case, depending on the degrees of belief for the values, an appropriate possibility distribution could be constructed. === The construction of the external distribution (rexternal) and the RFV === After modeling the random and internal possibility distribution, the external membership function, rexternal, of the RFV can be constructed by using the following equation: where x ∗ {\displaystyle x^{}} is the mode of r random {\displaystyle r_{\textit {random}}} , which is the peak in the membership function of r r a n d o m {\displaystyle r_{random}} and Tmin is the minimum triangular norm. RFV can also be built from the internal and random distributions by considering the α-cuts of the two possibility distributions (PDs). An α-cut of a fuzzy variable F can be defined as Therefore, essentially an α-cut is the set of values for which the value of the membership function μ F ( a ) {\displaystyle \mu _{\rm {F}}(a)} of the fuzzy variable is greater than α. This gives the upper and lower bounds of the fuzzy variable F for each α-cut. The α-cut of an RFV, however, has 4 specific bounds and is given by R F V α = [ X a α , X b α , X c α , X d α ] {\displaystyle RFV^{\alpha }=[X_{a}^{\alpha },X_{b}^{\alpha },X_{c}^{\alpha },X_{d}^{\alpha }]} . X a α {\displaystyle X_{a}^{\alpha }} and X d α {\displaystyle X_{d}^{\alpha }} are the lower and upper bounds respectively of the external membership function (rexternal) which is a fuzzy variable on its own. X b α {\displaystyle X_{b}^{\alpha }} and X c α {\displaystyle X_{c}^{\alpha }} are the lower and upper bounds respectively of the internal membership function (rinternal) which is a fuzzy variable on its own. To build the RFV, let us consider the α-cuts of the two PDs i.e., rrandom and rinternal for the same value of α. This gives the lower and upper bounds for the two α-cuts. Let them be [ X L R α , X U R α ] {\displaystyle [X_{LR}^{\alpha },X_{UR}^{\alpha }]} and [ X L I α , X U I α ] {\displaystyle [X_{LI}^{\alpha },X_{UI}^{\alpha }]} for the random and internal distributions respectively. [ X L R α , X U R α ] {\displaystyle [X_{LR}^{\alpha },X_{UR}^{\alpha }]} can be again divided into two sub-intervals [ X L R α , x ∗ ] {\displaystyle [X_{LR}^{\alpha },x^{}]} and [ x ∗ , X U R α ] {\displaystyle [x^{},X_{UR}^{\alpha }]} where x ∗ {\displaystyle x^{}} is the mode of the fuzzy variable. Then, the α-cut for the RFV for the same value of α, R F V α = [ X a α , X b α , X c α , X d α ] {\displaystyle RFV^{\alpha }=[X_{a}^{\alpha },X_{b}^{\alpha },X_{c}^{\alpha },X_{d}^{\alpha }]} can be defined by Using the above equations, the α-cuts are calculated for every value of α which gives us the final plot of the RFV. A random-fuzzy variable is capable of giving a complete picture of the random and systematic contributions to the total uncertainty from the α-cuts for any confidence level as the confidence level is nothing but 1-α. An example for the construction of the corresponding external membership function (rexternal) and the RFV from a random PD and an internal PD can be seen in the following figure.