Structured support vector machine

Structured support vector machine

The structured supportvector machine is a machine learning algorithm that generalizes the support vector machine (SVM) classifier. Whereas the SVM classifier supports binary classification, multiclass classification and regression, the structured SVM allows training of a classifier for general structured output labels. As an example, a sample instance might be a natural language sentence, and the output label is an annotated parse tree. Training a classifier consists of showing pairs of correct sample and output label pairs. After training, the structured SVM model allows one to predict for new sample instances the corresponding output label; that is, given a natural language sentence, the classifier can produce the most likely parse tree. == Training == For a set of n {\displaystyle n} training instances ( x i , y i ) ∈ X × Y {\displaystyle ({\boldsymbol {x}}_{i},y_{i})\in {\mathcal {X}}\times {\mathcal {Y}}} , i = 1 , … , n {\displaystyle i=1,\dots ,n} from a sample space X {\displaystyle {\mathcal {X}}} and label space Y {\displaystyle {\mathcal {Y}}} , the structured SVM minimizes the following regularized risk function. min w ‖ w ‖ 2 + C ∑ i = 1 n max y ∈ Y ( 0 , Δ ( y i , y ) + ⟨ w , Ψ ( x i , y ) ⟩ − ⟨ w , Ψ ( x i , y i ) ⟩ ) {\displaystyle {\underset {\boldsymbol {w}}{\min }}\quad \|{\boldsymbol {w}}\|^{2}+C\sum _{i=1}^{n}{\underset {y\in {\mathcal {Y}}}{\max }}\left(0,\Delta (y_{i},y)+\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle -\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y_{i})\rangle \right)} The function is convex in w {\displaystyle {\boldsymbol {w}}} because the maximum of a set of affine functions is convex. The function Δ : Y × Y → R + {\displaystyle \Delta :{\mathcal {Y}}\times {\mathcal {Y}}\to \mathbb {R} _{+}} measures a distance in label space and is an arbitrary function (not necessarily a metric) satisfying Δ ( y , z ) ≥ 0 {\displaystyle \Delta (y,z)\geq 0} and Δ ( y , y ) = 0 ∀ y , z ∈ Y {\displaystyle \Delta (y,y)=0\;\;\forall y,z\in {\mathcal {Y}}} . The function Ψ : X × Y → R d {\displaystyle \Psi :{\mathcal {X}}\times {\mathcal {Y}}\to \mathbb {R} ^{d}} is a feature function, extracting some feature vector from a given sample and label. The design of this function depends very much on the application. Because the regularized risk function above is non-differentiable, it is often reformulated in terms of a quadratic program by introducing one slack variable ξ i {\displaystyle \xi _{i}} for each sample, each representing the value of the maximum. The standard structured SVM primal formulation is given as follows. min w , ξ ‖ w ‖ 2 + C ∑ i = 1 n ξ i s.t. ⟨ w , Ψ ( x i , y i ) ⟩ − ⟨ w , Ψ ( x i , y ) ⟩ + ξ i ≥ Δ ( y i , y ) , i = 1 , … , n , ∀ y ∈ Y {\displaystyle {\begin{array}{cl}{\underset {{\boldsymbol {w}},{\boldsymbol {\xi }}}{\min }}&\|{\boldsymbol {w}}\|^{2}+C\sum _{i=1}^{n}\xi _{i}\\{\textrm {s.t.}}&\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y_{i})\rangle -\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle +\xi _{i}\geq \Delta (y_{i},y),\qquad i=1,\dots ,n,\quad \forall y\in {\mathcal {Y}}\end{array}}} == Inference == At test time, only a sample x ∈ X {\displaystyle {\boldsymbol {x}}\in {\mathcal {X}}} is known, and a prediction function f : X → Y {\displaystyle f:{\mathcal {X}}\to {\mathcal {Y}}} maps it to a predicted label from the label space Y {\displaystyle {\mathcal {Y}}} . For structured SVMs, given the vector w {\displaystyle {\boldsymbol {w}}} obtained from training, the prediction function is the following. f ( x ) = argmax y ∈ Y ⟨ w , Ψ ( x , y ) ⟩ {\displaystyle f({\boldsymbol {x}})={\underset {y\in {\mathcal {Y}}}{\textrm {argmax}}}\quad \langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}},y)\rangle } Therefore, the maximizer over the label space is the predicted label. Solving for this maximizer is the so-called inference problem and similar to making a maximum a-posteriori (MAP) prediction in probabilistic models. Depending on the structure of the function Ψ {\displaystyle \Psi } , solving for the maximizer can be a hard problem. == Separation == The above quadratic program involves a very large, possibly infinite number of linear inequality constraints. In general, the number of inequalities is too large to be optimized over explicitly. Instead the problem is solved by using delayed constraint generation where only a finite and small subset of the constraints is used. Optimizing over a subset of the constraints enlarges the feasible set and will yield a solution that provides a lower bound on the objective. To test whether the solution w {\displaystyle {\boldsymbol {w}}} violates constraints of the complete set inequalities, a separation problem needs to be solved. As the inequalities decompose over the samples, for each sample ( x i , y i ) {\displaystyle ({\boldsymbol {x}}_{i},y_{i})} the following problem needs to be solved. y n ∗ = argmax y ∈ Y ( Δ ( y i , y ) + ⟨ w , Ψ ( x i , y ) ⟩ − ⟨ w , Ψ ( x i , y i ) ⟩ − ξ i ) {\displaystyle y_{n}^{}={\underset {y\in {\mathcal {Y}}}{\textrm {argmax}}}\left(\Delta (y_{i},y)+\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle -\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y_{i})\rangle -\xi _{i}\right)} The right hand side objective to be maximized is composed of the constant − ⟨ w , Ψ ( x i , y i ) ⟩ − ξ i {\displaystyle -\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y_{i})\rangle -\xi _{i}} and a term dependent on the variables optimized over, namely Δ ( y i , y ) + ⟨ w , Ψ ( x i , y ) ⟩ {\displaystyle \Delta (y_{i},y)+\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle } . If the achieved right hand side objective is smaller or equal to zero, no violated constraints for this sample exist. If it is strictly larger than zero, the most violated constraint with respect to this sample has been identified. The problem is enlarged by this constraint and resolved. The process continues until no violated inequalities can be identified. If the constants are dropped from the above problem, we obtain the following problem to be solved. y i ∗ = argmax y ∈ Y ( Δ ( y i , y ) + ⟨ w , Ψ ( x i , y ) ⟩ ) {\displaystyle y_{i}^{}={\underset {y\in {\mathcal {Y}}}{\textrm {argmax}}}\left(\Delta (y_{i},y)+\langle {\boldsymbol {w}},\Psi ({\boldsymbol {x}}_{i},y)\rangle \right)} This problem looks very similar to the inference problem. The only difference is the addition of the term Δ ( y i , y ) {\displaystyle \Delta (y_{i},y)} . Most often, it is chosen such that it has a natural decomposition in label space. In that case, the influence of Δ {\displaystyle \Delta } can be encoded into the inference problem and solving for the most violating constraint is equivalent to solving the inference problem.

Transduction (machine learning)

In logic, statistical inference, and supervised learning, transduction or transductive inference is reasoning from observed, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training cases to general rules, which are then applied to the test cases. The distinction is most interesting in cases where the predictions of the transductive model are not achievable by any inductive model. Note that this is caused by transductive inference on different test sets producing mutually inconsistent predictions. Transduction was introduced in a computer science context by Vladimir Vapnik in the 1990s, motivated by his view that transduction is preferable to induction since, according to him, induction requires solving a more general problem (inferring a function) before solving a more specific problem (computing outputs for new cases): "When solving a problem of interest, do not solve a more general problem as an intermediate step. Try to get the answer that you really need but not a more general one.". An example of learning which is not inductive would be in the case of binary classification, where the inputs tend to cluster in two groups. A large set of test inputs may help in finding the clusters, thus providing useful information about the classification labels. The same predictions would not be obtainable from a model which induces a function based only on the training cases. Some people may call this an example of the closely related semi-supervised learning, since Vapnik's motivation is quite different. The most well-known example of a case-bases learning algorithm is the k-nearest neighbor algorithm, which is related to transductive learning algorithms. Another example of an algorithm in this category is the Transductive Support Vector Machine (TSVM). A third possible motivation of transduction arises through the need to approximate. If exact inference is computationally prohibitive, one may at least try to make sure that the approximations are good at the test inputs. In this case, the test inputs could come from an arbitrary distribution (not necessarily related to the distribution of the training inputs), which wouldn't be allowed in semi-supervised learning. An example of an algorithm falling in this category is the Bayesian Committee Machine (BCM). == Historical context == The mode of inference from particulars to particulars, which Vapnik came to call transduction, was already distinguished from the mode of inference from particulars to generalizations in part III of the Cambridge philosopher and logician W.E. Johnson's 1924 textbook, Logic. In Johnson's work, the former mode was called 'eduction' and the latter was called 'induction'. Bruno de Finetti developed a purely subjective form of Bayesianism in which claims about objective chances could be translated into empirically respectable claims about subjective credences with respect to observables through exchangeability properties. An early statement of this view can be found in his 1937 La Prévision: ses Lois Logiques, ses Sources Subjectives and a mature statement in his 1970 Theory of Probability. Within de Finetti's subjective Bayesian framework, all inductive inference is ultimately inference from particulars to particulars. == Example problem == The following example problem contrasts some of the unique properties of transduction against induction. A collection of points is given, such that some of the points are labeled (A, B, or C), but most of the points are unlabeled (?). The goal is to predict appropriate labels for all of the unlabeled points. The inductive approach to solving this problem is to use the labeled points to train a supervised learning algorithm, and then have it predict labels for all of the unlabeled points. With this problem, however, the supervised learning algorithm will only have five labeled points to use as a basis for building a predictive model. It will certainly struggle to build a model that captures the structure of this data. For example, if a nearest-neighbor algorithm is used, then the points near the middle will be labeled "A" or "C", even though it is apparent that they belong to the same cluster as the point labeled "B", compared to semi-supervised learning. Transduction has the advantage of being able to consider all of the points, not just the labeled points, while performing the labeling task. In this case, transductive algorithms would label the unlabeled points according to the clusters to which they naturally belong. The points in the middle, therefore, would most likely be labeled "B", because they are packed very close to that cluster. An advantage of transduction is that it may be able to make better predictions with fewer labeled points, because it uses the natural breaks found in the unlabeled points. One disadvantage of transduction is that it builds no predictive model. If a previously unknown point is added to the set, the entire transductive algorithm would need to be repeated with all of the points in order to predict a label. This can be computationally expensive if the data is made available incrementally in a stream. Further, this might cause the predictions of some of the old points to change (which may be good or bad, depending on the application). A supervised learning algorithm, on the other hand, can label new points instantly, with very little computational cost. == Transduction algorithms == Transduction algorithms can be broadly divided into two categories: those that seek to assign discrete labels to unlabeled points, and those that seek to regress continuous labels for unlabeled points. Algorithms that seek to predict discrete labels tend to be derived by adding partial supervision to a clustering algorithm. Two classes of algorithms can be used: flat clustering and hierarchical clustering. The latter can be further subdivided into two categories: those that cluster by partitioning, and those that cluster by agglomerating. Algorithms that seek to predict continuous labels tend to be derived by adding partial supervision to a manifold learning algorithm. === Partitioning transduction === Partitioning transduction can be thought of as top-down transduction. It is a semi-supervised extension of partition-based clustering. It is typically performed as follows: Consider the set of all points to be one large partition. While any partition P contains two points with conflicting labels: Partition P into smaller partitions. For each partition P: Assign the same label to all of the points in P. Of course, any reasonable partitioning technique could be used with this algorithm. Max flow min cut partitioning schemes are very popular for this purpose. === Agglomerative transduction === Agglomerative transduction can be thought of as bottom-up transduction. It is a semi-supervised extension of agglomerative clustering. It is typically performed as follows: Compute the pair-wise distances, D, between all the points. Sort D in ascending order. Consider each point to be a cluster of size 1. For each pair of points {a,b} in D: If (a is unlabeled) or (b is unlabeled) or (a and b have the same label) Merge the two clusters that contain a and b. Label all points in the merged cluster with the same label. === Continuous Label Transduction === These methods seek to regress continuous labels, often via manifold learning techniques. The idea is to learn a low-dimensional representation of the data and infer values smoothly across the manifold. == Applications and related concepts == Transduction is closely related to: Semi-supervised learning – uses both labeled and unlabeled data but typically induces a model. Case-based reasoning – such as the k-nearest neighbor (k-NN) algorithm, often considered a transductive method. Transductive Support Vector Machines (TSVM) – extend standard SVMs to incorporate unlabeled test data during training. Bayesian Committee Machine (BCM) – an approximation method that makes transductive predictions when exact inference is too costly.

GPT-5.3-Codex

GPT-5.3-Codex (Generative Pre-trained Transformer 5.3 Codex) is a large language model (LLM) announced and released by OpenAI on February 5, 2026. It is made as a competitor to Claude's Opus 4.6, focusing on code generation, speed and the ability to search repositories, run terminal commands and at the same time, debug code. In technical benchmarks, it is reported that GPT-5.3 Codex is 25% faster than Opus 4.6. GPT-5.3 Codex is available in the Codex app and on the web; access via API is also planned. According to OpenAI, GPT-5.3-Codex is the company's "first model that was instrumental in creating itself." On February 12, 2026, GPT-5.3-Codex-Spark was released in a research preview, which is a smaller version of GPT-5.3-Codex which supports text-only input. As of February 2026, GPT-5.3-Codex is only available for ChatGPT Pro ($200/month) subscribers.

Responsible AI Safety and Education Act

The Responsible AI Safety and Education Act (RAISE Act) is a New York State law that imposes transparency, safety, and reporting requirements on developers of large frontier artificial intelligence models. The law was signed by Governor Kathy Hochul on December 19, 2025. It was sponsored by State Senator Andrew Gounardes and Assemblymember Alex Bores. The RAISE Act is the second U.S. state law to regulate frontier AI model developers, following California's Transparency in Frontier Artificial Intelligence Act (TFAIA), which was signed in September 2025. Hochul signed the bill on the condition that the legislature would pass chapter amendments to bring the law closer to the California model. The amending bills (A9449/S8828) were introduced in January 2026; as of February 2026 they remain in committee, though the Governor's office and legal commentators treat the agreed-upon amendments as representing the final form of the law. == Provisions == The following describes the RAISE Act as it is expected to operate after the agreed-upon chapter amendments take effect. The law is expected to take effect on January 1, 2027. === Scope === The law applies to "large frontier developers," defined as companies with annual revenues exceeding $500 million that develop "frontier models," which are foundation models trained using more than 1026 floating-point operations (FLOPs). The version passed by the legislature in June 2025 had instead defined large developers based on having spent over $100 million in aggregate compute costs, and also included a provision prohibiting deployment of frontier models posing "unreasonable risk of critical harm"; both were removed as part of the negotiations between Hochul and the legislature. Accredited colleges and universities engaged in academic research are exempt, as is the state's Empire AI consortium. === Safety and transparency framework === Large frontier developers must write, implement, and publicly publish a "frontier AI framework" describing how they assess and mitigate catastrophic risks, secure unreleased model weights against unauthorized access, use third-party evaluators, govern internal use of frontier models, and respond to safety incidents. The framework must describe these measures "in detail," a requirement that goes beyond the California TFAIA's requirement to describe a developer's "approach." The framework must be reviewed at least annually, and material modifications must be published with justification within 30 days. Before or concurrently with deploying a new or substantially modified frontier model, developers must publish a transparency report including the model's release date, supported languages and output modalities, intended uses, and any restrictions on use. Large frontier developers must additionally include summaries of catastrophic risk assessments and the extent of third-party involvement. === Catastrophic risk and incident reporting === The law defines "catastrophic risk" as a foreseeable and material risk that a frontier model will contribute to the death of or serious injury to more than 50 people, or more than $1 billion in property damage, arising from a frontier model providing expert-level assistance in creating chemical, biological, radiological, or nuclear weapons; engaging in cyberattacks or conduct equivalent to crimes such as murder, assault, or theft without meaningful human oversight; or evading the control of its developer or user. Loss of equity value is explicitly excluded from the definition of property damage. "Critical safety incidents" include unauthorized access to model weights resulting in death or injury, materialization of a catastrophic risk, loss of control of a frontier model causing death or injury, and a model using deceptive techniques to subvert developer controls outside of an evaluation context in a manner that increases catastrophic risk. Frontier developers must report critical safety incidents within 72 hours, or within 24 hours if the incident poses an imminent risk of death or serious physical injury. === Enforcement === The chapter amendments establish a new office within the New York State Department of Financial Services to oversee compliance, receive incident reports, and publish annual reports on AI safety beginning in 2028. Large frontier developers must file disclosure statements with this office and pay pro rata assessments to fund its operations. The New York Attorney General may bring civil actions, with penalties of up to $1 million for a first violation and $3 million for subsequent violations. The version passed by the legislature in June 2025 had set penalties at up to $10 million and $30 million respectively. The law does not create a private right of action. == Legislative history == The bill was introduced in the Assembly on March 5, 2025, by Assemblymember Alex Bores, and in the Senate on March 27, 2025, by Senator Andrew Gounardes. After a series of amendments, the legislature passed the bill in June 2025. Governor Hochul did not immediately sign the bill, using nearly all the time available under New York law before acting; had she not signed by the end of 2025, the bill would have been pocket vetoed. The tech industry lobbied against the bill during this period, and Hochul initially proposed a near-complete rewrite modeled on California's TFAIA. Legislators resisted the extent of the changes, and the two sides ultimately agreed on a version that used the California law as a base but preserved several provisions that went beyond it, including the 72-hour incident reporting timeline and the creation of a dedicated enforcement office. Hochul signed the original bill (S6953-B/A6453-B) on December 19, 2025, with the legislature committing to pass chapter amendments formalizing the agreed changes in the January 2026 session. The amending bills (A9449 in the Assembly, S8828 in the Senate) were introduced on January 6 and January 8, 2026. OpenAI and Anthropic expressed support for the law. Anthropic's head of external affairs Sarah Heck said the two state laws "should inspire Congress to build on them." The super PAC network Leading the Future, backed by Andreessen Horowitz and OpenAI president Greg Brockman, subsequently announced plans to challenge Bores in a future election. == Federal preemption debate == Hochul signed the RAISE Act eight days after President Donald Trump issued an executive order on December 11, 2025, directing the Department of Justice to challenge state AI laws deemed to conflict with a "minimally burdensome" national AI policy. On January 9, 2026, the Department of Justice announced the establishment of an AI Litigation Task Force as called for by the executive order. The executive order also threatened states with loss of certain federal broadband funding if their AI laws were found to be onerous. Legal commentators have noted several potential avenues for federal challenge, including arguments that the law constitutes compelled speech, violates the dormant Commerce Clause by creating a patchwork of state regulations, or is preempted by federal AI policy. == Comparison with California's TFAIA == The RAISE Act was designed to align with California's Transparency in Frontier Artificial Intelligence Act, signed on September 29, 2025. Both laws use the same 1026 FLOP threshold to define frontier models and the same $500 million revenue threshold to define large developers. Both require public safety frameworks, transparency reports, and incident reporting. The RAISE Act's 72-hour incident reporting window is stricter than the TFAIA's 15-day window, though both require faster reporting for incidents posing imminent physical risk (24 hours under the RAISE Act, immediate under the TFAIA). The RAISE Act establishes a dedicated enforcement office within the Department of Financial Services, whereas California routes reports through the Office of Emergency Services. The RAISE Act requires developers to describe their safety measures "in detail" and how they "handle" various risks, whereas the TFAIA requires developers to describe their "approach."

Eimear Kenny

Eimear E. Kenny is a researcher in population genetics and translation genomics, and is the Founding Director of the Institute for Genomic Health, and Endowed Chair and Professor of Genomic Health at the Icahn School of Medicine at Mount Sinai. She is known for novel approaches in computational genomics, advancing the study of human genetic variation and its connection to disease risk and diagnosis. Her research has laid the foundation for integrating artificial intelligence (AI) and genomics into precision medicine and routine clinical care. By combining genomics, computer science, and medicine, her work leverages genomic sequencing technologies and machine learning algorithms to uncover insights that improve patient care, accelerate genomic data analysis, and enable the future of AI-driven healthcare. She has led multiple genomics-based clinical trials, applying computational biology and AI in clinical settings to advance genomic medicine and precision healthcare. == Research == A recipient of the Early-Career Award from the American Society of Human Genetics (USA), Kenny, as of 2024, leads a team in genetics, computer science, and medicine, focusing on genetic ancestry, large-scale genomics, clinical trials, and genomic medicine at the Institute for Genomic Health. The lab works to advance understanding of genetic ancestry and its impact on health in order to inform better clinical medicine models. She is recognized for her work to leverage biobanks for translational genomics and her development of new genetic tests an strategies for health care management. In one study, she and her colleagues investigated genetic disorders that might be under-diagnosed due to insufficient data, and found a variant in a collagen gene associated with Steel syndrome. This syndrome caused short stature and bone and joint issues and was thought to be rare. However, the study revealed it is common in individuals with Puerto Rican ancestry. Three of Kenny's genomic medicine clinical trials assessed how to bring new technology, such as digital apps, or information, such as polygenic risk scores, into routine clinical care. In the 2010s, Kenny was instrumental in several large-scale sequencing studies, including the 1000 Genomes Project, the Exome Sequencing Project, the Genome Sequencing Project, and the Trans-Omics for Precision Medicine. In 2012, she led work that discovered the variant responsible for blond hair in Melanesia, work that was featured in the Smithsonian NHGRI Human Genome Exhibit in Washington, D.C. In 2017, her group was one of the first to demonstrate that polygenic risk scores derived in predominantly European populations have reduced accuracy when applied in populations now widely acknowledged as a major challenge in the field of genomic risk prediction. As of 2024, she is Principal Investigator in many NIH-funded international consortium focused on computational genomics and genomic medicine, including Electronic Medical Records and Genomics, Polygenic Risk Methods in Diverse Populations, and the Human Pangenome Reference Consortium. In 2023, Kenny played a key role in a groundbreaking advancement in genomics research by helping to map a diverse human pangenome—a major shift from reliance on a single reference genome. Unlike the earlier genetic map, based on one man of mixed European and African ancestry in Buffalo, this new pangenome project captures far greater human genetic diversity. As reported by The Washington Post, Kenny's work demonstrates how a more inclusive human genome can drive discoveries in rare genetic diseases, improve genomic medicine, and accelerate the future of precision healthcare. Kenny was co-developer and current license holder for Random Forest adMIXture (RFMix), a patented software for inferring continental and sub-continental ancestry at genomic loci. == Education and career == Kenny graduated from Trinity College Dublin with a BA in Biochemistry in 1999 and did a masters in Bioinformatics at Leeds University. She received her PhD in Computational Genomics at Rockefeller University, and did her post-doctoral work in the lab of Dr. Carlos D. Bustamante at Stanford University. === Academic appointments === As of 2024, at Mount Sinai, she serves as the Endowed Chair and Professor of Genomic Health, Professor at the Department of Medicine and Professor at the Department of Genetics and Genomic Sciences. Since 2018 she has served as the Founding Director of the Institute for Genomic Health, and since 2022, she also serves as the Founding Director of the Center for Translational Genomics. She is also the Director of Translational Research, Division for Genomic Medicine. Former appointments include Assistant Professor at the Department of Genetics and Genomic Sciences and Member at The Charles Bronfman Institute of Personalized Medicine, both at Mount Sinai. She was also Bioinformatics Programmer at the California Institute of Technology, and research assistant at the Massachusetts Institute of Technology. == Publications == As of 2024, Kenny is an advisor to Cell Genomics. Google Scholar reports 50,623 citations, an h-index of 66 and an i10-index of 130. The five most-cited articles she contributed to are: Auton, A; Brooks, LD; Durbin, RM; Garrison, EP; Kang, HM; Korbel, JO; Marchini, JL; McCarthy, S; McVean, GA; Abecasis, GR (2015). "A global reference for human genetic variation". Nature. 526 (7571): 68–74. Bibcode:2015Natur.526...68T. doi:10.1038/nature15393. PMC 4750478. PMID 26432245.. Cited by 14847 Abecasis, GR; Auton, A; Brooks, LD; DePristo, MA; Durbin, RM; Handsaker, RE; Kang, HM; Marth, GT; McVean, GA (2012). "An integrated map of genetic variation from 1,092 human genomes". Nature. 491 (7422): 56–65. Bibcode:2012Natur.491...56T. doi:10.1038/nature11632. PMC 3498066. PMID 23128226.. Cited by 8287 Jacob A. Tennessen et al. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes.Science337,64–69(2012).DOI:10.1126/science.1219240 Cited by 1886 Taliun, D.; Harris, D.N.; Kessler, M.D.; et al. (2021). "Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program". Nature. 590 (7845): 290–299. Bibcode:2021Natur.590..290T. doi:10.1038/s41586-021-03205-y. PMC 7875770. PMID 33568819.. Cited by 1369 Vilhjálmsson, BJ; et al. (2015). "Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores". Am J Hum Genet. 97 (4): 576–92. doi:10.1016/j.ajhg.2015.09.001. PMC 4596916. PMID 26430803.. Cited by 1327

WomanStats Project

The WomanStats Project is a donor-funded research and database project housed at Brigham Young University that "seeks to collect detailed statistical data on the status of women around the world, and to connect that data with data on the security of states." The WomanStats Database aims to provide a comprehensive compilation of information on the status of women in the world. Coders comb the extant literature and conduct expert interviews to find qualitative and quantitative information on over 300 indicators of women's status in 174 countries with populations of at least 200,000. Access to the online database is free. == History and structure == WomanStats began as an outgrowth of a paper Dr. Valerie M. Hudson (of the Brigham Young University Political Science department) and one of her graduate students, Andrea den Boer, published in International Security on the association between national security and the abnormal sex ratio in Asia. After the success and influence of their first article, (later added as one of their top twenty national security articles of that journal of all time), Hudson and den Boer did further research on the connection between the status of women and national security, but found that there was no single database that covered the range of topics that they needed for their research. Consequently, they began compiling information on variables regarding the status of women around the world. The database was officially formed in 2001 and grew exponentially as it later added more variables. The Project went live on the Internet in July 2007. The principal investigators are: Valerie M. Hudson (International Relations), Bonnie Ballif-Spanvill (Psychology, emeritus), and Chad F. Emmett (Geography) all from Brigham Young University, Mary Caprioli from the University of Minnesota, Duluth (International Relations), Rose McDermott from Brown University (International Relations), Andrea Den Boer from the University of Kent at Canterbury in the United Kingdom (International Relations) and S. Matthew Stearmer from the Ohio State University (Sociology; doctoral student). Approximately a dozen undergraduate and graduate students at Brigham Young University and Texas A&M University work at any one time as coders for the project. The coders take the raw quantitative and qualitative data collected in government reports, news articles, research papers, etc. and sort the applicable information on women into categories. They may also implement scales developed by the principal investigators, or that they (the students) themselves have developed. == Database == As of February 2011, the database has 307 variables, covers 174 nations with populations over 200,000, uses 18,015 sources and contains over 111,000 individual data points. All data is referenced to original sources. Not every variable has information for each country; similarly, not all countries have information for each variable: overall, about 70% of country-variable combinations have information. These database coding gaps exist where information is not available or is incomplete, or variables are not collected and reported by governments or international organizations. At times, information from different sources may be contradictory, and the WomanStats Database records this discrepant information for triangulation purposes. == Users and role of the database == The database is meant to help fill a hole in the extant data on the situation of women around the world. WomanStats data and research has been vetted and/or used by the United Nations, the United States Department of Defense, the Central Intelligence Agency, and the World Bank. Their data and research were also used by the United States Senate Committee on Foreign Relations in crafting the International Violence Against Women’s Act. The Inter-Agency Network on Women and Gender Equality (IANWGE) of the United Nations has stated that the WomanStats project "filled a major gap in the availability of data on women" (2007). Victor Asal and Mitchell Brown, researchers not affiliated with WomanStats, stated in an article published in Politics and Policy that "one of the most significant challenges of cross-national empirical studies of the prevalence of interpersonal violence is the paucity of available data, particularly reliable data," and that "WomanStats has allowed for an important first glimpse at analyzing the factors related to interpersonal violence." They conclude by stating that "Our findings suggest that, in the same way that larger disciplinary resources have invested in interstate and intrastate war, disciplinary resources need to be expended in creating a data set exploring interpersonal violence. Until the rights and the lives of women and children are taken as seriously as the survival of states by more proactively collaborating on projects like WomanStats, we will continue to only have a small lens through which to understand problems like this." Princeton University professor Evan S. Liberman wrote, "Although data on political regimes and group conflict have been in far greater demand by political scientists than data on gender politics and policies, two gender-related databases provide...examples of innovative HIRDs. Both the Womanstats database project (Hudson et al. 2009) and the Research Network on Gender Politics and the State (RNGS) project (McBride et al. 2008) are well-integrated presentations of quantitative and qualitative data characterizing the quality of gender relations around the world and, in particular, analytic descriptions of the treatment of women."." == Research == The research component of WomanStats focuses on exploring the relationship between the situation of women and the behavior and security of states. Current research initiatives include: Exploring the relationship between violent instability and inequity and family law. Examining the effect of polygyny and marriage market dislocations on the rise of suicide terrorism. Documenting discrepancies between laws on the books and cultural practices on the ground concerning gender issues. Investigating how well the situation of women predicts the peacefulness of nations-states, compared to their variables such as democracy, wealth, and civilization. The Project has published articles in International Security, International Studies Quarterly, Peace and Conflict, Journal of Peace Research, Political Psychology, Cumberland Law Review, and World Political Review, and has a forthcoming book from Columbia University Press.

Brain.js

Brain.js is a JavaScript library used for neural networking, which is released as free and open-source software under the MIT License. It can be used in both the browser and Node.js backends. Brain.js is most commonly used as a simple introduction to neural networking, as it hides complex mathematics and has a familiar modern JavaScript syntax. It is maintained by members of the Brain.js organization and open-source contributors. == Examples == Creating a feedforward neural network with backpropagation: Creating a recurrent neural network: Train the neural network on RGB color contrast: