About this Event
32 VASSAR ST, Cambridge, MA 02139
https://forms.gle/3koDexDjTpDmWcVp6AI for code and science presents a very active area of current research that bridges multiple areas, including machine learning, programming languages, and software engineering. A substantial portion of recent work in this domain has found success by blending symbolic and neural techniques.
In this 2-day tutorial session, students will get practical experience in artificial intelligence tools to develop code targeted for science applications. Students will learn to combine neural network methods and program synthesis (symbolic) and how to apply these techniques in science. Students wishing to take part should have some programming experience.
As part of the session, we will provide a practical overview of systems/tools to get started in interesting AI for code and science applications. Students will also hear talks from researchers in the space. Hands-on activities may include using popular transformer-based models such as CodeBERT, CodeT5, and Codex. We will touch on recent ideas that can be applied to improve/adapt each of these models to computer science problems such as program repair. This tutorial will be hands-on, with time for participants to play and experiment with working code, try to solve real benchmark cases and get feedback on ideas they may want to pursue in the future.
There are limited seats for this activity, please express your interest by filling this form: https://forms.gle/3koDexDjTpDmWcVp6
Day 1
9am-12pm
Workshop 1: Neurosymbolic programming for Science with applications for molecular generation in chemistry
Speakers: Omar Costilla-Reyes and Minghao Guo, MIT CSAIL
Abstract: Neurosymbolic Programming (NP) techniques have the potential to accelerate scientific discovery. These models combine neural and symbolic components to learn complex patterns and representations from data, using high-level concepts or known constraints. NP techniques can interface with symbolic domain knowledge from scientists, such as prior knowledge and experimental context, to produce interpretable outputs. In this hansds-on tutorial we explore applications of neurosymbolic programming in health and biology.
The problem of molecular generation has received significant attention recently. Existing methods are typically based on deep neural networks and require training on large datasets with tens of thousands of samples. In practice, however, the size of class-specific chemical datasets is usually limited (e.g., dozens of samples) due to labor-intensive experimentation and data collection. Another major challenge is to generate only physically synthesizable molecules. This is a non-trivial task for neural network-based generative models since the relevant chemical knowledge can only be extracted and generalized from the limited training data. In this tutorial, we explore a data-efficient neurosymbolic generative model that can be learned from datasets with orders of magnitude smaller sizes than common benchmarks. At the heart of this method is a learnable graph grammar that generates molecules from a sequence of production rules. Additional chemical knowledge can be incorporated in the model by further grammar optimization.
12-1pm
Lunch
1-4pm
Workshop 2: An Introduction to Symbolic Regression with PySR and SymbolicRegression.jl
Speaker: Miles Cranmer, Princeton
Abstract: PySR (https://github.com/MilesCranmer/PySR) is an open-source library for practical symbolic regression, a type of machine learning that discovers human-interpretable symbolic models in the form of simple mathematical expressions. PySR is built on a high-performance distributed backend, SymbolicRegression.jl, which offers a flexible search algorithm, and interfaces with several deep learning packages. In this tutorial I will describe the nuts and bolts of the search algorithm and how PySR may be used in machine learning and scientific workflows. I will review existing applications of the software to science (https://astroautomata.com/PySR/papers/), and then present an interactive coding tutorial where we will go through several example symbolic regression problems with different levels of customization. Following this, we will look at using PySR as a distillation tool for translating deep neural networks into an interpretable scientific language, and go through additional examples.
Day 2
9am-12pm
Workshop 3: Learning to automatically fix compiler errors in C
Speaker: Jose Cambronero, Microsoft
Abstract: In this tutorial, we will introduce participants to the automated repair of compiler errors. We will focus our efforts on a collection of C programs written by students in an introductory programming class. We will explore different neural approaches to fixing such compiler errors, including large pretrained language models and smaller fine-tuned models. By the end of this tutorial, participants will have practical experience with multiple repair approaches, pointers towards extensions/improvements of the approaches surveyed, and a foundation to explore automatically repairing such errors in their own research.
12-1pm
Lunch
1-4pm
Workshop 4: Generating code that activates our brains
Speaker: Shashank Srikant, MIT CSAIL
Abstract: In this tutorial, we will introduce how our brains respond to code comprehension. Further, we will explore how a program can be automatically modified, such that the modified program predicts high responses in specific regions of our brains. The system we will build to achieve this will introduce and utilize backpropagation and the Gumbel softmax trick. We will hack through some popular code models available on Huggingface to build this system.
Requirements:
Organizers:
Omar Costilla-Reyes, MIT CSAIL
Jose Cambronero, Microsoft Research
+ 8 People interested in event