BEGIN:VCALENDAR
VERSION:2.0
PRODID:icalendar-ruby
CALSCALE:GREGORIAN
X-WR-CALNAME:MIT AI for Code and Science workshop - day 1
X-WR-TIMEZONE:Eastern Time (US & Canada)
BEGIN:VEVENT
DTSTAMP:20260520T160029Z
UID:tag:localist.com\,2008:EventInstance_42042296578655
DTSTART;VALUE=DATE:20230202
DESCRIPTION:AI for code and science presents a very active area of current 
 research that bridges multiple areas\, including machine learning\, progra
 mming languages\, and software engineering. A substantial portion of recen
 t work in this domain has found success by blending symbolic and neural te
 chniques. \n\nIn this 2-day tutorial session\, students will get practical
  experience in artificial intelligence tools to develop code targeted for 
 science applications. Students will learn to combine neural network method
 s and program synthesis (symbolic) and how to apply these techniques in sc
 ience. Students wishing to take part should have some programming experien
 ce. \n\nAs part of the session\, we will provide a practical overview of s
 ystems/tools to get started in interesting AI for code and science applica
 tions. Students will also hear talks from researchers in the space. Hands-
 on activities may include using popular transformer-based models such as C
 odeBERT\, CodeT5\, and Codex.  We will touch on recent ideas that can be a
 pplied to improve/adapt each of these models to computer science problems 
 such as program repair. This tutorial will be hands-on\, with time for par
 ticipants to play and experiment with working code\, try to solve real ben
 chmark cases and get feedback on ideas they may want to pursue in the futu
 re.\n\nThere are limited seats for this activity\, please express your int
 erest by filling this form: https://forms.gle/3koDexDjTpDmWcVp6\n\nDay 1\n
 \n 9am-12pm\n\nWorkshop 1: Neurosymbolic programming for Science with appl
 ications for molecular generation in chemistry\n\nSpeakers: Omar Costilla-
 Reyes and Minghao Guo\, MIT CSAIL\n\nAbstract: Neurosymbolic Programming (
 NP) techniques have the potential to accelerate scientific discovery. Thes
 e models combine neural and symbolic components to learn complex patterns 
 and representations from data\, using high-level concepts or known constra
 ints. NP techniques can interface with symbolic domain knowledge from scie
 ntists\, such as prior knowledge and experimental context\, to produce int
 erpretable outputs. In this hansds-on tutorial we explore applications of 
 neurosymbolic programming in health and biology. \n\nThe problem of molecu
 lar generation has received significant attention recently. Existing metho
 ds are typically based on deep neural networks and require training on lar
 ge datasets with tens of thousands of samples. In practice\, however\, the
  size of class-specific chemical datasets is usually limited (e.g.\, dozen
 s of samples) due to labor-intensive experimentation and data collection. 
 Another major challenge is to generate only physically synthesizable molec
 ules. This is a non-trivial task for neural network-based generative model
 s since the relevant chemical knowledge can only be extracted and generali
 zed from the limited training data. In this tutorial\, we explore a data-e
 fficient neurosymbolic generative model that can be learned from datasets 
 with orders of magnitude smaller sizes than common benchmarks. At the hear
 t of this method is a learnable graph grammar that generates molecules fro
 m a sequence of production rules. Additional chemical knowledge can be inc
 orporated in the model by further grammar optimization.\n\n12-1pm\n\n Lunc
 h \n\n1-4pm\n\nWorkshop 2: An Introduction to Symbolic Regression with PyS
 R and SymbolicRegression.jl\n\nSpeaker: Miles Cranmer\, Princeton\n\nAbstr
 act: PySR (https://github.com/MilesCranmer/PySR) is an open-source library
  for practical symbolic regression\, a type of machine learning that disco
 vers human-interpretable symbolic models in the form of simple mathematica
 l expressions. PySR is built on a high-performance distributed backend\, S
 ymbolicRegression.jl\, which offers a flexible search algorithm\, and inte
 rfaces with several deep learning packages. In this tutorial I will descri
 be the nuts and bolts of the search algorithm and how PySR may be used in 
 machine learning and scientific workflows. I will review existing applicat
 ions of the software to science (https://astroautomata.com/PySR/papers/)\,
  and then present an interactive coding tutorial where we will go through 
 several example symbolic regression problems with different levels of cust
 omization. Following this\, we will look at using PySR as a distillation t
 ool for translating deep neural networks into an interpretable scientific 
 language\, and go through additional examples.\n\n \n\nDay 2\n\n 9am-12pm\
 n\nWorkshop 3: Learning to automatically fix compiler errors in C\n\nSpeak
 er: Jose Cambronero\, Microsoft\n\n \n\nAbstract: In this tutorial\, we wi
 ll introduce participants to the automated repair of compiler errors. We w
 ill focus our efforts on a collection of C programs written by students in
  an introductory programming class. We will explore different neural appro
 aches to fixing such compiler errors\, including large pretrained language
  models and smaller fine-tuned models. By the end of this tutorial\, parti
 cipants will have practical experience with multiple repair approaches\, p
 ointers towards extensions/improvements of the approaches surveyed\, and a
  foundation to explore automatically repairing such errors in their own re
 search.\n\n 12-1pm\n\nLunch \n\n1-4pm\n\nWorkshop 4: Generating code that 
 activates our brains\nSpeaker: Shashank Srikant\, MIT CSAIL\n \n\nAbstract
 : In this tutorial\, we will introduce how our brains respond to code comp
 rehension. Further\, we will explore how a program can be automatically mo
 dified\, such that the modified program predicts high responses in specifi
 c regions of our brains. The system we will build to achieve this will int
 roduce and utilize backpropagation and the Gumbel softmax trick. We will h
 ack through some popular code models available on Huggingface to build thi
 s system. \n\n Requirements:\n\nLaptopSome programming experience \n\nOrga
 nizers:\n\nOmar Costilla-Reyes\, MIT CSAIL\n\nJose Cambronero\, Microsoft 
 Research
GEO:42.361965;-71.090261
LOCATION:Building 32\, tba
SUMMARY:MIT AI for Code and Science workshop - day 1
URL;VALUE=URI:https://calendar.mit.edu/event/mit_ai_for_code_and_science_wo
 rkshop
END:VEVENT
END:VCALENDAR
