Thoughts on the conference
My overall impression of AISTATS was that its a really good conference if statistical ML is what you are interested/working in. I learnt a fair bit about the basics of unknown topics and made a few friends who work in widely different areas. Its a single track conference so there’s more interaction among people and more exposure to the oral sessions. There’s barely any presence of industry (either in sponsors or papers) except for Google Research/Brain/Deepmind papers. And thankfully, AISTATS community seems to not be onboard the hype train (yet) and there’s ample time to ask questions in the poster session.
I am not really a statistical ML person and there were very few people who understood robotics/RL, in general. People were definitely more interested in the theoretical analysis than major takeaways in the papers, and this was reflected in the posters which had insane amounts of math in them (not that its a bad thing, but it makes newcomers to the field not understand anything about the work without asking an excessively large number of questions)
Interesting Papers
I will try to list the few papers that I liked on each day with a line or two about what I liked in them.
Day 1:
- Pathwise Derivatives for Multivariate Distributions : An interesting new take on the reparameterization trick and showing how its actually solving a differential equation to get the new estimator. Given the insight, you can design a new estimator by minimizing the variance and in simple cases, you obtain a more accurate gradient estimate than what you get by the reparameterization trick.
- Exponential convergence rates for Batch Normalization : Another paper along the lines of the Madry paper explaining what Batch Normalization is really doing (and no, its not tackling the internal covariate shift problem)
- Sequential Neural Likelihood : If you have a simulator with parameters that need to be tuned and real data, how do you get MLE estimates without knowing the likelihood function. Looks relevant to robotics problems, however they have a proposal distribution on the data which could be a bad idea if data is high-dimensional (which they dont test on)
- Active Exploration in MDPs : Got small parts of the poster but basically similar work to what Sham and Hazan’s new paper does : how do you explore an MDP efficiently if you don’t know the reward function and dynamics. Sham took a MaxEnt approach, these guys take a different approach
Day 2:
- Does data interpolation contradict statistical optimality: A mikhail belkin paper which had no one standing in front of the poster. Very math-y poster which I could barely understand. Need to read the paper but the premise seems to be very interesting.
- Negative Momentum for improved game dynamics: Tackling the oscillating behavior of minmax games (like GANs). Using negative momentum for both steps gives you an extra-gradient like step where the iterate takes an inner step closer towards the limit point. Interesting
- Distributional RL with linear function approximation: Had a long chat with Marc Bellemare about distributional RL and why it should work? He is super convinced by empirical results that it is a great idea however he admits there’s no theory backing it up. This paper and a subsequent one show that, in theory and practice, if you use a linear function approximation you will do worse with distributional RL than good old point estimate RL. Marc claims having deep nonlinear models are essential for good performance (no theory yet)
- The termination critic: Paper from Doina Precup’s group about discovery of options. How do you force the agent to discover options on its own? You train a separate critic that tries to predict the termination of an option and you train it by forcing any policy that reliably reaches a part of the state space to become an option. Need to read further
Day 3:
- Derivative-free methods for policy optimization : Very similar work to what I did, except they analyze the LQR setting and are more concerned with stabilizing controllers. Turns out the objective in LQR (non-convex) satisfies a strong growth condition which makes it “almost” strongly-convex so zeroth-order methods have really quick convergence to global optima with high probability.
- Model-free LQ control via reduction to expert prediction: Yasin Abbasi and Csaba Szepesvari paper. Looked interesting but a very math-y poster with no one standing beside it :( Will need to read it
- Sample-efficient IL via GANs : GAIL with an actor-critic replacing the TRPO. However, the whole way they deal with off-policy samples (which have reward as computed from several iterations ago) is very hack-y but it has a very good empirical performance (1 or 2 orders of magnitude less number of interactions with environment when compared to GAIL)
- Distilling Policy Distillation : Deepmind paper. Apparently, there’s a term called ‘policy distillation’ in RL literature which basically means doing RL with a few not-so-optimal expert demonstrations. This paper tries to give a common framework to understand the numerous policy distillation approaches proposed in the past. Might be worth a read
- Theoretical analysis of efficiency and robustness of softmax and gap-increasing operators in RL : Why VI fails in the presence of function approximation and why approaches like mellowmax and advantage learning help. The one poster I couldn’t go to since my own poster was today. Need to read