My personal list of journals I use for my research and projects where I wrote one-sentence summaries.
Machine Learning Research Conferences and Journals
- ICLR
- IJCAI
- JAIR
- NIPS
- Journal of Machine Learning Research
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- Artificial Intelligence
- Machine Learning
Deep Reinforcement Learning
- Playing Atari with Deep Reinforcement Learning
- Continuous Control with Deep Reinforcement Learning
- Deterministic Policy Gradient Algorithms
- Actor-Critic Methods
- Summary: an actor neural network would determine the actions (student) while the critic neural network would evaluate the actor’s actions (teacher)
- Progressive Neural Network, Reinforcment Learning Context
- Summary: adding columns for each new task results in better transfer learning compared to partial or complete fine-tuning which causes catastrophic forgetting
Deep Convolutional Neural Networks
- Wide Residual Networks
- Summary: a variation of residual networks where width over depth has shown better performance
- SqueezeNet
- Summary: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
Deep Neural Networks
- A shared neural ensemble links distinct contextual memories encoded close in time
- Summary: spatial memories that are acquired near in time are associated with overlapping neuronal ensembles in the brain’s hippocampus
- Memories linked within a window of time
- Summary: a theory called temporal context memory (TCM) explains why people have a better memory for words that occur close together in a list than for words that are further apart
- Learning Step Size Controllers for Robust Neural Network Training
- Summary: identifying informative states, using the states for learning step size and showing generalization to different tasks
- Weight Features for Predicting Future Model Performance of Deep Neural Networks
- Summary: using statistics of weights instead of actual weights
- Compete to Compute
- Summary: using competing linear units to outperform non-competing nonlinear units and avoid catastrophic forgetting when training sets change over time
- HyperNetworks
- Summary: using a HyperLSTMCell over BasicLSTM cell by using a small number of parameters (small LSTM) to generate a large number of parameters (larger LSTM)
- Non-Local Interaction via Diffusible Resource Prevents Coexistence of Cooperators and Cheaters in a Lattice Model
- Decoupled Neural Interfaces using Synthetic Gradients
- Summary: by modelling error gradients (synthetic gradients), we can decouple subgraphs and update them independently and asynchronously
- Distilling the Knowledge in a Neural Network
- Summary: using soft targets instead of hard targets, we can achieve similar performance from a much smaller network than a large network where we learned the soft targets from
Hyper-parameter Optimization
- Learning to learn by gradient descent by gradient descent
- Summary: learning an optimization algorithm that works on a class of optimization problems by parameterizing the optimizer
- Direct Feedback Alignment Provides Learning in Deep Neural Networks
- Summary: an alternative to error backpropagation by propagating the error through fixed random feedback connections directly from the output layer to each hidden layer
- DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks
- Summary: using a convex combination of the starting and ending points to accelerate convergence
- Gradient-based Hyperparameter Optimization through Reversible Learning
- Summary: tuning hyperparameters by casting them as a learning problem
Deep Recurrent Neural Networks
- HyperNetworks
- Summary: using a small LSTM to generate a large LSTM for substantial model compression
- Exploring Sparsity in RNN
- Summary: model size can be reduced by 90% and speed-up is around 2× to 7× while maintain accuracy by pruning weights during the initial training of the network