Machine Learning Journal Library

My personal list of journals I use for my research and projects where I wrote one-sentence summaries.

Machine Learning Research Conferences and Journals

Deep Reinforcement Learning

Playing Atari with Deep Reinforcement Learning
Continuous Control with Deep Reinforcement Learning
Deterministic Policy Gradient Algorithms
Actor-Critic Methods
- Summary: an actor neural network would determine the actions (student) while the critic neural network would evaluate the actor’s actions (teacher)
Progressive Neural Network, Reinforcment Learning Context
- Summary: adding columns for each new task results in better transfer learning compared to partial or complete fine-tuning which causes catastrophic forgetting

Deep Convolutional Neural Networks

Wide Residual Networks
- Summary: a variation of residual networks where width over depth has shown better performance
SqueezeNet
- Summary: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Deep Neural Networks

A shared neural ensemble links distinct contextual memories encoded close in time
- Summary: spatial memories that are acquired near in time are associated with overlapping neuronal ensembles in the brain’s hippocampus
Memories linked within a window of time
- Summary: a theory called temporal context memory (TCM) explains why people have a better memory for words that occur close together in a list than for words that are further apart
Learning Step Size Controllers for Robust Neural Network Training
- Summary: identifying informative states, using the states for learning step size and showing generalization to different tasks
Weight Features for Predicting Future Model Performance of Deep Neural Networks
- Summary: using statistics of weights instead of actual weights
Compete to Compute
- Summary: using competing linear units to outperform non-competing nonlinear units and avoid catastrophic forgetting when training sets change over time
HyperNetworks
- Summary: using a HyperLSTMCell over BasicLSTM cell by using a small number of parameters (small LSTM) to generate a large number of parameters (larger LSTM)
Non-Local Interaction via Diffusible Resource Prevents Coexistence of Cooperators and Cheaters in a Lattice Model
Decoupled Neural Interfaces using Synthetic Gradients
- Summary: by modelling error gradients (synthetic gradients), we can decouple subgraphs and update them independently and asynchronously
Distilling the Knowledge in a Neural Network
- Summary: using soft targets instead of hard targets, we can achieve similar performance from a much smaller network than a large network where we learned the soft targets from

Hyper-parameter Optimization

Learning to learn by gradient descent by gradient descent
- Summary: learning an optimization algorithm that works on a class of optimization problems by parameterizing the optimizer
Direct Feedback Alignment Provides Learning in Deep Neural Networks
- Summary: an alternative to error backpropagation by propagating the error through fixed random feedback connections directly from the output layer to each hidden layer
DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks
- Summary: using a convex combination of the starting and ending points to accelerate convergence
Gradient-based Hyperparameter Optimization through Reversible Learning
- Summary: tuning hyperparameters by casting them as a learning problem

Deep Recurrent Neural Networks

HyperNetworks
- Summary: using a small LSTM to generate a large LSTM for substantial model compression
Exploring Sparsity in RNN
- Summary: model size can be reduced by 90% and speed-up is around 2× to 7× while maintain accuracy by pruning weights during the initial training of the network

Numerical Optimization

Fast Exact Multiplication by the Hessian

Tags: