Skip to content
hhroberthdaniel edited this page Feb 11, 2020 · 5 revisions

Welcome to the Convergence wiki!

This projects aims to empirically analyze the impact the order of the samples has in learning a specific task.

Two main criterion have been analyzed:

* The loss of a sample

for example, we analyzed what happens when the next batch to be used for gradient descent contains:

  • the samples with the highest loss (because samples with highest loss have no significance when the network has just been initialized, a few epochs of normal training could be performed as warm-up)
  • the samples with the lowest loss
  • the samples with the lowest loss that are wrongly classified

* The trajectory a sample went throughout training

After a few batches of training, you might ask the following question: What samples can be considered similar, from a training perspective? One intuitive answer would be to see how their latent representation changed during training at layer k. In other words, how the 'code' of one sample changed compared to the 'code' when the network was just initialized, or the code before the last few gradient descent steps. With this representation, you can do various things. For example you can make sure that a batch will cover uniformly the space of representations, or maybe build a batch from only a single cluster from that space etc