-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This projects aims to empirically analyze the impact the order of the samples has in learning a specific task.
for example, we analyzed what happens when the next batch to be used for gradient descent contains:
- the samples with the highest loss (because samples with highest loss have no significance when the network has just been initialized, a few epochs of normal training could be performed as warm-up)
- the samples with the lowest loss
- the samples with the lowest loss that are wrongly classified
After a few batches of training, you might ask the following question: What samples can be considered similar, from a training perspective? One intuitive answer would be to see how their latent representation changed during training at layer k. In other words, how the 'code' of one sample changed compared to the 'code' when the network was just initialized, or the code before the last few gradient descent steps. With this representation, you can do various things. For example you can make sure that a batch will cover uniformly the space of representations, or maybe build a batch from only a single cluster from that space etc