Abstract
When you train an energy-based model, it is important to set a good margin. However, it is almost impossible to train a good margin using stochastic gradient descent (SDG). Because the margin will saturate to zero while minimizing the cost function. For that reason, we usually set the margin to a non-trainable scalar to penalize offending answers linearly to more apart than a certain distance. A good performance setting relates to the length of margin and dimension to which the feature being mapped. In this paper, we will show that a large margin does not always lead to a better performance and affirm the well-tuned margin can achieve better results.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 2017 International Conference on Industrial Design Engineering, ICIDE 2017 |
| Publisher | Association for Computing Machinery |
| Pages | 34-37 |
| Number of pages | 4 |
| ISBN (Electronic) | 9781450348669 |
| DOIs | |
| State | Published - 29 Dec 2017 |
| Event | 2017 International Conference on Industrial Design Engineering, ICIDE 2017 - Dubai, United Arab Emirates Duration: 28 Dec 2017 → 31 Dec 2017 |
Publication series
| Name | ACM International Conference Proceeding Series |
|---|
Conference
| Conference | 2017 International Conference on Industrial Design Engineering, ICIDE 2017 |
|---|---|
| Country/Territory | United Arab Emirates |
| City | Dubai |
| Period | 28/12/17 → 31/12/17 |
Bibliographical note
Publisher Copyright:© 2017 Association for Computing Machinery.
Keywords
- Energy based models
- Hinge loss
- Margin
- Negativelog-likelihood loss
- Triplet loss