Questions of Dense Interspecies Face Embedding:
Q:Is the Source Image shown in experiment results in paper the Synthesized Image or the orginal input image of the dataset?
A:The Source Image is the original input image of the dataset because the need for the labeled data.
[Here is the answer of the author]
Q:How this method does the inference in the paper?
A:The images are preprocessed and fed into the model and we get its corresponding DIFE.
Q:Did DIFE learn the feature information of StyleGAN2 by reducing the distance between the domain-specific embedding and the feature map of the original map extracted by StyleGAN2?
A:This process only plays an indirect role.The DIFE encoder is trained both by distance to CSE and by semantic matching to pseudo-paired images.
[Here is the answer of the author]
Q:When it comes to semantic matching loss,is the pixel embedding of the original image matched with the weighted small region embedding of the pseudo-paired image?
A:Yes.And Pixel embedding means that for each pixel in the input image, a vector is generated at the corresponding position to encode local feature information about the pixel. Feature embedding is the encoding of a set of high-dimensional features into a low-dimensional vector, which is used to extract global feature information,rather than the relationship between features and pixels in the feature map.
Think it further! we may need prompt-based knowledge to help adaptively estimate animal poses, i.e. predict complex animals with dense landmarks and simple animals with sparse landmarks. We never know the complexity of a given animal, and we can use prompt-based knowledge to provide this.