Zelin Report

The report will be updated on Wednesdays and Saturdays.

2.11 Sat

The reviews all agreed on the value of studying attribute bias. The problems are focused on the novelty and feasibility of the proposed method.

R1.1:

The reviewer can’t figure out the main difference between our method and the attribute-enhanced methods.

R1.2& R3.2b::

The reviewer doubts that our method is not quite applicable due to the dependence on the ground-truth label.

R1.3 :

The reviewer suggests exploring the influence of different combinations of training distribution on attribute bias, which has been done in our IJCAI version paper. :heavy_check_mark:

For R1.1 and R1.2, the main concern can be summarized as:

Although the attribute bias problem has not been explored, the idea of involving attribute information to super-resolve images has been studied. These attribute-enhanced methods suffer from the need for ground-truth attribute labels, which reduce their application values.

To justify our superiority to attribute-enhanced methods and answer R1.2, we should

  1. Examine our method design and try to reduce its reliance on given attributes. So, there is two application scenery, that we should emphasize in the new version:

1). The attributes are not given.

The model can slightly alleviate the attribute bias problem.

2). The attributes are given.

The model can largely alleviate the attribute bias problem.

  1. Emphasize the attribute-enhanced methods also suffer from low restoration realness and resolution problems, which are addressed by us through the application of GAN prior.

R2.1&R3.3: Reviewer 2 thinks the advantage of Clip over Arcface is not well-proven, while Reviewer 3 thinks Arcface is not a good baseline method. The advantage of Clip is proven by a tsne visualization which shows Clip embedding is more distinguishable in attribute level than Arcface, and we do obtain higher classification accuracy with Clip embeddings. But we must acknowledge that it is not necessary to prove Clip is better than Arcface. Because the training task is different between Clip and Arcface, it is easy to speculate that Clip is more sensitive to attributes as it learned to match images and descriptions(most of which are attributes descriptions), while Arcface learned to recognize people ignoring their age or gender changes.

We will replace the experiments of Clip with a simple justification: its advantage is brought by its learning task .

R2.2a: The reviewer is questioning the value of face image in person identification, however, we only focus on face recognition. The use of face restoration can justify the face recognition result. Humans can not match the degraded face with its counterpart, so we might be not confident with the conclusion from face recognition models. In this situation, the degraded face can be restored and compared with the gallery image, which improves the recognition results in confidence.

R2.2b: The reviewer is questioning the influence of restoration on face recognition. However, the restoration result will be not input into the face recognition model as we mentioned before. It is used to improve the result confidence.

R2.2c: The reviewer is wondering about the restoration result when there is a big difference between the input attribute and the actual attribute. The restoration actually mixes the information from the degraded image and input attribute, as a result, it will contain the characteristics from both the image and the input attribute.

We will do a demonstration in Sec(we have done it in the IJICAI version, but it should be improved by adding more introduction on the mixture of image information and attribute information)

R2.3: The doubts about comparison fairness come from the data preprocessing strategy, the reviewer thinks the VQFR and GFPGAN didn’t learn how to do 8x super-resolution so, it is not fair to compare them in 16x and 32x super-resolution tasks.

However, to solve the blind face restoration task, there is no fixed request for the data preprocessing strategy and there are differences between the strategy of different methods. We select the 16x and 32x super-resolution tasks just because the attribute bias problem is obvious in these resolutions. GPEN learned how to do super-resolution from 1x to 200x, but it still suffers from the attribute bias problem.

We will tune the official models with the images processed by our strategy to justify fairness .

R3.1: The difference between COX and SCFace distributions influences the confidence of the conclusion, which has been solved in our IJCAI version paper. :heavy_check_mark:

R3.2a: The reviewer thinks a two-stage work (SR+Edit) makes more sense.

We should justify the difference between our work and the two-stage work in the new version paper.

R3.2c: Since the age of a single-face image is ambiguous, The reviewer thinks it’s hard to choose the attribute input. However, human interaction can be introduced to choose the correct attribute. (A fine-grained control of age is introduced in our IJICAI version paper, we can also do more explanation on the necessity of fine-grained control)

R3.4: the image quality is sacrificed for altering face attributes.

To be finished

A good way to organize the reviewer’s comments is like this:

[Reviewer 1] C1: one sentence to conclude the reviewer’s first comment if Q1 and Q2 can be addressed together. (Q1, and Q2 are the original list provided by the reviewer)

For example,

[Reviewer 1] C1: Difference from the existing attribute-enhanced FR methods (Q1.1)

Our answer, reminder: we could include this part in the related work.

C2: Compare with more attribute-enhanced FR methods rather than AACNN (Q1.2)

Answer why AACNN is sufficient and check if other methods are outdated. Include necessary methods if there are any.

Use [Done], [Unfinished, expect to be done: xxx], [Todo] to mark the progress.

Please re-organize the point-to-point response asap.

[Reviewer 1] C1: Difference from the existing attribute-enhanced FR methods(Q1.1)

The problem can be translated to, although the attribute bias problem has not been explored, the idea of involving attribute information to super-resolve images has been studied. Actually there are many differences between the exsisting attribute-enhanced FR method and ours method: 1). Existing attribute-enhanced FR methods did not envolve GAN prior or other reference prior which result in restoration with less realness. 2). They foucused on a fixed resolution super-resolution task(.eg, 16x, 32x) which reduce its generalization to real scenery. Reminder: we can include this part in the related work. [Todo]

[Reviewer 1] C2: Reliance on the existence of ground-truth attribute reduce the application value(Q1.2,Q3.2b)

There there are two application sceneries, that we should emphasize:

1). The attributes are not given.

The model can slightly alleviate the attribute bias problem.

2). The attributes are given.

The model can largely alleviate the attribute bias problem. Reminder: This design should be explored. [Todo]

[Reviewer 1] C3: The reviewer suggests exploring the influence of different combinations of training distribution on attribute bias(Q1.3), which has been done in our IJCAI version paper.Reminder:[Done]

[Reviewer 2] C1: the advantage of Clip over Arcface is not well-proven, and Arcface is not a good baseline method(Q2.1,Q3.3).

The advantage of Clip is proven by a tsne visualization which shows Clip embedding is more distinguishable in attribute level than Arcface, and we do obtain higher classification accuracy with Clip embeddings. But we must acknowledge that it is not necessary to prove Clip is better than Arcface. Because the training task is different between Clip and Arcface, it is easy to speculate that Clip is more sensitive to attributes as it learned to match images and descriptions(most of which are attributes descriptions), while Arcface learned to recognize people ignoring their age or gender changes. Reminder:This part should be replaced with a simple justification: its advantage is brought by its learning task .[Todo].

[Reviewer 2] C2: The value of face image in person identification. (Q2.2a)

We only focus on face recognition. The use of face restoration can justify the face recognition result. Humans can not match the degraded face with its counterpart, so we might be not confident with the conclusion from face recognition models. In this situation, the degraded face can be restored and compared with the gallery image, which improves the recognition results in confidence.

[Reviewer 2] C3: The influence of restoration on face recognition. (Q2.2b)

The restoration result will be not input into the face recognition model as we mentioned before. It is used to improve the result confidence.

[Reviewer 2] C4: The restoration result when there is a big difference between the input attribute and the actual attribute. (Q2.2c)

The restoration actually mixes the information from the degraded image and input attribute, as a result, it will contain the characteristics from both the image and the input attribute.

We will do a demonstration in Sec… Reminder: we have done it in the IJICAI version[Done], but it should be improved by adding more introduction on the mixture of image information and attribute information[Todo])

[Reviewer 2] C5: Unfair comparisons.

The doubts about comparison fairness come from the data preprocessing strategy, the reviewer thinks the VQFR and GFPGAN didn’t learn how to do 8x super-resolution so, it is not fair to compare them in 16x and 32x super-resolution tasks.

However, to solve the blind face restoration task, there is no fixed request for the data preprocessing strategy and there are differences between the strategy of different methods. We select the 16x and 32x super-resolution tasks just because the attribute bias problem is obvious in these resolutions. GPEN learned how to do super-resolution from 1x to 200x, but it still suffers from the attribute bias problem. Reminder:We will tune the official models with the images processed by our strategy to justify fairness[Todo].

[Reviewer 3] C1: The difference between COX and SCFace distributions influences the confidence of the conclusion. (Q3.1)

Reminder: we have done it in the IJICAI version[Done]

[Reviewer 3] C2: A two-stage work (SR+Edit) makes more sense.

Reminder: We should justify the difference from various aspects between our work and the two-stage work in the new version paper[Todo]

[Reviewer 3] C3: Since the age of a single-face image is ambiguous, it’s hard to choose the attribute input.

Actually, Human interaction can be introduced to choose the correct attribute. Reminder:A fine-grained control of age is introduced in our IJICAI version paper, we can also do more explanation on the necessity of fine-grained control[Todo]

[Reviewer 3] C4: The image quality is sacrificed for altering face attributes.

Reminder: Thinking… [Todo]

2.15 Wed

  1. Collect papers to read(The titles are in reading lists) and glance at them.

  2. Restructure the responses and to-do list.

  3. Think about the possible improvement to the current method.

Current attribute-enhanced FR methods use conditional GAN to recover LR images conditionally on the input attributes. They do not explore the relationship between attributes and output image quality. In contrast, our method focuses on the attribute bias problem in face restoration. The attributes are explicitly/implicitly used to de-bias the image generation process and how to perfectly align the LR images and attributes from two domains via latent space of GAN prior becomes our key technique challenge. Thus, our method can be regarded as unconditional GAN which combats the attribute bias problem from a fresh perspective.

  • Indeed, requiring ground-truth attributes makes our method less generalizable to various applications. It remains an important application when it comes to criminal search in surveillance scenarios.

  • Additionally, our method works quite well with less accurate attributes (which can be predicted by the input LR) **[Exp required].**还是直接输入attribute的框架,但是我们使用更粗粒度的attribute,或者平滑后的attribute的分布,看下结果是否依然很接近GT使用效果。

  • To make our method more widely applicable, we extend our method to de-bias attributes in FR when only LR inputs are given. As expected, de-bias is not as effective as using the GT attributes for FR. The results show that our method still outperforms other FR methods in preserving attributes.

Simplify the description. Checked!

Our FR results can increase the experience of human inspectors. It is unwise to use restored faces for face recognition since unavoided hallucination will be produced, which will harm the extraction of discriminative features for recognition.

This can be extended as an important section. Inconsistency (Gap) between attributes and input LR.

Your response is too offensive.
We could say:

When studying the FR problem, it does not matter how the training data is generated, it’s more of an engineering trick. For example, GPEN generates training samples by randomly downsampling from 1x to 200x, which is a different upscaling factor than test data. To fully address R3’s concerns, we will tune the comparison methods by using the same training data as ours for comparison.

可能的回答:高图像质量(视觉感知的质量)是很容易获得的,一个GAN方法就很容易生成以假乱真的图像。我们能观察到图像质量的下降可能原因:1)提供的认为attribute的粒度和implicit生成图像中attribute的粒度不完全匹配,比如:gender男女是无法提供诸如面部形状、轮廓等显示的约束的;2)面部包括的attribute远多于我们目前探索的attribute,在面部修改时,只修改部分定义的attribute而没有联调所有attribute导致图像质量不可避免会有轻微下降。

2/22/2023, Wed

  1. Read <<DIFFACE: BLIND FACE RESTORATION WITH
    DIFFUSED ERROR CONTRACTION>> and write a reading note.
  2. Writing the first draft of the HUARUN patent. :frowning:
  1. Read “Improving Person Re-identification by Attribute and Identity Learning”
  2. Reading “Image Super-Resolution via Iterative Refinement” , Watching an online course teaching diffusion model: Probabilistic Diffusion Model概率扩散模型理论与完整PyTorch代码详细解读_哔哩哔哩_bilibili :frowning:

3.1 Wed

  1. DifFace Reproduction

  2. Tuning VQFR

  3. Survey on Efficient Image Algorithm: 目前在以efficient/real-time 作为关键词在搜, 难点还是在把应用场景圆过来,得说明把现有修复算法做的小或者快是有意义的:(

    a). edge–SR: Super–Resolution For The Masses
    b). Pyramid Architecture Search for Real-Time Image Deblurring

研究efficient SR/debluring问题看来不是很popular。第一,该问题和SR或者debluring本身是垂直的研究方向;第二,一般涉及到real-time,都不会仅限制于图像,大家会对video更感兴趣;

大家对图像修复问题感兴趣的点还是其ill-posed的本质。除此外,如果需要从应用场景下去约束一个研究课题,最终方法的创新性还是很容易被challenging。

3.4 Sat

  1. Finished some experiments about DifFace(natural,32x,64x).
  2. Tune VQFR
  3. Study the diffusion model and read SR3

3.8 Wed

  1. Finish studying DDPM and understand the proof of SR3.
  2. Browse the latent diffusion, the same as SR3 in the reconstruction equation(conditional noise generation).
  • I aim to prove the equivalence of DDPM and the Scored-based diffusion model and reproduce DDPM, reading source codes in my following work.
  1. Read " Jointly De-Biasing Face Recognition and Demographic Attribute Estimation"

3.11 Sat

  1. Study score-based generative model and try to figure out its equivalence to DDPM
  • Learn MCMC, SDE
  • DDPM optimizes the model to predict the noise of each reversion step and this can be regarded as denoising score matching.
  1. Finish training of VQFR and compare the different restoration scale with our model.
  2. Look through the design of latent diffusion.

3.15

  1. Tune PSFRGAN and GFPGAN to compare the different restoration scales with our model.
  2. Quantitative comparison with the tuned baselines