Interesting ideas

  • abstract text to video: exploring the design of ‘abstract degree prompts’ (low, medium, high abstract) in video generation networks could be fascinating. Additional design to handle ‘abstract text’ is the key.

  • animal pose estimation: Investigating the use of LLMs for effective animal joint detection is interesting. While CLAMP is a feasible approach, it’s not the most elegant. Addressing challenges like ‘occlusion joints’ and ‘ambiguity joints’ in prompt design would be really novel. Additionally, considering applying animal body-part aware learning (head, frontal body, back body) could mitigate imbalanced learning.

  • face sr: Ensuring high fidelity in the output HR face is what makes face SR distinct from natural SR. 1) Addressing large pose variations in LR is challenging yet practical. Simply augmenting the training data with different poses cannot solve this problem. How can we coherently learn 3D prior and SR together for pose-robust SR? 2) Maintaining identity consistency between LR and SR images is crucial. How good are existing diffusion models, LLMs for SR? It would be interesting to strengthen the identity in their final high-quality images.

  • FER: when long-tail meets noisy label dilemmas: how to consider both in a unified framework would be interesting.

  • to be continued…