Prof. Dr. JI, Mengqi 

季梦奇

Posdoc at Tsinghua Uninversity 

supervised by :

Prof. Qionghai Dai & Prof. Lu Fang

Research interest : 

3D vision, health technology

Publications

#means equal contribution. * identifies the corresponding author(s).

    2021

    SurRF: Unsupervised Multi-view Stereopsis by Learning Surface Radiance Field

    Jinzhi Zhang#, Mengqi Ji#, Guangyu Wang, Zhiwei Xue, Shengjin Wang, Lu Fang*

    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2021)

    Abstract:

    The recent success in supervised multi-view stereopsis (MVS) relies on the onerously collected real-world 3D data. While the latest differentiable rendering techniques enable unsupervised MVS, they are restricted to discretized (e.g., point cloud) or implicit geometric representation, suffering from either low integrity for a textureless region or less geometric details for complex scenes. In this paper, we propose SurRF, an unsupervised MVS pipeline by learning Surface Radiance Field, i.e., a radiance field defined on a continuous and explicit 2D surface. Our key insight is that, in a local region, the explicit surface can be gradually deformed from a continuous initialization along view-dependent camera rays by differentiable rendering. That enables us to define the radiance field only on a 2D deformable surface rather than in a dense volume of 3D space, leading to compact representation while maintaining complete shape and realistic texture for large-scale complex scenes. We experimentally demonstrate that the proposed SurRF produces competitive results over the-state-of-the-art on various real-world challenging scenes, without any 3D supervision. Moreover, SurRF shows great potential in owning the joint advantages of mesh (scene manipulation), continuous surface (high geometric resolution), and radiance field (realistic rendering).

    Latex Bibtex:

    @article{zhang2021surrf,
      title={SurRF: Unsupervised Multi-view Stereopsis by Learning Surface Radiance Field},
      author={Zhang, Jinzhi and Ji, Mengqi and Wang, Guangyu and Zhiwei, Xue and Wang, Shengjin and Fang, Lu},
      journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
      year={2021},
      publisher={IEEE}
    }

    A Modular Hierarchical Array Camera

    Light: Science & Applications (LSA 2020, Nature Press, Cover Paper)

    Abstract:

    Array cameras removed the optical limitations of a single camera and paved the way for high-performance imaging via the combination of micro-cameras and computation to fuse multiple aperture images. However, existing solutions use dense arrays of cameras that require laborious calibration and lack flexibility and practicality. Inspired by the cognition function principle of the human brain, we develop an unstructured array camera system that adopts a hierarchical modular design with multiscale hybrid cameras composing different modules. Intelligent computations are designed to collaboratively operate along both intra- and intermodule pathways. This system can adaptively allocate imagery resources to dramatically reduce the hardware cost and possesses unprecedented flexibility, robustness, and versatility. Large scenes of real-world data were acquired to perform human-centric studies for the assessment of human behaviours at the individual level and crowd behaviours at the population level requiring high-resolution long-term monitoring of dynamic wide-area scenes.

    Latex Bibtex:

    @article{yuan2021modular,
      title={A modular hierarchical array camera},
      author={Yuan, Xiaoyun and Ji, Mengqi and Wu, Jiamin and Brady, David J and Dai, Qionghai and Fang, Lu},
      journal={Light: Science \& Applications},
      volume={10},
      number={1},
      pages={1--9},
      year={2021},
      publisher={Nature Publishing Group}
    }

    Boosting Single Image Super-Resolution Learnt From Implicit Multi-Image Prior

    Dingjian Jin, Mengqi Ji, Lan Xu, Gaochang Wu, Liejun Wang, Lu Fang*

    IEEE Transactions on Image Processing (TIP 2021)

    Abstract:

    Learning-based single image super-resolution (SISR) aims to learn a versatile mapping from low resolution (LR) image to its high resolution (HR) version. The critical challenge is to bias the network training towards continuous and sharp edges. For the first time in this work, we propose an implicit boundary prior learnt from multi-view observations to significantly mitigate the challenge in SISR we outline. Specifically, the multi-image prior that encodes both disparity information and boundary structure of the scene supervise a SISR network for edge-preserving. For simplicity, in the training procedure of our framework, light field (LF) serves as an effective multi-image prior, and a hybrid loss function jointly considers the content, structure, variance as well as disparity information from 4D LF data. Consequently, for inference, such a general training scheme boosts the performance of various SISR networks, especially for the regions along edges. Extensive experiments on representative backbone SISR architectures constantly show the effectiveness of the proposed method, leading to around 0.6 dB gain without modifying the network architecture.

    Latex Bibtex:

    @article{jin2021boosting,
      title={Boosting Single Image Super-Resolution Learnt From Implicit Multi-Image Prior},
      author={Jin, Dingjian and Ji, Mengqi and Xu, Lan and Wu, Gaochang and Wang, Liejun and Fang, Lu},
      journal={IEEE Transactions on Image Processing},
      volume={30},
      pages={3240--3251},
      year={2021},
      publisher={IEEE}
    }

    GigaMVS: A Benchmark for Ultra-large-scale Gigapixel-level 3D Reconstruction

    Jianing Zhang#, Jinzhi Zhang#, Shi Mao#, Mengqi Ji, Guangyu Wang, Zequn Chen, Tian Zhang, Xiaoyun Yuan, Qionghai Dai, Lu Fang*

    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2021)

    Abstract:

    Multiview stereopsis (MVS) methods, which can reconstruct both the 3D geometry and texture from multiple images, have been rapidly developed and extensively investigated from the feature engineering methods to the data-driven ones. However, there is no dataset containing both the 3D geometry of large-scale scenes and high-resolution observations of small details to benchmark the algorithms. To this end, we present GigaMVS, the first gigapixel-image-based 3D reconstruction benchmark for ultra-large-scale scenes. The gigapixel images, with both wide field-of-view and high-resolution details, can clearly observe both the Palace-scale scene structure and Relievo-scale local details. The ground-truth geometry is captured by the laser scanner, which covers ultra-large-scale scenes with an average area of 8667 m^2 and a maximum area of 32007 m^2. Due to the extremely large scale, complex occlusion, and gigapixel-level images, GigaMVS brings the problem to light that emerged from the poor effectiveness and efficiency of the existing MVS algorithms. We thoroughly investigate the state-of-the-art methods in terms of geometric and textural measurements, which point to the weakness of existing methods and promising opportunities for future works. We believe that GigaMVS can benefit the community of 3D reconstruction and support the development of novel algorithms balancing robustness, scalability, and accuracy.

    Latex Bibtex:

    @article{zhang2021gigamvs,
      title={GigaMVS: A Benchmark for Ultra-large-scale Gigapixel-level 3D Reconstruction},
      author={Zhang, Jianing and Zhang, Jinzhi and Mao, Shi and Ji, Mengqi and Wang, Guangyu and Chen, Zequn and Zhang, Tian and Yuan, Xiaoyun and Dai, Qionghai and Fang, Lu},
      journal={IEEE Transactions on Pattern Analysis \& Machine Intelligence},
      number={01},
      pages={1--1},
      year={2021},
      publisher={IEEE Computer Society}
    }

    2020

    SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-view Stereopsis

    Mengqi Ji#, Jinzhi Zhang#, Qionghai Dai, Lu Fang*

    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2020)

    Abstract:

    Multi-view stereopsis (MVS) tries to recover the 3D model from 2D images. As the observations become sparser, the significant 3D information loss makes the MVS problem more challenging. Instead of only focusing on densely sampled conditions, we investigate sparse-MVS with large baseline angles since sparser sampling is always more favorable inpractice. By investigating various observation sparsities, we show that the classical depth-fusion pipeline becomes powerless for thecase with larger baseline angle that worsens the photo-consistency check. As another line of solution, we present SurfaceNet+, a volumetric method to handle the 'incompleteness' and 'inaccuracy' problems induced by very sparse MVS setup. Specifically, the former problem is handled by a novel volume-wise view selection approach. It owns superiority in selecting valid views while discarding invalid occluded views by considering the geometric prior. Furthermore, the latter problem is handled via a multi-scale strategy that consequently refines the recovered geometry around the region with repeating pattern. The experiments demonstrate the tremendous performance gap between SurfaceNet+ and the state-of-the-art methods in terms of precision and recall. Under the extreme sparse-MVS settings in two datasets, where existing methods can only return very few points, SurfaceNet+ still works as well as in the dense MVS setting.

    Latex Bibtex:

    @article{ji2020surfacenet+,
      title={SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-view Stereopsis},
      author={Ji, Mengqi and Zhang, Jinzhi and Dai, Qionghai and Fang, Lu},
      journal={arXiv preprint arXiv:2005.12690},
      year={2020}
    }

    Augmenting Vascular Disease Diagnosis by Vasculature-aware Unsupervised Learning

    Yong Wang#Mengqi Ji#, Shengwei Jiang#, Xukang Wang, Jiamin WuFeng Duan, Jingtao Fan, Laiqiang Huang, Shaohua Ma*, Lu Fang*, Qionghai Dai*

    Nature Machine Intelligence (NMI 2020, Nature Press)

    Abstract:

    Vascular disease is one of the leading causes of death and threatens human health worldwide. Imaging examination of vascular pathology with reduced invasiveness is challenging due to the intrinsic vasculature complexity and non-uniform scattering from bio-tissues. Here, we report VasNet, a vasculature-aware unsupervised learning algorithm that augments pathovascular recognition from small sets of unlabelled fluorescence and digital subtraction angiography images. VasNet adopts a multi-scale fusion strategy with a domain adversarial neural network loss function that induces biased pattern reconstruction by strengthening features relevant to the retinal vasculature reference while weakening irrelevant features. VasNet delivers the outputs ‘Structure + X’ (where X refers to multi-dimensional features such as blood flows, the distinguishment of blood dilation and its suspicious counterparts, and the dependence of new pattern emergence on disease progression). Therefore, explainable imaging output from VasNet and other algorithm extensions holds the promise to augment medical diagnosis, as it improves performance while reducing the cost of human expertise, equipment and time consumption.

    Latex Bibtex:

    @article{wang2020augmenting,
      title={Augmenting vascular disease diagnosis by vasculature-aware unsupervised learning},
      author={Wang, Yong and Ji, Mengqi and Jiang, Shengwei and Wang, Xukang and Wu, Jiamin and Duan, Feng and Fan, Jingtao and Huang, Laiqiang and Ma, Shaohua and Fang, Lu and others},
      journal={Nature Machine Intelligence},
      volume={2},
      number={6},
      pages={337--346},
      year={2020},
      publisher={Nature Publishing Group}
    }

    Zoom in to the Details of Human-centric Videos

    Guanghan Li, Yaping Zhao, Mengqi Ji, Xiaoyun Yuan, Lu Fang*

    IEEE International Conference on Image Processing (ICIP 2020)

    Abstract:

    Presenting high-resolution (HR) human appearance is always critical for the human-centric videos. However, current imagery equipment can hardly capture HR details all the time. Existing super-resolution algorithms barely mitigate the problem by only considering universal and low-level priors of image patches. In contrast, our algorithm is under bias towards the human body super-resolution by taking advantage of high-level prior defined by HR human appearance. Firstly, a motion analysis module extracts inherent motion pattern from the HR reference video to refine the pose estimation of the low-resolution (LR) sequence. Furthermore, a human body reconstruction module maps the HR texture in the reference frames onto a 3D mesh model. Consequently, the input LR videos get super-resolved HR human sequences are generated conditioned on the original LR videos as well as few HR reference frames. Experiments on an existing dataset and real-world data captured by hybrid cameras show that our approach generates superior visual quality of human body compared with the traditional method.

    Latex Bibtex:

    @inproceedings{li2020zoom,
      title={Zoom in to the details of human-centric videos},
      author={Li, Guanghan and Zhao, Yaping and Ji, Mengqi and Yuan, Xiaoyun and Fang, Lu},
      booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
      pages={3089--3093},
      year={2020},
      organization={IEEE}
    }

    A Learning-Based Model to Evaluate Hospitalization Priority in COVID-19 Pandemics

    Yichao Zheng*, Yinheng Zhu*, Mengqi Ji, Rongpin Wang, Xinfeng Liu, Mudan Zhang, Jun Liu, Xiaochun Zhang, Choo Hui Qin, Lu Fang, Shaohua Ma

    Cell Patterns 1 (6), 100092

    Abstract:

    The emergence of novel coronavirus disease 2019 (COVID-19) is placing an increasing burden on the healthcare systems. Although the majority of infected patients have non-severe symptoms and can be managed at home, some individuals may develop severe disease and are demanding the hospital admission. Therefore, it becomes paramount to efficiently assess the severity of COVID-19 and identify hospitalization priority with precision. In this respect, a 4-variable assessment model, including lymphocyte, lactate dehydrogenase (LDH), C-reactive protein (CRP) and neutrophil, is established and validated using the XGBoost algorithm. This model is found effective to identify severe COVID-19 cases on admission, with a sensitivity of 84.6%, a specificity of 84.6%, and an accuracy of 100% to predict the disease progression toward rapid deterioration. It also suggests that a computation-derived formula of clinical measures is practically applicable for the healthcare administrators to distribute hospitalization resources to the most needed in epidemics and pandemics.

    Latex Bibtex:

    @article{2020A,
      title={A Learning-based Model to Evaluate Hospitalization Priority in COVID-19 Pandemics},
      author={ Zheng, Y.  and  Zhu, Y.  and  Ji, M.  and  Wang, R.  and  Ma, S. },
      journal={Patterns},
      pages={100092},
      year={2020},
    }


    2019

    SPI-Optimizer: An Integral-Separated PI Controller for Stochastic Optimization

    Dan Wang*, Mengqi Ji*, Yong Wang, Haoqian Wang, Lu Fang*

    IEEE International Conference on Image Processing (ICIP 2019)

    Abstract:

    To overcome the oscillation problem in the classical momentum-based optimizer, recent work associates it with the proportional-integral (PI) controller, and artificially adds D term producing a PID controller. It suppresses oscillation with the sacrifice of introducing extra hyper-parameter. In this paper, we analyze that the fluctuation problem relates to the lag effect of the integral (I) term, and propose SPI-Optimizer, an integral-Separated PI controller based optimizer WITHOUT introducing extra hyper-parameter. It separates momentum term adaptively when the inconsistency of current and historical gradient direction occurs. Extensive experiments demonstrate that SPI-Optimizer generalizes well on popular network architectures to eliminate the oscillation, and owns competitive performance with faster convergence speed (up to 40% epochs reduction ratio) and more accurate classification result on MNIST, CIFAR10, and CIFAR100 (up to 27.5% error reduction ratio) than state-of-the-art methods.

    Latex Bibtex:

    @inproceedings{wang2019spi,
      title={SPI-Optimizer: an integral-Separated PI Controller for Stochastic Optimization},
      author={Wang, Dan and Ji, Mengqi and Wang, Yong and Wang, Haoqian and Fang, Lu},
      booktitle={2019 IEEE International Conference on Image Processing (ICIP)},
      pages={2129--2133},
      year={2019},
      organization={IEEE}
    }

    2018

    Crossnet: An End-to-end Reference-based Super Resolution Network Using Cross-scale Warping

    Haitian Zheng, Mengqi Ji,, Haoqian Wang, Yebin Liu, Lu Fang*

    Proceedings of the European conference on computer vision (ECCV 2018)

    Abstract:

    The Reference-based Super-resolution (RefSR) super-resolves a low-resolution (LR) image given an external high-resolution (HR) reference image, where the reference image and LR image share similar viewpoint but with significant resolution gap x8. Existing RefSR methods work in a cascaded way such as patch matching followed by synthesis pipeline with two independently defined objective functions, leading to the inter-patch misalignment, grid effect and inefficient optimization. To resolve these issues, we present CrossNet, an end-to-end and fully-convolutional deep neural network using cross-scale warping. Our network contains image encoders, cross-scale warping layers, and fusion decoder: the encoder serves to extract multi-scale features from both the LR and the reference images; the cross-scale warping layers spatially aligns the reference feature map with the LR feature map; the decoder finally aggregates feature maps from both domains to synthesize the HR output. Using cross-scale warping, our network is able to perform spatial alignment at pixel-level in an end-to-end fashion, which improves the existing schemes both in precision (around 2dB-4dB) and efficiency (more than 100 times faster).

    Latex Bibtex:

    @inproceedings{zheng2018crossnet,
      title={Crossnet: An end-to-end reference-based super resolution network using cross-scale warping},
      author={Zheng, Haitian and Ji, Mengqi and Wang, Haoqian and Liu, Yebin and Fang, Lu},
      booktitle={Proceedings of the European conference on computer vision (ECCV)},
      pages={88--104},
      year={2018}
    }

    Abstract:

    Direct image-to-image alignment that relies on the optimization of photometric error metrics suffers from limited convergence range and sensitivity to lighting conditions. Deep learning approaches has been applied to address this problem by learning better feature representations using convolutional neural networks, yet still require a good initialization. In this paper, we demonstrate that the inaccurate numerical Jacobian limits the convergence range which could be improved greatly using learned approaches. Based on this observation, we propose a novel end-to-end network, RegNet, to learn the optimization of image-to-image pose registration. By jointly learning feature representation for each pixel and partial derivatives that replace handcrafted ones (e.g., numerical differentiation) in the optimization step, the neural network facilitates end-to-end optimization. The energy landscape is constrained on both the feature representation and the learned Jacobian, hence providing more flexibility for the optimization as a consequence leads to more robust and faster convergence. In a series of experiments, including a broad ablation study, we demonstrate that RegNet is able to converge for large-baseline image pairs with fewer iterations.

    Latex Bibtex:

    @article{han2018regnet,
      title={Regnet: Learning the optimization of direct image-to-image pose registration},
      author={Han, Lei and Ji, Mengqi and Fang, Lu and Nie{\ss}ner, Matthias},
      journal={arXiv preprint arXiv:1812.10212},
      year={2018}
    }

    2017

    SurfaceNet: An End-To-End 3D Neural Network for Multiview Stereopsis

    IEEE Conference on Computer Vision and Pattern Recognition (ICCV 2017)

    Abstract:

    This paper proposes an end-to-end learning framework for multiview stereopsis. We term the network SurfaceNet. It takes a set of images and their corresponding camera parameters as input and directly infers the 3D model. The key advantage of the framework is that both photo-consistency as well geometric relations of the surface structure can be directly learned for the purpose of multiview stereopsis in an end-to-end fashion. SurfaceNet is a fully 3D convolutional network which is achieved by encoding the camera parameters together with the images in a 3D voxel representation. We evaluate SurfaceNet on the large-scale DTU benchmark

    Latex Bibtex:

    @inproceedings{ji2017surfacenet,
      title={Surfacenet: An end-to-end 3d neural network for multiview stereopsis},
      author={Ji, Mengqi and Gall, Juergen and Zheng, Haitian and Liu, Yebin and Fang, Lu},
      booktitle={Proceedings of the IEEE International Conference on Computer Vision},
      pages={2307--2315},
      year={2017}
    }

    Learning Cross-scale Correspondence and Patch-based Synthesis for Reference-based Super-Resolution.

    Haitian Zheng, Mengqi Ji, Lei Han, Ziwei Xu, Haoqian Wang, Yebin Liu, Lu Fang*

    Proceedings of the IEEE International Conference on Computer Vision.

    Abstract:

    In this paper, we explore the Reference-based Super-Resolution (RefSR) problem, which aims to super-resolve a low definition (LR) input to a high definition (HR) out- put, given another HR reference image that shares similar viewpoint or capture time with the LR input. We solve this problem by proposing a learning-based scheme, denoted as RefSR-Net. Specifically, we first design a Cross-scale Correspondence Network (CC- Net) to indicate the cross-scale patch matching between reference and LR image. The CC-Net is formulated as a classification problem which predicts the correct matches from the candidate patches within the search range. Using dilated convolution, the train- ing and feature map generation are efficiently implemented. Given the reference patch selected via CC-Net, we further propose a Super-resolution image Synthesis Network (SS-Net) for the synthesis of the HR output, by fusing the LR patch and the reference patch at multiple scales. Experiments on MPI Sintel Dataset and Light-Field (LF) video dataset demonstrate our learned correspondence features outperform existing features, and our proposed RefSR-Net substantially outperforms conventional single image SR and exemplar-based SR approaches.

    Latex Bibtex:

    @inproceedings{zheng2017learning,
      title={Learning Cross-scale Correspondence and Patch-based Synthesis for Reference-based Super-Resolution.},
      author={Zheng, Haitian and Ji, Mengqi and Han, Lei and Xu, Ziwei and Wang, Haoqian and Liu, Yebin and Fang, Lu},
      booktitle={BMVC},
      volume={1},
      pages={2},
      year={2017}
    }

    2016

    Deep Learning for Surface Material Classification Using Haptic and Visual Information

    Haitian Zheng, Lu Fang, Mengqi Ji, Matti Strese; Yigitcan Özer; Eckehard Steinbach

    IEEE Transactions on Multimedia (TMM 2016)

    Abstract:

    When a user scratches a hand-held rigid tool across an object surface, an acceleration signal can be captured, which carries relevant information about the surface material properties. More importantly, such haptic acceleration signals can be used together with surface images to jointly recognize the surface material. In this paper, we present a novel deep learning method dealing with the surface material classification problem based on a fully convolutional network, which takes the aforementioned acceleration signal and a corresponding image of the surface texture as inputs. Compared to the existing surface material classification solutions which rely on a careful design of hand-crafted features, our method automatically extracts discriminative features utilizing advanced deep learning methodologies. Experiments performed on the TUM surface material database demonstrate that our method achieves state-of-the …

    Latex Bibtex:

    @article{zheng2016deep,
      title={Deep learning for surface material classification using haptic and visual information},
      author={Zheng, Haitian and Fang, Lu and Ji, Mengqi and Strese, Matti and {\"O}zer, Yigitcan and Steinbach, Eckehard},
      journal={IEEE Transactions on Multimedia},
      volume={18},
      number={12},
      pages={2407--2416},
      year={2016},
      publisher={IEEE}
    }