• Mukea: multimodal knowledge extraction and accumulation for knowledge based visual question answering


    Title: multimodal knowledge extraction and accumulation of knowledge-based visual question answering Source: CVPR 2022https://arxiv.org/abs/2203.09138code:https://github.com/AndersonStra/MuKEA 1、 Questions raised General knowledge-based visual question answering (kb-vqa) requires the ability to associate external knowledge to achieve open cross modal scene understanding. Existing researches mainly focus on acquiring relevant knowledge from structured knowledge maps, such as conceptnet and DBpedia, or […]

  • Paper reading: multimodal graph networks for compositional generalization in visual question answering


    Title: multimodal graph neural network for combinatorial generalization in visual question answeringSource: neurlps 2020https://proceedings.neurips.cc/paper/2020/hash/1fd6c4e41e2c6a6b092eb13ee72bce95-Abstract.htmlcode:https://github.com/raeidsaqur/mgn 1、 Questions raised a key:Combinatorial generalization problem Example: taking natural language as an example, for example, people can learn the meaning of new words and then apply them to other language environments. If a person learns the meaning of a new […]

  • Interpretation of knowledge distillation model tinybert


    Summary:The focus of this article is to improve the optimization mechanism of information bottleneck, and explain it around two points: the difficulty of estimating mutual information in high latitude space and the trade-off problem in the optimization mechanism of information bottleneck. This article is shared from Huawei cloud community《[cloud co creation] appreciation of American articles: […]

  • Acmmm2021 | integrating “knowledge + atlas” into multimodal training: method and e-commerce application practice


    Introduction: with the continuous development of artificial intelligence technology, knowledge atlas, as the knowledge pillar in the field of artificial intelligence, has attracted extensive attention in academia and industry with its strong knowledge representation and reasoning ability. In recent years, knowledge atlas has been widely used in semantic search, question answering, knowledge management and other […]

  • Use avplayer to customize the player that supports full screen (V) – swift reconstructed version


    preface A simple video player was open-source long ago. Due to its long-term disrepair, the effect is terrible. Recently, it has taken time to deeply reconstruct it. The old version will not be maintained later, and the new version will be usedSwiftAfter implementation, more functions will be added in the future. If you don’t want […]

  • Client development (electron) awareness window 2


    Dear, Hello, I’m “front-end Xiaoxin”. I’ve been engaged in front-end development and Android development for a long time. I’m keen on technology and go farther and farther on the road of programming Electron is a framework for building desktop applications using JavaScript, HTML, and CSS. Embed chromium and node JS to binary electron allows you […]

  • JavaScript plug-ins supported by bootstrap


    1. Import JavaScript plug-ins In addition to rich web components, bootstrap includes drop-down menus, button groups, navigation, paging, etc. It also includes some JavaScript plug-ins. The JavaScript plug-in of bootstrap can be imported into the page alone or once. Because the JavaScript plug-ins in bootstrap depend on the jQuery library, you must import the jQuery […]

  • Introduction to Chinese multimodal benchmark evaluation Muge


    background In recent years, the successful practice of large-scale neural network model and pre training technology not only promotes the rapid development of computer vision and natural language processing, but also promotes the research of multimodal representation learning. In 2020, Jeff Dean pointed out that multimodal research will be a major trend in future research. […]

  • 34.qt quick popup customization


    1. Introduction to popup Popup is a pop-up window controlIts common properties are as follows: anchors. Centerin: object, used to set who is centered in the window Closepolicy: enumeration, which sets the closing policy of pop-up window. The default value is popup CloseOnEscape|Popup. Closeonpressoutside. The values are: Popup. Noautoclose: the pop-up window will not close […]

  • Good news! Baidu won the National Technological Invention Award


    On November 3, the 2020 National Science and technology award conference was held in Beijing. Baidu‘s “key technologies and applications of knowledge enhanced cross modal semantic understanding” won the second prize of national technological invention. This technology aims to solve the problem of fusion representation of different modal semantic spaces by building a large-scale knowledge […]

  • Communication between components with Redux


    Communication between components with Redux Demo address doubt When I was working on a project, I always encountered a problem that bothered me. How do two independent sub components communicate? Suppose the current structure is as follows ListItemIs a todolist component with aDelete operation, when you click Add remarks, a modal box will pop up […]

  • Counterfactual VQA: A Cause-Effect Look at Language Bias


    Counterfactual VQA: A Cause-Effect Look at Language Bias Abstract VQA model may tend to rely on language bias as the entry point, so it can not fully learn multimodal knowledge from both visual and language. Recently, a method of excluding language priors in reasoning process is proposed. However, they fail to clarify the “good” context […]