The Stanford I2V Dataset

Imagine being able to search for video content not by keyword, but by using other visual information, like a still image, a screenshot or a single frame from another video. As the ubiquity of images and video in our culture increases, it is not surprising how useful this kind of functionality could be.

Recently, Bernd Girod's Image, Video, and Multimedia Systems Group released a video dataset that stands to substantially advance current research in visual search technology. The "Stanford I2V dataset", announced in the Proceedings of the 6th ACM Multimedia Systems Conference, is archived and accessible in the Stanford Digital Repository.

From the associated paper, we learn that image-to-video search, or I2V, is a real-world challenge. "Example applications for I2V are advertisement monitoring, video lecture search using slides, organizing and searching a personal video collection or an archive of video, and content linking where a relevant video has to be found based on an image from a certain event (e.g., from a website or news article)."

The Stanford I2V dataset is newsworthy to visual search researchers due to its size. As the paper says, "... Previous work was mostly limited to evaluations on small or medium scale benchmark datasets. However, we think that I2V is now at a stage where algorithms and systems have to be evaluated on a larger scale to draw conclusions about their performance… Compared to existing datasets, Stanford I2V is more diverse and much larger."

I asked the depositor, André Filgueiras de Araujo, a few questions about his research and use of the SDR:

What do you hope to accomplish by sharing your data? How does it further your research?

"This is the first truly large-scale dataset for research on video search-by-image; sharing it is essential to allow the broad research community to make progress on this problem. Also, it provides more credibility to our own research work, since others can more easily reproduce and test the techniques we develop."

What are some of the challenges that you continue to face in your research?

"Processing such large-scale data is difficult due to high computational and storage costs, so one challenging aspect we continue to work on is to make our systems faster and lighter, while still high-performing."

As a graduate student, are you learning about the role that digital repositories will play in your career? How is the Stanford Digital Repository positioned to support your work?

"The Stanford Digital Repository provides our research group with effective storage solutions, allowing us to easily share our data -- a crucial aspect to ensure continuous progress in the field. As a graduate student in a field where shared datasets are so important for research to make progress, I believe digital repositories are a very useful tool."

André and his colleagues will present their work at the ACM Multimedia Systems 2015 Conference in Portland, Oregon, later this month.

This Data Story was written by Hannah Frost.

Find out more about the data featured in this Data Story.