[R] VLMs Behavior for Long Video Understanding
About this article
I have extensively searched on long video understanding datasets such as Video-MME, MLVU, VideoBench, LongVideoBench and etc. What I have seen there these datasets are focused on different categories such dramas, films, TV shows, documentaries where focus on tasks like ordering, counting, reasoning and etc. I feel that multi-step reasoning is less explored and then what i have did i designed the questions with no options just ground truth and asked the VLM to give me the answer but VLMs unabl...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket