We examined the obstructive factors to semantically integrating speech and iconic gestures in young children. We presented speech and iconic gestures via eight actions on video to 4‒6-year-old children (N = 49). They were then instructed to select the best-matched message from among four photographs. We assigned the children into two groups to examine whether participants are aware of the need to integrate speech and gesture information. One group was presented speech and gestures with directive words, and the other was presented speech and gestures without directive words. We followed Miyake and Sugimura (2016), but modified part of the photograph (i.e., the pose of the action and background) and the sentence end expression of speech (e.g., “Nage-masu”) to reduce the cognitive load of the participants when receiving the integrated speech and gesture information. The result showed that young children in the group with directive words were better at integrating speech and iconic gestures than the group without directive words in the revision task. The proportion that made the right choice when selecting speech and gesture was not diff erentiated by grade. Based on these results, we discuss the obstacles to integrated speech and iconic gesture information.