The gross mechanism seems to be, as you say, comparison of the time of arrival of matching sounds at each ear. Also, as you correctly say, sounds originating very close to the median plane of your head cannot be uniquely located according to that principle, since they could be anywhere in that plane. And of course there are similar, though less difficult ambiguities to locating sounds in other positions.
However though arrival time comparison is the obvious clue, it certainly is not the only one. For one thing, we make extremely heavy use of clues of context. Even if the sound of speech comes from one of two heads in a roughly the same direction, but the lips of the other are moving, we will be pretty sure to hear the sound coming from the moving lips. Similarly if we hear something drop, we are likely to hear the noise as coming from near our feet, not above our head.
Then again other external circumstances, such as sound absorbing and sound reflecting surfaces affect our perceptions and the way sound bends around our head, and more particularly, around our external ears, can give us surprisingly elaborate, though often misleading, clues as to the direction of the sources of sound.