initW = np.random.randn(len(vocab.vocabulary_), embedding_dim).astype(np.float32) / np.sqrt(len(vocab.vocabulary_)) why there exists the np.sqrt(len(vocab.vocabulary))?what is the aim?