emoji-semantic-search/server/app.py at main · lilianweng/emoji-semantic-search · GitHub

build

构造一个msg2emoji的json

msg2emoji = {
    "Hello! How are you?": ["😊", "👋"],
    "I'm doing great!": ["👍"],
    "What about you?": ["❓"],
    "Me too!": ["😄"]
}

转化为数组

descriptions = [
    "The emoji 😊 is about feeling happy.",
    "The emoji 👋 is about saying hello.",
    "The emoji 👍 is about showing approval.",
    "The emoji ❓ is about asking a question.",
    "The emoji 😄 is about expressing joy."
]

调用接口embeddings

[
    {"emoji": "😊", "message": "feeling happy", "embed": [0.1, 0.2, 0.3]},
    {"emoji": "👋", "message": "saying hello", "embed": [0.4, 0.5, 0.6]},
    {"emoji": "👍", "message": "showing approval", "embed": [0.7, 0.8, 0.9]},
    {"emoji": "❓", "message": "asking a question", "embed": [0.2, 0.3, 0.4]},
    {"emoji": "😄", "message": "expressing joy", "embed": [0.5, 0.6, 0.7]}
]

然后保存emoji-embeddings.jsonl.gz中,不用重复训练

从本地读取emoji-embeddings.jsonl.gz文件,然后格式化

请求embedding api, 获取向量

dotprod = np.matmul(self.embeddings, np.array(query_embed).T)

取20个最相似的返回