The Technical Solution
We built a dual-encoder NLP recommendation engine that matches news headlines to semantically relevant archive photos using GloVe (50d) word embeddings and Universal Sentence Encoder v4 (512d) for cosine similarity scoring. The system processes headlines in real-time, ranks photos by semantic relevance, and surfaces the top matches to editors instantly.
python
# Dual-encoder similarity scoring
import tensorflow_hub as hub
from sklearn.metrics.pairwise import cosine_similarity
# Load Universal Sentence Encoder v4
use_model = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
def get_recommendations(headline: str, photo_descriptions: list, top_k=5):
"""Match headline to archive photos via cosine similarity."""
headline_emb = use_model([headline]) # shape: (1, 512)
photo_embs = use_model(photo_descriptions) # shape: (N, 512)
scores = cosine_similarity(headline_emb, photo_embs)[0]
top_indices = scores.argsort()[-top_k:][::-1]
return [(photo_descriptions[i], float(scores[i])) for i in top_indices]PythonTensorFlow HubUSE v4GloVescikit-learnFlask