The document discusses detecting blogs independently from language and content. It presents research on developing a representation of web pages that can be used to identify blogs using machine learning, regardless of the blog's language, style, author or specific content. The research approach focuses on visual characteristics and structural features of blogs. Experimental results show the developed representation, called CRX, significantly improves blog classification accuracy compared to other representations like bag-of-words models. Future work is proposed to handle new languages, strengthen entity extraction and include temporal analysis.