Towards building a word similarity dictionary for personality bias classification of phishing email contents


Phishing attacks are a form of social engineering technique used for stealing private information from users through emails. A general approach for phishing susceptibility analysis is to profile the user’s personality using personality models such as the Five Factor Model (FFM) and find out the susceptibility for a set of phishing attempts. The FFM is a personality profiling system that scores participants on five separate personality traits: openness to experience (O), conscientiousness (C), extraversion (E), agreeableness (A), and neuroticism (N). However, existing approaches don’t take into account the fact that based on the content, for example, a phishing email offering an enticing free prize might be very effective on a dominant O-personality (curious, open to new experience), but not to an N-personality (tendency of experiencing negative emotion).

Therefore, it is necessary to consider the personality bias of the phishing email contents during the susceptibility analysis. In this paper, we have proposed a method to construct a dictionary based on the semantic similarity of prospective words describing the FFM. Words generated through this dictionary can be used to label the phishing emails according to the personality bias and serve as the key component of a personality bias classification system of phishing emails. We have validated our dictionary construction using a large public corpus of phishing email data which shows the potential of the proposed system in anti-phishing research.