Introduction to the Project
- LSA-Bot is a new, powerful kind of Chat-bot focused on Latent Semantic Analysis.
Using LSA it is possible to relate words to their vectorial representation, permitting to realize an intelligent chat-bot that can understand human language and can answer to natural language questions as well.
Project hosted on Sourceforge.net
Some informations about LSA-Bot
- I developed LSA-bot at university since 12-sept-2004 (first class birthdate).
- LSA-Bot is written in Java and it works thanks to the LSA (Latent Semantic Analysis) theory applied to a large amount of text documents (corpus). There are many chat-bot systems, most of them are using the AIML language to recognize users’ questions. Such bots can answer to the users, though the botmaster has to think about all kind of questions an user could possibly ask to the bot.
- Using LSA is possible to give some intelligence to the chat-bot, permitting to ignore, for instance, wrong words, stop-words and everything that is not needed for the deep meaning of a sentence.
- LSA-Bot uses the vectors related to every words found in the corpus to compute the distance between user’s question and all possible answers, that can be simplest sentences, small documents, or whatever the programmer wishes to do. Word’s vectors are obtained using the Singular Value Decomposition (SVD) onto the matrix built from words’ occurrences in the documents, using Matlab or other software that permit a singular value decomposition. Obtained the needed vectors, LSA-Bot uses them to create a vector for every word, and every question an user can compose. The distance among the question and the likely answers can be done by computing either a cosine, rejection over projection or tanimoto distance. The answer related to the vector that satisfies the minimum distance will be shown to the user.
- Another feature is that the knowledge-base of LSA-Bot can be improved (learn-mode) by specifying a new sentence the bot has to learn; a new representing vector will be therefore computed and added to knowledge-base.