• Alibek Jakupov

Overview of modern approaches applied in conversational modeling (Part 1)

Updated: May 17



Personal Digital Assistants (PDAs) and agents with a conversational interface are increasingly popular technologies that have become primary focus for many software developers. Due to the rapid development of natural language understanding technologies providing high-bandwidth information ow and dialog management we can observe the rise of this domain.


Currently PDAs are promoted as gateways to applications and services that provide a meta layer of intelligence that can arbitrate between apps for a given user query [12].


They have been demonstrated to be highly effective in a wide range of tasks, as for example suggesting a user to do something based on the events it has been tracking (proactive assistance) or responding to the user's explicit spoken or typed request (reactive assistance). Some of outstanding examples of such systems are Google Now (http://www.googlenow.fr/google-now/), Amazon's Alexa (https://developer.amazon.com/fr/docs/alexa-voiceservice/api-overview.html), Apple's Siri (https://www.apple.com/ios/siri/) and Microsoft's Cortana(https://www.microsoft.com/fr-fr/windows/cortana). Naturally there exists a set of requirements that industrial scale PDA platforms must satisfy [12].

  1. the breadth of language understanding domains and experiences

  2. the naturalness of user language

  3. the complexity of dialogs

  4. the range of modalities and devices the PDA can interact with

  5. the supported range of expertise of experience authors

  6. the latency and capacity of back-end or cloud services that can be accommodated

  7. the overall latency and accuracy of system responses

  8. allowable costs, e.g. computational, implementation, and maintenance

  9. support for development of uniform user experiences, e.g. the PDA `personality'

  10. support for easy upgrading of experiences

In many cases most of the developers are struggling to find the optimal balance between latency and accuracy. For instance, Cortana's automation level ranges from fully-automated dialogs to human-in-the-loop, the latter allowing more complex queries to be handled by a human agent [12]. Approach applied in Google's seq2seq consists in mapping a sequence to another sequence (for example mapping between queries and responses using recurrent

networks) [15]. This may be useful for conversational modeling, or in other words predicting the next sequence given the previous sequence or sequences. Surprisingly this approach can do well on generating fluent and accurate replies to conversations [15]. Another technique, applied in Google's Smart Reply, is based on response ranking that may demonstrate a common sense knowledge. However the majority of existing solutions use standard approach based on the intents and entities [7].


This attitude proves to be efficient when working with simple tasks like creating an alarm or making restaurant reservations. Thus a selling point of such systems, as often claimed, is that they can enable users to get many things done via a single entry point, i.e. replacing a

searching for a forgotten site or application [11].


In the next part we are going to discuss the issues caused by these techniques.


References

[1] D. Ameixa, L. Coheur, P. Fialho, and P. Quaresma. Luke, i am your father: Dealing with out-of-domain requests by using movies subtitles. In T. Bickmore, S. Marsella, and C. Sidner,

editors, Intelligent Virtual Agents, pages 1321, Cham, 2014. Springer International Publishing.

[2] A. S. Ashoor. Anomaly detection algorithm using multiagents. International Journal of Scientic and Technology Research, 1, 2012.

[3] P. S. Chaitanya Chemudugunta and M. Steyvers. Text modeling using unsupervised topic models and concept hierarchies. CoRR, abs/0808.0973, 2008.

[4] S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. Commun. ACM, 54(8):88 98, Aug. 2011.

[5] M. I. J. David M. Blei, Andrew Y. Ng. Latent dirichlet allocation. Journal of Machine Learning Research, 3, 2003.

[6] J. A. Hansen, E. K. Ringger, and K. D. Seppi. Probabilistic explicit topic modeling using wikipedia. In I. Gurevych, C. Biemann, and T. Zesch, editors, Language Processing and Knowledge in theWeb, pages 6982, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.

[7] M. Henderson, R. Al-Rfou, B. Strope, Y. Sung, L. Lukacs, R. Guo, S. Kumar, B. Miklos, and R. Kurzweil. Ecient natural language response suggestion for smart reply. CoRR,

abs/1705.00652, 2017.

[8] D. N. T. B. Jey Han Lau, Karl Grieser. Automatic labeling of topic models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, page 15361545. Association for Computational Linguistics, 2011.

[9] D. M. A. L. S. R. S. Margalit R. S., Roter D. Electronic medical record use and physician-patient communication: an observational study of israeli primary care encounters. Patient Education and Counseling, 1, 2006.

[10] R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proceedings of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing, July 2004.

[11] R. Sarikaya. The technology powering personal digital assistants. In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden,

Germany, September 6-10, 2015, 2015.

[12] R. Sarikaya, P. A. Crook, A. Marin, M. Jeong, J.-P. Robichaud, A. Celikyilmaz, Y.-B. Kim, A. Rochette, O. Z. Khan, D. X. Liu, D. Boies, T. Anastasakos, Z. Feizollahi, N. Ramesh, H. Suzuki, R. Holenstein, E. Krawczyk, and V. Radostev. An overview of end-to-end language understanding and dialog management for personal digital assistants. IEEE, December 2016.

[13] D. Tarasov. В поисках разума: можно ли сделать “универсальный” чат-бот с помощью нейронных сетей?, October 2017.

[14] D. S. Tarasov and E. D. Izotova. Common sense knowledge in large scale neural conversational models. International Conference on Neuroinformatics. Springer, Cham, 736, 2017.

[15] O. Vinyals and Q. Le. A neural conversational model. 2015.

[16] G. Zhu and C. A. Iglesias. In SumPre 2015 - 1st International Workshop on Summarizing and Presenting Entities and Ontologies Co-located with the 12th Extended Semantic Web Conference, Portoroz, Slovenia, June. ESWC.

 
  • Twitter
  • LinkedIn

Since 2018 by ©alirookie