From surfing to diving: a new approach to scientific research

Alibek Jakupov
Aug 30, 2021
5 min read

Updated: Nov 19, 2021

We live in a world where there is more and more information, and less and less meaning.

Jean Baudrillard

Infoxication

The term "Information overload" was first used in Bertram Gross' 1964 book, The Managing of Organizations and was further popularized by Alvin Toffler in his bestselling 1970 book Future Shock. Speier said that if input exceeds the processing capacity, information overload occurs, which is likely to reduce the quality of the decisions.

In our age of Internet this infobesity is associated with excessive viewing of information and its over-exposure. As a result, a reader is inundated with too much data and too little time to comprehend it all.

Turning the information into knowledge

And when it comes to the scientific research the situation becomes more critical. Having unrestricted access to the Web, allows the scientists to manage their own research using search engines which helps to find any information quickly.

But turning the information into knowledge is very tricky. Any published article may not always be reliable. Of course, we do not talk about the lack of authority-approval or a compulsory accuracy check before publication, as there is a proof-reading process before the article is finally published. In our case, it concerns a more profound issue, as we do not know whether we can trust the paper or not.

Internet information lacks credibility as the Web's search engines do not have the abilities to filter and manage information and misinformation

There's a so-called h-index, an author-level indicator that takes into account both the productivity and citation impact of the publications. This indicator represent important parameters such as winning the Nobel Prize, being accepted for research fellowships and holding positions at top universities. Thus, at least you have a reliable indicator to filter out the useless information junk. However, when you are discovering a domain, you will definitely start with a broad research with a plethora of highly ranked papers. This results in researcher having to cross-check each scientific publication before using it for decision-making, which definitely takes up more time.

You may have probably seen many techniques allowing to read the scientific papers faster, like skimming the paper, reading the abstract, reading the conclusion, and finally reading the methods. Whereas this methodology may be useful for domain experts, for PhD students, willing to discover a completely new domain, this still remains a challenge. So, when you want to seriously read the paper you find relevant, you need to :

Read the paper in its entirety
Look through a few of the previous papers from that group
Read a couple of articles on the same topic
If there's a statement you find particularly interesting get the reference (if any) and look it up
Should you need more detail, access any provided data repositories or supplemental information, for instance, if there's a concept you are not familiar with

In general, for good papers, the process takes 1-2 weeks. Why so long? Simply because we want to get knowledge out of this paper, not simply read it and mark a point as done in our ToDo list. This is a hard but a good way to read the papers. The great thing is that after reading one paper in such a way, you get more familiar with the domain. The bad thing is that it's time consuming. Moreover, when diving deep into one paper, you risk to loose the overall view, or so-called 'big question'.

Define the "Big Question"

Big question is not “What is this paper about?” but “What problem is this entire field trying to solve?”. For this, you need to find a perfect balance between "surfing" and "diving", in other words, look through all the papers and yet have a good understanding of what these papers are about. Not a trivial task at all, let me assure you.

Moreover, what if we wanted to visualize the domain? See the papers, their impact on the domain, see how the field grows, see the new fields to appear and vanish, and watch the different domains fusion to produce something great and beautiful? There are two ways to do that. Either, you have a phenomenal memory and imagination, or you look for great visualisation tools. I've found two of them:

Connected papers : a tool allowing to have a visual overview of a new academic field
Scinan: a graphical browser, that offers a new knowledge exploration of a descriptive and flexible knowledge database, tailored precisely for each case

The latter allows you not only to make a graphical map of an academic field, but also to manipulate it, see the sub domains and papers in a 3D space, 'travel' through the mapping.

What is important, is that you keep reading in the same 'serious' way (i.e. reading the paper in its entirety, studying the unknown concepts etc.), but with your personal GPS navigator, which allows you to see the right direction.

Semantic Search

Thus, 3D mapping seems to be a perfect tool for the research, right? It looks so, but, as we know, there is always room for improvement, so here are some ideas to create just a perfect tool for the research, like integrating the AI-powered semantic research. This idea has been inspired by Microsoft Search.

A couple of years ago at Ignite, Microsoft officials introduced "Bing for Business," which was part of its plan to enable Bing to work as an intranet search service, not just an Internet search service. At Ignite 2018, Microsoft rechristened this capability as "Microsoft Search in Bing." This intranet-centric Microsoft Search technology will be integrated into the new Chromium-based Edge, Windows 10, Office.com and various Office apps and more.

In other words, the ultimate motivation behind this idea is to provide insights not just to simple Google-like queries, but also more personalized, complex ones, such as "How to perform an ablation study for deception detection?" or "What is the best way to vectorize your textual data".

What could be helping power in this Semantic Search over 3D mapping is Microsoft Research's "Project Turing" which officials defined as as the codename for a large-scale, deep-learning effort inside Microsoft. AI-powered semantic search based on Turing model would help get around the limitations in current matching algorithms developed in the 1970s and 1980s that are based on terms and not the concepts. This term matching lacks in understanding the user query expressed in natural language. In this way, the researcher has to click through multiple documents to find the exact paper(s) they're looking for.

Our goal is natural-language understanding using state-of-the-art, generalizable end-to-end deep learning models

Youngji Kim, Principal Program Manager

Thus, Turing in tools like Scinan would help to understand semantics and query intent via searching by concept instead of keyword. Semantic understanding means researches/PhD students/scientist don't need to make their queries contain keywords to have exact word matches in the search results (e.g. when searching for "word embeddings" the results containing "word2vec", "skipgram" or "GloVe" would also appear as indirect matches).

The Turing researchers are employing machine reading, as well, to help with contextual search/results.

Mary Jo Foley, journalist

We all live in a fascinating world where innovations can improve our learning, thus allowing passionate people make our lives better, step by step. It sounds to me that in a couple of years we will see a technological breakthrough in all the fields of study, thanks to a better search engine for scientists.

Hope you found it interesting!

From surfing to diving: a new approach to scientific research

Infoxication

Turning the information into knowledge

Define the "Big Question"

Semantic Search

Recent Posts

Comments