Scientific Articles Crawler and Processing Application  

Project Domain / Category

Web Application 

Abstract / Introduction

A rapid growth of research articles is creating a problem of information overload for the researchers. Due to which both novice and expert researchers find it very difficult to download research articles of some specific journal/conference or form a web page. Therefore, there is need of application which will be able to download all research articles form a specific journal or conference or from a web page. To overcome this problem, we will develop a research articles crawler application which will be able to download freely available scientific articles of user interests in the form of PDF or Doc/Docx format and process those documents. 

Functional Requirements: 

1. SignUp: Create a Signup module. User will be required to register their self in the application  

2. Sign-In: Create a Sign-in module. Only registered user will be able to use the application 

3. Articles Scraping and Downloading with Creation of Web Pages: Make a webpage which will take URL of some conference/journal or a webpage and download all related scientific articles. 

4. Download Status: Show all download articles titles in the form of list over the webpage at run time below the input URL text box and download button.  

5. Maintain Articles History: Show downloaded articles history on a separate webpage. 

6. Browse Downloaded Scientific Articles for Processing: Create another webpage through which you can browse and select one or more pdfs from the downloaded pdfs. 

7. Process and Store Data:

Extract different sections of downloaded pdfs e.g. (Title, Authors, Keywords, Abstract, References). Save it in Excel file or CSV file column wise e.g. first column name is “Title”, Second is “Authors” and up to soon. 

8. Convert CSV/Excel to JSON File Format: Create another page which will convert this CSV or Excel file to JSON file.  

Allowed Tools:  

Programming Language: Python Framework:  Django or Flask IDE:   PyCharm, Visual Studio or any other Database: MySQL, MongoDB or any other 

Leave a Comment