Your first task consists of fixing these issues by scraping the urls when necessary and populating the empty fields
Ask Expert

Be Prepared For The Toughest Questions

Practice Problems

Your first task consists of fixing these issues by scraping the urls when necessary and populating the empty fields

CODING ASSESSMENT - ML FOR PEACE PROJECT

We are interested in enhancing our prediction models to correctly determine civic space changes for a country for a time-series data. Below is an example of our work and related tasks which you would need to finish. 

PROBLEM STATEMENT - 

You are given a dataset which consists of data scraped from public domain news sources. This crawled data is scraped from three sources - leconomiste.com, elheraldo.hn and reuters.com. Your objective is to write a machine learning model that is able to classify the events into categories. You have been provided with a sample train and test set. You would have to run your analysis on the test data(test.xlsx) and curate the results. 

REQUIREMENTS -

In order to get the results, you might need to fulfill some tasks. The coding assessment needs to be fulfilled in Python. The tasks are divided in the following way:

Task 1. You’ll notice that there are a series of problems with the articles given to you in the test data: either they lack the title, the text of the article, or the date of publication. Your first task consists of fixing these issues by scraping the urls when necessary and populating the empty fields. You can use any scraping library (Soup etc). 

Hint: For the text, you can take the only first paragraph of articles, wherever it is available. 

Task 2. In order to classify events, they need to be in English language. So, as a second step in the preparation of the data, you need to translate the title and the text of the articles from the language they are in to English(for both train and test). You can use any API to complete this task. We recommend using Facebook’s huggingface models. The models are designed on ISO-2 codes, find the appropriate model for your task provided here. Follow the hugging face installation instructions to install the libraries. 

Task 3. You would need to determine the civic-space event category for this data. Feel free to model what you think would be appropriate for this task. The categories are labeled into one of the three categories: ‘arrest’, ‘disaster’ and ‘violencelethal’. 

Hint: More importance would be given to the approach towards modeling with text data, compared to the accuracy of your model.

Follow - up (Bonus): As you might have noticed already, stories include different actors. We are also interested in the 'who did xxx to whom' type of questions. For instance, we would like to know who is arrested and by whom for the' arrest' category. In addition to extracting named entities (e.g., person in this case), this requires classifying the actors using specific actor classification schemes such as Cameo. For instance, a working model should identify Angela Merkel as the person in the next sentence, but it should also classify her as a government actor:

“German Chancellor Angela Merkel said on Wednesday that social distancing rules to contain the spread of the coronavirus would remain in place until at least May 3 but some shops could reopen next week.”

Obviously, there is no actor in certain events (such as disasters). Please summarize how you would approach this actor extraction problem. Be specific about the steps you will follow, as well as any models you will use. You can demonstrate your logic by using the actual events provided to you.

Task 4. Imagine you are using database storage for this exercise. How would you plan to store this data in unstructured format? Write code/pseudocode that would allow you to transform the output csv dataframe that you have already processed into a mongodb collection. 

RESULTS - 

You will need to submit a csv file with the predictions and your script(s). Please give a small write up which explains your methodology, including how the data was preprocessed, which algorithms you used to generate your model, and the rationale behind your choice to do so. If you have questions, please make an assumption and explain any assumptions that you have made in the write up. 

test.xlsx

train.xlsx

Hint
Computer In computer science, an algorithm is a particular methodology for taking care of a distinct computational issue. The turn of events and investigation of calculations is essential to all parts of software engineering: computerized reasoning, information bases, illustrations, organizing, working frameworks, security, etc....

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

1
img

Submit Question

Post project within your desired price and deadline.

2
img

Tutor Is Assigned

A quality expert with the ability to solve your project will be assigned.

3
img

Receive Help

Check order history for updates. An email as a notification will be sent.

img
Unable to find what you’re looking for?

Consult our trusted tutors.

Developed by Versioning Solutions.