Проект по машинному обучению, срочно (Python или R)

10 000 руб. за проект
21 марта 2021, 21:51 • 0 откликов • 22 просмотра
Описание работы:

Instructions

Your client is a car insurance company. They want to price their car insurance competitively, which means having a good model for customers at risk of getting into accidents. They have shared with you a sample of data in CSV format (attached to the email that included these instructions) that they would like you to analyse. Each row corresponds to a customer, the outcome column records whether the customer made a claim in the previous year or not. The client has informed you that the other columns should be self-explanatory.

Note: The data for this exercise has been generated randomly, so may display some regularity that would not be expected of real world data

1. Obtain the rest of the data


The data you've been given is just a sample. The rest of the data can be obtained from an API, the details of which are given below:

Note: The data available through the API includes the data we supplied in CSV format. If you obtain data from the API, don't combine it with the data we supplied you as you will have duplications in your dataset.

Resource URL

//////////////////////////////////// Resource Information

● Response format: JSON

● Requires authentication: Yes, via a key supplied in the request header

● Max records per query: 1000

Graphical user interface, application, table<br/ data-verified=
Description automatically generated">

Authenticating


To access the API, an API key needs to be passed as the value to the UserAPI-Key header. Your api key is supplied in the api-key.txt file sent to you with these instructions. Hint

If working in Python, you will find the requests package useful. If working in R you will find the httr package useful. Alternatively you can use cURL or httpie from the command line if you prefer.

The API will only return 1000 records per request. There are 100,000 records in total, so you will need to make multiple requests to get everything. To do this, make sure you request that results are sorted by id and use the min_id argument to request only records you haven't already seen by setting it to something larger than the records you have so far.

2. _Exploratory Questions

The client has some specific questions that they would like you to answer that they have not been able to answer themselves. Perform any preprocessing / cleaning of the data necessary to answer these questions.

2.1 What proportion of customers with a credit score below 0.2 made a claim in the last year?

2.2 What is the average number of speeding violations among customers with driving experience between 20 and 29 years (inclusive)?

2.3 What if you consider only the people in this group who drive a sports car?

2.4 What is the standard deviation in annual mileage?

As well as these specific questions, you suspect that they really just want to understand their data better.

2.5 Are there particular customer types?

2.6 What do claimants have in common?

2.7 How does the number of claims vary between postcodes?

Explore the data and present some of your findings to help the client understand their data better. These could be summary statistics, or visualisations.

In addition to understanding the data they have, the client is interested to know how they should collect data in the future in order to better support data science work.

2.8 Are there any problems with the data you've been given that should be kept in mind when modelling?

2.9 Has the client collected the right data for their business needs?

2.10 What recommendations would you make to the client for future data collection?

3. _Modelling

The client is interested to know if the customer data can be used to predict the likelihood that a claim is made in the next year. Your task is to investigate this and make a recommendation. You should complete the following tasks:

3.1 Briefly discuss any assumptions being made about the data,

3.2 Build a proof of concept model to predict the outcome column from the customer data, including any necessary data processing,

3.3 Test your model using appropriate metrics and state how you would expect it to perform on unseen data.

3.4 The client is keen to be able to interpret the model you build, and would be particularly interested in understanding which features are most important to the model's decisions.

3.5 They're also new to data science and interested in how this exciting new model you've built them works. Write a brief (no longer than a paragraph) description of how your model works that can be understood by someone without a technical background.

Submitting

Please send us a Jupyter Notebook or RMarkdown document as an ideal format for combining your code, documentation and presentation of the results into a single file.

Файлы