Use the dataset here Download the dataset herefor this question.
The data set has,
EXRATE: the exchange rate for the Mexican Peso (pesos per dollar), the value for the rate is taken at the end of the month
LAGEXRATE: the lagged exchange rate
MXINFL: the Mexican inflation rate
USINFL: the U.S. inflation rate. The observations run from January 2015 to December 2020. There is also a trend variable, T, which is 1 in the first period and increases by 1 each period. The file also has a column indicating the year and another indicating the month.
Details on the data sources will be available with the key.
a) Construct a histogram of the exchange rate. Use the option breaks=5. Based on the histogram (i) the data are highly left skewed (ii) the data are highly right skewed (iii) the distribution is approximately uniform (iv) the distribution is approximately bell shaped.
[ Select ] [“(i)”, “(iv)”, “(ii)”, “(iii)”] .
b) Now construct a histogram of the exchange rates before 2020. That is, exclude the last 12 observations from your graph. Does your graph for the pre 2020 observations have the same basic appearance as for the entire data set?
[ Select ] [“No, the graphs are noticeably different”, “Yes, the graphs are similar.”] .
c) Create a new variable that is the difference in the inflation rates. Take the Mexican inflation rate and subtract the U.S. inflation rate to get the difference. Find the summary statistics for this new variable. What is the mean for this variable?
[ Select ] [“0.1106”, “0.086”, “0.0782”, “0.0954”] .
d) Other things equal, if country A has persistently higher inflation than country B we expect that country A’s currency will depreciate. That is, country A’s currency will buy fewer units of country B’s currency over time. Regress the exchange rate on the lagged exchange rate and the difference in the inflation rates. Are the results consistent with previous sentences in part (d)? That is, do the estimated coefficients have the signs implied by the previous statements about inflation and currency depreciation? Reminder: the exchange rate is specified as Pesos/Dollar.
[ Select ] [“No”, “Yes”] .
e) Conduct a test for autocorrelation. The test leads you to the conclusion that the errors are
[ Select ] [“correlated”, “not correlated”] .
f) If you conclude that the errors are correlated use the Cochrane-Orcutt procedure to estimate the coefficient on the difference in inflation rates. If you conclude the errors are not correlated, just report the OLS estimate for the coefficient on the difference in inflation rates. The estimated coefficient is [ Select ] [“0.3652”, “0.3708”, “0.3513”, “0.3431”] .
g) Exchange rates will also fluctuate with trade policy changes. Create a dummy variable (aka indicator variable) for the Trump administration. This will be 1 for November 2016 through October 2020; while the inauguration takes place in January, markets tend to adjust in anticipation of future changes so use the time frame stated. Before November 2016 this variable will be 0 and after October 2020 this variable will also be 0.
Run an OLS regression where EXRATE is the dependent variable and the three explanatory variables are the lagged exchange rate, the difference in the inflation rate and the dummy variable. Based on the signs of the estimated coefficients does it appear that the Trump administration led to a strengthening of the dollar or a weakening of the dollar? Reminder: strong dollar means lots of Pesos per dollar and weak dollar means few Pesos per dollar. [ Select ] [“weaker dollar”, “stronger dollar”] .
h) Create a plot of the exchange rate against time or against a default index in R. That is, create a plot of the exchange rates where the left most value is the exchange rate from the first period and then the values are ordered by time such that the last value, the right most value, is from the last period in the data set. Immediately after creating the plot give the following command to R, abline(v=22.5, col=”red”, lwd=2). This creates a vertical line between the October and November 2016 values. Also, run this line of code, abline(v=52.5, col=”blue”, lwd=2). This creates a vertical line between the February and March 2020 values which are pre and post COVID-19 related quarantine implementations. Suppose you were regressing the exchange rate on the time trend up until before the COVID-19 quarantines; that is, until March 2020. You are no longer including a lagged value in the regression but instead using T as the explanatory variable. Based on the plot you just created choose the best statement. (i) it appears appropriate to include a dummy variable for the Trump administration but not an interaction term (ii) it appears appropriate to include an interaction term but not the dummy variable (iii) it appears appropriate to include both the dummy variable and an interaction term (iv) it appears that neither the dummy variable nor the interaction term belong in the model.
[ Select ] [“(i)”, “(ii)”, “(iv)”, “(iii)”] .
i) Create a dummy variable (indicator variable) for the first four months of the COVID quarantine period. That is, a variable which is 1 for the months of March-June 2020 and 0 for all other months. Regress the exchange rate on the lagged exchange rate, the difference in the inflation rates and the two dummy variables. There are four explanatory variables in your model, one is a dummy for the Trump administration and the other is an indicator for the first 4 months of quarantine. This is an OLS regression.
Do the errors appear to be heteroskedastic? You can use a combination of plots and formal tests to answer this.
[ Select ] [“NO”, “YES”] .
j) Test the hypothesis that the errors are correlated. Use ?=.1 for this test. Do you reject or fail to reject the null hypothesis of independent errors?
[ Select ] [“Fail to reject”, “Reject”] .
k) Based on your regression results does the quarantine appear to have had an impact on the exchange rate?
[ Select ] [“NO”, “YES”] .
l) Using this last model what is the estimated standard deviation of the residual?
[ Select ] [“0.53 dollars”, “0.49 pesos”, “0.87 pesos”, “0.35 dollars”, “0.7 pesos”] .
Category: R
-
“Analyzing Exchange Rate Data for the Mexican Peso and US Dollar” “Effects of COVID-19 Quarantine on Exchange Rates: An OLS Regression Analysis”
-
Title: Fine-Tuning Data Mining Methods for Classification Using R Software
d) Data Mining
i. Use the chosen data mining methods for exploring, analyzing, and extracting
important information from the prepared data set.
ii. Perform the data mining process based on the chosen method by using R
software.
iii. You are required to fine tune the parameter setting of the data mining methods
in order to achieve high quality of model. Show the parameter tuning process
and select the best parameter setting as default setting.
iv. Describe the data mining methods, the resulting data mining models, and any
important information obtained from the mining process.
fyi, i have done with data preparation,now i need u to help me to data mining in classification (• Decision Tree
Support Vector Machine • Naïve Bayes) only • Neural Network
• K-Nearest Neighbour -
“Analyzing Gold Coin Sales on eBay: A Statistical Analysis” “Analysis of Factors Affecting Auction Sales for Gold Coins”
Download the dataset here Download the dataset herefor this question.
The data set contains information on sales of 1oz gold coins on eBay. Further details will be available in the key after the exam ends. The file contains the following variables:
DATE: date of the sale
SALE: final selling price of the coin
GOLDPRICE: price of gold, one ounce, at the end of trading on the date of the sale, or, if the sale is on a weekend or holiday, the end of the previous day of trading.
BIDS: the number of bids submitted for the auction (these eBay sales were in an auction format)
TYPE: E for Eagle or a US coin, KR for Krugerrand or a South African coin and ML for Maple Leaf or a Canadian coin.
SHIPPING: cost of shipping; this is an additional fee the buyer must pay so that SALE+SHIPPING is the total cost to the buyer.
SLABBER: P for PCGS, N for NGC or U for not slabbed; a slabbed coin is a coin inside a tamper proof holder that also indicates the coin’s condition or grade.
GRADE: the grade of slabbed coins. If SLABBER=’U’ then this is 0.
other: additional characteristics of slabbed coins are noted here; example FD means the “slab” or coin holder notes that the coin was minted on the first day of minting and FDIFLAG means that it is labeled was first day of issue and the holder has an image of a flag on it.
a) What is the average for SALE?
[ Select ] [“1915”, “1651”, “1654”, “1930”] .
b) What is the maximum for BIDS?
[ Select ] [“55”, “60”, “57”, “62”] .
c) Create a boxplot of SALE. You should see that there are 3 (three) outliers. Look at those three observations and choose the correct statement. (i) the observations either have only 1 bid or other=”BURNISHED”, (ii) the observations all have SLABBER=”P”, (iii) the observation(s) with low value(s) for SALE has/have only 1 or 2 bids while the observation(s) with high value(s) for SALE has/have numbers of bids near the maximum, say within 5 of the maximum, (iv) the observations either have other=”ME” or “LD”.
[ Select ] [“(ii)”, “(iii)”, “(i)”, “(iv)”] . d) In R type the following command, table(yourdataset$other), where yourdataset is the name you gave to the dataset with the ebay coin sales. This will produce a table showing the value for “other” and the number of observations which have that value. For example, it will show the value “0” and under that the number 20, meaning that there are 20 observations where other is 0 and then it will show BURNISHED and under that a 2, meaning there are 2 coins where other is BURNISHED. How many observations are there where other is “LD”?
[ Select ] [“4”, “6”, “1”, “2”] .
e) Run a regression where SALE is the dependent variable and GOLDPRICE, BIDS, and SHIPPING are the explanatory variables. Consider the following statements and select which ones are correct (1) although there is little explanatory power the model is basically a good model (2) the model has minimal explanatory power (3) none of the independent variables have statistically significant coefficients at standard levels of significance (4) at least 1 of the estimated coefficients has the wrong sign, (5) some combination of items (2), (3) and (4) suggest this is not a good model.
[ Select ] [“(1), (2) and (3)”, “(1) and (3)”, “(2) and (4)”, “(1) and (2)”, “(2) and (3)”, “(2), (3), (4) and (5)”] .
f) Run a regression where SALE is the dependent variable and GOLDPRICE, BIDS, SHIPPING and a set of dummy variables for the values of other are the explanatory variables. NOTE: remove the observations where other=”ME” since there is only one such observation. Because there is only one observation with “ME” it will have a residual of 0 since the “ME” will perfectly explain why it is different from all other observations. This means that your regression is run with only 46 observations and you should see the df for the F statistic being 10 and 35.
What is the R2 value for this regression?
[ Select ] [“0.6268”, “0.6801”, “0.431”, “0.5527”, “0.3496”] .
g) Using this model what is the expected value for SALE for an auction with a gold price of $1650, 5 bids, free shipping (SHIPPING=0), and other= FDIFLAG?
[ Select ] [“$1988”, “$1945”, “$1956”, “$1919”, “$1972”] .
h) Is the coefficient on FDIFLAG statistically significant at the 0.05 level?
[ Select ] [“NO”, “YES”] .
i) Examine the residual plots. Find the observation with the largest absolute residual and the observation with the largest Cook’s Distance. Identify the correct statement. (i) the observation with the largest absolute residual is an outlier in the residual space and this is due to an extremely low sale price which might relate to only receiving one bid (ii) the observation with the largest Cook’s Distance is influential and has high leverage which might be because it has an unusual grade, GRADE, for a slabbed coin (iii) the observation with the largest absolute residual is an outlier in the residual space and this is due to an extremely high sale price which might relate to the unusually high price for gold at the time of the sale (iv) the value for the largest absolute residual is not an outlier and the largest value for Cook’s Distance does not qualify as being influential.
[ Select ] [“(i)”, “(iii)”, “(ii)”, “(iv)”] .
j) Remove the observations or observation from part (i) that had the largest absolute residual and the largest Cook’s Distance. If those are the same observation then remove only one observation. If they are different then remove them both, i.e., two observations. With this smaller dataset (which also has other=”ME” removed from before) regress SALE on GOLDPRICE, BIDS, SHIPPING and a set of dummy variables for the values of other. The estimated coefficient on GOLDPRICE is [ Select ] [“2.5083”, “1.5076”, “2.1763”, “1.763”, “2.0756”] .
k) Using the most recent model, from part (j), test the hypothesis that the coefficient on GOLDPRICE is 1. The t test statistic for this test is
[ Select ] [“1.232”, “1.733”, “0.833”, “1.497”] .
l) Examine the model results, from part (j). Based on these results, if you were auctioning off a gold coin to maximize your revenue, would you rather offer free shipping or would you rather charge $7.5 for shipping?
[ Select ] [“Offer free shipping.”, “It doesn’t appear to matter.”, “Charge $7.5 for shipping.”] .
m) Again, using the model from part (j), test whether the errors have constant variance using the test covered in the lectures. What is the p-value? -
“Applying the Principles of Good Design in R: Solving a Real-World Problem” Problem to Solve: As a graphic designer, I want to create a visually appealing and effective flyer to promote a local charity event. I need to determine the
Review The Power of Good Design and select three of the ten principles noted for good design. Next in R, utilize these three principles in a problem that you will solve. First note the problem to solve, the dataset (where the information was pulled from), and what methods you are going to take to solve the problem. Ensure the problem is simple enough to complete within a two-page document. For example, I need to purchase a house and want to know what my options are given x amount of dollars and x location based on a sample of data from Zillow within each location. Ensure there is data visualization in the homework and note how it relates to the three principles selected.
-
“Exploring the Relationship Between Income and Education Level: A Data Analysis using R Programming”
This project requires you to strictly follow the requirements (I have uploading requirements in the attachment). This project requires the completion of R programming and a report written in APA format based on the data conclusions drawn from the R programming content.