I have a data set which shows me whether a person accepted a training course, declined the course or there has been no response to it.
Within this dataset, amongst others, I have columns such as:
* Unique person ID
* Unique training course ID
* Course status (accepted, declined, pending)
* City (that they're based in)
* Age Range
* Profession (grouped by categories)
The analysis I'm trying to do is to understand where we have the best and worst rates of people accepting the courses, and then will look to do further analysis on why this might be and how to improve this.
To do the analysis, I've essentially created a bunch of pivot tables in Excel that show my data in various ways: e.g accepted course % by age range, accepted course % by city, accepted course % by city AND age range (to see something like was there a higher rate for 18-24 year olds in LA vs New York).
I have a two part questions:
1) Is there a more efficient/better way to do this?
2) Now that I have got a list of %'s, split by different variables, how can I show if these are of any significance. The %s will vary based on sample sizes for each of the individual cuts and I could have a higher % for one cut simply by the fact that the sample size is so small and so individuals skew the % a lot, so it could be that this is not statistical significant.
My total sample size is of around 2k rows in Excel, but this gets smaller based on certain cuts I do.
Any thoughts/advice on how to take process to the next part of this would be very helpful.
Thank you all