Menu

Suppose we are having an exit interview with two employees: John, and James. We have two simple questions: “What are the Pros of working at Company X?” and “What are the Cons of working at Company X?” Given that one of those employees genuinely love the company, but the other one hates it. I’d imagine the interviews probably be like this:

John
Pohnson: John, what are the pros of working at Company X?
John: Pohnson, that question is simple, but I’m not sure if our one-hour meeting can fit all that I have to say about our company. Let’s talk about the top 10 in my list. First, I love because… and the list goes on.

Pohnson: John, what are the cons of working at Company X?
John: [long pause] I cannot think of any.

Then I interview James who hates Company X.

James
Pohnson: James, what do you like about our company?
James: [long pause] I cannot think of any.

Pohnson: James, what are the cons of working at this company?
James: Pohnson, that question is simple, but I’m not sure if our one-hour meeting can fit all that I have to say about our company. Let’s talk about the top 10 in my list. First, I hate because… and the list goes on.

If the case holds true, counting the number of word in each answer should give us a clue into an employee’s mind. I mean, if you like a company, you should find a lot more to talk about Pros than Cons, right? But if you don’t, you probably have a lot to say in Cons.

Let’s do exactly that. But first, we need to load goodies.

I scraped the reviews by creating a separate R project for each company. So, I had to use read.csv()  for each firm and combine them in this project. As the files are in a CSV format, we can use the for loop()  to load them into the work environment.

As I didn’t change the scraping codes, the columns are the same. So, what we need to do is very simple: format the date, remove punctuation, and remove a placeholder row. But since Microsoft has two different date formats, I’ll just deal with its first, then initiate the for loop() .

Now, we are ready to combine them using rbind() .

Next, we remove the placeholder row, and create new variables.

Let’s start by creating boxplots.

1,200 words for FB’s Pros… whoever wrote this was sure an admirer. But what about cons?

Over 2,000 words for Amzn’s Cons… I sense discontents.

I think we need to make some adjustment as the outliers distorted the chart.

It seems like Google on average is the lowest. But we can do better than that. Let’s create summarized values and create a slope chart.

That’s too many observations. Since we only need ten distinct values, let’s use distinct() .

Now, that’s much better. Let’s see the mean slope plot.

Only Uber and FB has more words in Pros than Cons. The slope of each company is about the same. Amazon has the steepest slope.

But as we have seen in the boxplot, outliers will make the mean higher than it should be. So, let’s try median.

It’s just about the same. With median, AirBNB has fewer words in Cons than in Pros.

So, it seems like when the time to talk about reviewers’ employer, reviewers have more to say about Cons than Pros except Uber and Facebook.