Should You Outsource AI?

There’s no shortage of third parties offering their AI for text analytics. But here’s a secret: the best general purpose model will struggle to keep pace with a purpose-built one.

Introduction

Plague, pending climate apocalypse, post-truth politics aside, we live in interesting times in that AI is now so common that it’s sold in pre-packaged bundles that claim to be one-size-fits all. With every Tom, Dick, and Harry peddling their wares, it’s only natural that our business leaders ask: “why don’t we just outsource instead of hiring someone?” And so they seek out consultations with salespeople who promise the world.

In fact, I even caught one vendor (whose name I shall not confirm) promising that their text analytics AI was “more accurate than a human rater” – an impressive claim given that it’s accuracy is ultimately measured against the classifications made by human raters. Think about that for a second - the model’s accuracy is meaured against how well it can replicate a human’s classification of text. It by definition cannot read better than a human. Even if their model had 100% accuracy in training, all that would mean is 100% identical to the humans who labeled the training data. So when a business leader sees a tagline reading:

Artificial Intelligence (AI) Software Platform that Outperforms Human Category Analyses

…Can you blame them for getting excited? In reality the above line is only correct if you are talking about speed. AI can be faster than the humans it’s trying to replicate. But in a task like reading, it can only be as literate as the humans it learns from! Now, dodgy marketing aside, there’s another issue: any predictive model will suffer performance degradation as you make the things its trying to make predictions from get further away from the things it learned to predict stuff on. The further a fish gets out of water, the worse its ability to swim gets. So, even if the service advertising superhuman literacy really did deliver on the data it was trained on, its performance on your data will be worse given how different your data is from the data it learned on. And, since they own the IP, it’s difficult, if not impossible, to know how similar the data they trained their model on is to your data.

That brings us to the immediate problems outsourcing machine learning:

You’ll get sold on hype from people with a vested interest in making a sale.
You get a jack-of-all-trades or master-of-something-else that isn’t what you’re specifically trying to do.
You don’t actually own the model and will likely need to pay a subscription in perpetuity.

Here, I’ll use Tensorflow via Keras to show you that, with relatively little effort, it’s entirely possible to out-do even the giants like Azure by simply training your own model to your specific problem.

Hypothesis: a specialized, if poorly optimized AI should be able to compete with a premium AI service selling a model that wasn’t actually trained on the specific task of detecting sentiment in Glassdoor reviews (naturally labeled by the people who wrote them as pro or con).

(Feel free to skip to the conclusion)

The Test

Here’s a simple (sounding) task: take employee reviews from Glassdoor.com, separate out the pros and cons, and see if model I make is able to identify which comments were positive or negative better than traditional NLP methods or a premium AI-driven service like Azure cognitive services’ text analytics API. What makes this relatively straight forward is that reviews tend to be adjective heavy (e.g “good tea selection in the breakroom” or “horrible stench in the break room”) which makes it easy for a sentiment scoring algorithm to lazily look for a few common “up” or “down” words. However, being employment related, there are a few words like “benefits” that can be positive or negative, that makes it a not entirely trivial problem.

In this simple demonstration, I’ll use ~40,000 comments scraped from Glassdoor.com (using rvest), each one rated by the respective author as either a pro or a con. The data are split 70:30 into a training and validation set. I use the training set to train an artificial neural network in Tensorflow (via Keras to keep the code simple), then I compare the accuracy (and kappa!) against Azure and traditional sentiment analysis using the hold-out validation data. In this way, the model I make is evaluated on data it wasn’t trained on so as to keep the comparison between methods fair! What’s important to note is that it took only a Saturday afternoon to do this, so it should be more than possible for you company to achieve the same results!

The code is available to view below - just use the code toggles!

Dependencies & Data

First load up any dependencies, then we load in the data, do as little bit of last minute cleaning and filtering, and create the training and testing indexes. The text cleaning I do simply removes any left over HTML that could mess things up.

require(data.table) # Fast data manipulation
require(keras)      # Interface to Tensorflow
require(caret)      # ML suite
require(magrittr)   # Function Pipes
require(tm)         # Text mining functions
require(pbapply)    # Parallel processing
require(parallel)   # Parallel back-end

require(extrafont)
#loadfonts()
 # read data in
dt <- fread("data/siop_2020_txt_long.csv", nThread=2) 
 # limit to just pro and con
dt <- dt[text != "" & type %in% c("pro", "con"), ]
# remove some formatting junk
dt[, text  := gsub("^.*</b><br/>", "" ,text)]  # Remove pros/cons html tag
dt[, text  := gsub("<.*?>", "", text)] # remove HTML junt
dt[, text  := gsub("brbr$", "" ,text)]  
dt[, text  := gsub("[\\\r\\\n]", " " , text)] 

# Going to leave stopwords in - they shouldn't hurt an LSTM!
#dt[, text2 := tm::removePunctuation(text)]
# dt[, text2 := tm::removeWords(text2, tm::stopwords("en"))]
dt[, text := tolower(text)]
dt[, nword := vapply(gregexpr("[[:alpha:]]+", dt$text), 
                     function(x) sum(x > 0), numeric(1))]

# reduct to where at least 3 words are said
dt <- data.table::copy(dt[nword > 3, ])

# Create binary column - 1 means text is pro, 0 is con
dt[, target := ifelse(type == "pro", 1, 0)]

set.seed(3364900) # fixes random number generation - for reproducability (within R, not CUDA!)

# Partition training and test data balanced by type (pro/con) &  use 70% in training
train_idx <- caret::createDataPartition(dt$type, p = .7, list = FALSE)

The 40284 rows of data that we will use for the model now looks like this:

Review	Pro/Con	Target	Word Count
no cons, other than constant change that some employees may not enjoy.	con	0	12
immersed in different tasks relatively high salary for internship have co-interns	pro	1	12
great place to do an internship great culture and environment	pro	1	10
great work-life balance and they provide time to be innovative	pro	1	11
3m stock is a barometer of the market and the macroeconomy	con	0	11
great place to work! pay and benefits are some of the best.	pro	1	12
many options to develop internally nearly all jobs expect a regional or global collaboration	pro	1	14
many changes overwhelm employees options to work abroad for limited time would make the comany more attractive	con	0	17
get to work with interesting technologies. at times, can be a collaborative work environment. working in a large multinational conglomerate.	pro	1	20
a great organizational culture with limitless opportunities to grow and develop	pro	1	11
experienced people have no chance. no career plans. obbey instead of talent.	con	0	12
a large corporate with the restrictions and lack of pace that comes along with that	pro	1	15
if outside of corporate hq, opportunities for advancement will most likely require relocation	con	0	13
senior management more interested in lining up their next job then managing the current business.	con	0	15
it is difficult to make substantially more than your planned commission	con	0	11
can be very slow to start out when first hired	con	0	10
good job,on time salary, proper reimbursement ,proper variable pay	pro	1	10
excellent work culture people will respect you for your work opportunities in multiple domains	pro	1	14
tedious and repetitive work. but what can you expect… it’s manufacturing.	con	0	12
relatively low salaries lot of work, less number of people stressfull	con	0	11

Building A Custom Neural Net

Step 1: Tokenizer & Data Prep

First, I need to prepare the data for the neural net by way of tokenizing the text - a fancy way of saying turning strings into sequences of integers. These will be used with the embedding layer. To do that, I’ll need to decide on a maximum length of an input sequence (number of words per review). This is a balancing act between throwing away information and wasting precious VRAM on my graphics card (where Tensorflow will run).

I can see that most reviews are fairly short, so in the LSTM, we don’t need to provision a lot of memory to accommodate a lot of positional tokens. I’ll set a cutoff at 75 words, reviews longer than that are truncated.

Once the data is prepared:

Set up the parameters for the vocabulary size of the model (multiplies VRAM usage).
Set the number of words in the LSTM sequence (multiplies VRAM usage).
Tokenize text (turn to integer codes for each word)
Create token sequences of identical length
Split data into testing and training sets using the training index.

max_words <- 7000 # max number of words to use
max_len   <- 75 # text cutoff at n (make bigger for longer texts at expense of size/time)

# Create layer to reduce text to integer vector
# ...Then in embedding create distance vector betwween tokenized vec

# First, the tokenizer (to create the int vectors)
tokenizer <- text_tokenizer(num_words = max_words) %>%
             fit_text_tokenizer(dt$text)

# Create equal length sequences of tokens
txt_sequences <- texts_to_sequences(tokenizer, dt$text)
text_data     <- pad_sequences(txt_sequences, maxlen = max_len)


# Now split by balanced test/train indexes
x_train <- text_data[train_idx, ] %>% as.matrix() %>% unname()
x_test  <- text_data[-train_idx,] %>% as.matrix() %>% unname()
 
y_train <- dt[train_idx, target] %>% as.matrix() %>% unname()
y_test  <- dt[-train_idx, target] %>% as.matrix() %>% unname()

# Here's what the tokenized input will look like:
# head(x_train, 20) %>%
#   kbl() %>%
#   kable_paper() %>%
#   scroll_box(width = "100%", height = "200px")

It’s also important to quickly check that the data are evenly split between result categories, in this case 0 = con, 1 = pro. In this case the data are balanced, which makes life easy!

	Con	Pro
train	14101	14099
test	6042	6042

Step 2: Embeddings

Embedding layers are vectors that contain information about the similarity/dissimilarity of words. An embedding layer contains knowledge of language structure and synonyms. For example, that buck is to doe what man is to woman or that emerald is more closely related to sapphire than it is to granite. Creating language embeddings is a colossal task, and are beyond the scope of what my hardware can possibly process. HOWEVER! I don’t need to because I can take an open-sourced pre-trained language embedding model! I’ll use a relatively small (2.1GB 300-dimensional) model.

Extracting these embeddings and creating a weight matrix that can be fed into a new neural network is computationally intensive. As such, it’s best to do it in parallel and preferebly only once per combination of dictionary size and max sequence length.

# NOTE - Run this outside of markdown first -- else you'll create a thread panic!
#        After that, let the if condition handle executing in the render. 
# Check if this has already ran - no need to do it more than once! 
filename <- paste0("serialized/wiki_embedding_mx-300-", max_words, "-", max_len, ".RDS")
# print(filename)

if(file.exists(filename)){
  wiki_embedding_mx <- readRDS(filename)
} else {
  # Get Embeddings - give similarity between words used in English
  lines <- readLines('data/wiki-news-300d-1M/wiki-news-300d-1M.vec')
  # omit junk line 1
  lines <- lines[2:length(lines)]
  
  # Container for embedding structure
  wiki_embeddings <- list()
  
  # This process takes a long time, so going to run in parallel
  #24x faster...BUT no names..so need to do that too
  wiki_embeddings <- pblapply(lines, 
                              function(line){
                                  values     <- strsplit(line, " ")[[1]]
                                  out        <- as.double(values[-1])
                                  return(out)
                              }, 
                              cl = parallel::makeCluster(parallel::detectCores()))
  # So need to get and extract embedding names (unfortunately can't do all at once)
  embedding_names <-  pblapply(lines, 
                              function(line){
                                  values     <- strsplit(line, " ")[[1]]
                                  word       <- values[[1]]
                                  return(word)
                              }, 
                              cl = parallel::makeCluster(parallel::detectCores()))
  # It really pains me that I can't return to names index in a cluster apply 
  names(wiki_embeddings) <- embedding_names
  str(head(wiki_embeddings))
  
  # free some RAM
  rm(lines)
  rm(embedding_names)
  gc();gc()
  
  # Create our embedding matrix
  word_index         <- tokenizer$word_index 
  wiki_embedding_dim <- 300
  wiki_embedding_mx  <- array(0, c(max_words, wiki_embedding_dim))
  
  for (word in names(word_index)){
    index <- word_index[[word]]
    if (index < max_words){
      wiki_embedding_vec <- wiki_embeddings[[word]]
      if (!is.null(wiki_embedding_vec))
        wiki_embedding_mx[index+1,] <- wiki_embedding_vec # Words without an embedding are all zeros
    }
  }
  
  # Save so you don't need to do that again
  saveRDS(wiki_embedding_mx, filename)
  rm(wiki_embeddings); gc()

}

Step 3: Compile

The neural network schematic is created using the Keras interface to the Tensorflow back end. Once the network’s topology is set, the layers are combined and then the weights from the pre-trained embedding layer (containing a general understanding of the English language) are superimposed onto the network’s embedding layer and are set to be frozen (non-trainable).

Note that as this is just for demonstration purposes, the model structure has not undergone pruning or optimization. I’m certain that with tuning & pruning I could get even better results!

wiki_embedding_dim <- 300

# Setup input layer. Using 16-bit ints since smaller vocab to save VRAM
input <- layer_input(
  shape = list(NULL),
  dtype = "int16",
  name  = "input"
)

# Model layers

# Embedding - will populate with weights later
embedding <- input %>% 
    layer_embedding(input_dim = max_words, 
                    output_dim = wiki_embedding_dim, 
                    name = "embedding")

# Long Short Term Memory - including some dropout
lstm <- embedding %>% 
        layer_lstm(units = max_len, 
                   dropout = 0.25, 
                   recurrent_dropout = 0.25, 
                   return_sequences = FALSE, 
                   name = "lstm")

# Hidden Layers - using trusty tanh + lil2 norm. (can fall back to relu is speed is dragging)
# Add dropout if it overfits. 
hidden <- lstm %>%
          layer_dense(units = max_len, 
                      activation = "tanh", 
                      name = "hidden",  
                      kernel_regularizer = regularizer_l1_l2(l1 = 0.005, l2 = 0.005)) %>%
           layer_dense(units = 32, 
                      activation = "tanh", 
                      name = "hidden2",
                      kernel_regularizer = regularizer_l1_l2(l1 = 0.005, l2 = 0.005)) %>%
          layer_dense(units = 32, 
                      activation = "tanh", 
                      name = "hidden3",
                      kernel_regularizer = regularizer_l1_l2(l1 = 0.005, l2 = 0.005)) %>%
          layer_dense(units = 16,
                      activation = "tanh",
                      name = "hidden4",
                      )

# Output - sigmoid for probabilities
predictions <- hidden %>% 
               layer_dense(units = 1, 
                           activation = "sigmoid",
                           name = "predictions")

  
# Bring model together - predictions is a chain of embedding -> lstm -> hidden -> output
model <- keras_model(input, predictions)

# Set embedding layer to be wiki news weights
# Freeze the embedding weights initially to prevent updates back propagating
get_layer(model, name = "embedding") %>% 
  set_weights(list(wiki_embedding_mx)) %>% 
  freeze_weights()



# Compile
model %>% compile(
  optimizer = optimizer_adam(),
  loss = "binary_crossentropy",
  metrics = "binary_accuracy"
)


#print(model)

Step 4: Train

Now to train the model just feed in the training input and output, the batch size, validation data (to see how it performs on data it isn’t being optimized on), the number of epochs (times to cycle through all training data elements), tell it to shuffle the order in which it sees data every time, then tell it to keep logs for us to evaluate.

NOTE - this will use a LOT of system resources, so assume that your computer will freeze and crash if you run the below code.

#tensorboard("logs/md_log")
checkpoint_path <- "serialized/cp.ckpt"

# Checkpoint callback - to take the top validating model
cp_callback <- callback_model_checkpoint(
  filepath = checkpoint_path,
  save_weights_only = TRUE,
  save_best_only = TRUE,
  verbose = 1
)

history <- 
  model %>% 
  fit(x_train,
      y_train,
      batch_size      = 640,
      validation_data = list(x_test, y_test),
      epochs          = 42,
      shuffle         = TRUE,
      view_metrics    = TRUE,
      callbacks       = list(callback_tensorboard("logs/md_log"), cp_callback),
      verbose         = 1)

# Look at training results
print(history)

# Save model and training history
keras::save_model_hdf5(model, "serialized/md_model.hdf5")
saveRDS(history, file="serialized/md_history.RDS")

## 
## Final epoch (plot to see history):
##                loss: 0.1397
##     binary_accuracy: 0.9565
##            val_loss: 0.1867
## val_binary_accuracy: 0.9451

Step 5: Evaluate

To evaluate the network look for evidence of convergence, and test the model’s performance on validation data (data the model was not trained on). In this way you can see whether the algorithm is able to arrive at a solution, and if it is reasonable to assume said solution will perform on new data. Note that if you are making a model for production, you will typically have two hold-out datasets.

For convergence, we see evidence of the neural network rapidly learning reaching ~ 80% accuracy after the first training epoch (epoch = full cycle through all 28200 elements in the training data) followed by a gradual convergence after 90% accuracy by the end of training. The performance is slightly better on the training than validation data (overfit) which is to be expected, however this is not significant and furthermore we see that validation accuracy was also reaching convergence (and not declining) so I am able to conclude that it’s “good enough” for what I need it to do.

To formally assess the neural network’s performance on the hold-out validation data, we create what is called a confusion matrix - a table of real and predicted values. From that, we can see the accuracy, kappa, specificity, sensitivity, and estimate how the AI will perform on similar data.

y_pred <- predict(model, x_test)
y_pred <- factor(round(y_pred))
y_real <- factor(round(y_test))

tf_cmx <- confusionMatrix(y_pred, y_real)
print(tf_cmx)

Metric	Score
Overall
Accuracy	95%
Kappa	89%
By Class
Sensitivity	94%
Specificity	96%
Pos Pred Value	95%
Neg Pred Value	94%
Precision	95%
Recall	94%
F1	94%
Prevalence	50%
Detection Rate	47%
Detection Prevalence	49%
Balanced Accuracy	95%

Azure Cognitive Services

As an example, let’s compare the unoptimized demo model we created against the might of Microsoft’s Azure Cognitive Services Text Analytics AI. If my hypothesis holds water, it follows that the unpolished, unoptimized demo sentiment scoring AI should have performance in the same league as a premium general-purpose AI sentiment scoring service such as Azure.

The below code requires a private Azure API key, which was obtained using a 30-day free trial of ACS, which included a 200.00 USD credit. Bear in mind that pricing for ACS can extend into over 5000.00 USD per month (for sentiment scoring alone) so be careful not to run the below code willy-nilly as it could cost you a pretty penny!

# REMEMBER THAT THIS CAN COST MONEY! CURRENTLY HAVE 200 IN FREE AZURE CREDITS FOR A TRIAL
# AFTER 30 DAYS, THIS WILL COST ~20 DOLLARS TO RUN.

require(httr)
require(jsonlite)

azure_file_name <- paste0("serialized/azure_scores-", length(y_test) ,"-rows.csv")

# Stored in .Rprofile -- you'll need one of your own. 
api_key      <- Sys.getenv("AZURE_TEXT_ANALYTICS_KEY")
api_endpoint <- "https://text-analytics-aitmi-payg.cognitiveservices.azure.com/"
api_url_lan  <- "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/languages"
api_url_sen  <- "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment"
# text to pass into API - training data(for apples to apples validation)
text <- dt[-train_idx, .(text, type, target)]
text$language <- "en"
text$id <- (1:nrow(text))
setkey(text, id)


if(file.exists(azure_file_name)){
  azure_scores <- fread("serialized/azure_scores.csv")
} else{
  # As simple df
  text_data <- list(documents = text[, .(id, language, text)])
  object.size(jsonlite::toJSON(text_data)) %>% format("MB")
  # 2MB - may payload is 1mb -- REDUCE
  
  # Need to make multiple small requests
  # max 1000 records...
  text_data_list <- list()
  idx_max        <- length(y_test)
  set_max        <- 999
  sets_needed    <- ceiling(idx_max/set_max)
  
  # make as many sub dataframes as needed (13)
  idx_current    <- 1
  for(i in 1:sets_needed){
    this_limit <- min((i*set_max), idx_max)
    text_data_list[[i]] <- list(documents = text[idx_current:this_limit, .(id, language, text)])
    idx_current <- this_limit+1
  }

  azure_function <- function(x){
        out  <- httr::POST(url    = api_url_sen, 
                           config = httr::add_headers(`Ocp-Apim-Subscription-Key`=api_key),
                           body   = jsonlite::toJSON(x)) %>%
                httr::content( as="text")%>%
                jsonlite::fromJSON() %>%
                extract2("documents") %>%
                data.table() %>%
                .[, id := as.integer(id)]
        return(out)
  }
  
  # Be 100% sure you want to run this -- single line of code could cost $$$
  if(FALSE){
    responses    <- lapply(text_data_list, azure_function)
  }

  azure_scores <- rbindlist(responses) %>%
                  .[text, on="id", nomatch=0L]
  #Save to avoid making repeat API calls ($$)
  write.csv(azure_scores, azure_file_name, row.names=FALSE)

}

Parsing out the results for the return data from Azure Cloud Services sentiment scoring yields results like so:

Text	Pro/Con	Target	Azure Prediction
great developer of people, managers allowed discretion	pro	1	0.9551809
work life balance, respect, trust, development opportunities	pro	1	0.2162145
opportunity to learn and flexibility	pro	1	0.5000000
innovation, creativity, learning s and integrity	pro	1	0.5000000
brand name, product quality, innovation	pro	1	0.5000000
nice people. easy hours. good pay.	pro	1	0.9964200
good discounts on 3m products!	pro	1	0.9900347
great place to do an internship great culture and environment	pro	1	0.9914644
good company, stable pay, and good amount of autonomy	pro	1	0.9630058
good job,on time salary, proper reimbursement ,proper variable pay	pro	1	0.9763930
benefits and pay were nice	pro	1	0.9009468
many options to develop internally nearly all jobs expect a regional or global collaboration	pro	1	0.5000000
innovative, collaborative, global, and ethical company	pro	1	0.9774423
great pay and chance for advancement	pro	1	0.8708185
great pay and challenging work. most people there are dedicated, intelligent, hardworking, and wonderful human beings. i enjoyed that most of all: working with a team of people that are great at their jobs and work together under somewhat difficult circumstances. you are constantly encouraged to suggest and implement better ways of doing things. generous paid time off programs too.	pro	1	0.9989006
great innovative and quality products, good compensation package	pro	1	0.9795235
company culture diversity and inclusion smart decisions employees appreciation code of conduct career growth opportunities innovative thinking company	pro	1	0.9826248
flexible work arrangements–many people take wfh days or essentially work remotely. match is 5% on 401k and company gives an additional 3% so it’s 8% on 401k. health is good, get $600 (for individual; family is higher) toward hsa if you take high deductible plan. the work is exceedingly simple and there is minor stress given the lack of challenge. if you work with your """""""“manager”""""""" (many are really not managers at all……), you can ensure a good work balance and never work more than 40 hours per week. i know people who get their work done in under 40 every single week. if you want to settle for low salary but great benefits and essentially a mindless job that is impossible to get fired from, then sign up. maternity and paternity leave are good and so many people have a few kids then leave once they don’t need/can’t use that benefit anymore.	pro	1	0.0000003
it was a good environment	pro	1	0.9587986
great company to work for	pro	1	0.9655693
good health care benefits, few nice people, a lot of money due to overtime on the machine. earned double sometimes when am available to work.	pro	1	0.7587888
collaborative culture, good products, and highly ethical	pro	1	0.9339762
good people all around, supporting leaders, very friendly cullege	pro	1	0.9902489
not a stressful environment to work in	pro	1	0.3005069
really good pay with long-term incentive compensation at relatively low management levels. benefits decent, but getting more expensive every year like everywhere. work life balance great.	pro	1	0.0222268
friendly peers and clean work environment	pro	1	0.8804867
benefits and pay. overall great company	pro	1	0.9037578
15% culture high ethics diversity & inclusion	pro	1	0.5000000
good career growth for the employees	pro	1	0.9337417
great culture , freedom to try new approaches	pro	1	0.9943991
being in the 3m system, you can take your career anywhere you want to go. there are so many options within the company. working in red wing, mn makes commuting very easy as you do not have to deal with the city traffic. the products manufactured (fall protection equipment) is very meaningful and gives you a great """""""“why”""""""" to the job. this site is a great representation of minnesota nice as everyone is kind and… friendly.	pro	1	0.9690015
solid and profitable company with excellent core strengths and corporate culture.	pro	1	0.8930109
there is nothing positive to say	pro	1	0.8239341
excellent pay, benefits, 401k, vacation to start	pro	1	0.8770834
ethical and highly process driven	pro	1	0.9155611
nice environment, friendly people, nice infrastructure	pro	1	0.9847615
good benefits. health and dental. life insurance.	pro	1	0.9841331
the compensation package for salespeople is pretty solid compared to other companies. benefits are good as well.	pro	1	0.9263697
stable company, strong brand, great people	pro	1	0.9897828
excellent benefits, opportunity to move around within company, encourage professional and personal growth.	pro	1	0.9849688
people contact, exposure, work management in difficult conditions.	pro	1	0.0731521
great brand reputation, talented professionals, well respected products and services, global focus, employee oriented	pro	1	0.9917057
steady work for 30 years, the pay was good for the area.	pro	1	0.7508763
decent pay for the area	pro	1	0.1849081
broad based company allows for multiple career types	pro	1	0.5000000
little bit of work, no on site supervisor, little responsibility, very easy work.	pro	1	0.5000000
flexible working hours and conducive work environment	pro	1	0.7758822
met a few nice people along the way.	pro	1	0.9407599
it was a paycheck, barely	pro	1	0.1459821
great place to work for.	pro	1	0.9655693
healthy work/life balance, innovative, appreciates new ideas, is willing to try new approaches.	pro	1	0.9972401
flexible schedule, work from home options, ot available	pro	1	0.7578286
pay benefits work place culture tuition reimbursement multiple positions within company which allow for job changes or career shifts	pro	1	0.5000000
great products. focussed on new development	pro	1	0.9893661
hours, flexibility, co-workers, work from home	pro	1	0.1623306
good wages and benefits, reduced stock purchase price. matching 401k to 4%	pro	1	0.9564674
the benefits and pay are great	pro	1	0.8659391
opportunity for growth, excellent team members, and many networking groups within the company.	pro	1	0.9618406
easy to build a career here	pro	1	0.9389312
the best part is all sales representative earn incentives on secondary data basis	pro	1	0.9437200
friendly colleagues & nice office environment	pro	1	0.9087094
good team members good benefits work life balance good support team working environment harmony	pro	1	0.9577067
scale, innovation, differentiation, people culture	pro	1	0.5000000
work-life balance. good benefits. pays premium for external talent.	pro	1	0.9756182
well established company, progressive thinking, beautiful office	pro	1	0.9652684
relevancy across multiple market segments, career growth opportunities, and a focus on people and culture	pro	1	0.5000000
work balance with your life	pro	1	0.2430444
innovative, choose your own path, international opportunity, very ethical and values-driven, open to new ideas, highly collaborative, people want to do the right thing, diverse industries and geographies	pro	1	0.9962343
3m does a great job of recognizing value in all levels of employment. they do a great job in allowing for work-life balance and in cases where appropriate, work-life balance.	pro	1	0.2376331
great 401k match & stock purchase plan	pro	1	0.9481255
3m is still a really solid company, good benefits, work environment, with all the divisions there are opportunities to move around.	pro	1	0.2060166
overall, they do pay well with good benefits. the overall base pay is strong, they match 401(k) at varying percentages based on level, offer an employee stock purchase program with a 15% discount and have the standard yearly pay increases	pro	1	0.9174085
benefits, security and life balance	pro	1	0.5000000
good structure, develops their people,	pro	1	0.9275093
great work place and room for advancement. 3m also has a great wage for the work performed. 3m’s managerial hierarchy helps individual develop personally and professionally.	pro	1	0.9975671
friendly workers, generous benefit and work-life balance	pro	1	0.8811601
a work place that respect work life balance	pro	1	0.5000000
stability culture bright coworkers respected appreciated salary benefits promote from within career change opportunities flex time (salary, not production) red tape (yes, it can be positive… it keeps people in check and is the reason we have such great stability in the market and in our processes)	pro	1	0.9989728
innovative, inclusive, sustainability focus, great history and new products	pro	1	0.9819574
culture, people, products are great.	pro	1	0.9844176
ethical, stable, collaborative, diverse, growing	pro	1	0.9619355
challenging great salary & benefits good experience	pro	1	0.8934698
good company to work for with professional staff members.	pro	1	0.9728546
best place to work for	pro	1	0.9655693
there are some good folk with leadership qualities. shame about the bullies.	pro	1	0.0262452
to be in contact with customers and colleagues all around the world makes it be very interesting to work here. 3m is very social and meets the staff according to certain criteria.	pro	1	0.8101511
flexible, allows you to work at your own pace.	pro	1	0.7981701
i worked in 3m for 5 years. every day i learned something new, it’s a great company with innovative approach to business. corporate culture is exceptional.	pro	1	0.9994848
great pay and excellent benefits	pro	1	0.9300742
dedicated to inclusivity. ample opportunities to change to different jobs and different businesses. will invest in costly projects and new ways of thinking.	pro	1	0.9994803
great benefits, good pay, modern office environment, they either pay for metro or garage parking, very established	pro	1	0.8181764
competitive salary, pension, benefits. shift premium rate on 12 hour shifts.	pro	1	0.9182831
extreme amounts of flexibility. i talk to my manager once every other week and am able to completely make my schedule by myself, meaning i can have 5 home office days where i answer emails in my pjs, or i can be out in front of 10 different customers every day. you get to determine how to spend your time base on how you think it will grow your territory.	pro	1	0.9755369
known in the market and a large international company	pro	1	0.5000000
client database management and analysis, data cleaning using mainly ms excel macros; occasionally contacting business clients	pro	1	0.5000000
paid time off and good safety conditions	pro	1	0.1507154
team, initiatives, culture, people, environment	pro	1	0.5000000
good pay and benefits. lots of opportunity to find various career paths.	pro	1	0.9365184
the people & the benefits are fantastic.	pro	1	0.9151832
benefits are awesome, the pay is pretty great.	pro	1	0.9804236
Note:
First 100 results only

From this data, I’ll round the azure scores to a 0/1 classification and place the now binary predictions into a confusion matrix using the same data that we used to validate our neural network (minus one row that was not returned by Azure).

The results:

Azure Cognitive Services had significantly lower accuracy than my DIY model.

azure_pred <- factor(round(azure_scores$score))
azure_real <- factor(azure_scores$target) # index by ids - in case any weren't returned from Azure

azure_cmx <- confusionMatrix(azure_pred, azure_real, positive = "1")

Metric	Score
Overall
Accuracy	79%
Kappa	59%
By Class
Sensitivity	82%
Specificity	77%
Pos Pred Value	78%
Neg Pred Value	81%
Precision	78%
Recall	82%
F1	80%
Prevalence	50%
Detection Rate	41%
Detection Prevalence	53%
Balanced Accuracy	79%

Traditional NLP

Let’s also test against traditional, dictionary based sentiment analysis. The specific package we will use is a popular open source offering called sentimentr that in addition to advanced dictionary scoring, also permits valence shifters. This really is a good package, especially given that it’s not trying to be AI or anything and can be used by pretty much anyone on pretty much any computer! I pick sentimentr because it’s good, not to diss the sentimentr. The scores sentimentr return range from -1 to +1 so are converted into 0/1 and passed into a confusion matrix using the validation data set.

The result:

Traditional NLP had significantly lower accuracy than the AI based solutions.

require(sentimentr)

# peek at our text data:
text <- dt[-train_idx, .(text, type, target)]
text[, rid := 1:.N]
#head(text)

nlp_file_name <- paste0("serialized/nlp_scores-", length(y_test) ,"-rows.csv")

if(!file.exists(nlp_file_name)){
  # This takes a while. Can make it parallel with things like pbapply hacks...
  nlp_scores <- data.table(sentiment_by(get_sentences(text$text), by = text$rid), key = "rid")
  # join sentiment to text
  nlp_scores <- text[nlp_scores, on="rid"]
  setnames(nlp_scores, old = "ave_sentiment", new = "score")
  write.csv(nlp_scores, nlp_file_name, row.names = FALSE)
} else {
  nlp_scores <- fread(nlp_file_name)
}

nlp_scores[i = sample.int(nrow(nlp_scores), 100), 
           j = .(text, type, target, as.integer(score>0))] %>%
  kbl(col.names = c("Text", "Pro/Con", "Target", "NLP Prediction")) %>%
  footnote("First 100 results only") %>%
  kable_paper() %>%
  scroll_box(width = "100%", height = "200px")

Text	Pro/Con	Target	NLP Prediction
solving customer’s problems from the counter, and in the parking lot.	pro	1	1
everything else, esp the management or so called leaders	con	0	1
employee friendly, no micro management	pro	1	1
shifts can change, pressure to sell, high call volume	con	0	0
very less hike, amdocs related technologies,if u want to work on algorithms please don’t come here.	con	0	0
pretty laid back place to work	pro	1	1
work-life balance is great: they really appreciate family and know how important it is 2. company has a clear conciese vision and goals set forth 3. even though we are growing there is little growing pains and there are always plans laid out to move forward with the growth 4. middle and upper management are both easy to get along with and talk to on a continuing basis	pro	1	1
no cons here at ansys	con	0	1
more communication needed from co-workers.	con	0	0
no career advancement opportunity. only get holiday off. you still have to come in christmas eve.	con	0	0
great benefits time off and pay	pro	1	1
unrealistic sales goals in many cases. managers given free reign to do what ever they think is necessary and often treat employees bad. i have seen several run off or quit over the years. work load can be excessive. lots of duplication. large territories. only thing that matters to sales management and senior management is meeting wall street expectations. lots of brown nosing and politics.	con	0	0
abbott labs offers great growth planning and career opportunities for those willing to relocate and sacrifice their work life balance. they offer great benefits with stock options, 401 k matching and year end profit sharing.	pro	1	1
lot of travel with lot of work.	pro	1	1
i really enjoyed my time working at autozone, there were some fantastic people that i worked with, not only at my store, but surrounding stores as well. store management was fantastic, we had to buckle down and meet numbers, however we always had a fun time doing so. working with the customers is definitely the best part about this job. being able to install somebody’s battery and get them on their way is a really… great feeling.	pro	1	1
good place to work, employee friendly company.	pro	1	1
good food, great culture a great place to work and have fun	pro	1	1
in general i have no cons about this position and company	con	0	1
good first job out of college. professional. good training. steady work. consistent and reliable. good job security	pro	1	1
micro-management, not enough resources for career mapping and moving into a more fitting career within the company, work-life balance, can’t go to the restroom for 2 minutes without getting in trouble for being off the phone.	con	0	1
brand access to corporate guidance and protocol	pro	1	1
3m northridge facility is smaller than their other locations with limited growth and very limited learning opportunity. terrible management. employees perform redundant jobs everyday with no motivation or encouragement.	con	0	0
good knowledge transfer sessions are rare	con	0	1
amdocs is employee friendly company. good work culture.	pro	1	1
stable company with outstanding brand recognition.	pro	1	1
can get to do some real interesting development.	pro	1	1
i work in the mobile financial services division - fantastic team, interesting product and challenges, plenty of opportunities to contribute ideas and drive changes. working with some of the biggest telcos and banks on a mission to bring financial inclusion for millions of people around the world. excellent local management. opportunities to travel to work with remote teams and customers.	pro	1	1
good hours good pay. opportunities to learn many different processes	pro	1	1
flexible time and friendly environment.	pro	1	1
great company culture. very flexible with schedules. allowed me to work and go to college without being too stressful.	pro	1	1
great benefits, 5 weeks paid vacation, flexible scheduling with opportunities to make up time missed instead of spending pto, 401k. all benefits start day one, including pro-rated pto. - great work/life balance. allstate cares about its employees and does a lot of different things to make coming to work less stressful. - they ease you into the job with 3 months of paid training and 60 days of on-the-floor… mentorship from senior reps. if you’re new to insurance this is absolutely essential.	pro	1	1
amdocs does not treat their employees well. they have instituted a ticketing system for their it maintenance personnel which is demeaning at best.	con	0	0
have to work really hard for career growth which is not always a bad thing	con	0	1
short staffed. bad scheduling. it’s retail.	con	0	0
delivering news of premium increases. being that middle man when the news to give us not good. when we can’t give the best rates.	con	0	0
organizational structure is not optimized to get things done quickly or empower lower level employees; phd’s are highly respected sometimes over others, change is slow	con	0	1
work environment is great. get to learn new stuff almost. on site opportunity.	pro	1	1
great co workers learn automotive skills	pro	1	1
many re-organizations very conservative with ideas slow to change (implementations) no intentional succession planning	con	0	0
salary is bellow then avarage in industry. complicated career growth	con	0	1
corporate management is terrible at making decisions. this directly impacts sales.	con	0	0
variety of work duties from day to day keeps job from becoming boring.	pro	1	0
culture and salary employee care, not much politics	pro	1	1
employees are great and fun to work with, it makes the work day go by much faster and more enjoyable	pro	1	1
very team oriented, nice people, pto, pay, benefits are meh	pro	1	1
it’s common to have to work long extra hours.	con	0	1
3m columbia mo: pay for the same job grade is less than the other in state 3m plants and team leaders are one job grade less. there is no profit sharing for hourly. no raises for service. team leaders are taking on a supervisor role without the pay or benefit. heavy work loads. more work than staff. excessive overtime. management disconnected from production reality and needs.	con	0	1
slow growth of company despite being a leader.	con	0	1
advancement within the company, working many hours	con	0	1
very dysfunctional company. if they had not grown so large while they were a part of sears, this model now would never have lasted. direction from the top is a disaster, no clue or integrity.	con	0	0
free breakfast ac bus service wfh option 35/- flat lunch for all food counters no variable component in your salary annual gift once in a year (amazon voucher) bonus every year which is not part of salary 9 hrs a day good exposure to emerging technologies 20 casual leaves, 10 sick leaves and 10 festival leaves per year	pro	1	1
pay, work/life balance, hours, work environment	pro	1	1
some nice person to work with	pro	1	1
takes good care of employees. organizes lots of events and parties for employees.	pro	1	1
constant pressure and communication gaps exist	con	0	0
great pto, great benefits, good work/life balance	pro	1	1
outstanding benefits and with a solid pension plan for long term growth. employee resource business groups, online learning, long term career path. company is production based so you have to work hard but the rewards are worth it. community service hours can be given back to your organization in a form of a cash donation by allstate. they have big heart on community service. most teams have yearly outings. … internal promotions are good. access to allstate benefits services and financial planning products are outstanding. legal plan benefits, outstanding supplemental health coverage benefits. allstate is for someone who wants a 20 to 30 year career growth. strong branding and leadership keeps you posted on company growth each quarter.	pro	1	1
its a good company to move up and get into managment.	pro	1	1
-culture, environment and morale is very low: no team presence, competitive among other tms, limited management presence, no investment made in employees -benefits have been better at other smaller companies -territory goals are unattainable for larger territories. so much so, corporate lowered them end of q1, but continues to offer ascs bulk buy price without compensating territory for difference or factoring in… bulk buy year over year in goals -field expectations are work number one and family second -mandatory conference calls last minute and held after 5:00 -quarterly weekend meetings -quarterly weekend inventory audits -expect zero work/life balance	con	0	1
almost always very demanding and stressful, along with repetitive and mind-numbing. as long as you know your limits and don’t let yourself get burnt out - because you will catch up to all the work you have to do - you’ll do great.	con	0	0
very bad place to work with their new office of future, it is extremely difficult to get in, but one day out of nowhere they lay you off, no loyalties to the employee. you are just a number. no ethics, nothing positive about this company except pretty campus.	con	0	1
i done work on .net. friendly nature. working capacity. interested in new technology.	pro	1	1
very tight dead lines for project delivery	con	0	0
people don’t use their brains much, so the answer to every problem is to just work harder!	con	0	0
inside focused not looking at the customer	con	0	0
office culture, decent work-life balance, commision plus salary, flexible hours, company values, everybody needs insurance so your customer can be anybody in prospecting	pro	1	1
unusual scheduling senior management involvement	con	0	0
good work culture many opportunities	pro	1	1
fleixible work schedule, name brand product.	pro	1	1
bureaucratic, outsourced finance systems make getting anything done that has to do with payments or using external vendors nearly impossible. things that should take a few minutes take hours due to complicated, user-unfriendly systems.	con	0	0
many departments may be too complicated in terms of processes.	con	0	0
as per my exp. with aspl:- good teams available work life balance good chances to learn varied technologies as all dev team are available on single floor	pro	1	1
weekly schedule, had a bad store manager, corporate rules that dont budge,	con	0	0
very top heavy, with too many-mid level managers who are a hinderance for advancement	con	0	0
professional development and challenging work environment	pro	1	1
not everyone in the program saw the same progression in their career and some of it does involve luck in the rotations that you’re able to get. however, i do truly believe that some are able to make the best out of their situation and show they’re capable of more - and it’s going to come down to your initiative, attitude, and ability to rapidly grow as a leader.	con	0	1
long commute bad managers at times workload can be very heavy at times	con	0	0
no communication bad management treat employees like kids	con	0	1
you will make lots of friends with other employees. easy to move up.	pro	1	1
no work life balance. terrible senior management	con	0	0
challenging work with good work/life balance.	pro	1	1
working for an independent agent means no benefits and no residuals.	con	0	1
where do i begin? let’s start with i agree with many reviews current and former employees have made in regards to this company. - upper management is so disconnected from the daily business in the stores, they have no clue how to run a store or manage a store manager. the district managers are often hard to find, as they only appear in your store’s office when they feel something needs to be improved so that… their numbers impress the corporate office. - autozoners (red shirts) are the backbone of the company in regards to tedious tasks and up-selling. there is no incentive for a red shirt to excel, as there is little if any recognition, and any hard work is only to the benefit of the store manager. - store managers are more concerned with receiving their bonuses than anything else. coc and wittdjr play far too large a role in employee reviews. sell $400 in parts, it doesn’t matter, you didn’t tack on that extra few dollars in brake lubricant. - customer service is stated as a company priority, when in reality the bottom line of az is to push products on to customers that they do not need. - the statement """""""“no employee is too big for the smallest job”""""""“, which has been repeated to me at nausea, is completely false. good luck finding a store manager, or psm for that matter, willing to clean the bathrooms or mop the floors. - psm’s are glorified red shirts. they receive nothing but ft hours and a password. there is no formal training, and they are often inexperienced and lack the knowledge one would expect from someone in”""""""“management.”""""""" they also outnumber red shirts 2/1, meaning there is usually 2-3 members of management on shift to 1 red shirt. management is not a title to be thrown around so loosely. exceptional knowledge and work do not qualify one for a promotion, as time put in with the company seems the only way to move up in the ranks. - if your dm is for any reason dissatisfied with you (on the one or two occasions they have ever seen you), do not expect to ever receive promotions. they make the final call on people they have met maybe twice and most likely never seen work.	con	0	1
pay, is not competitive but fair.	con	0	0
high volume, limited growth, office politics	con	0	0
friendly environment. everyone supports each other. generous budgets for equipment.	pro	1	1
lack of communication from management	con	0	0
salary offer is competitive - flexi-time/hours - work-life balance	pro	1	1
nothing i can think of	con	0	0
sometimes management seemed short sited. that stifled creativity new products.	con	0	1
great flexibility, boss is very willing to allow for special circumstances. we work as a team in our office. the work atmosphere is relaxed, but professional. we each know our personal goals and our office goals. we keep each other posted on where we are at and how we are doing. there is a competitive edge that is not threatening, but inspiring. you are always learning from those around you and there is always… something new to learn to do at our office. the coorporate goals are communicated and enforced appropriately. most importantly, vacation is provided as well as medical/health stipends. there is never a """""""“war”""""""" within the office. no catty remarks, no intentional problem making, great to work with people who understand you and where you need to be in acheiving your goals. you will be rewarded when you work hard and challenge yourself in the office. you will find yourself enjoying the customers as the people they are, not just as dollar signs. you will be able to network with vast majorities of different people and cultures. it’s wonderful to help people understand their insurance needs and provide the coverage for them.	pro	1	1
good food ,good benefits,gym ,food	pro	1	1
excellent training, great and helpful staff, well defined company values, competitive salary, safe and clean work environment.	pro	1	1
dysfunctional, appear to care more about the bottom-line than the quality of life for their employees. very disappointing. health insurance and pharmacy benefits are terrible. stagnate wages.	con	0	0
help you learn about cars, great benefits	pro	1	1
good benefits, flex work hours, the people who work there are very nice and welcoming	pro	1	1
detailed training provided at no cost.	pro	1	0
the requirements for the position wasn’t the right ones. customer service people did more than they have to for the position.	con	0	0
unless you work at headquarters in miinn you are treated like a second class citizen. lots of age discrimination at 3m	con	0	0
to many chiefs not enough indians	con	0	1
Note:
First 100 results only

# all positive are 1, by default negative are 0

nlp_scores[, score := as.integer(score > 0)]
nlp_pred <- factor(nlp_scores$score)
nlp_real <- factor(nlp_scores$target) # index by ids - in case any weren't returned from Azure

nlp_cmx <- confusionMatrix(nlp_pred, nlp_real, positive = "1")

Metric	Score
Overall
Accuracy	75%
Kappa	49%
By Class
Sensitivity	94%
Specificity	55%
Pos Pred Value	68%
Neg Pred Value	90%
Precision	68%
Recall	94%
F1	79%
Prevalence	50%
Detection Rate	47%
Detection Prevalence	69%
Balanced Accuracy	75%

Conclusion

Overall our DIY model, despite only being a quick and untuned demonstration, was able to overcome the general-purpose-behemoth that is Azure Cognitive Services as well as handily beating traditional sentiment analysis.

“That seemed too easy!” you say. I am inclined to agree. A specialized model will typically have an easy time competing against a generalist model on any given task. Think of it as comparing the performance of pliers to a socket wrench when changing a flat tire. Sure, the pliers are going to be more useful in a wider range of applications, but right now you’re broken down on the i-95 and need to change tire ASAP - the versatility of the pliers doesn’t matter. In the same way, the neural network I made here does better than Azure for Glassdoor reviews because it’s only trying to be good at Glassdoor reviews. If I was to apply it to, say, Amazon reviews, it’d be like taking our socket wrench and trying to pull out a staple.

So should you or your boss) outsource ML and AI? For our specific use-case outsourcing to a pliers type of AI such as Azure is certainly a bad idea - worse performance and a premium price tag! If you have the data and a data scientist, you should absolutely use them before paying more money (in the long run) to get a worse solution that your organization doesn’t even own. Now, let’s say you don’t have a labeled training set like we did here, then, sure, you can’t train your own AI without data! In that case, using either traditional techniques (like sentimentr) or paying to outsource to something like Azure makes sense (at least temporarily).

Remember that Using a 3^rd party vendor, has significant up-front and yearly recurring costs, a “cheap” service could end up costing close to $500,000 over a 10 year period, assuming it works and to know that it really works, you would still need to perform a validation test by assessing the service against a large body of manually scored data - which is most of the work involved in constructing you own model in-house! Then, should a 3^rd party service either go bust or prove unreliable, the costs are sunk. Heck, even the simple act of attempting to create an in-house solution, your data science teams gain even more valuable experience, professional development, and insight. Your internal talent may not give you a sleek sales pitch, or promise you definitionally impossible levels of accuracy, but in the long-run there’s no substitute for internal talent.