Title: | An Easy Text and Sentiment Analysis Library |
---|---|
Description: | Implement text and sentiment analysis with 'texter'. Generate sentiment scores on text data and also visualize sentiments. 'texter' allows you to quickly generate insights on your data. It includes support for lexicons such as 'NRC' and 'Bing'. |
Authors: | Simi Kafaru [aut, cre] |
Maintainer: | Simi Kafaru <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.9 |
Built: | 2024-11-09 03:37:23 UTC |
Source: | https://github.com/simmieyungie/texter |
it contains news articles on brexits
SimiKafaru [email protected]
This function retrieves the number of times each word in a corpus occurs. It returns a dataframe containing the word and the corresponding counts
counter(word_vec, words)
counter(word_vec, words)
word_vec |
This is the corpus you want to the word frequency extracted from |
words |
This is a vector of words you want to retrieve their frequency counts |
a data frame object. A data frame object of strings and their corresponding count
it contains tweets on doge coin collected using twitter API
SimiKafaru [email protected]
The dataset is saved from the textdatahttps://github.com/EmilHvitfeldt/textdata/blob/master/R/lexicon_nrc.R for easier access
A tibble with 13,901 rows and 4 variables:
An English word
Indicator for sentiment or emotion: "negative", "positive", "anger", "anticipation", "disgust", "fear", "joy", "sadness", "surprise", or "trust"
http://saifmohammad.com/WebPages/lexicons.html
This function will help you remove punctuation and numbers from your text easily
removeNumPunct(x)
removeNumPunct(x)
x |
is the text column you want the punctuation and texts removed from |
a character vector.
{ removeNumPunct("is this your number? 01234") }
{ removeNumPunct("is this your number? 01234") }
This function helps remove URLs from text, particularly designed for tweets
removeURL(x)
removeURL(x)
x |
is the text value you want to extract the texts from |
a character vector.
This function will help you extract the weight of emotions conveyed in a tweet
sentimentAnalyzer(word_vec, details)
sentimentAnalyzer(word_vec, details)
word_vec |
This is the corpus you want to extract the sentiments from |
details |
(A TRUE/FALSE value): If TRUE you get a more robust distribution of these emotions. FALSE is summarised as Positive or Negative |
a data frame object. A data frame of each emotions and their corresponding weight in text
sentimentAnalyzer(doge$text, details = TRUE)
sentimentAnalyzer(doge$text, details = TRUE)
it contains stop_words from tidytext package. It is saved for easier acces
tidytext
This function is used to get the top N bigrams from a corpus. It will retrieve the most occurring two combinations based on frequency
top_bigrams(word_vec, remove_these, bigram_size)
top_bigrams(word_vec, remove_these, bigram_size)
word_vec |
This is the corpus you want to extract the sentiments from |
remove_these |
This is a vector of characters you want cleaned out of the text |
bigram_size |
This is the Top N number of rows to be retrieved as an integer value |
a data frame object.
{ top_bigrams(brexit[, c("content")], remove_these = c("rt"), bigram_size = 20) }
{ top_bigrams(brexit[, c("content")], remove_these = c("rt"), bigram_size = 20) }
This function returns the top 10 positive and negative words expressed in a text. By defaults a data frame of words classified as positive or negative based on weights.
top_Sentiments(word_vec, plot)
top_Sentiments(word_vec, plot)
word_vec |
This is the corpus you want to extract the sentiments from |
plot |
(TRUE/FALSE) TRUE means you want to return a plot which you can further customize. FALSE means a dataframe will be returned |
a data frame object if plot is FALSE. a ggplot object if plot = TRUE
top_Sentiments(doge$text, plot = TRUE)
top_Sentiments(doge$text, plot = TRUE)
This function is used to get the top N words from a corpus. It will retrieve the most occurring words based on frequency
top_words(word_vec, remove_these, size)
top_words(word_vec, remove_these, size)
word_vec |
This is the corpus you want to extract the sentiments from |
remove_these |
This is a vector of characters you want cleaned out of the text |
size |
This is the Top N number of rows to be retrieved as an integer value |
a data frame object.
{ top_words(brexit$content, remove_these = c("news","uk"), size = 10) }
{ top_words(brexit$content, remove_these = c("news","uk"), size = 10) }
This function helps to search for the top n words but only based texts or rows containing a key word. It is particularly useful when you want to search the top n words revolving around a certain keyword
top_words_Retriever(word_vec, word_ret, remove_these, size)
top_words_Retriever(word_vec, word_ret, remove_these, size)
word_vec |
This is the corpus you want to extract the sentiments from |
word_ret |
is the key word you want searched |
remove_these |
is a vector of characters you want cleaned out of the tex |
size |
is the N number of rows to be retrieved as an integer value |
a data frame object.
{ top_words_Retriever(brexit$content, word_ret = "brexit", remove_these = c("news","uk"), size = 10) }
{ top_words_Retriever(brexit$content, word_ret = "brexit", remove_these = c("news","uk"), size = 10) }
The function will extract any tagged handles from text
users(x, ...)
users(x, ...)
x |
This is the corpus you want to extract the mentions from |
... |
More inputs |
a character vector.
{ users("Come See this @simmie_kafaru") }
{ users("Come See this @simmie_kafaru") }