Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

Source:

The dataset below contains fictitious tweet information shortly after the release of an application called, "Myco ExampleApp".

DatetwitterIdisEmployeetweet
11/5/15lawrencetlu38141FALSEJust downloaded Myco ExampleApp! Transforming data in 5 mins!
11/5/15petramktng024TRUETry Myco ExampleApp, our new free data wrangling app! See www.example.com.
11/5/15joetri221TRUEProud to announce the release of Myco ExampleApp, the free version of our enterprise product. Check it out at www.example.com.
11/5/15datadaemon994FALSEGreat start with Myco ExampleApp. Super easy to use, and actually fun.
11/5/1599redballoons99FALSELiking this new ExampleApp! Good job, guys!
11/5/15bigdatadan7182FALSE@support, how can I find example datasets for use with your product?

There are two areas of analysis:

  • For non-employees, you want to know if they are mentioning the new product by name.
  • For employees, you want to know if they are including cross-references to the web site as part of their tweet.

Transformation:

The following counts the occurrences of the string ExampleApp in the tweet column. Note the use of the ignoreCase parameter to capture capitalization differences:

D trans
RawWrangletrue
p03Value'ExampleApp'
Typestep
WrangleTextcountpattern col:tweet on:'ExampleApp' ignoreCase:true
p01NameColumn
p01Valuetweet
p02NameOption
p02ValueText or pattern
p03NameText or pattern to count
p04Valuetrue
p04NameIgnore case
SearchTermCount matches

For non-employees, you want to track if they have mentioned the product in their tweet:

D trans
RawWrangletrue
p03Value'nonEmployeeExampleAppMentions'
Typestep
WrangleTextderive type:single value:if(isEmployee=='FALSE' && countpattern_tweet=='1',true,false) as:'nonEmployeeExampleAppMentions'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Valueif(isEmployee=='FALSE' && countpattern_tweet=='1',true,false)
p03NameNew column name
SearchTermNew formula

The following counts the occurrences of example.com in their tweets:

D trans
RawWrangletrue
p03Value'example.com'
Typestep
WrangleTextcountpattern col:tweet on:'example.com' ignoreCase:true
p01NameColumn
p01Valuetweet
p02NameOption
p02ValueText or pattern
p03NameText or pattern to count
p04Valuetrue
p04NameIgnore case
SearchTermCount matches

For employees, you want to track if they included the above cross-reference in their tweets:

D trans
RawWrangletrue
p03Value'employeeWebsiteCrossRefs'
Typestep
WrangleTextderive type:single value:if(isEmployee=='TRUE' && countpattern_tweet1 == 1, true, false) as:'employeeWebsiteCrossRefs'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Valueif(isEmployee=='TRUE' && countpattern_tweet1 == 1, true, false)
p03NameNew column name
SearchTermNew formula

Results:

After you delete the two columns tabulating the counts, you end up with the following:

DatetwitterIdisEmployeetweetemployeeWebsiteCrossRefsnonEmployeeExampleAppMentions
11/5/15lawrencetlu38141FALSEJust downloaded Myco ExampleApp! Transforming data in 5 mins!falsetrue
11/5/15petramktng024TRUETry Myco ExampleApp, our new free data wrangling app! See www.example.com.truefalse
11/5/15joetri221TRUEProud to announce the release of Myco ExampleApp, the free version of our enterprise product. Check it out at www.example.com.truefalse
11/5/15datadaemon994FALSEGreat start with Myco ExampleApp. Super easy to use, and actually fun.falsetrue
11/5/1599redballoons99FALSELiking this new ExampleApp! Good job, guys!falsetrue
11/5/15bigdatadan7182FALSE@support, how can I find example datasets for use with your product?falsefalse