Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0811

...

BeforeAfter
socks, socks, socks["socks", "socks", "socks"]
pants, pants["pants", "pants"]

...

Extract hashtags

...

Extract hashtags

Include Page
EXAMPLE - Extract Values
EXAMPLE - Extract Values

d-s-

...

also
inCQL

...

true

...

Source:

The following dataset contains a customer tweets across different locations.  

...

Excited to announce that we’ve transitioned Wrangler from a hybrid desktop application to a completely cloud-based service! #dataprep #businessintelligence #CommitToCleanData # London

...

Learnt more about the importance of identifying issues in your data—early and often #CommitToCleanData #predictivetransformations #realbusinessintelligence

...

Clean data is the foundation of your analysis. Learn more about what we consider the five tenets of sound #dataprep, starting with #1a prioritizing and setting targets.  #startwiththeuser #realbusinessintelligence #Paris

...

Learn how #NewYorklife

onboarded as part of their #bigdata  #dataprep initiative to unlock hidden insights and make them accessible across departments. 

...

How can you quickly determine the number of times a user ID appears in your data?#dataprep #pivot #aggregation#machinelearning initiatives #SFO

Transformation:

The following transformation extracts the hashtag messages from customer tweets.

D trans
p03ValueHashtag tweets
Typestep
p01NameColumn
p01Valuecustomer_tweets
p02NamePattern matching elements in the list
p02Value`{hashtag}`
p03NameNew column name
SearchTermExtract matches into Array

Results:

...

["#dataprep", "#businessintelligence", "#CommitToCleanData", " # London"]

...

["#CommitToCleanData",  "#predictivetransformations", "#realbusinessintelligence", "0"]

...

["#dataprep", "#startwiththeuser","#realbusinessintelligence", "# Paris"]

...

["#NewYorklife", "dataprep", "bigdata", "0"]

...

label(label = "cleanse_tasks")