NLP–特征提取

Text feature extraction
  • API

    • feature_extraction.text.CountVectorizer
    • feature_extraction.text.HasingVectorizer
    • feature_extraction.text.TfidfTransformer
    • feature_extraction.text.TfidfVectorizer

1.Bag of Words

  • scikit-learn 提供的工具

    • tokenizing
    • counting
    • normalizing

2.Sparsity

test

3.Common Vectorizer usage

4.TF-IDF term weighting

5.Decoding text files