python - how to make feature vector from the lists -

June 15, 2015

i'm new python. have train data in bag of words.each line of train data article. labels of train data in file , each label equal article in train data. did stemming on train data , removed stop words. output lists of words of each article(line). want extract feature vector of , use in knn classifier in python.. don't know how it! appreciate quick answer. here's code things did:

  import nltk   nltk.corpus import stopwords   nltk import stem   stemmer=stem.porterstemmer()     open('data.txt')as file:   while 1:       line=file.readline().split()       filtered_words = [w w in line if not w in stopwords.words('english')]       documents = [stemmer.stem(line) line in filtered_words]        print(documents)         if not line:          break       pass

take @ scikit-learn's countvectorizer or tfidfvectorizer. these can take list of documents (these lists of tokens, in example) input, , return feature matrix:

from sklearn.feature_extraction.text import countvectorizer count_vect = countvectorizer() x_train_counts = count_vect.fit_transform(your_list_of_documents)

you can find more information in working text data tutorial.

Search This Blog

Script

python - how to make feature vector from the lists -

Comments

Post a Comment

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

javascript - Bootstrap Popover: iOS Safari strange behaviour -

spring cloud - How to configure SpringCloud Eureka instance to point to https on non standard port -