Quick Contact

    Python Tutorial
    Python Panda Tutorial
    Python Selenium
    Python Flask Tutorial
    Python Django
    Numpy
    Tensorflow
    Interview Questions & Answers

    Word Embedding

    Word embedding is a general term that refers to creating a vector representation of the meaning of a word. When you work with “term document matrices” that contain word counts frequencies or tf-idf representations then you are creating similar to word embedding model, however word embeddings tend to refer to slightly more sophisticated models that don’t rely entirely on representations where each variable within a vector directly reflects a document.

    So, this means that they often contain information about the close range context of words rather than relying simply on co-occurrence within a document. Word embedding models are by and large machine learning algorithms that aim to provide you with meaningful representations of word semantics among the most successful of these models.

      • Suppose there is a program that just downloads tweets as people write tweets on Twitter and save them to a file and after running this program for about a month or two, you had collected a massive file with over 5 gigabytes of tweets after you compresses it, so you got a massive amount of data and it’s just raw data that people typed on the internet, after getting all this data, you set it directly into word embedding algorithm that was able to figure out tons of relationships between words just from the raw things that people happen to type on the internet.

      • So for example if you put in a color as-:
    • It will tell you the bunch of other colors, it never actually know the idea of color going in, it didn’t really know anything that all these words are related, you can see you can put in other things like a kind of food and you get other kinds of food out and you will actually have a link in the description, so that you can try this yourself, just to see how well it really learned the relationships between words.

    • So you are probably wondering how this actually worked because it’s kind of baffling.
    • Every word embedding algorithm uses the idea of context.

    For Example-:

    So, here is an example of a sentence where there is a word missing and it is expected to fill the blank and suitable word to be filled with is “some color” but unfortunately that’s not always true, you could also say “I painted the bench today” so ‘today’ is out of color but the main takeaway is the context is really kind of closely related to meaning so that was an example where multiple different words could go into the same context and presume that those words are somehow related at least a lot of them are but there’s another way the context can help and that if two words happen to always appear in the same context at once.

    So, take the example of three different sentences that helps you to understand the idea.

      1. First Sentence- Donald Trump is the United States president.

    So here ‘Donald Trump’ are likely to appear together because one of the first name of the person and one’s the last name of that same person, so those words are closely related, you also have ‘United States’ which is kind of just one logical word broken up into smaller words so United States are likely to appear together.

    – And then on second and third example i.e.

      1. I laughed at his joke.
      2. His joke didn’t make me laugh.

    – In (b) and (c) example you have joke and laughs are kind of related words.

    – You laugh at a joke so they are also likely to appear in the same context now, there’s one subtle thing that you had like to point out in this example which is that ‘laughed’ and ‘laugh’ are might be different words like laugh is present tense and laughed is past tense.

    – You could consider ‘joke’ and ‘jokes’ like one is plural and one is singular. So these are different forms of same word and ideally word embedding is going to have to learn the different forms of the same word or related and that’s the learned laughed is somehow related to laugh.

    Above examples are give you an idea of how the model might be able to do that because you can see laughed appeared with the word joke in both sentence so that tells that laugh and laughed both related to joke so that’s kind of where a word embedding get’s its knowledge from. It learned things via context as it sees what words occur near other words but what is the word embedding actually.

    – So one sentence word embedding just coverts words into vectors so you might give in a word like hamburger and you would get out of list say 64 numbers and those numbers would describe the word and forward embedding to be good.

    – Vectors carry some meaning so if you put in hamburger and cheeseburger into my model, you want those vectors to be very close to each other because they are related words whereas if you put in something else like “Ferrari” – so like a kind of car is totally unrelated to hamburger.

    – You want the vector for “Ferrari” to be far away for the vector from hamburger and all these differences are related. So closeness of vectors is needed and to resemble the closeness of the words that they represent and in addition to this kind of idea of closeness you might also want there to be even more structure.

    If you go for high level, to get the idea for how it works as-:

    Word2Vec neural network looks like as above image.

    – You essentially feed in a word and it produces in middle, it produces a small vector which is a word embedding and then it produces as output something like a context.

     

    Apply now for Advanced Python Training Course

    Copyright 1999- Ducat Creative, All rights reserved.

    Anda bisa mendapatkan server slot online resmi dan terpercaya tentu saja di sini. Sebagai salah satu provider yang menyediakan banyak pilihan permainan.